Data is the new oil. Simply having access to data is not enough. To make informed decisions, you must be able to visualize and analyze your data in a meaningful way.
In this article I will show you how you can build a self-serve analytics platform by letting your customers ask questions about data in plain English.
Azure Data Explorer is a powerful analytics platform designed to handle large volumes of data with low latency.
It is a cloud-based service in Azure that enables users to perform real-time data analysis on large datasets, making it ideal for businesses that need to process vast quantities of data quickly.
One of the key benefits of Kusto is its scalability. The platform is capable of handling petabytes of data with ease, making it ideal for businesses of all sizes. This means that companies can easily scale up or down their data processing capabilities as their needs change, without having to worry about investing in additional hardware or infrastructure.
Kusto is also highly versatile, supporting a wide range of data sources, including structured, semi-structured, and unstructured data. This makes it an ideal choice for businesses that need to process a diverse range of data types, such as log files, IoT data, and social media feeds.
One of the most popular use cases for Kusto is in conjunction with Power BI, Microsoft's business intelligence tool.
By integrating Kusto with Power BI, businesses can build powerful analytics solutions that enable them to gain insights into their data quickly and easily. This integration enables users to create interactive dashboards and reports that provide a real-time view of their data, helping them make more informed decisions.
The need for self-serve data analytics is becoming increasingly important, customers want to be able to access and analyze their own data, without having to rely on engineering or data analysts.
This is where the ability to query a database in plain English becomes essential.
With the help of GPT 3.5, you can translate natural language into queries. By using this paradigm you can now offer your customers the ability to ask questions about their data in plain English.
This means that customers can ask questions like
"What were my sales last month?" or "Which products are selling the most?" and your system will automatically translate these questions into queries and query the database.
OpenAI can be used to query databases in two ways, by translating plain English customer queries into query language or by using embeddings to convert the customer's question into a numerical representation that can be matched with the stored data, the best approach to use depends on the type of data that is being analyzed.
This is because embeddings can capture the semantic meaning of text data, even if the wording or phrasing of a question is not precise. For more information on how to do this you can read my previous article – Build a chatbot to query your documentation using Langchain and Azure OpenAI
In this article we will use the first approach to “translate” the customer’s data into a query language, in our case KQL (Kusto query language) and we will query an analytical DB containing sales data.
Here is the link on how to create a free cluster for this demo
AAD_TENANT_ID = "<your aad tenant id>"
KUSTO_CLUSTER = "https://<your azure data explorer name>.westeurope.kusto.windows.net"
KUSTO_DATABASE = "<your kusto database name>"
KUSTO_TABLE = "sales"
KUSTO_MANAGED_IDENTITY_APP_ID = "<your aad app registration id>"
KUSTO_MANAGED_IDENTITY_SECRET = "<your kusto managed identity secret>"
OPENAI_DEPLOYMENT_ENDPOINT = "https://<your openai>.openai.azure.com/"
OPENAI_API_KEY = "<your openai api key>"
OPENAI_DEPLOYMENT_NAME = "<your gpt35 deployment name>"
OPENAI_DEPLOYMENT_VERSION = "2023-03-15-preview"
OPENAI_MODEL_NAME="gpt-35-turbo"
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table
from dotenv import load_dotenv
import utils
import pandas as pd
import os
import openai
import json
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")
# Configure OpenAI API
openai.api_type = "azure"
openai.api_version = OPENAI_DEPLOYMENT_VERSION
openai.api_base = OPENAI_DEPLOYMENT_ENDPOINT
openai.api_key = OPENAI_API_KEY
AAD_TENANT_ID = os.getenv("AAD_TENANT_ID")
KUSTO_CLUSTER = os.getenv("KUSTO_CLUSTER")
KUSTO_DATABASE = os.getenv("KUSTO_DATABASE")
KUSTO_TABLE = os.getenv("KUSTO_TABLE")
KUSTO_MANAGED_IDENTITY_APP_ID = os.getenv("KUSTO_MANAGED_IDENTITY_APP_ID")
KUSTO_MANAGED_IDENTITY_SECRET = os.getenv("KUSTO_MANAGED_IDENTITY_SECRET")
cluster = KUSTO_CLUSTER
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(cluster, KUSTO_MANAGED_IDENTITY_APP_ID, KUSTO_MANAGED_IDENTITY_SECRET, AAD_TENANT_ID)
client = KustoClient(kcsb)
query = "sales | take 10"
response = client.execute("retail", query)
for row in response.primary_results[0]:
print("Item:{}".format(row["item_description"]))
kusto_query_template_prefix = """
<|im_start|>system
I have an Azure Data Explorer (Kusto) table containing the following columns:
year, month, supplier, item_code, item_description,item_type,retail_sales,retail_transfers,warehouse_sales
Write an KQL query based on the user input below. Answer in a concise manner. Answer only with the KQL query where the table name is T, no extra text.
user input:
"""
template_sufix = "<|im_end|>\n<|im_start|>assistant"
def call_openai(template_prefix, text):
prompt = template_prefix + text + template_sufix
response = openai.Completion.create(
engine=utils.OPENAI_DEPLOYMENT_NAME,
prompt=prompt,
temperature=0,
max_tokens=4096,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=["<|im_end|>"])
response = response['choices'][0]['text']
response = utils.remove_chars("\n", response)
response=utils.start_after_string("Answer:", response)
response=utils.remove_tail_tags("<|im_end|>", response)
return response
def call_openai_kql_response(text):
response = call_openai(kusto_query_template_prefix, text)
query = response.replace("T", "sales")
query = query.replace("```", "")
response = client.execute("retail", query)
df = dataframe_from_result_table(response.primary_results[0])
return df
df = call_openai_kql_response("I would like to get the total retail sales for 2020 by months")
print(df)
df = call_openai_kql_response("I would like to get the top 10 suppliers by total retail sales")
print(df)
df = call_openai_kql_response("I would like to get the total retail sales by year and month for the top 10 suppliers")
print(df)
This is just sample code.
In order to create a highly scalable and efficient system to let your customers query your data in English, you must build a robust infrastructure.
Some of the things you need to consider are:
The source code for this article can be found in this repo:
I hope you found this article enjoyable and easy to follow and it will help you get into the right direction in order to build a self-serve analytics application.
Thanks
Denise
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.