How to Query Live Amazon Athena Data in Natural Language in Python using LlamaIndex



Use LlamaIndex to query live Amazon Athena data data in natural language using Python.

Start querying live data from Amazon Athena using the CData Python Connector for Amazon Athena. Leverage the power of AI with LlamaIndex and retrieve insights using simple English, eliminating the need for complex SQL queries. Benefit from real-time data access that enhances your decision-making process, while easily integrating with your existing Python applications.

With built-in, optimized data processing, the CData Python Connector offers unmatched performance for interacting with live Amazon Athena data in Python. When you issue complex SQL queries from Python, the driver pushes supported SQL operations, like filters and aggregations, directly to Amazon Athena and utilizes the embedded SQL engine to process unsupported operations client-side (often SQL functions and JOIN operations).

Whether you're analyzing trends, generating reports, or visualizing data, our Python connectors enable you to harness the full potential of your live data source with ease.

About Amazon Athena Data Integration

CData provides the easiest way to access and integrate live data from Amazon Athena. Customers use CData connectivity to:

  • Authenticate securely using a variety of methods, including IAM credentials, access keys, and Instance Profiles, catering to diverse security needs and simplifying the authentication process.
  • Streamline their setup and quickly resolve issue with detailed error messaging.
  • Enhance performance and minimize strain on client resources with server-side query execution.

Users frequently integrate Athena with analytics tools like Tableau, Power BI, and Excel for in-depth analytics from their preferred tools.

To learn more about unique Amazon Athena use cases with CData, check out our blog post: https://www.cdata.com/blog/amazon-athena-use-cases.


Getting Started


Overview

Here's how to query live data with CData's Python connector for Amazon Athena data using LlamaIndex:

  • Import required Python, CData, and LlamaIndex modules for logging, database connectivity, and NLP.
  • Retrieve your OpenAI API key for authenticating API requests from your application.
  • Connect to live Amazon Athena data using the CData Python Connector.
  • Initialize OpenAI and create instances of SQLDatabase and NLSQLTableQueryEngine for handling natural language queries.
  • Create the query engine and specific database instance.
  • Execute natural language queries (e.g., "Who are the top-earning employees?") to get structured responses from the database.
  • Analyze retrieved data to gain insights and inform data-driven decisions.

Import Required Modules

Import the necessary modules CData, database connections, and natural language querying.

import os import logging import sys # Configure logging logging.basicConfig(stream=sys.stdout, level=logging.INFO, force=True) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout)) # Import required modules for CData and LlamaIndex import cdata.amazonathena as mod from sqlalchemy import create_engine from llama_index.core.query_engine import NLSQLTableQueryEngine from llama_index.core import SQLDatabase from llama_index.llms.openai import OpenAI

Set Your OpenAI API Key

To use OpenAI's language model, you need to set your API key as an environment variable. Make sure you have your OpenAI API key available in your system's environment variables.

# Retrieve the OpenAI API key from the environment variables OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] ''as an alternative, you can also add your API key directly within your code (though this method is not recommended for production environments due to security risks):'' # Directly set the API key (not recommended for production use) OPENAI_API_KEY = "your-api-key-here"

Create a Database Connection

Next, establish a connection to Amazon Athena using the CData connector using a connection string with the required connection properties.

Authenticating to Amazon Athena

To authorize Amazon Athena requests, provide the credentials for an administrator account or for an IAM user with custom permissions: Set AccessKey to the access key Id. Set SecretKey to the secret access key.

Note: Though you can connect as the AWS account administrator, it is recommended to use IAM user credentials to access AWS services.

Obtaining the Access Key

To obtain the credentials for an IAM user, follow the steps below:

  1. Sign into the IAM console.
  2. In the navigation pane, select Users.
  3. To create or manage the access keys for a user, select the user and then select the Security Credentials tab.

To obtain the credentials for your AWS root account, follow the steps below:

  1. Sign into the AWS Management console with the credentials for your root account.
  2. Select your account name or number and select My Security Credentials in the menu that is displayed.
  3. Click Continue to Security Credentials and expand the Access Keys section to manage or create root account access keys.

Authenticating from an EC2 Instance

If you are using the CData Data Provider for Amazon Athena 2018 from an EC2 Instance and have an IAM Role assigned to the instance, you can use the IAM Role to authenticate. To do so, set UseEC2Roles to true and leave AccessKey and SecretKey empty. The CData Data Provider for Amazon Athena 2018 will automatically obtain your IAM Role credentials and authenticate with them.

Authenticating as an AWS Role

In many situations it may be preferable to use an IAM role for authentication instead of the direct security credentials of an AWS root user. An AWS role may be used instead by specifying the RoleARN. This will cause the CData Data Provider for Amazon Athena 2018 to attempt to retrieve credentials for the specified role. If you are connecting to AWS (instead of already being connected such as on an EC2 instance), you must additionally specify the AccessKey and SecretKey of an IAM user to assume the role for. Roles may not be used when specifying the AccessKey and SecretKey of an AWS root user.

Authenticating with MFA

For users and roles that require Multi-factor Authentication, specify the MFASerialNumber and MFAToken connection properties. This will cause the CData Data Provider for Amazon Athena 2018 to submit the MFA credentials in a request to retrieve temporary authentication credentials. Note that the duration of the temporary credentials may be controlled via the TemporaryTokenDuration (default 3600 seconds).

Connecting to Amazon Athena

In addition to the AccessKey and SecretKey properties, specify Database, S3StagingDirectory and Region. Set Region to the region where your Amazon Athena data is hosted. Set S3StagingDirectory to a folder in S3 where you would like to store the results of queries.

If Database is not set in the connection, the data provider connects to the default database set in Amazon Athena.

Connecting to Amazon Athena

# Create a database engine using the CData Python Connector for Amazon Athena engine = create_engine("cdata_amazonathena_2:///?User=AWSAccessKey='a123';AWSSecretKey='s123';AWSRegion='IRELAND';Database='sampledb';S3StagingDirectory='s3://bucket/staging/';")

Initialize the OpenAI Instance

Create an instance of the OpenAI language model. Here, you can specify parameters like temperature and the model version.

# Initialize the OpenAI language model instance llm = OpenAI(temperature=0.0, model="gpt-3.5-turbo")

Set Up the Database and Query Engine

Now, set up the SQL database and the query engine. The NLSQLTableQueryEngine allows you to perform natural language queries against your SQL database.

# Create a SQL database instance sql_db = SQLDatabase(engine) # This includes all tables # Initialize the query engine for natural language SQL queries query_engine = NLSQLTableQueryEngine(sql_database=sql_db)

Execute a Query

Now, you can execute a natural language query against your live data source. In this example, we will query for the top two earning employees.

# Define your query string query_str = "Who are the top earning employees?" # Get the response from the query engine response = query_engine.query(query_str) # Print the response print(response)

Download a free, 30-day trial of the CData Python Connector for Amazon Athena and start querying your live data seamlessly. Experience the power of natural language processing and unlock valuable insights from your data today.

Ready to get started?

Download a free trial of the Amazon Athena Connector to get started:

 Download Now

Learn more:

Amazon Athena Icon Amazon Athena Python Connector

Python Connector Libraries for Amazon Athena Data Connectivity. Integrate Amazon Athena with popular Python tools like Pandas, SQLAlchemy, Dash & petl.