A Comparison of JDBC & ODBC Drivers for Amazon Athena



The metrics in this article were found using the most up-to-date drivers available as of March 2018.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. With the CData Drivers for Amazon Athena, you get top-of-the-line performance with those queries through standards-based interfaces such as JDBC and ODBC.

Preparation

This article will serve to compare the Amazon-supported ODBC Driver for Amazon Athena 1.0.11 to the CData Software ODBC Driver for Amazon Athena3 and the Amazon-supported JDBC Driver for Amazon Athena 1.0.22 to the CData Software JDBC Driver for Amazon Athena4. In order to provide a reproducible comparison, generate sample data using the JFairy library. You can download the source for a simple Java application (with required libraries) to create your own sample data.

The test machine specifications are as follows:
Operating System: Windows 10
Processor: Intel® CoreTM i7-6700 CPU @ 3.40GHz
Installed Memory (RAM): 8.00 GB
System type: 64-bit Operating System

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.


Comparison



The relevant details for the table are below:

Table Size     Table Number of Rows     Number of Columns
1.2 GB 6,000,000 19

The main goal of this investigation was to compare the related performance of the drivers. We did this by running the same queries with each driver. The queries are listed below:

  1. SELECT * FROM [table] LIMIT 100000
  2. SELECT * FROM [table] LIMIT 1000000
  3. SELECT * FROM [table] LIMIT 6000000


Results



For the ODBC Drivers, we connected to Athena using a DSN from an ADO.NET console application and executed the above queries repeatedly. The results were read and stored in a new object variable for every field in each row. For the JDBC Drivers, we modeled the same behavior in a simple Java application, using the java.sql library. The times you see in the chart below are based on averages, which should serve to level out any outliers due to spikes in network traffic, etc.

Query Times by Driver (in seconds)
Query Amazon ODBC CData ODBC Amazon JDBC CData JDBC
1 (100,000 rows) 45.34 10.46 (> 3x faster) 42.74 6.84 (> 5x faster)
2 (1,000,000 rows) 427.09 47.29 (> 8x faster) 435.53 36.22 (> 11x faster)
3 (6,000,000 rows) 2,396.64 206.55 (> 10x faster) 2,604.46 191.58 (> 12x faster)

As can be seen in the results, the CData drivers significantly outperformed the Amazon drivers when working with large result sets, regularly retrieving and processing results from over three to more than twelve times faster. The performance differences were ever greater as the size of the result set grew.

The average runtime for the larger two queries is compared in the charts below:

Results for 1,000,000 Rows

Results for 6,000,000 Rows


Conclusion



The CData driver's performance far exceeds that of the Amazon-supported driver. Our developers have spent countless hours optimizing the performance in processing the results returned by Amazon to the point that the drivers seem to only be hindered by web traffic and server processing times. This performance is particularly highlighted when the driver is required to process large amounts of data.

References



  1. https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html
  2. https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
  3. https://www.cdata.com/drivers/athena/odbc/
  4. https://www.cdata.com/drivers/athena/jdbc/