A Comparison of Drivers for Amazon DynamoDB



The metrics in this article are from the most up-to-date drivers available as of July 2019.

In this article, we compare the performance of the CData JDBC Driver for Amazon DynamoDB to the same technology produced by another company (Competitor 1). In our testing, we found that the CData Driver outperformed the Competitor driver, querying and processing data five times faster. The difference in performance is largely due to better client & server-side resource usage. Details of the comparisons follow.

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.

The Data



To provide a reproducible comparison, we use the sample restaurants dataset made publicly available by MongoDB, Inc. To create a large data set (around 10 million records), we added the original dataset to Amazon DynamoDB multiple times.

Table Number of Rows
restaurants 25,360
restaurants_2 10,020,921

JDBC Driver Read Performance



First, we compared the related performance of the drivers by running the same queries with each driver using the JDBC drivers in a simple Java application. To simulate actual data processing beyond, we read and process the values of every field in each row. The exact queries tested are listed below:

  1. SELECT borough, restaurant_id, cuisine, name FROM restaurants
  2. SELECT borough, restaurant_id, cuisine, name FROM restaurants_2

We set the provisioned Read Capacity in DyanmoDB for the tables queried to 1000 for the duration of our tests. The results of processing the query results are below.

JDBC Query Times by Company (in milliseconds)
Query CData Software Competitor 1
1 (~25k rows) 2728 (+217.9%) 8673
2 (~10m rows) 139818 (+462.4%) 786368

Note that these performance numbers are from a non-default configuration of the Competitor driver. By default, the Competitor driver issues each query using a single thread. In contrast, the CData Driver uses up to four threads by default and can be configured to use as many as needed. At installation, the CData Driver shows a better performance comparison than our tests indicate. For this article, we tested only after increasing the thread count for the Competitor driver to four to match the default setting of the CData Driver. As can be seen in the results, the CData Driver retrieves and processes result sets significantly faster than the Competitor driver.

JDBC Driver Resource Usage



While testing the read performance, we also measured client & server-side resource usage, looking specifically at client-side memory & CPU usage and allocated Read capacity Unit (RU) consumption. The charts below were found by running a sample Java program and using Java VisualVM to capture the CPU and memory usage. We used Java version 8 update 211 with a maximum heap size of 4.27 Gigabytes.

Querying with High Read Capacity

For this comparison, we ran a query for a large number of rows, with a high read capacity allocation for the DynamoDB table: SELECT borough, restaurant_id, cuisine, name FROM restaurants_2

CData Driver

Competitor 1 Driver

When we provision a high read capacity (1000 read units), the differences in how each driver utilizes available client-side resources are stark. Based on the graph, the CData Driver maintains a high client-side resource usage, using around 37% of the CPU and averaging near 700 MBs of heap usage.

In contrast, the Competitor driver only uses around 4% of the CPU and averages around 110 MBs of RAM usage. By making better use of client-side resources, the CData Driver requests and processes data more than five times faster than the Competitor driver. Finishing the read process faster not only saves on time, but it means that you are making the best use of resources provisioned for the DynamoDB table.

DynamoDB Read Capacity Consumption

While testing the client-side resource usage, we also captured the read capacity consumption for each driver (with 1000 read units provisioned). The graph below shows the read capacity consumed by each driver for the same query. The first spike in consumed read capacity represents the CData Driver, where the second bump represents the Competitor driver.

DynamoDB CloudWatch Metrics

Based on the graph, we can see that the CData Driver makes significantly better use of the provisioned read capacity, utilizing around 70% of the available read capacity units. The Competitor driver, meanwhile, uses less than 20% of the available read capacity (despite being configured in the JDBC URL to use 100%), further explaining why the competitor driver takes longer to request and process the table data.

Conclusion



The CData Software Drivers regularly prove to be faster than the equivalent competitor product, particularly when dealing with large data sets. We realize that speed is only one measurement, but the performance of our drivers is a reliable indicator of the depth and technical prowess embedded in all of our drivers and data access technologies. Our developers have spent countless hours optimizing the performance in processing the results returned by the DynamoDB database to the point that the drivers seem only to be hindered by web traffic and server processing times.

Download a free, 30-day trial of any of our DynamoDB drivers and experience the CData difference for yourself.