A Comparison of Drivers for Azure Cosmos DB



The metrics in this article are from the most up-to-date drivers available as of July 2019.

In this article, we compare the performance of the CData ODBC Driver for Cosmos DB to the equivalent native driver, looking at two measures.

First, we compare read performance, measuring the amount of time it takes to query an Azure Cosmos DB instance for data and process the result set in some way. We find that the CData Driver is at least three times faster than the native driver.

Next, we compare the resource usage of each driver for read queries, focusing on CPU and network usage. This is used to explain the underlying cause of better performance from the CData Driver. We further extend this test to see how both drivers scale as more server resources are added to CosmosDB (by configuring Read Units (RUs) in Azure Cosmos DB settings).

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.

The Data



In order to provide a reproducible comparison, we copied the sample restaurants dataset made publicly available by MongoDB, Inc, repeating the dataset until we built a database with 10 million records. The relevant details for the table queried are below:

Table Number of Rows
restaurants 10,000,000

ODBC Driver Read Performance



First, we tested the related performance of the drivers by running the same two queries for each driver:

  1. SELECT * FROM restaurants LIMIT 25000
  2. SELECT * FROM restaurants

To simulate actual processing of the data from Cosmos DB, we read the values of every field in each row. The times required for each product to process the results are in the table below.

ODBC Query Times by Company (in milliseconds)
Rows Queried CData Software Native
25,000 1,441 (+382%) 6,945
10,000,000 437,526 (+203%) 1,325,441

As you can see in the results, the CData ODBC Driver handled large result sets significantly faster than the native driver, processing the largest dataset three times faster.

ODBC Driver Resource Usage



After testing the read performance, we decided to run a separate test to measure and compare resource usage, looking specifically at network and CPU usage. The charts below were found by running a sample C program and using Windows Resource Monitor (we ran two instances of Resource Monitor so we could capture the CPU usage and network usage simultaneously). For the resource usage test, we looked at two separate use cases, based on allocated Read Units (RUs) in Azure Cosmos DB. We limited the results to 2.5 million rows.

1. Querying a Large Number of Rows with Minimum RUs

For the initial comparison, we ran a query for a large number of rows, with the minimum number of RUs (400) in Cosmos DB: SELECT * FROM restaurants LIMIT 2500000

CData Driver

Native Driver*

* Note the change in scale for the Network graph.

From the graphs above, we can see that when the RUs are set to the minimum, both drivers retrieve data and then pause in the retrieval, due to rate limiting based on the allocated RUs. However, the CData Driver appears to process the data while retrieving it, whereas the native driver appears to retrieve the data and then process it. Additionally, with the minimum RUs set, the CData Driver peaks at pulling data in between 30 and 40 Mbps, where the native driver peaks between six and seven Mbps (note the scale change for the native driver). The higher rate of data retrieval indicates that the CData Driver is making better use of the network bandwidth and results in faster query processing times for the CData Driver.

2. Querying a Large Number of Rows with High RUs

For the second comparison, we ran a query for a large number of rows, with high RUs in Cosmos DB: SELECT * FROM restaurants LIMIT 25000000

CData Driver

Native Driver*

* Note the change in scale for the Network graph.

When we set the RUs to a higher value (3500), the performance of the CData Driver improves dramatically and the differences in how each driver utilizes available resources become more pronounced. Based on the graph, the CData Driver maintains the pattern of retrieving and processing the data simultaneously but is retrieving much more data per second (around 70 Mbps), fully utilizing the allocated RUs. The native driver sees little-to-no improvement in performance with higher RUs, continuing the pattern of first retrieving the data and then processing the data separately. The native driver retrieves the data at the same rate (around 7 Mbps) as when the RUs were set to the minimum value, showing no indication of utilizing the allocated RUs.

Conclusion



The CData Software driver regularly proves to be faster than the native driver, particularly when dealing with large data sets and with a high RU allocation. Our developers have spent countless hours optimizing the performance in requesting data and processing the results returned by the Cosmos DB database, capitalizing on the RUs allocated to a system. This engineering means you get the best performance possible based on your allocated RUs. Download a free, 30-day trial of any of our Cosmos DB drivers and experience the CData difference for yourself.

Related Articles