CData Elasticsearch Driver Features & Differentiators



Elasticsearch is a distributed, RESTful, full-text search engine designed to store, index, retrieve, and manage document-oriented or semi-structured data. Common uses for Elasticsearch range from building a simple search engine for a web site or document collection, to supporting auto-completion, analytics, AI and cognitive computing workloads.

Because of the schemaless / NoSQL nature of Elasticsearch, data management is often a challenge. Through REST APIs users can access the search capabilities and features of Elasticsearch, but that requires custom development to connect it with other applications and services. The other option of using the SQL API and vendor-supplied libraries offers limited read-only SQL query capability and still only offers a small subset of tool integrations.

The CData Elasticsearch drivers offer the best of both worlds. The CData Drivers expand on the capabilities of the REST and SQL APIs, providing standards-compliant SQL-92 interfaces for data management, and the support for standards like ODBC / JDBC / ADO.NET provides near-universal seamless tool integration. The effectiveness of this methodology is best represented by our knowledge base of Elasticsearch integrations, which is only a small sample of what is possible.

In this article, we explore how CData drivers:


Access to All of Elasticsearch

CData Drivers for Elasticsearch provide unmatched access to Elasticsearch with many features beyond those available from the native drivers. The distinctive features for enabling access to Elasticsearch are outlined below:

  • Connects to the SQL endpoint and REST endpoint (as opposed to only the SQL endpoint)
  • Enables SQL access to Elasticsearch v2.2 and above (compared to v6.3 and above)
  • Compatible with Open Source Elasticsearch subscriptions in addition to other commercially available subscriptions
  • Returns the relevance score as a _score column for queries without explicitly requiring the SCORE() function
  • Allows querying multiple indices (details below)

Querying Multiple Indices

Elasticsearch SQL supports querying multiple indices (or tables) through pattern matching. The CData Drivers support querying multiple indices, but for both the REST and SQL endpoints (as opposed to only the SQL endpoint). Multiple indices can be queried using any of the following formats:

Result SourceSQL StatementREST API Endpoint
All available indicesSELECT * FROM [_all]/_all/_search
A list of specific indicesSELECT * FROM [index1, index2, index3]/index1%2Cindex2%2Cindex3/_search
Indices matching a wildcard patternSELECT * FROM [index*]/index*/_search
Indices mathing a pattern (with exclusions)SELECT * FROM [index*, -index3]/index*%2C-index/_search

SQL Support

The CData Drivers for Elasticsearch fully support standard SQL queries, whether crafted manually for custom applications and visualizations or generated automatically by a BI, reporting, or ETL tool.

Full CRUD Support

Since the CData Drivers include support for both the SQL and REST API endpoints, users can submit SELECT, INSERT, UPDATE, and DELETE statements to easily read from and write to Elasticsearch with CData Drivers.

OperationSQL StatementEndpoint(s) Used
Query all documents in an indexSELECT * FROM [myIndex];REST, SQL
Create a new document in an indexINSERT INTO [myIndex]
  (myField1, myField2, myField3, ...)
VALUES
  ('myValue1', 'myValue2', 'myValue3', ...);
REST
Update an existing document in an indexUPDATE [myIndex] SET
  myField1 = 'myNewValue1'
WHERE
  myId = '1';
REST
DELETE an existing document from an indexDELETE FROM [myIndex]
WHERE
  myId = '1';
REST

Filtering

CData Drivers allow for filtering using LIKE and = operators, as opposed to only supporting LIKE operators, offering more precise querying. This functionality is useful, but can still lead to imprecise results. For example, Elasticsearch may return records that contain the same words but in a different order: a search performed using the value "blue sky" will return a record with "sky blue." The Query Type logic built-in to the CData drivers identifies columns as Analyzed or Non-Analyzed and the drivers issue the best possible Elasticsearch query based on the specific operator and the search value.

WHERE ClauseColumn TypeElasticsearch Query Type
column = 'value'AnalyzedQuery String Query
column = 'value with spaces'AnalyzedMatch Phrase Query
column LIKE 'v_lu%'AnalyzedQuery String with wildcards
column = 'value'Non-AnalyzedQuery String Query*
column = 'value with spaces'Non-AnalyzedWildcard Query
column LIKE 'v_lu%'Non-AnalyzedWildcard Query with wildcards

* Non-analyzed columns can be matched case-insensitive, so four popular cases are checked: 'myValue' OR 'MYVALUE' OR 'myvalue' OR 'Myvalue'

JOIN Support

In Elasticsearch, the join datatype creates a parent/child relation within documents of the same index. CData Drivers can leverage the join datatypes to split related tables and enable SQL JOIN queries across those parent/child relationships. If you need to JOIN across indexes, the built-in SQL engine can perform a client-side, in-memory JOIN to eliminate restrictions on how you work with your data. Using the Elasticsearch example, we can build related tables using the join datatype.

For example, if we have the following documents in the sample index, we get different results from a query, depending on whether we use the native drivers or the CData drivers.

Example Index: sample

{
    "_index": "sample",
    "_type": "doc",
    "_id": "2",
    "_score": 1,
    "_source": {
        "text": "What is your favorite motorcycle?",
        "question_id": {
            "name": "question"
        }
    }
},
{
    "_index": "sample",
    "_type": "doc",
    "_id": "1",
    "_score": 1,
    "_source": {
        "text": "What is your favorite company?",
        "question_id": {
            "name": "question"
        }
    }
},
{
    "_index": "sample",
    "_type": "doc",
    "_id": "3",
    "_score": 1,
    "_routing": "1",
    "_source": {
        "text": "CData Software",
        "question_id": {
            "name": "answer",
            "parent": "1"
        }
    }
},
{
    "_index": "sample",
    "_type": "doc",
    "_id": "4",
    "_score": 1,
    "_routing": "1",
    "_source": {
        "text": "Elastic",
        "question_id": {
            "name": "answer",
            "parent": "1"
        }
    }
},
{
    "_index": "sample",
    "_type": "doc",
    "_id": "5",
    "_score": 1,
    "_routing": "1",
    "_source": {
        "text": "Suzuki V-Strom 650",
        "question_id": {
            "name": "answer",
            "parent": "2"
        }
    }
},
{
    "_index": "sample",
    "_type": "doc",
    "_id": "6",
    "_score": 1,
    "_routing": "1",
    "_source": {
        "text": "Yamaha FZ6",
        "question_id": {
            "name": "answer",
            "parent": "2"
        }
    }
}

The native drivers expose a single table, sample, with a single column, text, but with no preservation of the relationship between the documents. The CData Drivers expose two tables, sample_question and sample_answer based on the relationship between the documents in the index.

Native Drivers

When querying the data, the native Elasticsearch driver returns a single table with one column, 'text':

Query Result (sample)
text
What is your favorite company?
What is your favorite motorcycle?
CData Software
Elastic
Suzuki V-Strom 650
Yamaha FZ6

CData Drivers

In contrast, CData Drivers expose two separate related tables, with several columns, including foreign keys to define the relationships:

Parent (sample_question)
_id_scoretext
10.0What is your favorite company?
20.0What is your favorite motorcycle?
Child (sample_answer)
_idsample_question_id_scoretext
310.0CData Software
410.0Elastic
520.0Suzuki V-Strom 650
620.0Yamaha FZ6

Support for JSON Structures

CData Drivers provide support for querying JSON structures, like arrays and nested JSON objects, which can often be found in Elasticsearch records. With CData, users will be able to get exactly the data they want from Elasticsearch, thanks to built-in schema discovery and JSON structure flattening.

Array Support

Native Elasticsearch drivers do not (at the time of writing) support querying data found in arrays. CData Drivers not only support accessing the arrays (as raw JSON data), but also support flattening the arrays to drill down into the relevant data. Consider the following JSON array:

{
  ...
  "coords" : [35.91,-79.06],
  ...
}

CData Elasticsearch Drivers expose the columns coords coords.0 and coords.1 (sample results below).

coordscoords.0coords.1
[35.92,-79.06]35.91-79.06

Nested Type Support

Data stored for use with Elasticsearch often contains nested JSON objects. Without support for these objects, native drivers will miss a lot of available data in their queries. CData Drivers support searching nested objects and include functionality to flatten the nested objects to atomize the data. Consider the following JSON object:

{
  "manager": {
    "name": "Ms. Manager",
    "age": 35,
    "location": "Flagship Store",
    ...
  }
}

CData Elasticsearch Drivers expose manager as a column, along with a separate column for each nested element: manager.name, manager.age, manager.location (sample results below):

managermanager.namemanager.agemanager.location
{ "name": "Ms. Manager", "age": 35, "location": "Flagship Store", ... }Ms. Manager35Flagship Store

Platform Support

Outside of the version and subscription restrictions, one of the largest limitations of the native Elasticsearch drivers is that they only support the JDBC and ODBC standards. In comparison, CData offers connectivity across a variety of data-centric standards like ODBC (Windows, Linux, Mac), JDBC, and ADO.NET and directly within tools like Excel, SSIS and PowerShell. Whether you are using tools and applications for BI, analytics, reporting, and ETL or building custom applications, CData drivers provide live access to Elasticsearch data wherever you need it.

Performance Comparison

The CData JDBC Driver is able to read data more than twice as fast as the native driver. The CData engineers built a driver that makes better use of client-side resources to work with large datasets quickly.

JDBC Query Times by Company (in milliseconds)
Rows Queried CData Software Native
25,000 1,016.9 (+35%) 1,375.4
~10,000,000 199,577 (+155%) 508,338

For more information, refer to our performance comparison article.

Other Features

Beyond these major features and key differentiators, CData Drivers for Elasticsearch come with additional features, that are included across all of the full range of data sources that CData supports.

  • Collaborative query processing - a powerful, built-in SQL engine pushes supported complex queries (filtering, aggregations, JOINS, etc.) down to Elasticsearch for server-side processing and features client-side processing for unsupported queries
  • Active development & maintenance — changes to the Elasticsearch API are implemented and supported as soon as possible, keeping up with the latest versions and updates
  • World-class support — our Support Team is available to answer your questions via email, live chat, and phone.

Free Trials & More Information

You will find the latest version of our Elasticsearch drivers here and all of our drivers include a free, 30-day trial. If you want to learn more about using our drivers in your existing tools or applications, you can read more in our Knowledge Base.