JSON Drivers: Parsing Hierarchical Data



Modern services return or store data as JSON objects, arrays, or any combination thereof. While this is convenient for storing hierarchical data, it can be difficult to work with in common BI, reporting, and ETL tools. There are two prevailing techniques for dealing with nested JSON data:

  • Using horizontal and vertical flattening to drill down into the nested arrays and objects
  • Parsing the data structure and building a relational model based on the existing hierarchy

This article will describe the differences between the two techniques and how to configure the driver to use either of the techniques. In another Knowledge Base article, we explore more deeply how JSON data can be interpreted using horizontal and vertical flattening.

Document, Flattened Documents, and Relational Modeling



The CData Drivers for JSON manage NoSQL data by reporting full documents, implicitly JOINing flattened documents, or building a relational model. When reporting full documents, the driver will return nested object arrays as aggregated JSON objects. When working with flattened documents, the driver will interpret nested array objects as a single table, allowing you to implicitly perform a JOIN by running a SELECT query. When building a relational model, the driver will interpret the results as individual tables based on the discovered objects and structures, allowing you see a relational model of the NoSQL data and perform explicit SQL JOIN queries on the data. Follow along below for examples of each technique.

  • Document: A top-level, document view of your NoSQL data, including aggregated arrays.
  • Flattened Documents: Nested array objects are implicitly JOINed to parent objects.
  • Relational Modeling: All of your NoSQL data viewed as individual tables.

For reference, these examples are based on the following sample JSON document.

Sample Document



The sample document includes entries for people, the cars they own, and various maintenance services performed on those cars.

people.json

{
  "people": [
    {
      "personal": {
        "age": 20,
        "gender": "M",
        "name": {
          "first": "John",
          "last": "Doe"
        }
      },
      "vehicles": [
        {
          "type": "car",
          "model": "Honda Civic",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "07-17-2017",
              "desc": "oil change"
            },
            {
              "date": "01-03-2018",
              "desc": "new tires"
            }
          ]
        },
        {
          "type": "truck",
          "model": "Dodge Ram",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "08-27-2017",
              "desc": "new tires"
            },
            {
              "date": "01-08-2018",
              "desc": "oil change"
            }
          ]
        }
      ],
      "source": "internet"
    },
    {
      "personal": {
        "age": 24,
        "gender": "F",
        "name": {
          "first": "Jane",
          "last": "Roberts"
        }
      },
      "vehicles": [
        {
          "type": "car",
          "model": "Toyota Camry",
          "insurance": {
            "company": "Car Insurance",
            "policy_num": "98765"
          },
          "maintenance": [
            {
              "date": "05-11-2017",
              "desc": "tires rotated"
            },
            {
              "date": "11-03-2017",
              "desc": "oil change"
            }
          ]
        },
        {
          "type": "car",
          "model": "Honda Accord",
          "insurance": {
            "company": "Car Insurance",
            "policy_num": "98765"
          },
          "maintenance": [
            {
              "date": "10-07-2017",
              "desc": "new air filter"
            },
            {
              "date": "01-13-2018",
              "desc": "new brakes"
            }
          ]
        }
      ],
      "source": "phone"
    }
  ]
}

Document Modeling



Using a top-level document view of NoSQL data provides ready access to top-level fields and objects, forgoing the time and resources to process and parse hierarchical arrays. You can configure CData drivers to view a JSON store based on the top-most repeated element and treat all nested arrays as single columns. In this mode, the driver uses streaming to read the JSON data, only parsing the returned data once per query.

Below is a sample query and the results, based on the sample document above. The query results in a single "people" table based on the JSON path "$.people".

Connection String

Set the Data Model connection property to "Document" and set the JSON Path connection property to "$.people" to perform the query above and see the example result set.

DataModel=Document;JSONPath='$.people';

Metadata

The table below describes the metadata based on using the Document data model.

People
Column Data Type
personal.age Integer
personal.gender String
personal.name.first String
personal.name.last String
source String
vehicles String

Query

In this query, we pull top-level object elements and an array into our results. The top-level object elements are available due to default object flattening. The array is returned as aggregated JSON.

SELECT 
  [personal.age] AS age, 
  [personal.gender] AS gender, 
  [personal.name.first] AS name_first, 
  [personal.name.last] AS name_last, 
  [source], 
  [vehicles]
FROM 
  [people]

Results

With a document view of the data, the "personal" object is flattened into 4 columns and the "source" and "vehicles" elements are returned as individual columns, resulting in a table with 6 columns.

age gender name_first name_last source vehicles
20 M John Doe internet [{"type":"car","model":"Honda Civic","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"07-17-2017","desc":"oil change"},{"date":"01-03-2018","desc":"new tires"}]},{"type":"truck","model":"Dodge Ram","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"08-27-2017","desc":"new tires"},{"date":"01-08-2018","desc":"oil change"}]}]
24 F Jane Roberts phone [{"type":"car","model":"Toyota Camry","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"05-11-2017","desc":"tires rotated"},{"date":"11-03-2017","desc":"oil change"}]},{"type":"car","model":"Honda Accord","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"10-07-2017","desc":"new air filter"},{"date":"01-13-2018","desc":"new brakes"}]}]

Benefits & Considerations

With the document model, you are able to see all of the top-level data in a JSON store or service, along with the aggregated array data, in a single table. Your are able to submit simple queries to work with the top-level data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. One consideration is your need for any data stored in an array and the ability of your tool or application to process JSON arrays in a meaningful way.

Flattened Documents Modeling



For users who simply need access to the entirety of their JSON data, flattening the data into a single table is the best option. You can configure the driver to parse a single table from the JSON data, based on JSON paths in the data. In this mode, nested object arrays are treated as separate tables, but implicitly JOINed to the parent table. The driver uses streaming and only parses the JSON data once per query. With the flattened documents model, you can perform implicit JOIN statements on the data using dot notation to drill down into nested elements in the JSON data.

Below is a sample query and the results, based on the sample document above, parsing based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance" (this implicitly JOINs the "people" collection with the "vehicles" collection and implicitly JOINs the "vehicles" collection with the "maintenance" collection).

Connection String

Set the Data Model connection property to "FlattenedDocuments" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.

DataModel=FlattenedDocuments;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'

Metadata

The table below describes the meta-data based on using the Flattened Documents data model.

People
Column Data Type
people:_id String
personal.age Integer
personal.gender String
personal.name.first String
personal.name.last String
source String
vehicle:_id String
type String
model String
insurance.company String
insurance.policy_num String
maintenance:_id String
date Date
desc String

Query

In this query, we are able to drill into the nested elements in each "people" object. Since we included the "vehicles" collection as a JSON path, we can query the element of a "vehicle" explicitly.

SELECT 
  [personal.age] AS age, 
  [personal.gender] AS gender, 
  [personal.name.first] AS name_first, 
  [personal.name.last] AS name_last, 
  [source], 
  [type], 
  [model], 
  [insurance.company] AS ins_company, 
  [insurance.policy_num] AS ins_policy_num,
  [date] AS maint_date, 
  [desc] AS maint_desc
FROM 
  [people]

Results

With horizontal and vertical flattening based on the described paths, each "vehicle" object is implicitly JOINed to its parent "people" object and each "maintenance" object is implicitly JOINed to its parent "vehicle" object to produce a table with 8 rows (2 "maintenance" objects each for 2 "vehicles" and 2 "vehicles" each for 2 "people").

age gender first_name last_name source type model ins_company ins_policy_num maint_date maint_desc
20 M John Doe internet car Honda Civic ABC Insurance 12345 2017-07-17 oil change
20 M John Doe internet car Honda Civic ABC Insurance 12345 2018-01-03 new tires
20 M John Doe internet truck Dodge Ram ABC Insurance 12345 2017-08-27 new tires
20 M John Doe internet truck Dodge Ram ABC Insurance 12345 2018-01-08 oil change
24 F Jane Roberts phone car Toyota Camry Car Insurance 98765 2017-05-11 tires rotated
24 F Jane Roberts phone car Toyota Camry Car Insurance 98765 2017-11-03 oil change
24 F Jane Roberts phone car Honda Accord Car Insurance 98765 2017-10-07 new air filter
24 F Jane Roberts phone car Honda Accord Car Insurance 98765 2018-01-13 new brakes

Benefits & Considerations

With flattened documents, you are able to see all of the data in a JSON store or service in a single table. You are able to submit simple queries to drill down into the hierarchical data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. When working with flattened documents, users need to consider whether the tool or application you are working with works better with discrete entities versus a single pre-JOINed dataset.

Relational Modeling



The CData drivers can be configured to create a relational model of the data in the JSON file or source, treating nested object arrays as individual tables, including relationships to parent tables. This is particularly useful if you need to work with your JSON data in existing BI, reporting, and ETL tools that expect a relational data model. The model interpreted is based on the JSON paths in the data for each object array you wish to see as a table. If you build a relational model, then any time you perform a JOIN query, the JSON file or source will be queried once for each "table" included in the query.

Below is a sample query and the results, based on the sample document above, using a relational model based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance".

Connecting String

Set the Data Model connection property to "Relational" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.

DataModel=Relational;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'

Metadata

The tables below describe the metadata based on using the Relational data model.

People
Column Data Type
_id String
personal.age Integer
personal.gender String
personal.name.first String
personal.name.last String
source String

Vehicles
Column Data Type
_id String
insurance.company String
insurance.policy_num String
model String
type String

Maintenance
Column Data Type
_id String
date Date
desc String


Query

In this query, we explicitly JOIN the "people", "vehicles", and "maintenance" tables.

SELECT 
  [people].[personal.age] AS age, 
  [people].[personal.gender] AS gender, 
  [people].[personal.name.first] AS first_name, 
  [people].[personal.name.last] AS last_name, 
  [people].[source], 
  [vehicles].[type], 
  [vehicles].[model], 
  [vehicles].[insurance.company] AS ins_company, 
  [vehicles].[insurance.policy_num] AS ins_policy_num, 
  [maintenance].[date] AS maint_date, 
  [maintenance].[desc] AS maint_desc
FROM 
  [people]
JOIN 
  [vehicles] 
ON 
  [people].[_id] = [vehicles].[people_id]
JOIN 
  [maintenance] 
ON 
  [vehicles].[_id] = [maintenance].[vehicles_id]

Results

Using a relational model, any JOINs are controlled by the query. In this case, each "maintenance" object is JOINed to its parent "vehicle" object, which is JOINed to its parent "people" object to produce a table with 8 rows (2 "maintenance" entries for each of 2 "vehicles" each for 2 "people").

age gender first_name last_name source type model ins_company ins_policy_num maint_date maint_desc
20 M John Doe internet car Honda Civic ABC Insurance 12345 2017-07-17 oil change
20 M John Doe internet car Honda Civic ABC Insurance 12345 2018-01-03 new tires
20 M John Doe internet truck Dodge Ram ABC Insurance 12345 2017-08-27 new tires
20 M John Doe internet truck Dodge Ram ABC Insurance 12345 2018-01-08 oil change
24 F Jane Roberts phone car Toyota Camry Car Insurance 98765 2017-05-11 tires rotated
24 F Jane Roberts phone car Toyota Camry Car Insurance 98765 2017-11-03 oil change
24 F Jane Roberts phone car Honda Accord Car Insurance 98765 2017-10-07 new air filter
24 F Jane Roberts phone car Honda Accord Car Insurance 98765 2018-01-13 new brakes

Benefits & Considerations

With relational modeling, you are able to build a model of your data based on the discrete entities in the JSON data or service, which provides better compatibility with some BI, reporting, and ETL tools. However, in order to build and query a relational model, you will sacrifice some performance. If you are working with data from several tables in the relational model, the driver will query and parse data for each table in the query. For example, the query above requires three separate requests.

Related Articles