Standards-Based Access to NoSQL Data Sources



NoSQL data sources like MongoDB provide a solution for high volume, scalable, and agile data management; there is a high demand to integrate these data sources with existing infrastructure and familiar tools. Standards-based data access facilitates your expansion into MongoDB by enabling you to leverage existing skills using standards like ODBC, JDBC, and ADO.NET. Standards-based drivers also provide maximum interoperability with BI (business intelligence), analytics, and ETL.

The ability to connect to MongoDB and other NoSQL data sources as a relational database is a critical component of integration with BI, analytics, and ETL. The major BI (business intelligence) tools, such Crystal Reports, Tableau, QlikView, and TIBCO Spotfire, depend on the SQL interface provided by standards-based drivers.

CData drivers enable bidirectional access using standard SQL to MongoDB data. CData drivers model MongoDB objects as normalized relational tables. This article shows how to access objects in NoSQL data sources from any CData driver technology. In this article, you will use the ODBC Driver for MongoDB to create reports in a popular BI tool, Crystal Reports.


Connect to MongoDB from BI, Analytics, and ETL



CData drivers enable out-of-the-box connectivity to NoSQL data sources from your application. The drivers automatically generate schemas, or table definitions, that enable you to start querying MongoDB objects with SQL as soon as you connect. Simply set the DetectDataTypes property to the number of rows the driver will scan to infer column data types.

The driver models all top level native properties as columns of an appropriate type. The inferred data types of the following JSON object are shown below:

{
  id: 12,
  name: "Lohia Manufacturers Inc.",
  address: {street: "Main Street", city: "Chapel Hill", state: "NC"},
  offices: ["Chapel Hill", "London", "New York"]
  annual_revenue: 35,600,000
}
Column NameData TypeExample Value
idInteger12
nameStringLohia Manufacturers Inc.
addressString {street: "Main Street", city: "Chapel Hill", state: "NC"}
officesString["Chapel Hill", "London", "New York"]
annual_revenuedouble35,600,000


Access Nested Data from SQL



CData drivers provide easy access to nested data through the XPath standard. You can call the JSON_EXTRACT SQL function to drill down to nested properties from SQL. Use the XPath as the second argument to flatten the JSON hierarchy to the level you want. Use an SQL alias to define the dataset's column names.

The following examples use the raw JSON below of a nested array of objects:

offices: ["Headquarters", {street: "600 Main Street", city: "Chapel Hill", state: "NC"}]

To flatten the fields of the offices object, use the following SQL:

SELECT JSON_EXTRACT(offices,'[1]') AS Headquarters, 
JSON_EXTRACT(offices,'[2].street') AS Street, 
JSON_EXTRACT(offices,'[3].city') AS City, 
JSON_EXTRACT(offices,'[4].state') AS State
FROM Customers

The dataset returned from the preceding query contains the following columns:

Column NameData TypeExample Value
HeadquartersStringHeadquarters
StreetString600 Main Street
CityStringChapel Hill
StateStringNC


Access Nested Data from Custom Tables



CData drivers provide two methods to outline the exact fields you want to pull from MongoDB, which you are then able to query as tables:

  • Specify the XPath in your query with the JSON_EXTRACT formula.
  • Create static schemas to map columns to objects and arrays nested in the JSON response.

You can define schemas, or table definitions, in simple, XML-based files. Schema files can be used with any CData driver technology.

The schema in this example will map columns to the fields of the following array of objects:

offices: ["Headquarters", {street: "600 Main Street", city: "Chapel Hill", state: "NC"}]

The following definitions map columns to the fields in the example:

<attr name="Headquarters" xs:type="string" other:bsonpath="offices.0" />
<attr name="OfficeStreet" xs:type="string" other:bsonpath="offices.1.street" />
<attr name="OfficeCity" xs:type="string" other:bsonpath="offices.1.city" />
<attr name="OfficeState" xs:type="string" other:bsonpath="offices.1.state" />

Each attr defined in the rsb:info section defines a column from the database. Define a column by providing the following information:

  • Name: Set name to the column name.
  • Data Type: Set xs:type to define the data type. The supported data types for the xs:type attribute are integer, double, datetime, string, long, and boolean.
  • XPath: Set other:bsonpath to the XPath to the JSON element you want to retrieve. Elements of arrays are indexed starting from 0. Each '.' in the other:bsonpath attribute signifies a child of the current element.

Schemas are defined in .rsd files with the table name specified by the filename. Schemas consist of column definitions and calls to the CRUD (create, read, update, and delete) operations shipped with the driver. Below is a complete schema, Office.rsd. The only part of the schema you will need to modify is the rsb:info block; copy the rest of the script as is.

<rsb:script xmlns:rsb="https://www.cdata.com/ns/rsbscript/2">
  <rsb:info title="Office" description="An example schema.">  
    <!-- Column definitions -->
    < attr name="OfficeName" xs:type="string" other:bsonpath="offices.0" />
    < attr name="OfficeStreet" xs:type="string" other:bsonpath="offices.1.street" />
    < attr name="OfficeCity" xs:type="string" other:bsonpath="offices.1.city" />
    < attr name="OfficeState" xs:type="string" other:bsonpath="offices.1.state" />
  </rsb:info>
  <rsb:script method="GET">
    <rsb:call op="mongodbadoExecuteSelect">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="POST">
    <rsb:call op="mongodbadoExecutePost">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="MERGE">
    <rsb:call op="mongodbadoExecuteMerge">
      <rsb:push />
    </rsb:call>
  </rsb:script>

  <rsb:script method="DELETE">
    <rsb:call op="mongodbadoExecuteDelete">
      <rsb:push />
    </rsb:call>
  </rsb:script>
</rsb:script>

You can use CData drivers to obtain the raw JSON for any MongoDB object:

  1. Set the Logfile connection property to a location where the driver has write permissions (ex: C:\MongoDBLog.txt).
  2. Set the Verbosity connection property to 3. The driver will log the body of HTTP requests, containing the JSON.
  3. Connect to the MongoDB database. The driver will detect the objects in your database.
  4. Execute a SELECT * query to the MongoDB object you want.
  5. The log file should now contain the raw JSON response from your server.

To use a custom schema, set the Location connection property to the schema file and reconnect.


Connect to MongoDB from Crystal Reports



The ODBC driver has almost ubiquitous support across the major BI tools. It is managed using Windows tools. Follow the steps below to use the ODBC Data Source Administrator to connect to MongoDB as an ODBC data source in Crystal Reports:

  1. Specify connection properties in a DSN (data source name). You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs. Typical connection properties are the following:

    • Server: Enter the name or address of the MongoDB server you want to connect to.
    • Database: Enter the name of the database you want to connect to.
    • User: Enter the username of an authorized user.
    • Password: Enter the password of an authorized user.
    • AuthDatabase: If you are connecting to a different database than the authentication database, enter the name of the authentication database.
  2. In Crystal Reports, open a new report and click Create New Connection -> ODBC.

  3. In the resulting wizard, click Select Data Source and select the DSN in the Data Source Name menu.

Design a Report



After adding an ODBC connection to MongoDB, you can then use the Report Wizard to add MongoDB objects to your report.

In the Report Wizard, click Create New Connection -> ODBC and select the MongoDB DSN. Configure the data source by selecting the tables and fields needed in the report. We use a dataset from the Bureau of Transportation Statistics' domestic flights for 2015. You can then create charts and other report objects. Your report objects can summarize and sort report fields.


Leverage Server-Side Processing



With a fast database like MongoDB, you can see marked performance gains by formulating large result sets on the server. CData drivers enable you to take advantage of MongoDB's aggregation framework to create responsive reports that reflect changes to the data. As you build charts and other report objects in Crystal Reports, Crystal Reports builds a query to the driver. The driver makes the corresponding request to MongoDB APIs.

The driver exposes the capabilities of the MongoDB API as standard SQL. Below are some report objects and the corresponding SQL queries. The dataset used is from the Bureau of Transportation Statistics' domestic flights dataset for 2015.

The following query finds the best average time to catch a flight to avoid delays. The query retrieves the average arrival delay for each departure time. The arrival time is the difference in minutes between the scheduled and actual arrival time. The number of minutes a plane arrived early are represented as negative numbers:

SELECT AVG(ARR_DELAY) AS AvgARR_DELAY, DEP_TIME
FROM Airlines
GROUP BY DEP_TIME

To create this chart, click Insert -> Chart in a report and enter the following in the Chart Wizard:

  • In the On Change Of menu, select DEP_TIME.
  • In the Show Values box, select ARR_DELAY and set the summary option to average.

You can work with MongoDB just as any other ODBC data source. See the Online Knowledge Base for more guides to integrating your business intelligence and ETL solutions with MongoDB.