How to work with JDBC-ODBC Bridge Data in Apache Spark using SQL



Access and process JDBC-ODBC Bridge Data in Apache Spark using the CData JDBC Driver.

Apache Spark is a fast and general engine for large-scale data processing. When paired with the CData JDBC Driver for JDBC-ODBC Bridge, Spark can work with live JDBC-ODBC Bridge data. This article describes how to connect to and query JDBC-ODBC Bridge data from a Spark shell.

The CData JDBC Driver offers unmatched performance for interacting with live JDBC-ODBC Bridge data due to optimized data processing built into the driver. When you issue complex SQL queries to JDBC-ODBC Bridge, the driver pushes supported SQL operations, like filters and aggregations, directly to JDBC-ODBC Bridge and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can work with and analyze JDBC-ODBC Bridge data using native data types.

Install the CData JDBC Driver for JDBC-ODBC Bridge

Download the CData JDBC Driver for JDBC-ODBC Bridge installer, unzip the package, and run the JAR file to install the driver.

Start a Spark Shell and Connect to JDBC-ODBC Bridge Data

  1. Open a terminal and start the Spark shell with the CData JDBC Driver for JDBC-ODBC Bridge JAR file as the jars parameter: $ spark-shell --jars /CData/CData JDBC Driver for JDBC-ODBC Bridge/lib/cdata.jdbc.jdbcodbc.jar
  2. With the shell running, you can connect to JDBC-ODBC Bridge with a JDBC URL and use the SQL Context load() function to read a table. To connect to an ODBC data source, specify either the DSN (data source name) or specify an ODBC connection string: Set Driver and the connection properties for your ODBC driver.

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the JDBC-ODBC Bridge JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

    java -jar cdata.jdbc.jdbcodbc.jar

    Fill in the connection properties and copy the connection string to the clipboard.

    Configure the connection to JDBC-ODBC Bridge, using the connection string generated above.

    scala> val jdbcodbc_df = spark.sqlContext.read.format("jdbc").option("url", "jdbc:jdbcodbc:Driver={ODBC_Driver_Name};Driver_Property1=Driver_Value1;Driver_Property2=Driver_Value2;...").option("dbtable","Account").option("driver","cdata.jdbc.jdbcodbc.JDBCODBCDriver").load()
  3. Once you connect and the data is loaded you will see the table schema displayed.
  4. Register the JDBC-ODBC Bridge data as a temporary table:

    scala> jdbcodbc_df.registerTable("account")
  5. Perform custom SQL queries against the Data using commands like the one below:

    scala> jdbcodbc_df.sqlContext.sql("SELECT Id, Name FROM Account WHERE Id = 1").collect.foreach(println)

    You will see the results displayed in the console, similar to the following:

Using the CData JDBC Driver for JDBC-ODBC Bridge in Apache Spark, you are able to perform fast and complex analytics on JDBC-ODBC Bridge data, combining the power and utility of Spark with your data. Download a free, 30 day trial of any of the 200+ CData JDBC Drivers and get started today.

Ready to get started?

Download a free trial of the JDBC-ODBC Bridge to get started:

 Download Now

Learn more:

ODBC Connectivity from Java Icon JDBC-ODBC Bridge Driver

The JDBC-ODBC Bridge provides JDBC access from any Java App to ODBC data sources on Windows, Linux and Mac. Whether your organization uses Java-based tools for reporting and analytics, or builds custom Java solutions, the CData JDBC-ODBC Bridge provides an easy way to connect with any ODBC data source.