Establish a Live Connection with Spark Data using Tableau Bridge



The CData ODBC Driver for Spark enables you to integrate live Spark data into Tableau Cloud dashboards using the Tableau Bridge.

The Tableau Bridge enables you to publish dashboards to Tableau Cloud while maintaining live connectivity with any data source. In this article, you will use the Tableau Bridge to maintain data freshness in a published workbook by listening for changes in the underlying Spark data.

The CData ODBC drivers offer unmatched performance for interacting with live Spark data in Tableau Cloud due to optimized data processing built into the driver. When you issue complex SQL queries from Tableau Cloud to Spark, the driver pushes supported SQL operations, like filters and aggregations, directly to Spark and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can visualize and analyze Spark data using native Tableau data types.

Connect to Spark as an ODBC Data Source

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.

Set the Server, Database, User, and Password connection properties to connect to SparkSQL.

When you configure the DSN, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

Add Spark Data to a Dashboard

  1. From a new workbook, click Data -> New Data Source -> Other Databases (ODBC).
    Select the CData Data Source Name (for example: CData SparkSQL Source).
  2. In the Database menu, select CData.
  3. In the Table box, enter a table name or click New Custom SQL to enter an SQL query. This article retrieves the Customers table.
  4. Drag the table onto the join area. At this point, you can include multiple tables, leveraging the built-in SQL engine to process complex data requests.
  5. Click the tab for your worksheet. Columns are listed as Dimensions and Measures, which you can drag and drop onto the dashboard to create visualizations.

Set Up Tableau Bridge as a Service

  1. In the Server menu, select Start Tableau Bridge Client.
  2. Sign in to the Tableau Bridge using a site admin level account.
  3. If prompted, select the Tableau Cloud site where you want to publish live data. The bridge client will open and is accessible from the system tray.
  4. By default, the Tableau Bridge client is set to Application mode. Select 'Switch to service' to enable Tableau Bridge to handle live connections.
  5. Log in to your Tableau Cloud site as an administrator.
  6. From your site, click Settings, then Bridge.
  7. In the Bridge settings, under Enable Clients to Maintain Live Connections, check the box labeled 'Enable Tableau Bridge clients to maintain live connections to on-premises data.'

Publish a Dashboard Containing the Live Data Source

Having configured both the Tableau Bridge and Tableau Cloud to enable live data connections, you can now publish your workbook to Tableau Cloud. From the Server menu, select Publish Workbook.

After choosing the workbook name and project that you wish to publish to, configure the deployment so that the CData ODBC driver for Spark is embedded in your workbook as a separate, live data source.

  1. Under Data Sources, select the option to Edit the embedded data sources in the workbook.
  2. Change Publish Type to 'Published separately,' then select a desired means of authentication.
  3. Last, select 'Maintain connection to a live data source' and click the green Publish Workbook button.

The published workbook now updates alongside the underlying Spark data. From a published dashboard, simply click the Refresh button to reflect the most recent changes.

Ready to get started?

Download a free trial of the Apache Spark ODBC Driver to get started:

 Download Now

Learn more:

Apache Spark Icon Apache Spark ODBC Driver

The Spark ODBC Driver is a powerful tool that allows you to connect with Apache Spark, directly from any applications that support ODBC connectivity.

The Driver maps SQL to Spark SQL, enabling direct standard SQL-92 access to Apache Spark.