Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →How to Connect to & Open Databricks Data in Microsoft Excel
This article uses the CData ODBC driver for Databricks to import data in Excel with Microsoft Query. This article also demonstrates how to use parameters with Microsoft Query.
The CData ODBC driver for Databricks uses the standard ODBC interface to link Databricks data with applications like Microsoft Access and Excel. Follow the steps below to use Microsoft Query to import Databricks data into a spreadsheet and provide values to a parameterized query from cells in a spreadsheet.
About Databricks Data Integration
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
- Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
- Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
- Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
- Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
Getting Started
If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.
To connect to a Databricks cluster, set the properties as described below.
Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
- Server: Set to the Server Hostname of your Databricks cluster.
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
You can then work with live Databricks data in Excel.
NOTE: In recent versions of Excel, Microsoft Query is not visible by default. To enable visibility, Navigate to Options > Data and check From Microsoft Query (Legacy) under the Show legacy data import wizards section.
- In Excel, open the Data tab and choose Get Data -> Legacy Wizards -> From Microsoft Query (Legacy).
- Choose the Databricks DSN. Select the option to use Query Wizard to create/edit queries.
- In the Query Wizard, expand the node for the table you would like to import into your spreadsheet. Select the columns you want to import and click the arrow to add them to your query. Alternatively, select the table name to add all columns for that table.
- The Filter Data page allows you to specify criteria. For example, you can limit results by setting a date range.
- If you want to use parameters in your query, select the option to edit the query in Microsoft Query.
To set a parameter in the query, you will need to modify the SQL statement directly. To do this, click the SQL button in the Query Editor. If you set filter criteria earlier, you should have a WHERE clause already in the query.
To use a parameter, use a "?" character as the wildcard character for a field's value in the WHERE clause. For example, if you are importing the Customers, you can set "Country=?".
- Close the SQL dialog when you are finished editing the SQL statement. You will be prompted to enter a parameter value. In the next step, you will select a cell to provide this value. So, leave the box in the dialog blank.
-
Click File -> Return Data to Microsoft Excel. The Import Data dialog is displayed. Enter a cell where results should be imported.
- Close the Import Data dialog. You will be prompted to enter a parameter value. Click the button next to the parameter box to select a cell. Select the option to automatically refresh the spreadsheet when the value changes.