How to update Databricks from Excel



This article explains how to transfer data from Excel to Databricks using the Excel Add-In for Databricks.

The CData Excel Add-In for Databricks enables you to edit and save Databricks data directly from Excel. This article explains how to transfer data from Excel to Databricks. This technique is useful if you want to work on Databricks data in Excel and update changes, or if you have a whole spreadsheet you want to import into Databricks. In this example, you will use the Customers table; however, the same process will work for any table that can be retrieved by the CData Excel Add-In.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Establish a Connection

If you have not already done so, create a new Databricks connection by clicking From Databricks on the ribbon.

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

  • Server: Set to the Server Hostname of your Databricks cluster.
  • HTTPPath: Set to the HTTP Path of your Databricks cluster.
  • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

Retrieve Data from Databricks

To insert data into Databricks, you will first need to retrieve data from the Databricks table you want to add to. This links the Excel spreadsheet to the Databricks table selected: After you retrieve data, any changes you make to the data are highlighted in red.

  1. Click the From Databricks button on the CData ribbon. The Data Selection wizard is displayed.
  2. In the Table or View menu, select the Customers table.
  3. In the Maximum Rows menu, select the number of rows you want to retrieve. If you want to insert rows, you need to retrieve only one row. The Query box will then display the SQL query that corresponds to your request.
  4. In the Sheet Name box, enter the name for the sheet that will be populated. By default the add-in will create a new sheet with the name of the table.

Insert Rows to Databricks

After retrieving data, you can add data from an existing spreadsheet in Excel.

  1. In a cell after the last row, enter a formula referencing the corresponding cell from the other spreadsheet; for example, =MyCustomersSheetInExcel!A1.
  2. After using a formula to reference the cells you want to add to Databricks, select the cells that you are inserting data into and drag the formula down as far as needed. The referenced values you want to add will be displayed on the Customers sheet.
  3. Highlight the rows you want to insert and click the Update Rows button.

As each row is inserted, the Id value will appear in the Id column and the row's text will change to black, indicating that the record has been inserted.

Ready to get started?

Download a free trial of the Excel Add-In for Databricks to get started:

 Download Now

Learn more:

Databricks Icon Excel Add-In for Databricks

The Databricks Excel Add-In is a powerful tool that allows you to connect with live Databricks data, directly from Microsoft Excel.

Use Excel to read, write, and update Databricks. Perfect for mass imports / exports / updates, data cleansing & de-duplication, Excel based data analysis, and more!