How to create an RPA flow for Databricks Data in UiPath Studio



Use the Databricks ODBC Driver to create workflows that access real-time Databricks data without any coding.

UiPath is a Robotic Process Automation (RPA) platform with rich features and an easy-to-use UI that enables non-developers to create process automation. By using UiPath Studio, you can build an RPA program just like drawing a diagram. With the CData ODBC Driver for Databricks, users can embed Databricks data in the workflow.

This article walks through using the Databricks ODBC Driver in UiPath Studio to create an RPA program that accesses Databricks data.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Configure the Connection to Databricks

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

  • Server: Set to the Server Hostname of your Databricks cluster.
  • HTTPPath: Set to the HTTP Path of your Databricks cluster.
  • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

Connect UiPath Studio to Databricks Data

Now you are ready to use Databricks data ODBC DSN in UiPath Studio with the following steps.

  1. From the Start page, click Blank to create a New Project.
  2. Click Manage Packages then search for and install UiPath.Database.Activities.
  3. Navigate to the Activities and drop a Flowchart (Workflow -> Flowchart -> Flowchart) onto the process.
  4. Drop a database Connect activity (App Integration -> Datbase -> Connect) after the Start activity.
  5. Double-click the Connect activity and configure the Connection.
    1. Click the Connection Wizard
    2. Select "Microsoft ODBC Data Source"
    3. In Connection Properties, select your DSN (CData Databricks Source) and click OK
  6. To store Connection info, create a variable and bind to Output in the Properties section. Choose DatabaseConnection in Output.

Create an Execute Query Activity

With the connection configured, we are ready to query Databricks data in our RPA.

  1. From the Activities navigation, select Execute Query and drop it on the Flowchart.
  2. Double-click the Execute Query activity and set the properties as follows:
    • ExistingDbConnection: Your Connection variable
    • Sql: SELECT statement like SELECT City, CompanyName FROM Customers WHERE Country = 'US'
    • DataTable: Create and use a variable with the Type System.Data.DataTable

Create Write CSV Activity

With the Connection and Execute Query activities configured, we are ready to add a Write CSV activity to the Flowchart to replicate the Databricks data.

  1. From the Activities navigation, select Write CSV and drop it after the Execute Query activity.
  2. Double-click the Write CSV activity and set the properties as follows:
    • FilePath: Set to a file (new or existing) on disk (i.e.: C:\UiPath[id]-data.csv
    • DataTable: Set to the DataTable variable you created earlier

Connect the Activities and Run the Flowchart

If they are not already connected, connect each Activity that you created to complete the RPA project for extracting Databricks data and exporting it to CSV.

Click Run to extract Databricks data and create a CSV file.

In this article, we used the CData ODBC Driver for Databricks to create an automation flow that accesses Databricks data in UiPath Studio. Download a free, 30-day trial of the ODBC Driver and start working with live Databricks data in UiPath Studio today!

Ready to get started?

Download a free trial of the Databricks ODBC Driver to get started:

 Download Now

Learn more:

Databricks Icon Databricks ODBC Driver

The Databricks ODBC Driver is a powerful tool that allows you to connect with live data from Databricks, directly from any applications that support ODBC connectivity.

Access Databricks data like you would a database - read, write, and update through a standard ODBC Driver interface.