How to Build an ETL App for Excel Online Data in Python with CData



Create ETL applications and real-time data pipelines for Excel Online data in Python with petl.

The rich ecosystem of Python modules lets you get to work quickly and integrate your systems more effectively. With the CData Python Connector for Excel Online and the petl framework, you can build Excel Online-connected applications and pipelines for extracting, transforming, and loading Excel Online data. This article shows how to connect to Excel Online with the CData Python Connector and use petl and pandas to extract, transform, and load Excel Online data.

With built-in, optimized data processing, the CData Python Connector offers unmatched performance for interacting with live Excel Online data in Python. When you issue complex SQL queries from Excel Online, the driver pushes supported SQL operations, like filters and aggregations, directly to Excel Online and utilizes the embedded SQL engine to process unsupported operations client-side (often SQL functions and JOIN operations).

Connecting to Excel Online Data

Connecting to Excel Online data looks just like connecting to any relational data source. Create a connection string using the required connection properties. For this article, you will pass the connection string as a parameter to the create_engine function.

You can connect to a workbook by providing authentication to Excel Online and then setting the following properties:

  • Workbook: Set this to the name or Id of the workbook.

    If you want to view a list of information about the available workbooks, execute a query to the Workbooks view after you authenticate.

  • UseSandbox: Set this to true if you are connecting to a workbook in a sandbox account. Otherwise, leave this blank to connect to a production account.

You use the OAuth authentication standard to authenticate to Excel Online. See the Getting Started section in the help documentation for a guide. Getting Started also guides you through executing SQL to worksheets and ranges.

After installing the CData Excel Online Connector, follow the procedure below to install the other required modules and start accessing Excel Online through Python objects.

Install Required Modules

Use the pip utility to install the required modules and frameworks:

pip install petl
pip install pandas

Build an ETL App for Excel Online Data in Python

Once the required modules and frameworks are installed, we are ready to build our ETL app. Code snippets follow, but the full source code is available at the end of the article.

First, be sure to import the modules (including the CData Connector) with the following:

import petl as etl
import pandas as pd
import cdata.excelonline as mod

You can now connect with a connection string. Use the connect function for the CData Excel Online Connector to create a connection for working with Excel Online data.

cnxn = mod.connect("InitiateOAuth=GETANDREFRESH;OAuthSettingsLocation=/PATH/TO/OAuthSettings.txt")")

Create a SQL Statement to Query Excel Online

Use SQL to create a statement for querying Excel Online. In this article, we read data from the Test_xlsx_Sheet1 entity.

sql = "SELECT Id, Column1 FROM Test_xlsx_Sheet1 WHERE Column2 = 'Bob'"

Extract, Transform, and Load the Excel Online Data

With the query results stored in a DataFrame, we can use petl to extract, transform, and load the Excel Online data. In this example, we extract Excel Online data, sort the data by the Column1 column, and load the data into a CSV file.

Loading Excel Online Data into a CSV File

table1 = etl.fromdb(cnxn,sql)

table2 = etl.sort(table1,'Column1')

etl.tocsv(table2,'test_xlsx_sheet1_data.csv')

In the following example, we add new rows to the Test_xlsx_Sheet1 table.

Adding New Rows to Excel Online

table1 = [ ['Id','Column1'], ['NewId1','NewColumn11'], ['NewId2','NewColumn12'], ['NewId3','NewColumn13'] ]

etl.appenddb(table1, cnxn, 'Test_xlsx_Sheet1')

With the CData Python Connector for Excel Online, you can work with Excel Online data just like you would with any database, including direct access to data in ETL packages like petl.

Free Trial & More Information

Download a free, 30-day trial of the CData Python Connector for Excel Online to start building Python apps and scripts with connectivity to Excel Online data. Reach out to our Support Team if you have any questions.



Full Source Code


import petl as etl
import pandas as pd
import cdata.excelonline as mod

cnxn = mod.connect("InitiateOAuth=GETANDREFRESH;OAuthSettingsLocation=/PATH/TO/OAuthSettings.txt")")

sql = "SELECT Id, Column1 FROM Test_xlsx_Sheet1 WHERE Column2 = 'Bob'"

table1 = etl.fromdb(cnxn,sql)

table2 = etl.sort(table1,'Column1')

etl.tocsv(table2,'test_xlsx_sheet1_data.csv')

table3 = [ ['Id','Column1'], ['NewId1','NewColumn11'], ['NewId2','NewColumn12'], ['NewId3','NewColumn13'] ]

etl.appenddb(table3, cnxn, 'Test_xlsx_Sheet1')

Ready to get started?

Download a free trial of the Excel Online Connector to get started:

 Download Now

Learn more:

Excel Online Icon Excel Online Python Connector

Python Connector Libraries for Excel Online Data Connectivity. Integrate Excel Online with popular Python tools like Pandas, SQLAlchemy, Dash & petl.