CData AWS Glue Connector Trial

You can test and build out Proofs-of-Concept for the CData AWS Glue Connectors using the CData JDBC Drivers.

Date Entered: 3/19/2021 Last Updated: 6/9/2021

The CData AWS Glue Connectors are available on a subscription basis only from the AWS Marketplace. If you wish to test out the connectivity without subscribing to the Connectors from the marketplace, you can download a free trial of the the CData JDBC Driver for the same data source. Upload the driver to an Amazon S3 bucket, store your connection properties in AWS Secrets Manager, create a custom connector in AWS Glue Studio, and then build Glue jobs with live connectivity to your chosen data source. Once you have a proof of concept, you can subscribe to the Glue Connector from the AWS Marketplace and simply swap to the new connector in your configured Glue Jobs.

Gathering Necessary Information

There are several key pieces of information you will need in order to create the custom connector and begin working with your data in Glue Studio:

The JDBC URL (connection string) for your chosen data source
An RTK (run-time key) for the JDBC Driver
The Class name for the JDBC Driver
An AWS IAM role for running Glue Jobs

NOTE: This article uses Salesforce as an example data source. You can refer to the Help Documentation for your specific JDBC Driver to find more information about the required connection properties and Class name for the JDBC Driver.

Building the JDBC URL

For assistance in constructing the JDBC URL, use the connection string designer built into the JDBC Driver. Either double-click the JAR file or execute the JAR file from the command-line.

java -jar cdata.jdbc.salesforce.jar

Fill in the connection properties and make note of the connection string.

Using the built-in connection string designer to generate a JDBC URL (Salesforce is shown.)

As an example, a typical connection string for Salesforce looks like the following: jdbc:salesforce;User=myuser@domain.com;Password=mypassword;SecurityToken=mysecuritytoken;

Obtaining the RTK

To complete the JDBC URL, you will need the value of the run-time key (RTK) for your driver. You can obtain an RTK from our Support Team. Once you receive the RTK, you will need to append it as follows to the JDBC URL (again, using Salesforce as an example):

jdbc:salesforce;User=myuser@domain.com;Password=mypassword;SecurityToken=mysecuritytoken;RTK=54321...12345

JDBC Driver Class Name

The JDBC Driver Class name can be found in the Help documentation for your JDBC Driver in: Getting Started -> Establishing a Connection. As an example, the Driver Class name for the Salesforce JDBC Driver is cdata.jdbc.salesforce.SalesforceDriver.

After gathering the necessary information, you are ready to build an AWS Glue Job with connectivity to your data source.

Create (or Update) an IAM Role

When you create the AWS Glue job, you specify an AWS Identity and Access Management (IAM) role for the job to use. The role must grant access to all resources used by the job, including Amazon S3, for any sources, targets, scripts, temporary directories, and AWS Glue Data Catalog objects. The role must also grant access to the Custom connector from AWS Glue.

The following policies should be added to the IAM role for the AWS Glue job, at a minimum:

AWSGlueServiceRole (For accessing Glue Studio and Glue Jobs)
AmazonEC2ContainerRegistryReadOnly (For accessing the Custom connector created below)

If you will be accessing data found in Amazon S3, add:

AmazonS3FullAccess (For reading from and writing to Amazon S3)

Last, since you will be using AWS Secrets Manager to store confidential connection properties (see more below), you will need to add an inline policy similar to the following, granting access to the specific secrets needed for the Glue Job:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes128-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes192-4D5e6F",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes256-7g8H9i"
            ]
        }
    ]
}

Using the CData JDBC Driver in AWS Glue Studio

To use a CData JDBC Driver in AWS Glue Studio, you need to upload the driver to Amazon S3, create a custom connector & connection, and create a Glue Job.

Upload the CData JDBC Driver

Open the Amazon S3 Console
Select an existing bucket (or create a new one)
Click Upload
Select the JAR file (e.g. cdata.jdbc.salesforce.jar) found in the lib directory in the installation location for the driver

Uploading a JDBC Driver to an Amazon S3 bucket (Salesforce is shown)

Store Connection Properties in the Secrets Manager

Open the AWS Secrets Manager Console
Click "Store a new secret"
Select Other type of secrets and add a row for each connection property and value from your JDBC URL, including the RTK:
Click Next
Name the Secret (e.g. CDataSalesforceSecret) and click Next
Disable credential rotation and click Next
Review and Save the Secret

Create a Custom Connector

Open the AWS Glue Studio Console
Navigate to Connectors and click "Create custom connector"
Fill in the Custom connector properties:
- Connector S3 URL: Browse S3 and select the .jar file you uploaded earlier (be sure to select the actual file and not just the bucket containing the .jar file)
- Name: A unique name (e.g. CData Salesforce Connector)
- Connector type: JDBC (default)
- Class name: The class name for the driver (e.g. cdata.jdbc.salesforce.SalesforceDriver)
- JDBC URL base:
  The JDBC URL needs to be specifically formatted for AWS Glue Studio. The URL will include key-value-pairs of the connection property name (e.g. "User") and a placeholder for the value, escaped with a dollar sign ($) and curly braces ({...}) (e.g. "${User}). Use the same value inside the placeholder that you did for each of the properties in your AWS Secret (see above). The JDBC URL base for the AWS Secret above would be as follows:
  jdbc:salesforce:User=${User};Password=$Password};SecurityToken=${SecurityToken};RTK=${RTK}
- URL parameter delimiter: A semi-colon (;)
Click "Create connector"

Create a Connection (From the Custom Connector)

After creating the Connector:

Click on the Custom Connector from the AWS Glue Studio Connectors console
Click "Create connection"
Fill in the Connection properties:
- Name: A unique name (e.g. cdata-jdbc-salesforce)
- Connection credential type: Select "default" (you should see your JDBC URL base from above in Connection URL preview)
- AWS Secret: Select the AWS Secret created earlier
NOTE: You will see fields for your connection properties. Leave these blank. The values will be pulled from the AWS Secret.
Click "Create connection"

Create the Glue Job

After creating the Connection:

In Glue Studio, under "Your connections," select the connection you created

Click "Create job"

The visual job editor appears. A new Source node, derived from the connection, is displayed on the Job graph. In the node details panel on the right, the Source Properties tab is selected for user input.

Configure the Source Node Properties:

You can configure the access options for your connection to the data source in the Source properties tab. Refer to the AWS Glue Studio documentation for more information. Here we provide a simple walk-through.

In the visual job editor, make sure the Source node for your connector is selected. Choose the Source properties tab in the node details panel on the right, if it is not already selected.
The Connection field is populated automatically with the name of the connection associated with the marketplace connector.
Enter information about the data location in the data source. Provide either a source table name or a query to use to retrieve data from the data source. An example of a query is SELECT * FROM Account.
To pass information from the data source to the transformation nodes, AWS Glue Studio must know the schema of the data. Select "Use Schema Builder" to specify the schema interactively.
Configure the remaining optional fields as needed. You can configure the following:
- Partitioning information - for parallelizing the read operations from the data source
- Data type mappings - to convert data types used in the source data to the data types supported by AWS Glue
- Filter predicate - to select a subset of the data from the data source
You can view the schema generated by this node by choosing the Output schema tab in the node properties panel.

Edit, Save, and Run the Job

Edit the job by adding and editing the nodes in the job graph. See Editing ETL jobs in AWS Glue Studio for more information.

After you complete editing the job, enter the job properties.

Select the Job properties tab above the visual graph editor.
Configure the following job properties when using custom connectors:
- Name: Provide a job name
- IAM Role: Choose (or create) an IAM role with the necessary permissions, as described previously
- Type: Choose "Spark"
- Glue version: Choose "Glue 2.0 - Supports spark 2.4, Scala 2, Python 3"
- Language: Choose "Python 3"
- Use the default values for the other parameters. For more information about job parameters, see "Defining Job Properties" in the AWS Glue Developer Guide
At the top of the page, choose "Save."
A green top banner appears with the message: "Successfully created Job."
After you successfully save the job, you can choose "Run" to run the job.
To view the generated script for the job, choose the "Script" tab at the top of the visual editor. The "Job runs" tab shows the job run history for the job. For more information about job run details, see "View information for recent job runs."

At this point, you have live connectivity to your data source form your Glue Jobs. You can configure Jobs to perform data flows to and from the AWS ecosystem AND any of the data sources available from CData. You will be able to build out your proofs-of-concept and otherwise test the connector as needed.

Changing to Use the Glue Connector

Once you have tested the connectivity and its use in AWS Glue Studio, you can subscribe to the AWS Glue Connector. We have a full deployment guide for the CData AWS Glue Connector for Salesforce, but the principles apply to any of the CData AWS Glue Connectors. Once you have deployed the CData AWS Glue Connector and configured a connection, you can open your Glue Job, select your source node(s) that use the Custom Connector, and change them to use the CData AWS Glue Connector and the corresponding connection.

Changing from a Custom Connector to an AWS Glue Connector.

We appreciate your feedback. If you have any questions, comments, or suggestions about this entry, please contact our support team at support@cdata.com.

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Connect With Us

Get Started

Data Connectors

ETL/ ELT Solutions

Cloud & API Connectivity

OEM & Custom Drivers

Connect With Us

Get Started

Data Visualization

Company

Resources