Replicate Data from Multiple Local Files Using CData Sync



CData Sync is able to point to either a directory or a folder, pull all files that are similarly formatted, and replicate them into a single destination table. After the initial replication, CData Sync checks the last modified date of each file in the directory and pulls in only the new files, preventing the replication of duplicate data.

Organizations regularly produce related files for periodic data (like receipts, invoices, etc.). For this article, we presume that there is a local folder on your machine, which contains CSV files, so that we may replicate related data from those files into a Snowflake data warehouse.

To begin, log into your CData Sync instance, so that we may set up a CSV file as a source connector from CData Sync.

Configuring a CSV Source to Read Data Across Multiple Files

To replicate data from CSV files, first set up CSV Source connection in CData Sync.

  1. Log into CData Sync and navigate to the Connections tab. Click + Add Connection, then select Sources, and search for and select CSV.
  2. Enter the connection properties, including Connection Type (Local), Auth Scheme, and URI. The URI should be set to the local directory on your machine where the CSV files are stored. Set Aggregate Files to True. Click Save and Test.

Configuring Snowflake as a Destination

After testing and saving our CSV connection, we need to connect to the destination, in this case, Snowflake.

  1. Click the Connections tab, select Destinations, and choose Snowflake.
  2. Enter your Snowflake connection properties, including Warehouse, URL, Auth Scheme, User, Password, Database, and Schema and click Save and Test.

Replicating Multiple Files Into Snowflake

Now, we are ready to create a job to replicate multiple files from a local folder to our Snowflake data warehouse.

  1. Click Jobs in the Sync interface and then select + Add Job.
  2. Give your job a name and select the CSV source and Snowflake destination connections that we just created. Click Add Job.
  3. After the job has been created, select the job and click on the Task tab. Click + Add Tasks and select AggregatedFiles. Click Add Tasks (1).
  4. You may replicate the data to an existing table or create a new one (to change the destination table, click the edit button for the Destination Information on the Task's Overview tab).
  5. Navigate to the Task's Column table to ensure the column mapping is correct.
  6. On the Job screen, click Run.
  7. Next, check your Snowflake instance to verify that indeed the data from our CSV files was replicated to there.

Free Trial & More Information

Now that you have seen how to replicate data from your local machine to Snowflake using CData Sync, visit our CData Sync page to read more information about Sync and download a free trial. Start consolidating your enterprise data to a data warehouse today! As always, our world-class Support Team is ready to answer any questions you may have.