Data Engineering Resources



ETL Data Transformation

Transformation is the process of cleansing and aggregating data to prepare it for analysis. From an architectural standpoint, there are two ways to categorize ETL transformation:

  • Multistage data transformation - This is the classic extract, transform, load process. Extracted data is moved to a staging area where transformations happen 'in-flight', or before loading data into a warehouse.
  • In-warehouse data transformation - The process changes to resemble ELT. Data is extracted and loaded into the warehouse, and transformations are pushed down for processing in the underlying data warehouse.

Today, transforming data in-warehouse rather than prior to loading has become common for two reasons:

  • The scalability and increased performance of the modern analytics database
  • The potential to write in-database transformations in SQL, the data analysis language of choice for most analysts

These advantages offer tremendous benefits to project cost and speed, making it the default method for most data integration projects.

ETL Transformation Types

No matter where in the process transformation takes place, it's a significant step in the analytics workflow. Transformations prepare the data for analysis. Some of the most common types include.

Basic ETL Data Transformations

  • Deduplication - Finding and removing duplicates
  • Cleaning - Mapping values like Chief Executive Officer to CEO, zero to 0, N to No and Y to Yes, etc.
  • Key restructuring - Establishing key relationships across tables
  • Format revision - Character set conversion, unit of measurement conversion, date/time conversion, etc.

Advanced ETL Data Transformations

  • Summarization: Values are summarized to obtain total figures which are calculated and stored at multiple levels as business metrics, e.g. summing all expenditures to attain a total cost metric.
  • Derivation: Applying business rules to your data that extract new calculated values from existing data, for example, creating a profitability metric that accounts for costs.
  • Filtering: Selecting only certain rows and/or columns.
  • Integration : Give each data element one standard name with one standard definition. Data integration integrates data from across sources that use different data names and values for the same data element.
  • Aggregation : Data elements are collected from multiple data sources and databases.
  • Joining: Connecting data from multiple sources - for example, adding ad spend data across multiple platforms, such as Google Adwords and Facebook Ads.
  • Splitting: Splitting a single column into multiple columns.
  • Data validation: Simple or complex data validation.

CData Sync: Powerful ETL & ELT Data Transformation

CData Sync is a secure, intuitive solution for managing enterprise data replication, ETL and ELT. Create petabyte-scale data replication workflows in minutes and connect more than 100 enterprise data sources to every major data warehouse. Manage ETL and ELT on-demand. Download a free trial to get started today.





Ready to get started?

Automate data replication from any data source to any database or data warehouse with a few clicks.

Download for a free trial:


FREE TRIAL