ETL

The term Extract, Transform, Load (ETL) refers to a process where data is Extracted from one of more possibly disparate data sources,  Transformed as required to the desired final format, and Loaded into the destination

 

Data Quality

We usually find “bad data” in the sources,  so Data Quality and profiling is usually required as part of the preparation phase before we design the ETL steps.  Once we know what we are dealing with we accommodate it in the Load and Transform steps

 

Extract

Sources can include databases, spreadsheets, XML, JSON, flat files – pretty much any data source without limit.  We once developed a system which processed incoming emails with an Excel attachment,  which were loaded into Oracle – all logged and event based emails sent as required.

 

Transform

Where there are diverse data sources this is unlikely to be “one size fits all” – we are likely to have to develop different Transform processes for different sources.  The target is to cleanse,  validate and format the data so it is ready to be loaded into the target data store/s.  This stage my include the application of business rules and chcks on data integrity

 

Load

The resulting data is usually loaded into a single target,  such as a data warehouse or new repository but there is no reason why there shouldn’t be multiple load targets

 

Integration vs Migration

These have a lot in common,  and both may use similar concepts and ETL tools.  The business case and objectives are different however:

 

Data Integration

This is an ongoing process – as new/modified data appears in the source it is propagated to the target,  either in near real time (seconds or low minutes) or periodically (eg daily or monthly).  The target is often a data warehouse or business reporting system. Unlike migration,  the emphasis is on synchronisation. Often used for the business to get “a single version of the truth” where underlying data is spread over different systems. Commercial  ETL or our own data integration tools may be used.

 

Data Migration

This is usually run once,  but there are use cases where it may be repeated periodically.  Unlike Integration,  the emphasis is on a “one-off” transfer. This is commonly used to move data from a legacy system or systems onto a new platform or from desktop databases to an enterprise database;  the source systems are usually decommissioned once the process is complete. Commercial  ETL or our own data migration tools may be used.

 

Where EIT can help

We can provide specialist  consultancy skills either on or off premise to supplement your in-house team,  covering all aspects of your project from analysis and design through to technical implementation using our experienced developers.

Our recommended ETL toolset is Talend (we are a Talend partner) but also use out own ETL tools where appropriate.

More on Talend