The term Extract, Transform, Load (ETL) refers to a process where data is Extracted from one of more possibly disparate data sources, Transformed as required to the desired final format, and Loaded into the destination
We usually find “bad data” in the sources, so Data Quality and profiling is usually required as part of the preparation phase before we design the ETL steps. Once we know what we are dealing with we accommodate it in the Load and Transform steps
Sources can include databases, spreadsheets, XML, JSON, flat files – pretty much any data source without limit. We once developed a system which processed incoming emails with an Excel attachment, which were loaded into Oracle – all logged and event based emails sent as required.
Where there are diverse data sources this is unlikely to be “one size fits all” – we are likely to have to develop different Transform processes for different sources. The target is to cleanse, validate and format the data so it is ready to be loaded into the target data store/s. This stage my include the application of business rules and chcks on data integrity
The resulting data is usually loaded into a single target, such as a data warehouse or new repository but there is no reason why there shouldn’t be multiple load targets
Integration vs Migration
These have a lot in common, and both may use similar concepts and ETL tools. The business case and objectives are different however:
This is an ongoing process – as new/modified data appears in the source it is propagated to the target, either in near real time (seconds or low minutes) or periodically (eg daily or monthly). The target is often a data warehouse or business reporting system. Unlike migration, the emphasis is on synchronisation. Often used for the business to get “a single version of the truth” where underlying data is spread over different systems. Commercial ETL or our own data integration tools may be used.
This is usually run once, but there are use cases where it may be repeated periodically. Unlike Integration, the emphasis is on a “one-off” transfer. This is commonly used to move data from a legacy system or systems onto a new platform or from desktop databases to an enterprise database; the source systems are usually decommissioned once the process is complete. Commercial ETL or our own data migration tools may be used.
Where EIT can help
We can provide specialist consultancy skills either on or off premise to supplement your in-house team, covering all aspects of your project from analysis and design through to technical implementation using our experienced developers.
Our recommended ETL toolset is Talend (we are a Talend partner) but also use out own ETL tools where appropriate.