Archive

Posts Tagged ‘ETL’

Data migration using Kettle (Pentaho)

February 19, 2009 9 comments

I have been using Kettle (a Pentaho tool) for quite sometime in order to do migration of data from different kinds of data sources to PostgreSQL and I can now say that it has been just able to do all kinds of different jobs for me.

All you need to have is the JDBC driver for your target and source database and that is it. Most of the database drivers are already there with standard Kettle installation but in case you need to use a datasource that is not there you can easily configure it with Kettle by placing the JDBC driver in the classes folder for Kettle. I did the same too when I had to convert DBF files to PostgreSQL where I downloaded the DBF JDBC driver; configured it and moved the data across just in few minutes.

Kettle also provides you with a lot of options for data cleansing and all that can be done with drag and drop control in the GUI mode which is very user friendly. If those default controls do not cater for your needs then you can always add your custom logic using JavaScripts which I did as well and was able to cover almost everything from there. You can load/save you control jobs, and it gives a pretty diagram showing every step involved in the ETL tasks which are divided into transformations and jobs.

I myself have tested it with migrating data from Oracle, SQL Server, CSV, XML and DBF to PostgreSQL without much problem so I will highly recommend people looking for an open source data migrator tool that works with PostgreSQL.

Shoaib Mir
shoaibmir[@]gmail.com