Home > PostgreSQL > Data migration using Kettle (Pentaho)

Data migration using Kettle (Pentaho)

I have been using Kettle (a Pentaho tool) for quite sometime in order to do migration of data from different kinds of data sources to PostgreSQL and I can now say that it has been just able to do all kinds of different jobs for me.

All you need to have is the JDBC driver for your target and source database and that is it. Most of the database drivers are already there with standard Kettle installation but in case you need to use a datasource that is not there you can easily configure it with Kettle by placing the JDBC driver in the classes folder for Kettle. I did the same too when I had to convert DBF files to PostgreSQL where I downloaded the DBF JDBC driver; configured it and moved the data across just in few minutes.

Kettle also provides you with a lot of options for data cleansing and all that can be done with drag and drop control in the GUI mode which is very user friendly. If those default controls do not cater for your needs then you can always add your custom logic using JavaScripts which I did as well and was able to cover almost everything from there. You can load/save you control jobs, and it gives a pretty diagram showing every step involved in the ETL tasks which are divided into transformations and jobs.

I myself have tested it with migrating data from Oracle, SQL Server, CSV, XML and DBF to PostgreSQL without much problem so I will highly recommend people looking for an open source data migrator tool that works with PostgreSQL.

Shoaib Mir
shoaibmir[@]gmail.com

  1. February 19, 2009 at 11:57 am

    Good stuff with DBF to Pg.
    Thanks for sharing!

  2. February 19, 2009 at 12:06 pm

    But seriously with DBF I guess the best will be convert DBF files to CSV (I didnt find good JDBC driver support for DBF) and then move that into PG where both these steps can be done within a single Kettle transformation.

  3. aryps
    March 12, 2009 at 5:09 am

    i agree.
    good stuff
    did you try also with mdf files?
    could it work with that also?

    best,

  4. March 12, 2009 at 9:52 am

    MDF… hmm I guess they are SQL Server files, so the best bet is to load them into a SQL Server database and then use Kettle to move the data easily from SQL Server to any other database of your choice.

  5. Alberto
    March 17, 2009 at 7:32 am

    Have you ever used Kettle + PG in a data warehouse project? If so, do you recommend both for this type of apps? thanks!

  6. March 17, 2009 at 10:29 am

    Yes I did. Its just that in that case you need to do some performance tweaks on the PostgreSQL side. For the data loading part Kettle gives you even support for slave servers so that means you can do your data loads in a distributed environment. Other then that the BI side of Pentaho was really helpful with a very nice MDX support for queries.

  7. veeolina
    May 8, 2009 at 2:41 pm

    hi Shoaib Mir,

    i am a newbie on PostgreSQL and Kettle and i want to migrate from Oracle to PostgreSQL.
    May you tell me step by step to migrate it?

    Thanks in advance

  8. May 8, 2009 at 4:04 pm

    Its very simple…

    Create new JDBC connections within Kettle both for PostgreSQL and Oracle and then just use follow the instructions given at –> http://wiki.pentaho.com/display/COM/Use+Kettle+to+Migrate+Sample+Data+to+MySQL where change your source database to Oracle and target to PostgreSQL

  9. veeolina
    May 11, 2009 at 8:01 pm

    thanks, i’ll try it

  1. No trackbacks yet.

Leave a comment