Friday, February 20, 2009

Exporting a Kettle Repository to Files

Hi All!

Today I'd like to announce KREX, a small solution I put together to export a Kettle (a.k.a. Pentaho Data Integration) Repository to individual transformation (.ktr) and and job (.kjb) files.

The idea to create this was inspired by this thread on the pentaho forums, started by kandrews. He (she?) wrote:
Has anyone ever been able to export a PDI repository and convert it somehow into regular non-repository .kjb & .ktr files? If you have done this already or this functionality already exists please let me know.

My initial thoughts are possibly an XLS translation against the XML from the repository export. Thoughts?
Well, I hope this helps! Enjoy en let me know if its useful. Be advised that in the same thread, Matt Casters already revealed that the functionality to do this will soon be built into PDI, but until then this may be of use.

To start using KREX,

  • checkout the repository or download the Job and Transformation files to your file system.

  • Open the main Job file export_repository_to_files.kjb using Pentaho Data Integration 3.2's spoon (Currently a Milestone 1 release)

  • Configure the Set Source Repository Step in the set_source_repo_and_target_directory transformation to match the repository you want to export

  • Run the main job file (export_repository_to_files.kjb)

If all goes well, you should now have a directory called pdi_repo_export in your home directory which contains a subdirectory named after your exported repository containing the directory tree with the .ktr and .kjb files.

Here's a quick screenshot of the main job, just to give you an idea:
krexThe heart of the job is formed by the very last transformation, which does the actual legwork of extracting and saving the individual transformations:
krex2
The steps before that are mainly configuration and ensuring that the directory tree that is to contain the files is created before we attempt to write any files.

If you have any suggestions or comments, I welcome you to post them here. If you are trying to use KREX but run into an issue, please use the KREX issuelist.

If you are looking for more tips and trick with kettle and Pentaho in general, stay tuned. The "Building Pentaho Solutions" book I'm writing for Wiley together with Jos van Dongen will contain tons and tons of practical tips and solutions, and explain many of its technologies and concepts in thorough detail.

Cheers and until next time,

Roland

11 comments:

Anonymous said...

worked like a charm. Thank you

Ben said...

This is exactly what I am looking for. Does it work with Kettle 4.x as well?

rpbouman said...

Hi Benjamin,

I think so. Have you tried?

Anonymous said...

When I download the files and import them into Spoon 4.4.3, all of them are blank, no entries. IS that an issue with version 4, or am I downloading the wrong files?

rpbouman said...

@Anonymous, have you tried out checking out from svn and opening the files from your local drive?

Anonymous said...

My fault - I got the files properly this time and it worked wonderfully. Thank so much! What a time saver...

rpbouman said...

No problem. Glad it works for you now! :)

Unknown said...

Wow - this worked amazing ! I am working in Community 5.1 and had no issues. I just used the transformation export_repo_xml_to_files.ktr hardcoded some of the variables and it worked like a charm. Hats off!

Anonymous said...

Very useful, thank you!

Still waiting for this functionality in Spoon...

Danny Teok said...

Very useful, indeed! Thank you!

By the way, the third step does not work:
"Configure the Set Source Repository Step in the set_source_repo_and_target_directory transformation to match the repository you want to export".

The "Set Source Repository" step is not there.

I've exported the enterprise repository by hand and did some changes to the rest (hard coded locations) to make it work. Still works like a charm!

Anonymous said...

Very, very useful!
I am working in Community 6.1 and had no issues.
To set source repository, set values of master job parameters.
Thx a lot!
Tommaso

DuckDB bag of tricks: Processing PGN chess games with DuckDB - Rolling up each game's lines into a single game row (6/6)

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is the sixth installment of a s...