Roland Bouman's blog: Kettle Solutions

Thursday, December 09, 2010

Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters

I get this question on a regular basis, so I figured I might as well blog it, in the hope it will be useful for others. Here goes:

Let's say that I want to delete all records that match an id from a set of tables. The table names come in as rows into the Execute SQL Script step (check execute for every row). Next I write:
DELETE FROM {table_name} WHERE id = {identifier}
as the SQL to execute. In the parameters grid at the bottom right, I have two fields: table_name and identifier. What is the syntax for substituting the table_name and identifier parameters in the sql script?

(Although this particular question focuses on the "Execute SQL Script" step, it also applies to the "Table Input" step, and probably a few more steps I can't recall right now.)

The parameters grid can be used for prepared statement value placeholders. In the SQL statement these placeholders are denoted as questionmarks (?). These are positional parameters: they get their value from those fields in the incoming stream that are entered in the parameters grid, in order.
Here's an example of the correct usage of these placeholders:

DELETE FROM myTable WHERE id = ?

Here, the ? in the WHERE clause will be bound to the value of the first field from the incoming stream entered in the parameters grid. Because there is only one such placeholder, there can be only one field in the parameters grid.

An important thing to realize is that these parameters can only be used to parameterize value expressions. So, this kind of parameter does not work for identifiers, nor do they work for structural elements of the SQL statement, such as keywords. So this kind of parameter cannot be used to parameterize the table name which seems to be the intention in the original example posed in the question.

There is a way to parameterize the structural elements of the SQL statement as well as the parameters. You can apply variable substitution to the SQL statetment.

Kettle Variables can be defined by a Set Variables step, or by specifying parameters at the transformation level. They get their value from "the environment": for example, parameters get their value initially when the transformation is started, and regular variables are typically set somewhere in the job that is calling your transformation.

In text fields, including the SQL textarea of the Table input step or the Execute SQL Script step, you denote those variables with this syntax: ${VARIABLE_NAME}. So to parameterize the table name we could use something like this:

DELETE FROM ${TABLE_NAME}

In order to force kettle to apply variable substitution to the SQL statement, you have to check the "variable substitution" checkbox. If this checkbox is checked, then all variables are simply substituted with their (string)value during transformation initialization. This is a lot like the way macro's are substituted by the pre-processor in C/C++ code.

When comparing variables with parameters, two important things should be mentioned here:

Unlike value placeholders, variables can be used to manipulate any aspect of the SQL statement, not just value expressions. The variable value will simply become the text that makes up the SQL statement, it is your responsibility it results in a syntactically valid and correct SQL statement.

Variables are evaluated once during transformation initalization. So if you want to vary the variable value, you'll have to call the transformation again for the change to take effect. For the same reasons, you cannot set the value of a variable and read it within the same transformation: setting the variable value occurs at runtime, but evaluating it occurs at initialization time.

Finally, here's a screenshot that summarizes these different ways to parameterize SQL statements in kettle:

If you want to read more about this topic, it's covered in both our books Pentaho Solutions and Pentaho Kettle Solutions. Another title you might be interested in is Maria Roldan's Pentaho 3.2 Data Integration: Beginner's Guide.

Wednesday, August 11, 2010

Back to blogging....

It has been a while since I posted on my blog - in fact, I believe this is the first time ever that more than one month passed between posts since I started blogging. There are a couple of reasons for the lag:

Matt Casters, Jos van Dongen and me have spent a lot of time finalizing our forthcoming book, Pentaho Kettle Solutions (Wiley, ISBN: 978-0-470-63517-9). The book is currently being produced, and should be available according to schedule in early September 2010. If you're interested, you might like to read one of my earlier posts that explains the organization and outline of the book.

(I should point out that we have reorganized the outline as the project progressed, so the final result will not have all the chapters mentioned in that post. We do however cover most of the topics mentioned.)

I have been checking out Quipu, a promising Open Source data warehouse management solution. Quipu provides a repository-based extensible code-generator that allows you to generate and maintain a data warehouse based on the Data Vault model. One of the things I want to do in the short term is write templates that allows it to work for MySQL, and after that, I want to see if I can get Quipu to generate Kettle Jobs and transformations.

I have been working on a couple of software projects. Two of them are currently available as open source on google code:

mql-to-sql
This allows you to use the Metaweb Query Language (MQL) to query a RDBMS. If you're wondering what this is all about: MQL is the query language used by Freebase (which is a collaborative "database-of-everything" or "wikipedia-gone-database").

While Freebase is interesting in its own right, I am particularly enthused about the MQL query language. I feel that MQL is an exceptionally good solution for flexible, expressive and secure data access for modern (AJAX) Web applications. Even though MQL was not developed with relational database systems in mind, I think it is a pretty good fit.

Anyway, this is very much a work in progress, and I appreciate your feedback on it. If you're interested, you can read a bit more about my take on RDBMS data access for web applications and MQL on the mql-to-sql project home page. I have also put up an online demo that allows you to query the sakila MySQL sample database using MQL.

kettle-cookbook
This is a project that provides auto-documentation for Kettle (a.k.a. Pentaho Data Integration).

The project consists of a bunch of Kettle Jobs and transformations (as well as some XSLT stylesheets) that extract data from Kettle Jobs and transformation and transform it into a collection of human-readable HTML documents along with a table of contents. The resulting documentation looks and feels a bit like JavaDoc documentation.

If all goes well, I will be presenting kettle-cookbook in a Pentaho web seminar, which is currently scheduled for September 15, 2010

I've been enjoying 4 weeks of vacation (I started working this week). There's not much to tell about that, other than that it was great spending a lot of time with my family. I plan to do this more often, and now that Kettle Solutions is finished I should be able to find more time to do it.

I've been looking at two emerging HTML5 APIs for client-side structured storage, the Web SQL Database API and the Indexed DB API.

I have developed a few thoughts about the ongoing debate (see this and article for some background) about which one is better and I see a role for something like MQL here too. I will probably write something about this in the next few weeks

Yesterday, I got wind of the JS1k contest! Basically, the challenge is to write an interesting standalone JavaScript demo program that must be no larger than 1024 bytes. It is amazing and inspiring to see what people manage to do with a modern browser in 1k of JavaScript code.

I decided to try it myself, and you can find my submission here: An interactive SQL query tool for the Web SQL DB API.

Essentially, you get a textarea where you can enter arbitrary SQLite queries, and button to execute the SQL, and a result area that will print a feedback message and a result table (if applicable). As a bonus, there's a button to get a listing of the available database objects (using a query on sqlite_master) and an explain button to show the query plan of the current SQL statement.

The demo works on recent versions of Google Chrome, Apple Safari and Opera. It can run offline, and does not require any plugins. I should say that I expect my submission will be rejected by the judges since the demo is not functional on Mozilla Firefox, which is a requirement. (That is, the script will detect that the Web SQL Database API is not supported and print a message to that effect). However, it was still fun to try my hand at it.

Ok - that's it for now. I will try and post more regularly and write about these and other things in the near future. Don't hesitate to leave a comment if you have any questions or suggestions.

Thursday, December 09, 2010

Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters

Wednesday, August 11, 2010

Back to blogging....

DataZen winter meetup 2025