Wednesday, August 11, 2010

Back to blogging....

It has been a while since I posted on my blog - in fact, I believe this is the first time ever that more than one month passed between posts since I started blogging. There are a couple of reasons for the lag:

  • Matt Casters, Jos van Dongen and me have spent a lot of time finalizing our forthcoming book, Pentaho Kettle Solutions (Wiley, ISBN: 978-0-470-63517-9). The book is currently being produced, and should be available according to schedule in early September 2010. If you're interested, you might like to read one of my earlier posts that explains the organization and outline of the book.

    (I should point out that we have reorganized the outline as the project progressed, so the final result will not have all the chapters mentioned in that post. We do however cover most of the topics mentioned.)

  • I have been checking out Quipu, a promising Open Source data warehouse management solution. Quipu provides a repository-based extensible code-generator that allows you to generate and maintain a data warehouse based on the Data Vault model. One of the things I want to do in the short term is write templates that allows it to work for MySQL, and after that, I want to see if I can get Quipu to generate Kettle Jobs and transformations.

  • I have been working on a couple of software projects. Two of them are currently available as open source on google code:

    mql-to-sql
    This allows you to use the Metaweb Query Language (MQL) to query a RDBMS. If you're wondering what this is all about: MQL is the query language used by Freebase (which is a collaborative "database-of-everything" or "wikipedia-gone-database").

    While Freebase is interesting in its own right, I am particularly enthused about the MQL query language. I feel that MQL is an exceptionally good solution for flexible, expressive and secure data access for modern (AJAX) Web applications. Even though MQL was not developed with relational database systems in mind, I think it is a pretty good fit.

    Anyway, this is very much a work in progress, and I appreciate your feedback on it. If you're interested, you can read a bit more about my take on RDBMS data access for web applications and MQL on the mql-to-sql project home page. I have also put up an online demo that allows you to query the sakila MySQL sample database using MQL.

    kettle-cookbook
    This is a project that provides auto-documentation for Kettle (a.k.a. Pentaho Data Integration).

    The project consists of a bunch of Kettle Jobs and transformations (as well as some XSLT stylesheets) that extract data from Kettle Jobs and transformation and transform it into a collection of human-readable HTML documents along with a table of contents. The resulting documentation looks and feels a bit like JavaDoc documentation.

    If all goes well, I will be presenting kettle-cookbook in a Pentaho web seminar, which is currently scheduled for September 15, 2010


  • I've been enjoying 4 weeks of vacation (I started working this week). There's not much to tell about that, other than that it was great spending a lot of time with my family. I plan to do this more often, and now that Kettle Solutions is finished I should be able to find more time to do it.

  • I've been looking at two emerging HTML5 APIs for client-side structured storage, the Web SQL Database API and the Indexed DB API.

    I have developed a few thoughts about the ongoing debate (see this and article for some background) about which one is better and I see a role for something like MQL here too. I will probably write something about this in the next few weeks

  • Yesterday, I got wind of the JS1k contest! Basically, the challenge is to write an interesting standalone JavaScript demo program that must be no larger than 1024 bytes. It is amazing and inspiring to see what people manage to do with a modern browser in 1k of JavaScript code.

    I decided to try it myself, and you can find my submission here: An interactive SQL query tool for the Web SQL DB API.

    Essentially, you get a textarea where you can enter arbitrary SQLite queries, and button to execute the SQL, and a result area that will print a feedback message and a result table (if applicable). As a bonus, there's a button to get a listing of the available database objects (using a query on sqlite_master) and an explain button to show the query plan of the current SQL statement.

    The demo works on recent versions of Google Chrome, Apple Safari and Opera. It can run offline, and does not require any plugins. I should say that I expect my submission will be rejected by the judges since the demo is not functional on Mozilla Firefox, which is a requirement. (That is, the script will detect that the Web SQL Database API is not supported and print a message to that effect). However, it was still fun to try my hand at it.


Ok - that's it for now. I will try and post more regularly and write about these and other things in the near future. Don't hesitate to leave a comment if you have any questions or suggestions.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...