Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Wednesday, June 06, 2007

Just send Microsoft your feedback...

I have a windows live mail account. Sure, laugh at me ;)

I just stumbled upon one of the most disgusting pieces of internet "journalism" ever:
Kicken_Dumpen
The message explains how on Komodo a 9 year old boy got killed after being attacked by a Komodo Dragon.

In itself this is just a tragic piece of news. What I find disgusting is that people are invited to vote whether this is "Kicken" - which means "Cool!" or whether it should be removed because it is not cool - "Dumpen". As you can see, by majority vote this is considered "Cool".

Bah! I feel sick to the stomach, and ashamed. Not only by the majority vote but by the sheer decadence and heartlessness of combining a story of such a family tragedy with a senseless and completely irrelevant poll.

Presumably, MSN does good business in selling this type of pages for advertisements, as can be seen in the right bottom, which shows the logo of Essent, a dutch electrical power supplier.

Normally I don't complain, but this time I felt compelled to inform MSN how disgusting I think this is....finally I found this "feedback form"
Microsoft_feedback
(This is just a button to submit the form: apparently we are not required to actual judge something or give feedback.) Thank you so much, Microsoft!

Tuesday, June 05, 2007

Google Gears Query Tool: bugfixes, new features

Yes, I committed another deadly sin programming for the off-line web :0

A few days ago, I wrote how I didn't test the Google Gears Query Tool with Internet Explorer.

This time, I was kindly informed by Jon Stephens that the query tool didn't check to see whether Google Gears is actually installed. It's fixed now. You should see the following message if the query tool detects that google gears is not yet installed: get-gg-pplease

Apart from that, I also added a few decorations and features. Most notably, the SQL statement textarea now uses a monospace font, and an SQL statement history was added.

Check it out:
ggqt
The statement history will simple maintain a stack of successfully executed statements, and put the most recent statements on top. Selecting a statement from the listbox will copy the SQL back to the textarea.

News from The MySQL UDF Repository


The MySQL UDF Repository is moving forward! Although it was only recently launched we've been contacted by quite a few interested parties already.

For example, the MySQL Repository has been mentioned by Ronald Bradford in this week's Logbuffer, the "Carnival of the Vanities" hosted by the Pythian Group. Thanks Ronald!

To facilitate the community process, we've set up a Google Group. The actual repository will remain located where it is now, but if you want to keep track of what is going on, please contact us to become a member of the group.

Becoming a member does not oblige you to anything, but please do take a moment to read about the intended usage of the group. You might also want to read a little bit more about what the MySQL UDF Repository tries to be. Here you will find a few ideas on how to achieve some of the goals of the MySQL UDF repository. All these goals are crucial to make the mission of the MySQL UDF Repository succeed.

MySQL UDF Repository Welcomes Arnold Daniels

We are glad and proud to welcome Arnold Daniels as a member of the MySQL UDF Repository Google Group!

Arnold is part of Javeline and also author of libmyxql. Although him joining is good news in itself, it gets even better: in his blog Arnold has announced that he will move libmyxql to the repository!!

Moving libmyxql over to the repository is a logical step for a few reasons. There has been a proposal for an XML library on the roadmap. If you have seen Arnold's library it is immediately clear that it offers a few of the most important requirements to be added to the repository:

  • It is a coherent collection of functions targeted at a set of similar tasks

  • It is already reasonably documented

  • The documentation includes examples

  • The functions have been given short yet intuitive names, and are properly prefixed to avoid name clashes

Arnold confided to me that in spite of the apparent quality, the library did not receive as much attention as he'd anticipated, and I hope that the move to the repository will remedy this.

That said, a few changes will be made to the library as it is now:

  • The code will be modified to compile on windows, linux and Mac OS/X

  • The name of the source file and binary will be changed to lib_mysqludf_xql for the sake of consistency within the repository

  • The mapping from relational data to XML will be enormously simplified, making it much easier for the user to call the functions

On order to grasp how it is possible to improve the usability of this already good library, take a look at the following signature of the xql_element function from lib_mysqludf_xql:

xql_element(
tagName, content
, att_name1, att_value2,..., att_nameN, att_valueN
)
So, we can pass the tagname of the element first, then optionally, the contents of the xml element, and then optionally a list of name/value pairs to specify a set of attributes for the XML element. When applied to the film table in the sakila database, it could look like this:

mysql> select xql_element(
-> 'film'
-> , title
-> , 'film_id'
-> , film_id
-> , 'release_year'
-> , release_year
-> )
-> from film
-> limit 1
And the result would be something like this:

<film id="1" release_year="2006">ACADEMY DINOSAUR</film>
This is all fine, but we think there is a better way.

Generating XML from relational data is a matter of translation of one data model to another data model. As it happens, data models usully bear a great deal of similarity - a film row becomes a film tag, a film_id column becomes a film_id attribute and so on and so forth. It is likely that in most cases, one would like to preserve the metadata across models. As it happens, the MySQL UDF interface for argument processing contains a very useful but undocumented feature which allows the previous result to be generated with a call like this:

mysql> select xql_element(
-> title as film
-> , film_id
-> , release_year
-> )
-> from film
-> limit 1

So what is actually happening here? Well, for film_id and release_year we can simply re-use the expression text of the udf argument as attribute name. In our example, we want the value of the title column inside our film tag, so in that case we need to provide an alias so that it reads title as film.

The ability to use the expression text in the UDF is not a very well known feature. This is probably because it is undocumented. However, you can find the definition in mysql_com.h:
typedef struct st_udf_args
{
unsigned int arg_count; /* Number of arguments */
enum Item_result *arg_type; /* Pointer to item_results */
char **args; /* Pointer to argument */
unsigned long *lengths; /* Length of string arguments */
char *maybe_null; /* Set to 1 for all maybe_null args */
char **attributes; /* Pointer to attribute name */
unsigned long *attribute_lengths; /* Length of attribute arguments */
} UDF_ARGS;
So, inside an UDF, args->attributes[i] holds the expression text or alias of the argument, and args->attribute_lenghts[i] holds the length of the string in args->attributes[i].

We have already successfully used this feature of the UDF interface extensively in the lib_mysqludf_json library.

If you are interested in lib_mysqludf_xml, or if you want to contribute, drop us a line and we'll make sure you get updated.

New library: lib_mysqludf_udf

We've also added a new library to the repository to make it easier to inspect the internals of the UDF interface. This library is primarily intended for UDF authors, and for debugging purposes.

Internally we have already been able to use it to answer a question like this "Is the string length returned by an UDF limited to 64kb, or not?" With lib_mysqludf_udf the answer is simple:

mysql> select length(udf_arg_value(repeat(' ',2 * 65535)));
+----------------------------------------------+
| length(udf_arg_value(repeat(' ',2 * 65535))) |
+----------------------------------------------+
| 131070 |
+----------------------------------------------+
1 row in set (0.02 sec)

So, no, UDF's can return more than 64k ;)

Coming Soon: Compiling for Windows using Visual Studio Express


Shortly we will be delivering Windows binaries of the repository libraries, compiled with Visual Studio Express. Please allow for some time for us to publish the details concerning the installation procedure.

For those that are interested in building the sources themselves with visual studio, we will shortly publish a tutorial on how to go about that.

Ok - well, that's it for today - hope to see you soon on our google group.

Roland Bouman

Sunday, June 03, 2007

What MySQL can do to enter the off-line Web

Disclaimer - views expressed in this blog (and this entry) are my own and do not necessarily reflect the views of MySQL AB

Ever since I wrote my blog entry about Google Gears and the query tool for the browser embedded offline Google Gears database service, I have been wondering how MySQL might fit in here.

I have heard an idea to write a MySQL storage engine for SQLite and although I do not think this is necessarily a bad idea, I don't think it will be immediately useful for typical applications powered by Google Gears. Personally, I think the following things might be of more use:

A modification of the Google Gears browser extension that allows a local MySQL database to be used instead of the embedded SQLite database

This would be interesting in case you would need a particular MySQL feature that is not available in SQLite. I'm thinking mainly of stored procedures but more so about replication.

The local MySQL server could be setup as a master that replicates to a remote slave. The remote slave would be hosted by your company and that backend would somehow merge all the private slaves from its employees into the corporate database.

Of course, the idea for the SQLite storage engine could fit here too - a local MySQL Server with the SQLite storage engine could be set up as master to replicate to the corporate slave.

A Google Gears Worker Pool application that synchronizes the embedded SQLite database with the remote, corporate MySQL database

This idea would of course require some server component that may be accessed by the Worker application, and it would presumably take the form of a web-service. This could be implemented either as a piece of middleware that sits in a HTTP server or as a component that is part of the MySQL server itself. This product would achieve in part the same functionality as replication from the client's master to the remote slave but I think there are a number of advantages here.

First of all, the implementation would not be intrusive to Google Gears - users can still use the ordinary Google Gears extension, and MySQL AB does not need to ensure that their modifications to the browser extension are compatible with future Google Gears developments.

Second, this application could in principle allow a true synchronization rather than just replication. In other words, it could pull data from the corporate database as needed, and push local modifications to be merged with the remote database. Obviously the possibilities here would depend on the flexibility of the server-side component accessed by the worker pool application.

Thirdly, users would not need to install an extra MySQL server on their webclient host. Arguably this could be seen as a disadvantage for MySQL AB as it would result in fewer installations of MySQL Server, but personally I don't think this is the right angle. I think that for most people, an extra MySQL Server that sits on their machine just for the purpose of replication will not be very attractive.

To me, using the Google Gears Worker Pool to drive synchronization seems like the most sensible choice that fits good with the whole offline-web idea. But what do you think? Can MySQL play a role in the offline web like SQLite can now?

Bugfixes to the Quick And Dirty Query Tool (Google Gears Database Service Demo)

Yesterday, when I published my blog entry on the Google Gears database service, I made the terrible mistake of not testing my Google Gears Query Tool in Internet Explorer.

I fixed a number of issues, and it from what I can tell, it now works for Firefox (tested: 2.0.0.4 in Windows XP and Ubuntu Linux Feisty Fawn) and Internet Explorer (tested: 6.0 on Windows XP)

For Internet Explorer, there is however one issue I cannot solve. I suspect it is a bug in the Google Gears extension. The bug may be reproduced by opening the Query Tool page and hitting the "Execute" button, leaving the textarea for the SQL text empty. You will get a popup that looks something like this:
gg-crash
And then the browser crashes.

I already reported it, so if you have any more information regarding this bu, please add to that thread to share your observations.

I also added a new feature. You can now view the query plan for your queries by hitting the "Explain" button. This will only work for SELECT statements, and you will see an error message otherwise. Explain is implemented simply by executing the current statement text in the textarea, prepended by the string "EXPLAIN ". The resultset of this is then rendered just like any normal resultset.

Saturday, June 02, 2007

Google Gears: Webbrowser embedded Database

Google Gears is an open source browser extension created by Google. It provides a framework that allows the creation of offline webbrowser applications. At the moment it provides three services:

Local Server

A data store for static resources. This allows efficient client-side caching of images, (client-side) scripts and webpages

Database

An embedded relational database management system, based on SQLite. SQLite should be familiar to most PHP developers, as it is being shipped with PHP since version 5

Worker Pool

A form of threading support inside the browser that allows webapplications to initiate long running processes without hampering the responsiveness of the user interface.
All these services can be accessed from within the browser using a javascript API.

Users only needs to install the extension in order for the browser to be able to access the services when browsing pages.

If you want to get an immediate taste of the database service, be sure to install Google Gears and take a look at my offline, browser-based database client.

Currently, the extension is available for mozilla Firefox and Internet Explorer.

What does this buy us?


Google Gears offers an enormous potential for web application developers, for a number of reasons:
  • The local server can be used to setup a specialized cache that can help to run web applications off-line.
  • The database service offers not only a structural solution for maintaining state at the client side, it even allows for persistence. That is, if you close the browser, shut down the computer, and restart, the application can pick up where it left off!
  • I am not really sure about the worker pool but I can imagine this can be very useful to perform tasks that require a connection with a server. For example, the worker could be instantiated that attempts to synchronize the local state and data gathered by the application to a webservice that acts as the terminal destination.
Of course, we all have to see real-world results, but I think this truly is a revolutionary development. It finally makes sense to start thinking of "browser applications" instead of "web applications", because the application does not necessarily have to connect to a HTTP server anymore to be useful.

I would not be surprised if this would initiate a new wave of emancipation of relational databases. Literally everybody that can operate a webbrowser will now own a database, and all the glue to make it work together with the user interface is already there. Also, a number of mobile service applications might be implemented on top the Google Gears embedded SQLite database instead of a specific standalone database product.

The web browser offers a reasonably sufficient environment for creating data oriented end-user applications, and the combination might prove to be a killer solution. The SQLite database can still be accessed outside the browser too, and be integrated with other applications.

Prerequisites

There are few preparations that need to be taken before you can use google gears in a web page. The prerequisites are that the browser that will view the page has the Google Gears browser extension installed. The page itself needs to include the gears_init.js script in order to use the javascript API to access the services. This single line in the <head> element of the page allows access to the API:

<script type="text/javascript" src="gears_init.js"></script>

The gears_init.js script needs to reside at the same location as the page in this case, but it may be referenced from any location.

One of the fascinating things is that neither the page nor the script need be on a webserver. You can simple store the script and the page on the local filesystem.

When opening the page in the browser, you will be prompted to allow the page to access the Google Gears services:
GG_Warning
So the user needs to allow access for each different site that wants to use the services.

A closer look at the Database service

The Google Gears database service can be accessed using a simple API. All of the Google Gears APIs are javascript (ecmascript) APIs.

Creating a database object and opening a database


First a database object needs to be created using a line like this:

var db = google.gears.factory.create("beta.database", "1.0");

The variable db can then be used as a handle to open the actual database. The following line will create a database named GGDB:

db.open("GGDB");

Database names are unique per application, or rather origin. All pages that are located in a particular scheme, host and port are considered to be part of the same application. (For the web location http://www.foo.com/, http is the scheme, www.foo.com is the domain and the port is the default port for the scheme, which is 80 in this case). Pages can only access databases that were created by a page in the same domain.

This 'same domain' restriction is an understandable security measure: if we visit https://www.mybank.com/, we do not want the data stored in any of the databases used by that site to be accessible by pages from http://www.bankrobbers.com/.

Creating a table

The database object implements the execute method, which allows one to execute SQL statements. For example, if we want to create a table to store contacts, we could try something like this:

db.execute(
"CREATE TABLE persons("
+ " id INTEGER PRIMARY KEY AUTOINCREMENT"
+ ", first_name"
+ ", last_name"
+ ")"
);
When you are not used to SQLite's specific philosophy of data storage, the previous statement must've made you frown at least once.

Yup people, SQLite does not require us to specify column data types. It is allowed to include an identifier where you'd normally put a data type name. However, this 'data type' does not restrict the values that maybe stored in that column. There is only one exception to that rule: when a column is defined with INTEGER PRIMARY KEY AUTOINCREMENT, it denotes a surrogate key that will store only integral values.

In spite of the fact that SQLite does not use data types for table definitions, SQLite does have a type system. SQLite has a concept called storage classes: each value entered into the database is associated with one of the following storage classes:

    NULL

    A special storage class for SQL NULL values

    INTEGER

    Storeas integral values

    REAL

    Stores numerical floating point values

    TEXT

    Variable length strings

    BLOB

    Variable length binary strings

However, a storage class is bound on the level of a value, not a column. So, one column can contain multiple values, and each is associated to one of the storage classes.

Inserting data


In Google Gears, all interaction with the database is performed by issuing SQL statements through the execute method of the Database object. So, if we want to insert data into our persons table, we need to do this with a SQL INSERT statement.

The first argument of the execute method is the actual SQL statement, and we can use this to pass a complete INSERT statement, including any value literals. However, we can also use parameterized statements, using placeholders instead of value literals. The remainder of the argumentlist must then be used to provide values for the placeholders in the SQL statement text:

db.execute(
"INSERT INTO persons (first_name,last_name)"
+ "VALUES (?,?)"
, "Roland"
, "Bouman"
);
This manner of passing values offers the advantage that you do not need to escape quotes from the values: this is automatically taken care of by SQLite.

Processing resultsets

A call to the execute method always returns a ResultSet object. In the previous examples, execution of the statements did not result in a set of rows, but a javascript ResultSet object is returned regardless.

The following snippet illustrates how you can process the rows in the ResultSet object:

var result = db.execute( //execute query, get result
"SELECT * FROM persons"
);
var fieldCount = result.fieldCount(); //get number of columns
while(result.isValidRow()){ //as long as we did not process all rows
for(var i=0; i<fieldCount; i++){ //loop over all columns
var value = result.field(i); //get value of the current column
}
result.next(); //process the next row
}
result.close();
The fieldCount method of the ResultSet object can be used to return the number of columns associated with the ResultSet object. If the SQL statement that yielded the ResultSet object did not return a set of rows, the fieldCount method returns 0.

The isValidRow() method of the ResultSet object returns a boolean that indicates whether there are still rows associated with the ResultSet object that may be processed. The next() method of the ResultSet object proceeds to the next row in the resultset. After calling the next() method, the isValidRow() method must be used to detect whether the resultset is already exhausted. Together, the isValidRow() and next() methods can be uses this to drive a loop to iterate over all the rows associated with the ResultSet object.

The field(columnPosition) method of the ResultSet object can be used to obtain the value for the column at the specified position. The fieldByName(columnName) method of the ResultSet object can also be used to obtain the value of a column, but as indicated by the method name, the column must be specified using the name of the column. The fieldName(columnPosition) may be used to retrieve the name of the column at the specified position.

When we are done processing the resultset, we are required to call the close() method of the ResultSet object. There are plans to implement a way to automatically close the ResultSet object when it isn't used anymore, but even then it can't hurt to close the ResultSet object explicitly.

Handling Errors

Although Google Gears is great, we still are not in a perfect world. Runtime errors may and will occur, and to handle them you need to use the javascript try...catch syntax. This is not so much a Google Gears thing, but as the database API will throw a runtime error for syntax errors, database constraint violations etcetera, you simply myst use try...catch blocks everywhere.

The following snippet illustrates how to do this:

try {

//stuff that can go wrong

} catch(e){

//use the message property
//from the Exception object
alert(e.message);

}

A Quick and Dirty Command Line Client


To play around with the Google Gears Database service, I made a quick and dirty browser-based Command line client for the SQLite database. You can use the online version here. If you want to use it offline, no problem, just download save the page anywhere on your local disk. Just be sure to save a copy of the gears_init.js script in the same directory. That's it....on to the offline web!