Percona Live MySQL User's Conference, San Francisco, April 10-12th, 2012 Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Sunday, July 16, 2006

Hands on: MySQL Samples For Pentaho 3 - (Making a MySQL Samples directory)

In my previous blog entry I described some configuration tasks that make the pre-configured pentaho demo environment installation act more like a server.

In this entry I will describe how to add a folder for our own MySQL samples. We will use this folder to deploy our real samples to. We want to use our own folder for two reasons:

  • presentation - it's much nicer if our samples stand out among the common ones everybody knows already

  • maintenance - we can easily remove, rename or do whatever we want with all of our samples without affecting the other ones



Let's start off right away. Oh yeah, we won't need to stop pentaho - in fact, to demonstrate that, make sure it is started right now. Remember, you can start the pentaho BI platform by running the start-pentaho script located in the root of the preconfigured installation.

Creating a directory on the filesystem


The first step is to create a directory on the filesystem. This will show up as a folder on the pentaho entry page next to all the other pentaho folders.

Just create a directory named MySQL in samples directory, which is located at

%PCI%/pentaho-solutions/samples.

So, now we have a directory:

%PCI%/pentaho-solutions/samples/MySQL.

Adding index files


We're not done yet. This is not enough to make pentaho display a MySQL folder at the entry page. We need to put an index.xml file in the directory to do that. The easiest way to do this is to copy a index.xml file from one off the other sample directories. For example go to the dashboard sample directory:

%PCI%/pentaho-solutions/samples/dashboard

And open the index.xml file. You should see something like this:



Well, it's just guess-work on my part, but I think that the index document element actually represents an index-entry of the actual index one level higher- that is, on the level of the samples directory. The name and description elements contain a variable reference - hence the % prefix. We'll discuss variables and references later on. The icon element contains some sort of reference to a resource that is used to graphically represent the item. The visible element contains either true or false, which denotes wheter the entry should be displayed. The display-type element contains a value that most probably tells pentaho how to display the contents of the entry.

Let's focus on those variabeles - you know, the funny values inside the name and description elements. Well, a bit of looking around learned me that the variables are defined in the index.properties file. Here's what the index.properties file for the dashboard samples looks like:



Obviously, the red scribbling is by my hand to show you where the variables are 'declared' so to speak.

Now what could be the reason for putting this in a separate file? This all becomes clear when you look inside the directory associated with the "Getting Started" samples folder:

%PCI%/pentaho-solutions/samples/getting-started

Whereas we only have just index.xml and index.properties for "Dashboards", we have a whole bunch of .properties files for the "Getting Started" samples. To name just a few: index_cn.properties, index_cn.properties, index_es.properties...

It doesnt take a scientist to conclude that the suffixes denote a landcode, and these .properties files are used to localize.

Now that we know how it is organized, we can apply it to our MySQL samples folder. Simply copy a index.xml file and a index.properties file, and place them inside the MySQL directoy we just created. First, open the index.xml file and make sure what the names are of the variables that are used there. Then, edit the .properties file. Make sure that the names of the variables referenced in the index.xml file are in fact defined in the index.properties file.

(I'm telling this explictly - the pentaho samples do not all use the same variable names. Compare for example those from the "Getting Started" and "Dasboards" samples. In my case, I copied my index.xml and index.properties from the "Getting Started" samples)

Set the value of the directory_name variable to the name of our samples directory: "MySQL". Write a nice description and assign it to the directory_description variable.

Refresh Solution Repository


Ok - we're nearly done. The only thing we need to do now is to make Pentaho aware of the changes. We can do this from the front page by clicking the "Solution Repository" link in the "Refresh" section on the front-page:



A new window will open to indicate wheter the action succeeded:



You're going to need to do this a lot. Everytime you decide to change an item that is already deployed, you must do this or else it will go by unnoticed.

Custom Icon


Ok - we've succeeded in adding a new directory. This s nice of course, but it still looks just like the other folders - we don't want that. We want ours to stand out and attract attention. The easiest way to do that, is to add a custom icon.

We don't need to think much about what image we should elect: we're more than happy to settle for that friendly little dolphin everybody knows so well:



(url: http://www.mysql.com/common/logos/mysql_100x52-64.gif)

So initially I just downloaded that and converted it to png format. Then I put it in the MySQL samples directory. After that, I opened the index.xml file, and modified the contents of the icon element, so that it's value read:

mysql_100x52-64.gif.png|MySQL.png

Remember that the original value for the icon element found in the index.xml file for the "Dashboards" folder was this:

folder.png|dashboard.jpg

And indeed, the dashboard directory also contains a folder.png image.

(My reasoning was that the folder image that is used as icon for the other Pentaho samples had tot do with the folder.png stored in each individual folder)

To cut a long story short - the image file needs to be located one level higher: inside the %PCI%/pentaho-solution folder. To do it, first make a copy of the folder.png located there, and modify that so that it looked something like this:
.

After that modification, this is what my Pentaho entry page looks like:

.

Of course, I'm reaching in the dark - why do I need such a funny picture ? What does the pipe (|) mean for these icon elements?

(If anyone could help me with that, please add a comment to the blog)

Next time


We'll continue to build on the MySQL Samples for Pentaho.

Wednesday, July 12, 2006

Hands on: MySQL Samples For Pentaho 2 (PCI Post-installation tasks)

In my previous blog entry I described how to install a Pre-configured version of the Pentaho BI Platform. That entry concluded with testing a few out of the box samples to verify that the BI Platform is working correctly.

In this entry I will discuss some Pentaho (PCI) post-installation tasks. All these are also described in the Pentaho Quick start Guide.

Running the PCI as a server



A demo environment with samples is nice, but it's even nicer when you can offer the samples right at the fingertips of your audience. Suppose you want let a business play around with some of the samples: you certainly do not want to force them to crawl behind your demo machine. That would be like going to a restaurant and find out that they've only got one plate to eat from, so that all the customers need to queue before they can have dinner.

In order to allow the PCI to be accessed from a remote client, we need to edit a configuration file: the web.xml file. You can find it in the

%PCI%/jboss/server/default/deploy/pentaho.war/WEB-INF

directory.

The WEB-INF/web.xml file contains a so-called java servlet web application deployment descriptor. That's just what the Pentaho BI platform is - a web application based on java servlet technology. For the PCI, the servlet container happens to be JBoss.

Anyway, once you found the web.xml file, it's probably a good idea to make a copy before you start editing. You can open file with any text-based editor. Search for the string base-url. You should end up finding a fragment such as this:



So, we've located the parameter with the name base-url and it's value is now http://localhost:8080/pentaho. We need to change the hostname (now: localhost) to either the DNS-name or the IP address of our server. It is probably a wise idea to use a DNS-name instead of the IP address, so here we go:

http://%My Computer's Name%:8080/pentaho

Now, save the file and (Re)start the server (using the start-pentaho script). Test if your modification was successful by navigating to http://%My Computer's Name%/pentaho. Of course, you should also test if it still works when you try it from a remote machine. If that is not the case, it's most likely that this is due to a firewall that blocks remote requests or outbound responses on port 8080.

If nothing happens, or when you get an error message, you most likely made a typo. Besides comparing with the old web.xml file, you can monitor the output of the start-pentaho script. Any errors will be output by the script and it might give you a hint as to what's wrong.

Ok, so now we can reach our pentaho demo environment from a remote machine.

You can also configure the server to use a port other than 8080. For example, port 80, the default http port, seems a sensible choice. This would allow users to connect to the much friendlier-looking url:

http://%My Computer's Name%/pentaho.

To modify the server port, you must modify the pentaho web app configuration file (the web.xml file we've just been editing) accordingly. So, again, we edit the value of the base-url parameter, deleting the :8080 bit so that it reads:

http://%My Computer's Name%/pentaho

We also need to modify the server configuration file. According to page 10 of the Pentaho Advanced Install guide, this corresponds to the server.xml file that is located in the

%PCI%/jboss/server/default/deploy/jbossweb-tomcat55.sar

directory. Again, make a copy of the original first, and store it somewhere safe. Then, you can open the original with a regular text editor. Once you've opened it, you should look for the fragment that defines the HTTP connector. The lines look something like this:



Just modify the port attribute of the Connector element so that it reads 80. Save the file and (re)start the server with the start-pentaho script.

It is worth noting that this this did not work for me at first. I got the following error:

20:07:13,234 ERROR [Http11Protocol] Error initializing endpoint
java.net.BindException: Address already in use: JVM_Bind:80

This indicates that another application is already using this port. Now, usually this would be another HTTP server, but I thought of that and made sure none of the other webservers I have installed were running. Finally, I found that Skype was using port 80. The issue was solved by configuring skype not to use port 80. (For everybody laughing their eyeballs out: this option was on by default.)

Configuring Mail


In my previous blog entry, I mentioned that not all of the packaged samples will run out of the box. For example, this is the case with the Bursting examples.

To configure mail, we need to edit the email_config.xml file which resides in the directory:

%PCI%/pentaho-solutions/system/smtp-email/

Again it's best to make a copy of the file first. Then, open it with a plain text editor.

If you want to use a google mail adres, I recommend you discard the original email_config.xml. Then, rename the email_config_gmail.xml file in the same directory to email_config.xml.

In most cases, you'll need to provide values for only these elements:


  • mail.smtp.host - provide the name of your smtp server

  • mail.userid - the user name

  • mail.password - the password for this mail



The config files have excellent inline comments and it should not be a problem to set this up.

When you're done, start pentaho if you did not already do so. Changing the mail configuration does not require a restart as far as I can tell.

Now, navigate to the Bursting Examples section. From there, follow the sequence of instructions to test if the workflow engine is working correctly. You can tell that it does when you see that your mailbox is being spammed by your pentaho demoserver.

Next time


Ok, so far we've just been busy setting up the pentaho BI platform: installing and configuring. Next time, we will get ourselves some development tools, and actually build our first MySQL sample.

Monday, July 10, 2006

Hands on: MySQL Samples For Pentaho 1

I got a quite a few encouraging comments in response to my previous blog entry, Some thoughts on Pentaho and MySQL. Now let's see if I can turn all this talk about Pentaho and MySQL into something we can touch and look at.

I want to make a start in creating a bundle of MySQL centric Pentaho samples. The idea is to end up with at least one simple educational sample to for each of the main flavours of functionality offered by pentaho.

Pentaho


As far as Pentaho is concerned: I'm taking the preconfigured demo environment as a basis for a couple of reasons:

  • Right now I'm not really interested in getting lost in the intricacies of setting up a J2EE server. There's plenty of time for that later on

  • As an extension to my first point, the samples will be easier to deploy for other people as well

  • The samples can be included in a future Pentaho demo environment with minimal effort



(Later, if and when I have a decent set of samles I might get rough and set up a demo pentaho environment from scratch to see how it works out if I replace the hypersonic database with MySQL.)

MySQL


For MySQL, I will assume a 5.0 generally available release. I will probably build most of the samples around the Sakila sample database. I hope it will be included in the MySQL release, and even if that's not the case, it is sufficiently easy to set it up.

It will also be quite interesting to build some things around some of the MySQL server level entities. The SHOW commands, several of the server variables and of course the mysql database are really always there, even in earlier MySQL versions. I envision that it should not be too difficult to build a few pentaho solutions that could actually be useful for real world database administration. Just a thought.

Functionalities



I want to have samples to cover at least the following areas of the BI space:

  • Reporting

  • ETL



Those are no-brainers actually. I think I really should also do


  • Analytics

  • Graphing



It starts to get really interesting if I could be able to use the more sophisticated Pentaho features:


  • Datamining

  • Business Rules

  • Dashboards / KPI visualization

  • Bursting

  • Workflow



Taking the first step...



Ok, let's not hesistate and make a start right away:

Installing the Pentaho BI Platform



  1. Use a web browser to navigate to the Pentaho download area

  2. Pick up some of the pentaho documentation while you're in the download area. I recommend you download the Pentaho Quick Start and the Creating Pentaho Solutions pdf documents. Besides, this will give you something to read while the BI Platform is busy downloading.

  3. Download the Pentaho BI Platform (pre-configured installation or PCI)

  4. Install the BI Platform. The Quick start guide contains just enough details, but I'm not afraid to spoil the surprise: Windows and Mac users only need to unzip the distribution. Linux users need to install a rpm.

  5. Installation results in a pentaho-demo directory. This is the serverroot, and it is commonly referred to as the PCI root or PCI install directory or something similar. To start the server, windows users run start-pentaho.bat; *nix users run start-pentaho.sh.

  6. Open an internet browser, and navigate to: http://localhost:8080/. This may take a little while - the server needs to warm up.



You should see a web page that looks something like this:





On the right side, you see a few links to access a few administrative tasks. The folders on the left side contain the samples, and we'll be adding a new folder there to hold our MySQL samples.

Before we go there, we should first try a few of the packaged samples to verify that the server is functioning correctly. It is important to realize that some of the samples need some configuration before they'll work. However, we'll avoid that for now - There's penty of time to set that. Again, the Quick start guide gives you all the info you need in just 15 pages, but let's highlight a couple of things that work right out of the box and that are visually appealing:


  1. Getting Started::Hello World

  2. Reporting. Just give a few of there reports a sping

  3. Chart Examples. Shows some of the included charting capabilities

  4. Analysis / OLAP Examples. Demonstrates slice and dice

  5. Dashboards


When you're done with pentaho, locate the stop-pentaho script in the PCI installation directory. Execute the script to stop de server.

Next time


Next time, I'll discuss some configuration tasks. I'll also discuss installing the sakila sample database. Stay tuned!

Sunday, July 09, 2006

Some thoughts on Pentaho and MySQL

Three weeks have gone by since the presentation of the "Business Intelligence with MySQL and Pentaho" webinar.

People that have been reading this blog will probably know how enthusiastic I am about Pentaho, even before I joined MySQL. Any effort that helps these superb products go together "...like peas and carrots..." is certainly something I like to contribute to as much as I can.

MySQL has been supporting features that make it extraordinarily useful in reporting / datawarehousing environments for quite a long time, and one of the major features of the upcoming 5.1 release, partitioning will probably not be the last one on the list. Pentaho is of course a much younger project, but that makes it all the more impressive if you see what they've achieved so far. Just download the latest release of their prefconfigured demo environment, and browse through the samples: ETL, Reporting, OLAP, Datamining, Dashboarding, Graphing - all under one hood, integrated in a workflow engine. Naturally, I believe that the Gold Partnership between Pentaho and MySQL is a good move that will lead to many mutual advantages.

So, I think it's not hard to see that this combination is one where both parties gain tremendous value from each other. Actually, it's probably not at all an exaggeration to state that you literally do not need any other branch of products or tools to implement your long term enterprise BI solution: just MySQL and Pentaho will prove to be entirely sufficient for most purposes. (But yeah, your business users will still use Excel Spreadsheets - and: let them if they want to! - Pentaho integrates there too).

Of course, time will have to tell if I'm right on this one, but I think it's likely that Pentaho has the potential to give the P in LAMP a whole new meaning.

That's not to say things are exactly perfect right now. Pentaho is sensitive to that, and are committed to make it increasingly easy for MySQL users to get the power of Pentaho at their fingertips. And to do that, they need help. Actually, Pentaho, like MySQL and a lot of other open source projects, rely on feedback of those users that know what's best for them. This is what Nicholas Goodman has to say about this:


We want to understand how to make it increasingly easy to use Pentaho with MySQL. In return for providing Pentaho with much needed feedback on ease of use and the user experience for installation/configuration Pentaho is giving away a Mac Mini.


Of course, I don't know if it counts, but here are a few of my thoughts regarding this:


  • Add a (web)interface to Pentaho that makes it easy to install and/or upgrade the Connector/J, the MySQL jdbc driver, so that it can be used to build pentaho solutions. Right now, this is not really difficult (read more), but it is a bit of a hassle. Another wild idea would be to have a Pentaho workflow check the download site to see if there's a newer version of Connector/J so it can alert the administrator or prompt for upgrade.

  • Include a setup for the sakila sample database, and add some samples that demonstrate Pentaho using this database.



With regard to my second suggestion - a little while ago, I wrote a little article on kettle - now officially called "Pentaho Data Integration". In that article, I only scratched the surface, but I already promised then to write another article to illustrate a slightly more realistic use case. Well, I still intend to do that, and it should pop up in the next few days on my blog. So, if you're interested, stay tuned.

If you want to know more about MySQL and Pentaho and how these two can strengthen each other, download and view the webinar.