Roland Bouman's blog: 2005

Friday, December 30, 2005

Formatting SQL code made easy

I was just taking a quick glance at AMIS technology blog. This blog aggregate contains quite a few high quality Oracle related entries, as well as some entries relating to eclipse and some other open source developments.

There's an entry by Aino Andriessen referring to a an entry to Eddie Awad's blog. Eddie's been searching the web for SQL pretty printers/formatters, and he has come up with a few interesting ones.

Now, I've been needing such a utility ever since I'm using MySQL 5.0 views. Suppose you've made a complex view using a CREATE VIEW statement, and you want to review your code afterwards.

You could of course maintain the source code to your view outside the database and review that, but why not use the information_schema? The VIEWS system view has a column, VIEW_DEFINITION, that contains the SQL SELECT expression underlying your view. That way, you'd be sure to review the actual code: mistakes are out of the question.

Yes, that's very true: you actually are reviewing exactly the code underlying the view...wich is *NOT* the code you entered in your CREATE VIEW statement; actually, it's not at all the code you entered. This feature request (bug #11082) and the link in that bug report to this message on the MySQL internals list illustrate this perfectly.

I'll just paste the CREATE VIEW statement appearing in that message here:


CREATE VIEW DEBITEUR_NAW
(
 RELATIEID,
 NAAM,
 ADRES,
 POSTCODE,
 PLAATS,
 LAND,
 DEBITEURNUMMER,
 TAV,
 BTWNUMMER,
 BETAALTERMIJN,
 TELEFOON,
 FAX,
 ACTUEEL,
 EMAIL,
 KVKNUMMER
) AS
select d.relatieID,
       r.naam,
       a.adres,
       a.postcode,
       a.plaats,
       l.omschrijving,
       d.debiteurnummer,
       d.Tav,
       d.btwnummer,
       d.betaaltermijn,
       (select telefoonnummer from relatie_telefoon
         where relatieid = r.relatieid and telefooncode = 1 and nummer = 1),
       (select telefoonnummer from relatie_telefoon
         where relatieid = r.relatieid and telefooncode = 2 and nummer = 1),
       r.actueel, r.email, r.kvknummer
from debiteur d
join adres a on (d.relatieid = a.relatieid and a.adrescode = 1)
join relatie r on (d.relatieid = r.relatieid)
left join land l on (a.landcode = l.landcode)
;

Ok, everything's hunky dory here. Now, let's ask the information schema for the view definition:


SELECT  view_definition
FROM    information_schema.views
WHERE   table_schema = schema()
AND     view_name = 'DEBITEUR_NAW'
;

And, suprise, surprise:


select `d`.`relatieid` AS `RELATIEID`,`r`.`naam` AS `NAAM`,`a`.`adres` AS
`ADRES`,`a`.`postcode` AS `POSTCODE`,`a`.`plaats` AS
`PLAATS`,`l`.`omschrijving` AS `LAND`,`d`.`debiteurnummer` AS
`DEBITEURNUMMER`,`d`.`tav` AS `TAV`,`d`.`btwnummer` AS
`BTWNUMMER`,`d`.`betaaltermijn` AS `BETAALTERMIJN`,(select
`serp`.`relatie_telefoon`.`telefoonnummer` AS `telefoonnummer` from
`serp`.`relatie_telefoon` where ((`serp`.`relatie_telefoon`.`relatieid` =
`r`.`relatieid`) and (`serp`.`relatie_telefoon`.`telefooncode` = 1) and
(`serp`.`relatie_telefoon`.`nummer` = 1))) AS `TELEFOON`,(select
`serp`.`relatie_telefoon`.`telefoonnummer` AS `telefoonnummer` from
`serp`.`relatie_telefoon` where ((`serp`.`relatie_telefoon`.`relatieid` =
`r`.`relatieid`) and (`serp`.`relatie_telefoon`.`telefooncode` = 2) and
(`serp`.`relatie_telefoon`.`nummer` = 1))) AS `FAX`,`r`.`actueel` AS
`ACTUEEL`,`r`.`email` AS `EMAIL`,`r`.`kvknummer` AS `KVKNUMMER` from
(((`serp`.`debiteur` `d` join `serp`.`adres` `a` on(((`d`.`relatieid` =
`a`.`relatieid`) and (`a`.`adrescode` = 1)))) join `serp`.`relatie` `r`
on((`d`.`relatieid` = `r`.`relatieid`))) left join `serp`.`land` `l`
on((`a`.`landcode` = `l`.`landcode`)))

Is what is returned. Nasty, eh? Actually, this pretty much forces you to maintain a copy of the source outside the database if you're actively maintaining this view.

Suppose you didn't. Or suppose you're not in control of that: you could be in a position where you arrive at a customer that needs a problem fixed, and you're confronted with just the database. What will you do then huh?

Well, now you can just have your code formatted online. This is what it did:


SELECT `D`.`RELATIEID` AS `RELATIEID`,
       `R`.`NAAM` AS `NAAM`,
       `A`.`ADRES` AS `ADRES`,
       `A`.`POSTCODE` AS `POSTCODE`,
       `A`.`PLAATS` AS `PLAATS`,
       `L`.`OMSCHRIJVING` AS `LAND`,
       `D`.`DEBITEURNUMMER` AS `DEBITEURNUMMER`,
       `D`.`TAV` AS `TAV`,
       `D`.`BTWNUMMER` AS `BTWNUMMER`,
       `D`.`BETAALTERMIJN` AS `BETAALTERMIJN`,
       (SELECT `SERP`.`RELATIE_TELEFOON`.`TELEFOONNUMMER` AS `TELEFOONNUMMER`
        FROM   `SERP`.`RELATIE_TELEFOON`
        WHERE  ((`SERP`.`RELATIE_TELEFOON`.`RELATIEID` = `R`.`RELATIEID`)
                AND (`SERP`.`RELATIE_TELEFOON`.`TELEFOONCODE` = 1)
                AND (`SERP`.`RELATIE_TELEFOON`.`NUMMER` = 1))) AS `TELEFOON`,
       (SELECT `SERP`.`RELATIE_TELEFOON`.`TELEFOONNUMMER` AS `TELEFOONNUMMER`
        FROM   `SERP`.`RELATIE_TELEFOON`
        WHERE  ((`SERP`.`RELATIE_TELEFOON`.`RELATIEID` = `R`.`RELATIEID`)
                AND (`SERP`.`RELATIE_TELEFOON`.`TELEFOONCODE` = 2)
                AND (`SERP`.`RELATIE_TELEFOON`.`NUMMER` = 1))) AS `FAX`,
       `R`.`ACTUEEL` AS `ACTUEEL`,
       `R`.`EMAIL` AS `EMAIL`,
       `R`.`KVKNUMMER` AS `KVKNUMMER`
FROM   (((`SERP`.`DEBITEUR` `D`
          JOIN `SERP`.`ADRES` `A`
            ON (((`D`.`RELATIEID` = `A`.`RELATIEID`)
                 AND (`A`.`ADRESCODE` = 1))))
         JOIN `SERP`.`RELATIE` `R`
           ON ((`D`.`RELATIEID` = `R`.`RELATIEID`)))
        LEFT JOIN `SERP`.`LAND` `L`
               ON ((`A`.`LANDCODE` = `L`.`LANDCODE`)))

Pretty cool huh? I just chose the default options, but you can do stuff like control the letter case and the position of the comma's in lists etc.

It doesn't finish here though: This online tool can take on several input dialects (MS SQL, Oracle, Access, MySQL and generic) and, dig this, several output formats in addition to SQL: PHP, Java, VB and lots and lots more. (In those cases, the code for a string expression in the pertinent language is returned). You can also control

In addition to the online tool, Eddie Awad's blog entry mentions an applet, and a downloadable tool to. I haven't tried those, but you can check em out for yourselves of course.

Thursday, December 29, 2005

MySQL udf.NET, wtf?

I just read a post on Brian Aker's blog hinting on plans to embed MONO, the open source fork of the Microsoft .NET framework, into MySQL as an UDF language, or even an alternative Stored Procedure language.

Brian's post's just a brief mental note made public. It refers to a post on Miguel de Icaza's blog. Unfortunately, the link is broken, but I was able to dig up what I think is the intended adres.

Well, I think this is great and exciting news! I did a couple of things with Microsoft C# .NET, and however shallow my impression might be, I think the framework's solid - I like it. (I like C# too, but that's technically a different matter.) I haven't had an opportunity to check out MONO, but I heard some very promising things about it. (I heard someone claim binary compatibility - imagine, no recompile across OS-platforms!)

Anyone not experienced in C/C++ that did program an UDF will probably welcome the addition of a managed language to write their UDF. Also, having the ability to write stored procedures in some language based on the framework is bound to open up MySQL to a whole new group of users. It will definitely keep the Gap between MS SQL server 2005 and MySQL a lot smaller.

It makes me wonder though. The following quote is from the chapter 17.2.1. CREATE PROCEDURE and CREATE FUNCTION from the MySQL Reference Manual (v 5.0):

A framework for external stored procedures will be introduced in the near future. This will allow you to write stored procedures in languages other than SQL. Most likely, one of the first languages to be supported is PHP because the core PHP engine is small, thread-safe, and can easily be embedded. Because the framework is public, it is expected that many other languages can also be supported.

Question is, is PHP still in the picture? I sure do hope so! What about Java (I'm thinking Oracle compatibilty here)?
Anyway, exciting stuff ahead!

Ubuntu on an external hard drive - It's a megablast!

Wow, it's just such a thrill to have ubuntu sitting here on a portable drive!!

I just hooked my external drive to the IBM Thinkpad laptop I use for my job.

True, for some reason, I had to install the drive first under Windows XP (I'd expected the BIOS to pick it up right away, but it didnt), but after that, I could boot Ubuntu right away, just like I did on my desktop upstairs.

(Some people might regard it as a bit of a makeshift solution, but I like having the internal hard drive as the default boot device. During startup, I just hit whatever function key BIOS happens to accept to pop up a list of boot devices - F8 on my desktop, F12 on my laptop - and pick my external hard-drive if I want to boot Ubuntu. It makes sense, I really cannot be sure the external drive is hooked up, whereas It's safe to assume the interal hard drive's still there ;-).

So, Ubuntu has just awoken in this strange, new, compact body. From there, I used the graphical "networking" tool to set up my wireless adapter (built-in in my thinkpad, my desktop uses and old fashioned umbilical chord) and lo-and-behold: here I am, sitting in the living downstairs, with my laptop posting this.

I admit, I did reboot once after tinkering with the wireless adapter configuration. It just would not work at once, and the rebooting reflex is kinda engraved in my backbone, being a Windows user since the 3.11 version.

Thanks again Dave (a.k.a. DaBruGo), I really, really am so much obliged to you for posting this solution of yours!! I'll just repeat it here: those that want to have a dual boot solution for ubuntu and they want ubuntu to sit on an external drive, look no further! This thread in the ubuntu forums contains indispensible valuable information. Right, people are even posting replies that tell you how to do this with an USB flash memory stick, imagine that!

Wednesday, December 28, 2005

Linux for Human Beings - AND Roland too!

I've been busy exploring open source software for some years now.

I think it was Erik, a friend of mine, that first told me and explained what open source software was about. Then, he was right in the middle of developing the backup tool Abakt and we then discussed XML as a format for storing configuration data. He used (and still uses, I think) an open source SAX parser, I can't remember which one exactly.

Anyway, he introduced me to MySQL, OpenOffice, Mozilla - all these different kinds of products. My initial reaction was one of disbelief: How in the world could this software be free of charge? Put another way, how can it be free of charge, and be of good quality too? What's the catch?

I then briefly explored MySQL (Still a 4.0 I think?) and at the time, I thought I knew what the catch was: MySQL was not that good. At least, not compared to what I was used too (Oracle, MSSQL, Interbase even). It didn't have a lot of things that I used (and still use) a lot. Declarative Check Constraints, Views, Stored Procedures. Besides, it had all these strange features such as really eerie NULL handling. On top of that, you had to configure all that InnoDB stuff and go to all kinds of trouble to enable transactions and foreing keys.

Mozilla was sort of the same story. At the time, I didn't have built-in XSLT support (like IE does), and although it claimed it implemented only standard DOM to perform DHTML, I really was unimpressed as the interface IE offered seemed so much richer and easier to me.

Of course, I just consumed the products, briefly glancing over the features. And my perpective was quite narrow towards open source software, almost exlusively focussed on the costs, and not on the inherent quality or the community process. Some of the things I really did dig then was the Java programming language. I thought that was open source too, because it was free - of cost that is. Also, I had a tendency to compare all this stuff to the equivalents I was familiar with - being mostly Microsoft products, or commercially available software running on the Windows platform.

Gradually, my attitude changed. A product that definitely made a difference in this respect was PHP. At the time, I was using three different technologies to develop web applications: Oracle Mod_plsql, Java servlet pages and Microsoft Active Server Pages. To me, the problem with all these technologies was their lack of portability. I want to develop applications and let my customers decide what underlying infrastructure (Web server, database server, and of course: Operating System) is best for them. With these three, this is an utopy.

I stumbled into PHP again via Erik. Also, I noticed that funny php extension on all these webpages I was browsing, and I made a note I had to check it out sometime, ifnot to know why not to use it. Finally I downloaded it, and read the installation notes. Then, I set up in both Apache and Microsoft Internet Information Server - both up and running within 10 minutes. This was amazing! This was a product that I legally did not pay for, it had clear documentation AND it worked exactly as described on two very different webservers (at least, marketing-wise). Better still than not paying for it, I could even use PHP to build applications on a commercial basis!

(Actually, I was using Apache quite alot then too, but listen up, hear this and try not to laugh at me: I did't know Apache was open source! See, I was familiar with what was (and still is) called "the Oracle HTTP Server _{powered by apache}" wich is about 98% apache that's shipped with the oracle database as of version 8 (among that 2% is that mod_plsql I've blogged about here an there). Mind you, they havent even succeeded in using a Apache 2 based product, it's still Apache 1.3 there. tsk..).

From then on, I looked at open source software from a different point of view. I started using MySQL. The 4.1 is of course objectively a big improvement over the 4.0 version I met initially, but of course, my change of attitude has made all the difference. And my confidence in open source software has got another boost by getting involved (in my own modest way) in the MySQL forums, and in writing the odd blog or article.

However, all this time, I have been glued to the Windows operating system. I've been wanting to convert there too, but there's something about the nature of an OS that doesn't make me change. I mean, I don't like particularly configuring stuff, making my computer and OS see my hardware, solving IRQ conflicts, man you name it.

But now, something has come to my attention that might change all that. It's the ubuntu Linux distribution. Ubuntu advertises itself as "Linux for human beings". They do not explicitly say to what kind of a user base the other Linuxes are targeted, but for now, I'll gladly assume the mean "human beings that do like to configure, make the OS see my hardware and solving IRQ conflicts".

People might be wondering why I'm picking it up so late, but the result's what matters to me. (In case anyone cares, I picked it up through Morgan Tocker's blog wich I picked up through PlanetMySQL: thanks Morgan, thanks PlanetMySQL!).

Actually, I'm writing this blog from my freshly installed Ubuntu. It ships with Mozilla Firefox. I'm really impressed with how it looks, and how fast it works!

I'm still runnin it dual boot with my Windows XP OS, and I'll probably hang on to that too (actually, I haven't got much of a choice there as I need it to serve most of my customers). I had some installation troubles of course: I wanted to run it from an external hard drive, in part because I was a bit afraid the installation might corrupt my windows disk.

Anyway, power to the community, here as well: on the ubuntu forums, it took some 10 hours for me to get the issue solved....using a pointer to an existing thread (that I would've found if I had searched better). So, thanks for the pointer aysiu. And of course, many thanks, and a big thumbs up for my hero of the day: DaBruGo, who has so kindly explained to us all how to set ubuntu up using an external hard drive.

No helpdesk can beat this kind of support - it's just amazing.

Friday, December 23, 2005

Reporting for the Masses: Eclipse/BIRT, Apache Tomcat and MySQL

I got it working...

Eclipse

About two monts or so ago, my attention was caught by that remarkable open source product, Eclipse. (Maybe a lot of people are saying "..duh!.." out loud right now, but hey, I can't hear you)

I wrote the MySQL connector/J example accompanying the article in the Call Level Interface section of Andrew Gilfrin's site www.mysqldevelopment.com, using eclipse, very much to my satisfaction.

When I started to write the example, I was using sun's netbeans, but it was simply too slow. Especially as my source grew bigger (I'm talking only several hundreds of lines here, shouldn't be a big deal) the editor was just lagging and freezing up to a level that simply wasn't acceptable to me. Clearly, I needed something else and eclipse happened to be the first thing I thought of using instead.

BIRT

Browsing through some of the other, non-core, eclipse projects I noticed the BIRT project. And, quite shortly after becoming aware of BIRT's existence, an article was published on the MySQL website: Using BIRT To Report On Bugzilla in MySQL. I knew then that I had to check it out for sure.

For those that never heard about it, BIRT stands for: "Business Intelligence and Reporting Tools". BIRT lets you extract and present data stored in various types of data sources (such as relational databases, but also comma seperated files, and some others). Such a presentation, or report, can be a powerful tool for the upper and middle management to analyze, control and plan their business. (You can take a look at a DEMO flash movie here)

Of course, reports can have a much more modest purposes too. For example, an order entry system could be required to print out an entire order including order lines so it can be printed out and sent to the client as an invoice.

Now, I have worked with some closed source Business Intelligence and Reporting Tools, and I think it's fair to say that at this moment, BIRT is best characterized as a reporting tool rather than a Business Intelligence tool.

True Business Intelligence tools offer all kinds of instruments to analyze data, quite often in a dynamic, flexible, ad hoc manner. At this moment, BIRT is not such a tool. Those that are wondering what tools qualify as true Business Intelligence tools: I'm thinking of OLAP and Datamining tools such Cognos Powerplay, Oracle Discoverer, Microsoft Analysis Services.

BIRT let's you define sets of data (for example, from a SQL query) and then render that data into a mainly static layout. There is quite some choice of course: Data maybe rendered as simply plaintext, or in a complex tabular grouped format or even as a colourful graph or diagram. It should be possible to add some user interaction too, because you can put hyperlinks and javascript into the report (haven't tried that yet though).

There is also limited support for some analytics. The data drawn from the query can be grouped in various levels. Aggregate functions can be independantly applied to these groupings, showing both detail rows as well as the aggregates (think of a grand total for all ordered items on the invoice).

Actually, I think BIRT bears quite some resemblance to Microsoft Reporting Services. This is especially true when designing Reports. The Eclipse IDE, in which you design BIRT reports, matches Microsoft Visual Studio (which is used to create MS Reporting Services Reports). Both reporting platforms are extensible. Of course, BIRT is so by definition, as it is open source; but the methods for extending Reporting Services are documented and the specification is available. The language in which reports are defined is an XML format for both platforms .

(BTW - the XML-format this is something I like very much in both these products, It offers one all kinds of opportunities to manipulate report sources. For example, because this is all XML, it shouldn't be too hard to translate reports sources from one platform to the other using simple XSLT transformations. Another thing you could do is build your own implementation, generating source code in say, PHP. Again, all you'd need are XSLT transformations).

Of course, there are also some differences between these products. Here are some obvious ones which do not directly impact the functionality of the report:

BIRT is open source under the Eclipse Public License - Reporting Services is closed proprietary source software, subject to its own product license

BIRT is based on java technology - Reporting Services is built on .NET technology

BIRT has jdbc database connectivity - Reporting Services has ODBC and ADO .NET connectivity

Syntactically, BIRT expressions are Javascript expression - Reporting Services expression are Visual Basic .NET expressions

BIRT does *NOT* have crosstabs as of yet - Reporting Services has of course

BIRT datasources are defined on the report level - Reporting Services datasources are defined on the Report Server and/or Project level (wich I think is much more convenient than in BIRT)

A lot of output formats are available for Reporting Services - BIRT can only output HTML or PDF (I'm really missing an XML output format)

There are significant differences in Architecture:

Although you can use Microsoft Reporting services on various databases (ODBC, ADO) you really need a Microsoft SQL Server instance, always! The report data is processed and cached in a SQL Server database, no matter the actual data source. The actual report is run from the cached data. BIRT is dependant only upon a runtime library. No datastore is required to cache data.

Reporting Services is integrated into Microsoft SQL Server and Microsoft Internet Information Services (webserver). So, there, deployment is really easy. BIRT is just a reporting tool. You'll have to hook it up to a Application Server yourself - one capable of running java servlets. Deployment is then done manually, copying the source files to the application server

Tomcat

Of course, for it to be a serious solution, BIRT really needs to be hooked up to an application server. You can't expect your users to open eclipse to preview their report.

Luckily, the BIRT manual explains exactly how you can run your reports from your Apache Tomcat application server. Some time ago, I installed one of those too, but I never got round to playing around with it. This was my first chance to do that too.

I expected it to be big deal - it's not. In a minimal setup, you need to create a BIRT directory under tomcat's webapps directory and copy one of BIRT's components, the report viewer to that directory. It's a walk in the park, I had it running withing minutes, having read the relevant chapter in the BIRT manual.

MySQL - Sakila

I used the sakila sample database (download here) to do some reporting.

I devised two little queries that I think would be useful had I started a career as a DVD rental shop owner: Rentals per store through time, and Rentals per category.

Here's Rentals per store:


select     DATE_FORMAT(last_day(r.rent_date),'%Y') as rent_year
,          DATE_FORMAT(last_day(r.rent_date),'%c') as rent_month_number
,          DATE_FORMAT(last_day(r.rent_date),'%b') as rent_month_abbreviation
,          i.store_id                              as store
,          count(*)                                as count_rentals
from       rental r
inner join inventory i
on         r.inventory_id = i.inventory_id
group  by  rent_year
,          rent_month_number
,          store

I rendered it as bar graph:

a bar graph of sakila rentals per store through time

Here's Rentals per category:


select     c.name               category
,          count(*)
from       rental    r
inner join inventory i
on         r.inventory_id = i.inventory_id 
inner join film      f
on         i.film_id      = f.film_id
left join  category  c
on         f.category_id  = c.category_id
group by   category

I rendered it as a Piechart:

a pie chart of sakila rentals per category

Quite interesting results, actually. I'm not experienced in DVD rentals, but this does look like quite a homogenous set of data to me. Anyway, I just heard Mike Hillyer is going to discuss the sakila database at the MySQL user conference in april 2006. We'll probably hear exactly how he generated this set.

I just want to add that it took me just a couple of hours to explore BIRT and the underlying XML format (dont worry, designing is all GUI) enough to come up with these. Apart from these graphs, my first report also includes a table of rentals per store with subtotals for several grouping levels. I skipped the tutorials, I just watched the Demo Flash movie (link in the top). My conclusion is that eclipse/BIRT is intuitive enough to build serious reports.

Also, I'm impressed with the performance running the reports. I'm running eclipse/birt, mysql 5.0.17 and Tomcat all on my 1700 Mhz/ 1G Ram Thinkpad, and the graphs appear almost instantly (Yup, disabled my browser cache - it really is fast). We are looking at 16088 rows of rental data - I'm impressed this speed is possible even without the data cache such a used by Microsft Reporting services.

Final thoughts

You can put together a Reporting Platform based exclusively on open source products. It's quite easy to set it all up. BIRT may not be as advanced as some comercial products, but because no financial investments are involved purchasing software, theres no risk in trying. In fact, I think that in a lot of cases, the functionality provided by BIRT is quite sufficient to serve most reporting requirements.

Although BIRT may lack some more advanced features such as crosstabs, most of these will probably be added within an overseeable amount of time. Take a look at the BIRT project plan to see what is going on. I'm quite convinced we won't have to wait too long before we see a crosstab inside BIRT

Tuesday, December 20, 2005

'Doing' MS Access

I've just come home visiting one of my Inter Access co-workers.

Barbara is a very capable Oracle Developer, Information Analyst and BI Consultant. She enjoys the craftmanship so much that she still finds time to help out friends managing their little IT solutions.

For example, one of her friends owns a sports clothing store. She built them an order entry system in that popular little Database by Microsoft known as Access.

As you might expect, the system has grown over time, and has had too little administrative attention. Now, it's about 30Mb of data. This may not seem much, until you try to run a query. Size really is a relative measure.

What's worse, the number of users has grown too. Again, it may not seem much, but five users that are simultaneously working with one Access database is really about the limit. At least, it seems to be the case here.

She decided that it might be a good idea to try porting the application's backend to MySQL, and she asked me if I had any experience in porting such an application. 'Yeah..' I said, lying through my teeth of course, '..sure I have. I do that stuff all the time.'. 'Lyer..', she said, '..but do you think you can give me a hand?'.

Of course, I said yes. I really never did this, although I was aware of the builtin functionalities of the migration toolkit to do the legwork. And I'm always eager to goahead and try these kinds of things, especially since this is sort of a real, mission-critical application. (Just because we're doing this in our spare time, and that this concerns only a small sportsclothes shop does not change the fact that this application is mission-critical and that the migration should be planned and executed carefully).

Luckily, the application was already built in two separate chunks. The front end is an Access application: It contains some stored Queries ('Views'), some Forms and some VB code. The backend is also an access database. This one is worth 35 Mb of base tables. The front end communicates with the backend through a so-called 'linked table'. (For the MySQL techies: this offers the same functionality as the FEDERATED storage engine in MySQL)

In our first endeavours, we used the MySQL Migration Toolkit (download: here) to reverse engineer the database schema and to load the data into MySQL. This really went remarkebly well. We used default options in most of the cases, and I can only conclude that this is a very good way to import an access database into MySQL.

Then, we used the MySQL ODBC driver to link the existing Ms Access front end to the MySQL backend database. The immediate first impression regarding this solution is that query performance is much better than in the old situation, so that's a thumbs up for MySQL. And we didn't even do any tuning sofar.

A bit of a bummer is that we did encounter problems inserting data into tables with socalled AutoNumber columns (such a column is very much like an auto_increment in mysql).
In the fist place, the migration toolkit did NOT automatically map those to mysql auto_increment columns. This was fixed soon enough though. We just added the auto_increment attribute ourselves.

A more serious problem was that Access did not synchronize well upon insertion of a new row. The row would be put into the database, but immediately after insertion, the access front end would show the row as #deleted#. Refreshing the view on the data from inside Access would show the row, proving the insert did indeed take place, and proving access is capable of retrieving the newly inserted row.

We did some searching on the MySQL Access Migration forum but alas, to no avail. Lucky for us, a bit of browsing around brought us to the -unbeatable - reference manual. The "deleted" problem is described there exactly, and a workaround is ginven there too.

(By the way, the manual has an entire section devoted to using the MySQL ODBC driver in combination with Microsoft Access)

We had to change just two things. First, we had to enable the Return matching rows option in the MyODBC configuration dialog. This is available from the ODBC Driver manager. Then, we had to add a timestamp column to the tables:


    alter table table1
    add dummy timstamp
    default current_timestamp
    on update current_timestamp

...and that fixed the problem.

A point worth mentioning is that it really doesn't matter how you call that column. You don't even need to use it in your application. It just needs to be there, that's all.

Unfortunately, we are both at a loss as to explain why (and how) this works. We even did some ODBC tracing to see what happened. We couldn't find out how it works, but we could see something else that's interesting. After the INSERT, Access performs a SELECT, presumably to synchronize it's view on the data. Judging by the trace, that SELECT statement does NOT use the timestamp column in the WHEREclause.

Now, there still have to be some tests, but at least, we're very confident that this port to MySQL will be quite successful and without siginificant effort (thanks to the Migration toolkit)

Barbara promised she'd let me know if there's any progress or any troubles, so I will report them back in this blog, '...for the world to knwow...'

Friday, December 16, 2005

Some MySQL stuff I have been doing during my Holidays

During my holiday, I alo spent some time with MySQL.

I've been working on an article series for mysqldevelopment.com. This time, Andrew, Markus and yours truly have teamed up to cover (or at least, introduce) the interfaces you use to access MySQL from the (end-user) application. So far, we've come up with an introduction and an article describing Connector/J, the MySQL JDBC interface.

Given the amount of available API's/Connectors, it seems we've got our work cut out for us. But hey, dear reader, if you feel you want to contribute, don't hesitate and drop us a line.

I've been working on MySQL UDF's too. What a great and powerful feature! As I described in a previous entry (This is too wicked), I'm building a set of UDF's to port between Oracle and MySQL (not neccessarily in that direction)

To me, the most amazing thing sofar has been the discovery that MySQL already implements so much Oracle specific functions. I did notice a few before, but It really became clear to me once I started making an alphabetical list of functions (still busy with it). Or rather, since I started mapping Oracle's alphabetical list of functions from the Oracle SQL Reference to the MySQL functions.

I sure did learn a lot since I wrote my first UDF (Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers). I finally got rid of that pesky #include "winsock.h", and I developed a couple of macros, typedefs and such that make up a nice little reusable toolkit for creating UDF's (well, at least: I think so). I'll think I'll describe it in the blog in the near future.

Thursday, December 15, 2005

Having a bit of a Holiday

I've been enjoying a bit of a holiday lately. Nothing much, just two weeks off from my regular job.

Frankly, not having to go to work, celebrate "Sinterklaas" with my family, reading a novel (Sinterklaas gave me the Dutch translation of Michel Houellebecq's last novel, "La possibilité d’une île", Dank U Sinterklaasje! A certain iPOD Nano device also arrived on the 5th of december - what timing, I love it!) and making a walk on the beach (Yup! we saw a seal, a wild one that is, in Katwijk aan Zee. It looked as if it was making an attempt to swim into the Rhine River, in fact it was at the foot of the dunes, well on the shore).

Tuesday, November 29, 2005

This is TOO Wicked!

Wow, this really is too cool to believe!

I'm one of the lucky three grand prize winners in the You Make MySQL 5.0 Rock! contest. I bet that sounds cheesy, but I really never imagined I would be one of those lucky three. It goes without saying I'm very very pleased indeed! Brilliant!

I just want to say: Thank You! to the MySQL team, it's a great honour. And hey, Arjen, I didn't mind sucking up at all ;-), as a matter of fact, I can't wait until the next contest.

Well, now that I'm at it, I want to speak up in favour of a couple of MySQL community personalities that I respect and think highly of. To a large extent, my motivation to occupy myself with MySQL has to do with being inspired reading their posts on various MySQL forums, their blogs and in an isolated case, their book. So, here we go:

Andrew Gilfrin: Andrew barely needs introduction: without doubt, one of the people most present on various of the MySQL forums, and of course, the man behind www.mysqldevelopment.com. But hey, this guy is so busy, isn't there anyone out there that can help out, and contribute to the site? Come take a look and help to make it even better still!
Markus Popp: The brains behind that wonderful and unique initiative, www.db4free.net, making open source database product accessible to literally everyone (with a connection and an email adress, that is). Congrats to you, dude!
Jay Pipes: Apart from being a very helpful and knowledgable MySQL forum member, Jay's also one of the Authors of Pro MySQL. I think that book's just great stuff, taking the principle perspective of MySQL as a professional database product.
Felix Geerinckx: I've seldom seen a forum member that was so accurate and quick in answering forum questions. And that's exactly what Felix does. I bet he's the modest type too, because unlike myself, you can never catch him plugging his own pages into the forum. Felix, where's your blog?
Beat Vontobel: Well, who could've missed that, a MySQL Stored Procedure debugger. Amazing! Congratulations Beat! The day will come that somebody will aggregate your blogs in a Internet Explorer friendly page ;D)
Giuseppe Maxia: Finally, someone senior enough has taken it upon himself to create some structure and organize the various MySQL general purpose routines that are scattered around. So, come on, check out the MySQL General Purpose Stored Routines Library and post your comments and contributions.
Arjen Lentz: Maybe I'm not the best judge, but it is my impression that Arjen is one of the major reasons why this community is thriving, and literally alive and kicking. For one, the PlanetMySQL initiative is really great. Just one page to see all those interesting items and blogs. This is daily digest for me

So there you have it. You may think it's corny, or you may think it's a matter of blunt, bad taste advertising...so be it. I benefit a lot from these people, and to me, it just seems the proper thing to publicly pay them tribute. Props to all of you!

Meanwhile, work has to be done too.

For one, Markus and Andrew and myself have been, and still are, quite busy writing an series of articles on connecting to MySQL from an application programming language for mysqldevelopment.com. You can expect to see those soon over there.

Second, I'm getting kinda hooked writing UDF's and delving into the associated materials. The C language (after about 8 years, it's coming all back to me), some of the MySQL source (up till now, only the headers). I took it upon me to write a library to implement most of the Oracle functions. Why? Because I might need it for a migration, and because it's fun!
I decided to make a plan to do it too, although I can't dislose too much about that right now. Let me just say I'm implementing them in alphabetical order (about to do DECODE() now), until there are no more entries on the list ;-).

Bye and greetings from one happy Roland.

Wednesday, November 23, 2005

Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers

Yes, this is really a feature that would be most welcome.

Ever since I started following the MySQL Stored Procedure/Functions and Trigger Forims (say, June 2005) this feature, or rather, the lack thereof, has been the subject of a substantial amount of threads (see: http://forums.mysql.com/read.php?98,24044,24100, http://forums.mysql.com/read.php?99,22523,23797#msg-23797 or more recent: http://forums.mysql.com/read.php?99,55108,55541)

Luckily, this has been recognized by the MySQL developers, and there's an entry in the MySQL 5.0 Technical FAQ that addresses the issue.

But, there's always people that can't wait. For example, Scott Maxwell devised a most creative solution. Props to you Scott!

Well, I scribbled up a little solution that takes another aproach to the problem. I won't pretend I'm the first one that thought of this. It was actually suggested quite a few times, here and there. I must say, I did not see anyone offering the code to do it, but this maybe just ignorance on my part. At least it gave me the opportunity to touch a whole new field of MySQL development: User Defined Functions or UDFs.

Using an UDF

A UDF is an external function that's dynamically linked to the MySQL server software. You can look it all up in the reference manual. There are some obscurities, especially when you're not an C programmer.

(However, the obscurities could be intentional. Maybe it's meant that way, just to scare people off that don't know what they're doing. Well, I'm the first to admit it: I am not an experienced C programmer, and most of the time (well...) I don't know what I'm doing (not just in C, applies to almost any programming language). I guess the difference between starting to do something not knowing what you're doing and finishing doing that is called experience. Or failure. Anyway, the key is: not letting other people know which one it is, and at the same time pretending you do know the difference yourself.)

First, you need have the object code that contains the UDF. The object code is just the compiled, binary code that contains your functions. Now this compilation must be done in a certain manner so that it results in a library that can be dynamically linked to the MySQL Server process. I don't know how this works out for a linux box (but it has something to do with .so files, your guess is probably better than mine is), but in Windows Speak this is called a DLL (Dynamicaly Linked Library).

Once you have the binary, it must be placed in some location that's accessible to MySQL (I'll tell you in a bit where i put it). Finally, you must issue some specialized DDL command so that MySQL knows it exists (yep, I'll tell you that too). From that point on, you will have this UDF accessible throughout the server. Just like the builtin functions, you will never have to prefix it with a database name. You will never have to grant it to anyone either. It's just like using builtin functions, such as LOWER() or GREATEST().

I'll try the best I can to describe what I did to get it working. I really hope it helps.

Setting up a Project

We have a very modest goal here. We just need one little function that will do nothing but raise an error. Normally, I'm quite good at breaking stuff, and my code gives runtime errors all the time, so this shouldnt be too hard. So let's get started and code some stuff.

Well, the relevant pages in the manual say that you should write the function in C. You probably could do it in other languages as well as long as calling conventions, stack properties and who knows whatnot will be the same. But, I'm not Indiana Jones: just writing a little bit of C is adventure enough for now.

I just used Visual Studio C++ 2003 for no particular reason. It's just what I happen to run. You could probably achieve the same with another IDE and Compiler. But hey, I don't know those, so I won't describe that (Maybe when I get Eclipse hooked up to a freely obtainable compiler I'll relive the experience). BTW, you can download the free (as in bear) VC++ express edition from you know who here. Mind you, it's not entirely the same, but it should support the basic features I'm using here.

I opened a new, blank solution and added a win32 empty DLL project: "MySQLRaiseErrorUDF.vcproj". I added one source C/C++ source file: "MySQLRaiseErrorUDF.cpp", and one Module Definition file: "MySQLRaiseErrorUDF.def". At this stage, both are still empty. However, Visual Studio knows that it should offer C/C++ specific property pages in the Project options dialog, which is good, cause that's what we'll be setting up next.

(Note that the .cpp extension is usually associated with C++ source files. However, we do not use any C++ specific stuff. It just happend to be the only sort of source file I could create using the VS wizard.)

I right-clicked the project to modify the following options:

In the "General" page of the "C/C++" options folder, I modified the "Additional Include Directories" property to set it to the "include" folder under the MySQL installation directory (available after a full or custom/development install). We need this to be able to refer to the "mysql.h" header file in our source.

In the "Input" page of "Linker" folder, I included the "MySQLRaiseErrorUDF.def" file we just added to the project.

Coding the DLL

Now, we can start coding. First, our C source needs to include the "mysql.h" header file. This contains references to "winsock.h" which is included in the windows platform sdk. Most probably, the latter has a different counterpart on other operating systems. (Anyone that can enlighten me concerning this, and even help making the source compilable for multiple platforms, is most invited.) So, we start like this:


#include <winsock.h>
#include <mysql.h>

A simple, non-aggregate function that is to be called from the MySQL side corresonds to three functions in the C code. Say, we want to create a function called: raise_error(). Then, our C file must at least contain the code for that, and optionally contain a raise_error_init() and a raise_error_deinit() function. (look it up in the manual)

We'll just rush on to coding the raise_error_init function. Once we got that, you'll see that the raise_error() is really very trivial.

Now, raise_error_init() functions MUST always have this signature:
(Look here for the relevant page in the reference manual.)


my_bool raise_error_init(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *message
);

There's not much to explain as to why it must be like this: that's just the way MySQL was built. I can tell what this signature contains, and hope that that'll be enough of an explanation.

`initid`

The initid is declared as a pointer to a struct of the UDF_INIT type. This is defined in mysql_com.h (ln 369..376). It doesnt really matter right now what a pointer is, or what a struct is. What does matter, is that the initid variable is like a composite variable. You can think of it like a record which has a couple of fields (members is the appropriate term here) that contain information about our function.

On the one hand, mysql passes initid to our raise_error_init() function so that we get some information on the context that's calling our raise_error() function; On the other hand, we can change the values in the fields of initid to provide information to our raise_error(), because the very same initid will be passed on by MySQL to the raise_error() after raise_error_init() finishes.

`args`

Like initid, args is a pointer to a particular struct, this time, the UDF_ARGS struct. That's defined in mysql_com.h (ln 356..365).

args provides information on the arguments passed to our raise_error() function. Again, we can both read and write data to and from the members of this struct just before the raise_error() itself is called.

`message`

Now we are arriving at the heart of the matter.
The message variable is again a pointer (man, these guys just love pointers). Maybe now it's a good moment to explain what a pointer is exactly.

A pointer is actually just an integer value. But it's a special integer value: it's an integer that identifies, or "points to", a position in the computer's memory. Quite literlly, the pointer's integer value is the address of a byte in the computer's RAM.

Pointers have some added semantics. Their type definition also includes the datatype of the stuff that's stored at that address. Or more appropriately, it includes the datatype that will be used to interpret whatever happens to be in the computer's memory starting from the address identified by the pointer integer value. For example, the message argument is a char pointer, notated as char * in C.

Now, the contract of the XXX_init() function holds that message points to a buffer of a particular size, MYSQL_ERRMSG_SIZE bytes long. Knowing what we know now, we can also say that message identifies an address that demarcates a stretch of consecutive bytes that are reserved.

Furthermore, XXX_init() expects us to write to this memory a string of characters that human users may read and recognize as a meaninful error message text. At least, if we can detect an error right now.

The main reason to use this feature for an ordinary function would be to check that the UDF has been called with the right amount and types of arguments in order to exit in case of invalid arguments. Now, for the function we're writing now it's easy: raise_error() should always generate an error; that's the entire reason for it's existence. And, while we're at it, it would be cool to use the value passed by the user as argument as error message. That would pretty much conclude the functionality of raise_error().

Returning from `raise_error_init()`

Actually, the message argument to raise_error_init() is only half of the story concerning error generation. (see the manual for more details) In order to let MySQL know that an error occurred we MUST return a 1 (one). Normally, a zero (0) should be returned, indicating that execution may proceed execution.

`raise_error_init()`

So, finally, here's the code for raise_error_init():


my_bool raise_error_init(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *message
){ 
    unsigned int argStringLength;
    if(args->arg_count==1
    && args->arg_type[0]==STRING_RESULT)
    {
        argStringLength = strlen(args->args[0]) 
                        + 1
                        ;
        memcpy(
            message
        ,   args->args[0]
        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength
        );
    } else {
        memcpy(
            message
        ,   "Unspecified error raised"
        ,   25
        );
    }    
    return 1;
}

Basically, what happens is that we check if a string argument was passed:


    if(args->arg_count==1
    && args->arg_type[0]==STRING_RESULT
    && args->args[0]!=NULL)

and if so, we calculate the number of bytes needed to hold that string


        argStringLength = strlen(args->args[0]) 
                        + 1
                        ;

(Mind you, the + 1 is there because strings are null-terminated (\0 in C))

Then, this information is used to copy the stretch of memory pointed to by the argment to the strecth of memory pointed to by our message:


        memcpy(
            message
        ,   args->args[0]
        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength
        );

and here, the precaution has been taken to restrict the number of bytes written to message:


        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength

Now there's also the possibility that the wrong number and types of argument were passed to our UDF. In that case, we just copy a default message to message:


        memcpy(
            message
        ,   "Unspecified error raised"
        ,   25
        );

The only thing we have to do now is return a 1 (one), and our raise_error_init() is ready:


    return 1;

`raise_error()`

For the actual function it isn't really important what we put in there, or what return type it will have. Remember, the function will never be executed because raise_error_init() already stopped before control is passed to raise_error(). Let's leave the details undiscussed right now. Let's just say that our raise_error() really does not do anything.


long long raise_error(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *is_null
,   char *error
){
    return 0;
}

so, here it is, doing nothing.

Module Definiton and Compilation

So far we just typed C code. But what we need is that this is all compiled into a DLL, in such a manner that other programs, like MySQL Server, will be able to invoke these functions. We must expose our functions using the module definition file. We created that as "MySQLRaiseErrorUDF.def". It's done like this:


LIBRARY MySQLRaiseErrorUDF
EXPORTS
    raise_error_init
    raise_error

That's it. Not too hard eh?
Now, hit the compile button and generate a DLL.

Deployment

I put the DLL in the bin directory under the MySQL installation directory. Then, on the command line i did:


mysql> create function raise_error returns int soname 'MySQLRaiseErrorUDF.dll';
Query OK, 0 rows affected (0.04 sec)
mysql> select raise_error();
ERROR:
Unspecified error raised

as you can see, an error is raised.


mysql> select raise_error('Whoops!');
ERROR:
Whoops!
mysql>

Getting rid of the function is also easy:


mysql> drop function raise_error;
Query OK, 0 rows affected (0.00 sec)

(You need to do that before you can overwrite a DLL that's linked to the server via a UDF).

Limitations

Right now there are some limitations to this approach.
The most important is that our argument really must be a constant value in order to return it:


mysql> set @err:='Whoops'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> select raise_error(@Whoops);
ERROR:
Unspecified error raised

This occurs because of the way the UDF_ARGS is initialized when it is passed to raise_error_init(). The argument values are only initialized for constant argument expressions.

Of course, we can get around it using prepared statements:


create procedure raise_application_error(
    errno int
,   errm  varchar(255)
) begin
    set @errno := errno;
    set @errm  := errm;
    set @stmt  := concat(
        'select raise_error'
    ,   '(''',errm,''')'
    );
    prepare stmt from @stmt;
    execute stmt;
    deallocate prepare stmt;
end;
//

mysql> set @msg := 'Whoops, my bad, sorry!'
    -> //
Query OK, 0 rows affected (0.00 sec)

mysql> call raise_application_error(-20000,@msg);
    -> //
ERROR:
Whoops, my bad, sorry!
mysql>

This example also proves we can raise the error from inside a procdure. By the way, it works for triggers too, I checked.

Now what?!

Well, I would really like to do it the other way around too: reporting the current error via an UDF. But from what I've seen so far, this is not possible.

Anyway,I'm really glad I got my first UDF up an running. It took me some time (especially the module definition was not clear from any background docs i know), but at least now I know some of the pitfalls etc.

Saturday, November 12, 2005

OSDBConsortium? Cool!

I just read this very interesting post on Kaj Arnö's blog.

Word is, the major database open-sourcerers ;-) (MySQL, PostgreSQL, SQLite, Firebird and others) might start a consortium, and collaborate more where it is convenient.

I find this very interesting. If it works out, it may be a new milestone in professionalism and standardistation, and help out people choosing the right product.

Already, a link is given for a putative website. Right now, it's nothing, but I'll be putting a link up in my sidebar as soon as there's more news.

Friday, November 11, 2005

SQL 2003 drafts for download

Funny how seem to find stuff by just looking out for it...(duh!) like these drafts of the 2003 version of the sql standard.

Wednesday, November 09, 2005

Download ISO 9075:1999.....Again!

Yep, I just spotted another location where you can download most of the 1999 version of the SQL Standard.

It's right here: http://www.cse.iitb.ac.in/dbms/Data/Papers-Other/SQL1999/.

As far as I can see, here the 3rd part is missing too, like in the location I blogged about before.

Not that I would encourage anyone to illegally download it, of course. I mean, altough it's a standard, I really think you should pay money to ISO and IEC, even though it´s a teeny weeny outdated.

Tuesday, November 08, 2005

Powerful MySQL Aggregate Functions

When I started exploring the MySQL SQL Dialect, one of the first things that struck me was the quite support for some quite exotic aggregate functions. Some of the are really useful too, and frankly, I'm at a loss why products like Microsoft SQL Server and Oracle don't support them too.

Just for those who don't know what I'm talking about, here are some examples of my favourites, based on the sakila sample database.

`COUNT(DISTINCT expr1[,expr2,...,exprN])`

Most RDBMS-es support COUNT(DISTINCT expr1). That returns the number of distinct values among all occurrences of expr1. MySQL takes this one step further, and allows you to count the number of distinct combinations of values.
For example, to count the number of film category/actor combinations, we can just do:


SELECT     count(DISTINCT f.category_id, fa.actor_id)
FROM       film f
INNER JOIN film_actor fa
ON         f.film_id = fa.film_id

The only way to achieve this result with the traditional version of COUNT(DISTINCT expr1) is to concatenate all the different expressions into on big expression. Believe me, it's a hassle, because you can't just concatenate. Just consider these category_id / film_id combinations: 1 - 22 and 12 - 2. So, the concatenation approach only works when you use some kind of separator...Which is not a solution, but just another hassle, because you have to think of a separator that does not appear within the data we're concatenating.

`GROUP_CONCAT([DISTINCT] expr1 [ORDER BY expr1[,..,exprN] [SEPARATOR expr1])`

Group_Concat performs string concatenation of the occurrences of expr1 (or distinct occurrences, if applicable). You can sort the occurrences that make up the aggregate inline using a normal ORDER BY syntax. Last but not least, you can separate the occurrences with a particular separator (the default is a comma).
For example, it allows us to write a query that shows us the film title and the list of actors, and all on the same line:


SELECT     f.title
,          group_concat(a.last_name)
FROM       film f
INNER JOIN film_actor fa
ON         f.film_id = fa.film_id
INNER JOIN actor a
ON         fa.actor_id = a.actor_id
GROUP BY   f.title

This really is an extraordinarily and remarkably powerful feature, which is very useful!
In other rdbms-es, you should either do this on the client side or write your own function to do this, and by the looks of it, quite a bunch of people think they have to do this in MySQL too (see: http://forums.mysql.com/read.php?98,53349,53363, http://forums.mysql.com/read.php?102,53112,53262). Such a function would use cursor traversal to loop throuh the list of actors given a particular film, and inside the loop, ordinary string concatenation would be used to deliver the final result.
This wouldnt be so bad if you'd only have to write that function just once. Of course, you can't, not in a straightforward way anyhow because the cursor is probably different for each case where you'd need it. Besides, this approach will usually perform quite poorly, especially when you'd use it for something like the film / actors example, becuase the cursor has to be opened for each film record.

So, here's my big thumbs up for the clever guy or gall that came up with this feature. Thank you!

Saturday, November 05, 2005

MySQL 5: Prepared statement syntax and Dynamic SQL

Every now and then, people are puzzled by the precise status and extent of MySQL support for dynamic SQL.

Statement Handling

MySQL support the prepared statement syntax. For the better part, a prepared statement is much like a 'normal', immediate statement. The main difference is seen in the way the statement is processed by the server.

Immediate Statements

When an immediate statement is issued, it is processed directly. Processing comprises the following steps:

Parsing: lexical and syntactic analysis of the statement

Planning: optimizer devises a strategy to realise the required result or action, the execution plan or query plan

Execution: retrieval/seeking, writing and reading of data and, if applicable, the construction of a resultset

After these steps, the server responds to the request by sending the client a resultset (if applicable), or an acknowledgement that the statement was executed. Of course, all these actions are performed in concert, and the client is not aware of these different steps taking place. This becomes clear when typing the following statement into the MySQL command line client tool:


mysql> select count(*) from information_schema.schemata;

+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.10 sec)

Immediately (well, almost) the request to select the number of schemata is responded to by returning the resultset.

Prepared Statements

A prepared statement is initiated by the PREPARE statement. A preparation for a query equivalent to previous one could look like this:


mysql> prepare stmt from 
    -> 'select count(*) from information_schema.schemata';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

This time, we didn't get a resultset.
The PREPARE statement instructs the server to parse the query, and possibly, to devise the execution plan. PREPARE associates an identifier with the statement, stmt, wich acts as a handle to refer to the statement and the corresponding execution plan.

Actual execution is postponed until called for by the EXECUTE statement, using the handle to identify the prepared statment to execute.


mysql> execute stmt;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.00 sec)

Which is the same result as we got when we issued the immediate statement.

This seems like a overly complex way to do what we could also do with just one statement. It is - until we execute it again:


mysql> execute stmt;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.00 sec)

....and again and again and again.

This actually touches upon the major purpose of the prepared statement concept: when a statement is to be repeatedly executed, a prepared statement is potentially more efficient. Because only the execution step needs to be repeated, there is less time wasted on parsing and building the query plan each time the result is required.
Increased efficiency is even more evident for statements that do not return a resultset (such as INSERT, UPDATE and DELETE) because the actions needed to construct a resultset are generally more timeconsuming than parsing and creating an execution plan.

Now, we need to elaborate just a little on this efficiency argument. In the Reference manual, you will bump in to this phrase pretty quickly:

MySQL 5.0 provides support for server-side prepared statements. This...takes advantage of the efficient client/server binary protocol...provided that you use an appropriate client programming interface. Candidate interfaces include the MySQL C API client library (for C programs), MySQL Connector/J (for Java programs), and MySQL Connector/NET.

An alternative SQL interface to prepared statements is available. This interface is not as efficient as using...a prepared statement API...

A little further on, the manual explains that the prepared statement syntax is available from within the SQL language primarily for development purposes; NOT to gain efficiency.

(I did some tests that suggest that the SQL prepared statement syntax is slower than immediate statements, but I don't know if this has to do with the query cache. I used INSERT statements BTW)

Using Parameters

A very powerful feature of prepared statments is the possibility to bind parameters to it. Parameters are specified by writing special ? placholders inside the sql statement that is to be prepared:


mysql> prepare stmt from 
    -> 'select count(*) from information_schema.schemata where schema_name = ?'
;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

When EXECUTE-ing the statement, these placeholders must be bound to user variables with the USING syntax:


mysql> set @schema := 'test';
Query OK, 0 rows affected (0.00 sec)

mysql> execute stmt using @schema;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

So, even though the statement has been prepared, we can still enjoy the flexibility of controlling the query result.

One thing to keep in mind is that parameters are not implemented using simple string substitution. For example, the placeholder in the previous example is not quoted inside the statement. It merely provides a slot for a value, and the binding process takes care of transferring that value to the slot inside the statement. Quotes are merely syntactic methods to distinguish a string from the surrounding code. Because the binding process is way beyond the level of parsing, it does not make sense to use quotes.

You can't just put a placeholder wherever you like. Parameter placeholders can appear only in those places inside the statement where you would normally expect an expression. In particular, you cannot use parameter placeholders to parameterize identifiers or entire statement structures. So, the following attempt fails with a syntax error, because the parameter placeholder appears where you would normally put an identifier:


mysql> prepare stmt from 'create table ? (id int unsigned)';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your
MySQL server version for the right syntax to use near '? (id int unsigned)' at line 1

However, is is not impossible to paramterize identifiers using the prepared statement syntax. I'll show that in a bit. It's just that you can't do it using parameters, because parameters are just a special cases of expressions, like column references, literals, etc.

Multiple Parameters

We are not restricted to just one parameter, we can use several:


mysql> prepare stmt from 
    -> 'select count(*) 
    -> from information_schema.schemata 
    -> where schema_name = ? or schema_name = ?'
;
Query OK, 0 rows affected (0.00 sec)
Statement prepared
mysql> execute stmt 
    -> using @schema1,@schema2
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

Parameter binding occurs in a positional manner.
Each ? placeholder must be matched bij exactly one user variable in the USING clause, where the first placeholder matches the first user variable, the second placeholder matches the second user variable and so on.

You really must match each placeholder by exactly one user variable, or else you will encounter an error:


1210 (HY000): Incorrect arguments to EXECUTE

Dynamic SQL

For no particular reason, PREPARE accepts either a string literal, or a user-defined variable to define the statement to prepare. It would've been just as conceivable to accept just a statement, like so:


mysql> prepare stmt from 
    -> select count(*) from information_schema.schemata;

But no, this does not work. This just results in a syntax error.

Anyway, we already saw how PREPARE accepts a statement in the form of a string literal. It's pretty much the same for a global user variable:


mysql> set @sql_text := 'select count(*) from information_schema.schemata';

mysql> prepare stmt from 
    -> @sql_text

Query OK, 0 rows affected (0.00 sec)
Statement prepared

Because we can freely assign whatever value we want to @sql_text user variable, we can use this as a device to employ dynamic SQL.

Now we know how to dynamically manipulate our identifiers too: we just manipulate our string before assigning it to the user variable, like so:


mysql> set @table_name := 'mytable';
Query OK, 0 rows affected (0.02 sec)

mysql> set @sql_text:=concat('create table ',@table_name,'(id int unsigned)');
Query OK, 0 rows affected (0.00 sec)

Cleaning up

There's one extra step in the process of using prepared statements that I did not yet mention. That's cleaning up (I guess that tells you something about me, right?). Once you've prepared a statement, the handle and the associated objects on the server's side will remain to exist until the client's session is over. This means that client will keep some of the server's resources occupied. Therefore, it's good practice to clean up afterwards. When you're sure you're done working with the statement, you should DEALLOCATE it:


mysql> deallocate prepare stmt;

This just tells the server to get rid of all the resources associated with the statement handle, and to forget about the statement handle as well.

Some links

Although it does not seem that the prepared statement syntax was designed for it (word is there will be true dynamic sql support in MySQL 5.1 in the form of the EXECUTE IMMEDIATE syntax), it sure does the job. If you want to read some articles on actually using this feature for a real purpose, check out these links:

"The Wizard revisited": By Beat Vontobel
"The power of dynamic queries": by Giuseppe Maxia
"Scheduling Stored Procedure Execution": by yours truly

Tuesday, November 01, 2005

Some more thoughts on Crosstabs

I noticed my previous blog on creating crosstabs in MySQL has generated some attention. It made me find out that I didn't do my homework as well as I would've liked.

Whilst writing it, I googled a bit for solutions, and I did stumble into a PERL solution written by Giuseppe Maxia. However, I totally missed Giuseppe's excellent article that is referred to by Beat Vontobel's blog entry describing a stored procedure that solves this problem for the general case.

Thank you guys, I learnt a lot from those!

Reading all that made me rethink the problem. In doing so, I thought of a little syntax that I would like to use to define a crosstab query. I don't intend to do anything with this right away, but I'm just curious what other people think. So, applied to the example from my earlier blog entry, I would like to write this:


select     f.category_id
,          f.film_id
,          f.title
,          i.store_id
,          r.inventory_id   
from       film f
inner join inventory i
on         f.film_id = i.film_id
left join  rental r
on         i.inventory_id = r.inventory_id
RENDER AS CROSSTAB
ROWS
    LEVEL(category_id) as category
,   LEVEL(film_id,title) as film
COLUMNS
    LEVEL(store_id) as store
CELLS (category,store)
    count(distinct film_id) as count_films
,   count(inventory_id) as count_rentals
CELLS (film,store)
    count(inventory_id) as count_rentals

As for the semantics, ROWS is like a operator that generates rows by applying some sort of 'cascading GROUP BY' to the rows from the SELECT expression using the LEVEL specifications, like this:

category LEVEL: GROUP BY category_id
film LEVEL: GROUP BY category_id, film_id, film_title

The COLUMNS operator is like ROWS, but it generates columns that slice up the rows generated by ROWS.

Finally, CELLS generates a group of cells at the intersections of the row/column pairs of LEVEL specifications between the parentheses. In the intersections, the values appear of the expressions in the comma separated list following the intersection specification.

This would generate a crosstab with this structure:


                                  +-------------------------------------------------------+
                                  |                         store                         |
                                  +---------------------------+---------------------------+
                                  |    1                      |    2                      |
--------+-------------------------+------------+--------------+------------+--------------+
category|film                     |            |              |            |              |
--------+-------+-----------------+count_films |count_rentals |count_films |count_rentals |
        |film_id|title            |            |              |            |              |
--------+-------+-----------------+------------+--------------+------------+--------------+
       1|                         |     29     |     103      |     35     |     125      |
--------+-------------------------+---------------------------+------------+--------------+
       1|     19|AMADEUS HOLY     |            |      13      |            |       0      |
        .       .                 .            .              .            .              .
        ...more rows here...            .              .            .              .
        .       .                 .            .              .            .              .
       1|    991|WORST BANGER     |            |       8      |            |      14      |
--------+-------+-----------------+------------+--------------+------------+--------------+
        .       .                 .            .              .            .              .
 some more rows are here   .            .              .            .              .
        .       .                 .            .              .            .              .

(This is only the structure, I have to think a bit how this would have to be rendered in a resultset)

Monday, October 31, 2005

Creating Crosstabs in MySQL

In some cases, it would be very nice to be able to get a crosstab result from a query. You can't really do that in MySQL, at least not directly; as matter of fact, you can't do it in most rdbms-es (Interestingly, MS Access does have a language construct for crosstabs: TRANSFORM...PIVOT..)
If you really need a crosstab, here are some techniques that might be of use. I'll illustrate them using the updated version of the sakila database. I really think the following stuff will be a lot more fun if you download and install this database first. If you think that's too much of a hassle, then just open the diagram of this database in a new window to help you understand queries.

Step 1: decide what objects you want to know something about

Let's say we want to know something about films. We will need something to recognize a film, and we'll be choosing the title to do that:


select  f.title
from    films   f

This will just return a list of all films.

Step 2: decide what you want to know about those objects

Well, suppose we want to know how well our films are being rented, we could do:


select     f.title
,          count(r.inventory_id) rentals
from       film             f
inner join inventory        i
on         f.film_id      = i.film_id
left join  rental           r
on         i.inventory_id = r.inventory_id
group by   f.title

This query takes all films, and finds the stores that have the film in their inventory. Because we do this using an inner join, we exclude the films that aren't in any store's inventory. This is good: by definition, there can't be any rentals for those films, so it does not make sense to include them in the result.

The inventories are left joined to the rentals. This ensures we won't exclude films that have never been rented. In this case, we definitely want to include these films. If you were a DVD rental store manager, wouldn't you want to know about those films that are for rent but have never been rented before?

Step 3: decide how you want to categorize your data

Now, we do know how many times a particular film has been rented, wich is good. But there's more to a rental than a film ending up to be rented. For starters, a film isn't "just" rented: it is being rented by a particular customer. Second, the customer doesn't "just" rent the film: it's rented from a particular store. The different customers that can rent films, and the different stores where they rent film from offer a perspective that let's us 'slice' out a subset of all the rentals.

For this example, let's stick to the stores to categorize our data.

Given our previous query, it's pretty easy to get our hands on the actual information. We just need to add the store_id to the select and group by list from the previous query:


select     f.title
,          i.store_id
,          count(r.inventory_id) rentals
from       film             f
inner join inventory        i
on         f.film_id      = i.film_id
left join  rental           r
on         i.inventory_id = r.inventory_id
group by   f.title
,          i.store_id

This will give us the number of times a film was rented per store.

Hey, you promised us a crosstab, now pay up!

But wait! This is not a crosstab, this is just a tablular resultset! That's right of course. But we know all the information is there, so what's the matter then?

Actually, a crosstab is just a particular presentation of data. To present a crosstab of this resultset, we could have the films along it's vertical axis (in the direction of the rows), the stores along it's horizontal axis (in the direction of the columns). (We could just as well turn it the other way around, but because we have so much movies, it's probably more practical to have them placed along the vertical axis.) The number of rentals would then be at the row-column intersections or the cells.

Now, as for having the films along the vertical axis - that's pretty much what our queries have been doing all along. A SELECT expression are all about dynamically generating resultset rows according to data in database tables. We can be quite confident that this will not pose a problemn.

Having the stores along the horizontal axis is an entirely different story. Columns in the resultset of a SELECT expression correspond to the individual select list expressions. We sure do control generation of resultset columns ourselves: the expression that we type is evaluated to data appearing in the column, and if we give that expression an alias, that will be used as the column heading. However, we can only exert this control in a static manner, by typing them as part of the statement. What we would like to have is a kind of device that would generate columns dynamically, driven by the data (in our case: the rows from the store table).
Actually, we need to generate columns quite like the way a SELECT expression generates rows.

Let's accept this situation - for the moment.

Step 4: Coding the horizontal axis

To find a way out, let's just pretend our collection of stores is not dynamic, (it doesn't change that often anyway) and try to achieve only the layout of a crosstab. This would involve us typing a separate SELECT list expression for each store, thus statically creating a column definition for each store.
So let's inspect the store table and see for what stores we would have to type an extra column expression:


select  *
from    store

gives us:


+----------+---------+------------+
| store_id | manager | address_id |
+----------+---------+------------+
|        1 |       1 |          1 |
|        2 |       2 |          2 |
+----------+---------+------------+

That's not too bad. Only two stores, wich means we'll only have to type two expressions. Let's concentrate on these expressions.
In our previous queries, we used:


count(r.inventory_id)

to calculate the number of rentals. Now, we still want to calculate the number of rentals, but for a particular column, we only want to count those rentals that occurred in the store corresponding to the column. We need something like a conditional version of count. So, suppose we would be writing the expression for the store with store_id = 1, we could try something like this:


count(
    if(i.store_id=1
    ,   r.inventory_id
    ,   null
    )
)

But wait! Something's wrong here. What if a particular film would be in the Inventory of the store with store_id = 2, but not in the inventory of Store with store_id = 1? Well, the if expression would always return nnull, because no records will be found for the store with store_id = 1. Count will return the count of non-null values, but zero if there are only null values. So, this expression will return zero for the store with store_id = 1.

This is bad news! It means we cannot distinguish between a film that does not exist in the inventory of a particular store and a film that does exist in the inventory but isn't rented at all. Lucky for us, we can emulate count using sum, like this:


sum(
    if(i.store_id=1
    ,   1
    ,   null
    )
)

The if will return 1 for each record where the store_id = 1, and null in other cases.
SUM will faithfully sum these 1's while ignoring the null's, effectively yielding the count.
If sum is fed with only null arguments, sum will return null. This is good, because the result will now let us distinguish between having a film that happens not to exists in the inventory of a particular store as opposed to having the film in the inventroy without ever renting it.

So, the query looks like this:


select     f.title
,          sum(
               if(i.store_id=1
               ,   1
               ,   null
               )
           )                "store_id=1"
,          sum(
               if(i.store_id=2
               ,   1
               ,   null
               )
           )                "store_id=2"
from       film             f
inner join inventory        i
on         f.film_id      = i.film_id
left join  rental           r
on         i.inventory_id = r.inventory_id
group by   f.title

So, there's our crosstab. It wasn't hard at all, just a bit tedious.

Step 4 - continued: Generating the statement

Up untill now, we've put up with the nuisance of 'generating' columns by typing them. Now it's time to stop putting up with that. We've hand-coded a working Example, and it has taken us enough time. We do not want to waste more time on the same problem in the future. And really, we're bound to encounter this problem again. Sooner or later, the data in the store table will change, probably invalidating our crosstab query.

But hey, it's not so bad, is it? We know exactly wich parts of the code correspond to occurrences in the store table, right? And we know what parts of the code are independant of all that. So, let's try and devise something that will generate all that code for us.


    select   concat(
                 'select   f.title','\n'
             ,   group_concat(
                     concat(
                         ',          sum(','\n'
                     ,   '               if(i.store_id=',s.store_id,'\n'
                     ,   '               ,   1','\n'
                     ,   '               ,   null','\n'
                     ,   '               )\n'
                     ,   '           )'
                     ,   '                "store_id=',s.store_id,'"\n'
                     )
                     order by s.store_id
                     separator ''
                 )
             ,   'from       film             f','\n'
             ,   'inner join inventory        i','\n'
             ,   'on         f.film_id      = i.film_id','\n'
             ,   'left join  rental           r','\n'
             ,   'on         i.inventory_id = r.inventory_id','\n'
             ,   'group by   f.title','\n'
             ) statement
    from     store s

This query returns just one row. The one column will contain a string value that is exactly equal to the statement text of our previous hand-coded crosstab query. Compare the hand-coded crosstab-query SELECT expression with the SELECT list of the generator. As you can see, the generator does not much more than select the concatenated string literals that make up the crosstab query. The only variable element is the GROUP_CONCAT call wich aggregates all the rows from the store table in the FROM list of the generator, yielding the collection of SELECT list expressions that makes up the horizontal axis of the crosstab, and of course the data appearing in the cells.

(By the way, GROUP_CONCAT is a commonly recurring element in these generators. I just love this feature, wich really seems to be unique to MySQL. So much for open source products not being innovative)

Now that we can generate the crosstab query SELECT expression, we will never have to worry about data changing in the store table. We can simply run this query to obtain an updated query for the crosstab.

Step 4 - continued: wrapping generation and execution in a stored procedure

Generating the query is certainly an improvement. By generating the statement, we created a single point of definition that handles the duplication of code (the Column expression) in a structured, data driven way. That saves us time, and more important, it will always be correct. It will never contain syntax errors, and it will always cost the same amount of time to create an updated version of the Statement should the data in the store stable be modified.

It's still a bit of a nuisance that we can't just execute the crosstab query with a single statement though.
Right now, we always need at least two statements: one to generate the statement, and one to execute the generated statement.
We would really like to do that in one statement.

Lucky for us, we can actually do that.
We can create a stored procedure that performs these steps in the right order:

CREATE PROCEDURE `sakilar2`.`p_film_rentals_per_store`()
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
begin
    select   concat(
                 'select   f.title','\n'
             ,   group_concat(
                     concat(
                         ',          sum(','\n'
                     ,   '               if(i.store_id=',s.store_id,'\n'
                     ,   '               ,   1','\n'
                     ,   '               ,   null','\n'
                     ,   '               )\n'
                     ,   '           )'
                     ,   '                "store_id=',s.store_id,'"\n'
                     )
                     separator ''
                 )
             ,   'from       film             f','\n'
             ,   'inner join inventory        i','\n'
             ,   'on         f.film_id      = i.film_id','\n'
             ,   'left join  rental           r','\n'
             ,   'on         i.inventory_id = r.inventory_id','\n'
             ,   'group by   f.title','\n'
             ) statement
    into     @film_rentals_per_store_sql
    from     store s
    order by s.store_id
    ;
    prepare  film_rentals_per_store
    from     @film_rentals_per_store_sql
    ;
    execute  film_rentals_per_store
    ;
    deallocate 
    prepare  film_rentals_per_store;
end

Inside the procedure, we first generate the sql that defines the query, and then use the prepared statement syntax to execute the generated statement. It's not as if we really need statement preparation; we just need some dynamic SQL. The prepared statement syntax happens to be the only construct that supports this in MySQL right now.

By now, we've achieved pretty much all we wanted. We can get a crosstab query result by simply calling the procedure. I'ts a bit of a pity we cannot setup a more generalized approach, but I'll settle with this for a while.