Tuesday, November 01, 2005

Some more thoughts on Crosstabs

I noticed my previous blog on creating crosstabs in MySQL has generated some attention. It made me find out that I didn't do my homework as well as I would've liked.

Whilst writing it, I googled a bit for solutions, and I did stumble into a PERL solution written by Giuseppe Maxia. However, I totally missed Giuseppe's excellent article that is referred to by Beat Vontobel's blog entry describing a stored procedure that solves this problem for the general case.

Thank you guys, I learnt a lot from those!

Reading all that made me rethink the problem. In doing so, I thought of a little syntax that I would like to use to define a crosstab query. I don't intend to do anything with this right away, but I'm just curious what other people think. So, applied to the example from my earlier blog entry, I would like to write this:


select f.category_id
, f.film_id
, f.title
, i.store_id
, r.inventory_id
from film f
inner join inventory i
on f.film_id = i.film_id
left join rental r
on i.inventory_id = r.inventory_id
RENDER AS CROSSTAB
ROWS
LEVEL(category_id) as category
, LEVEL(film_id,title) as film
COLUMNS
LEVEL(store_id) as store
CELLS (category,store)
count(distinct film_id) as count_films
, count(inventory_id) as count_rentals
CELLS (film,store)
count(inventory_id) as count_rentals


As for the semantics, ROWS is like a operator that generates rows by applying some sort of 'cascading GROUP BY' to the rows from the SELECT expression using the LEVEL specifications, like this:

category LEVEL: GROUP BY category_id
film LEVEL: GROUP BY category_id, film_id, film_title

The COLUMNS operator is like ROWS, but it generates columns that slice up the rows generated by ROWS.

Finally, CELLS generates a group of cells at the intersections of the row/column pairs of LEVEL specifications between the parentheses. In the intersections, the values appear of the expressions in the comma separated list following the intersection specification.

This would generate a crosstab with this structure:


+-------------------------------------------------------+
| store |
+---------------------------+---------------------------+
| 1 | 2 |
--------+-------------------------+------------+--------------+------------+--------------+
category|film | | | | |
--------+-------+-----------------+count_films |count_rentals |count_films |count_rentals |
|film_id|title | | | | |
--------+-------+-----------------+------------+--------------+------------+--------------+
1| | 29 | 103 | 35 | 125 |
--------+-------------------------+---------------------------+------------+--------------+
1| 19|AMADEUS HOLY | | 13 | | 0 |
. . . . . . .
...more rows here... . . . .
. . . . . . .
1| 991|WORST BANGER | | 8 | | 14 |
--------+-------+-----------------+------------+--------------+------------+--------------+
. . . . . . .
some more rows are here . . . . .
. . . . . . .



(This is only the structure, I have to think a bit how this would have to be rendered in a resultset)

4 comments:

Anonymous said...

can re-upload your previous database example or send it to me so that I may try it too...


my e-mail address is samueln@onpoint-it.com

kind regards
thank you

rpbouman said...

Hi!

The database is Sakila. The SQL sprung from imagination - MySQL does not understand this syntax, nor does any other RDBMS.

If you want to create crosstabs, look here:

The Wizard revisited: Dynamic Crosstabs using MySQL Stored Procedures By Beat Vontobel

MySQL Proxy tutorial - Crosstabs By Giuseppe Maxia

Or check out an OLAP tool like Mondrian

Unknown said...

Hi Roland,

Any luck yet with being able to render the result set?

Deepak

rpbouman said...

Hi Deepak,

not sure what you are talking about. Are you referring to the other crosstab post, http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html?

DuckDB bag of tricks: Processing PGN chess games with DuckDB - Rolling up each game's lines into a single game row (6/6)

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is the sixth installment of a s...