Tuesday, November 08, 2005

Powerful MySQL Aggregate Functions

When I started exploring the MySQL SQL Dialect, one of the first things that struck me was the quite support for some quite exotic aggregate functions. Some of the are really useful too, and frankly, I'm at a loss why products like Microsoft SQL Server and Oracle don't support them too.

Just for those who don't know what I'm talking about, here are some examples of my favourites, based on the sakila sample database.

COUNT(DISTINCT expr1[,expr2,...,exprN])


Most RDBMS-es support COUNT(DISTINCT expr1). That returns the number of distinct values among all occurrences of expr1. MySQL takes this one step further, and allows you to count the number of distinct combinations of values.
For example, to count the number of film category/actor combinations, we can just do:

SELECT count(DISTINCT f.category_id, fa.actor_id)
FROM film f
INNER JOIN film_actor fa
ON f.film_id = fa.film_id

The only way to achieve this result with the traditional version of COUNT(DISTINCT expr1) is to concatenate all the different expressions into on big expression. Believe me, it's a hassle, because you can't just concatenate. Just consider these category_id / film_id combinations: 1 - 22 and 12 - 2. So, the concatenation approach only works when you use some kind of separator...Which is not a solution, but just another hassle, because you have to think of a separator that does not appear within the data we're concatenating.

GROUP_CONCAT([DISTINCT] expr1 [ORDER BY expr1[,..,exprN] [SEPARATOR expr1])


Group_Concat performs string concatenation of the occurrences of expr1 (or distinct occurrences, if applicable). You can sort the occurrences that make up the aggregate inline using a normal ORDER BY syntax. Last but not least, you can separate the occurrences with a particular separator (the default is a comma).
For example, it allows us to write a query that shows us the film title and the list of actors, and all on the same line:

SELECT f.title
, group_concat(a.last_name)
FROM film f
INNER JOIN film_actor fa
ON f.film_id = fa.film_id
INNER JOIN actor a
ON fa.actor_id = a.actor_id
GROUP BY f.title

This really is an extraordinarily and remarkably powerful feature, which is very useful!
In other rdbms-es, you should either do this on the client side or write your own function to do this, and by the looks of it, quite a bunch of people think they have to do this in MySQL too (see: http://forums.mysql.com/read.php?98,53349,53363, http://forums.mysql.com/read.php?102,53112,53262). Such a function would use cursor traversal to loop throuh the list of actors given a particular film, and inside the loop, ordinary string concatenation would be used to deliver the final result.
This wouldnt be so bad if you'd only have to write that function just once. Of course, you can't, not in a straightforward way anyhow because the cursor is probably different for each case where you'd need it. Besides, this approach will usually perform quite poorly, especially when you'd use it for something like the film / actors example, becuase the cursor has to be opened for each film record.

So, here's my big thumbs up for the clever guy or gall that came up with this feature. Thank you!

10 comments:

Anonymous said...

I believe this particular feature would be one of Felix's favorite functions, wouldn't you agree? :)

rpbouman said...

Is it? I don't think a spreadsheet like MS Excel has anything like it, so it could very well be :)

Dexter Wintergreen said...

thanks Roland for the pointer to group_concat, it provides exactly the flat view I need for presentation layer reporting and charting. Earlier I created nested cursors to achieve this, not really happy with soln and after some strong coffee I find your blog entry. I should have searched the group by docs better...
regardless, bedankt!

Anonymous said...

thanks for this short but usefull information ... exactly what I needed to process some many to many relation ships

Unknown said...

I cannot thank you enough. You've saved me an extraordinary amount of time with group_concat. Truly, truly grateful!

stefano said...

Hi.

I used similar aggregate functions in Oracle 8.1.7. Can't remember the exact function name, though

Anonymous said...

And that's just for built-in functions. You can actually create your own functions and call them as select fields. The only shameful limitation about it is the fact that MySQL functions do not support dynamic SQL.

Anonymous said...

And that's just for built-in functions. You can actually create your own functions and call them as select fields. The only shameful limitation about it is the fact that MySQL functions do not support dynamic SQL.

Unknown said...

can we create dynamic columns as how we use in maria db like wise the same using mysql commands,is that possi??

rpbouman said...

Hindhuja,

I don't see how dynamic columns have anything todo with this articles, but the answer is no, MySQL does not support dynamic multi-valued columns.

SAP HANA Trick: DISTINCT STRING_AGG

Nowadays, many SQL implementations offer some form of aggregate string concatenation function. Being an aggregate function, it has the effe...