I never had the pleasure of meeting Alex in person, but his articles are always interesting and of high quality, and the SQL wizardry he pulls off is downright inspiring. I really feel humbled by the creativity of some of his solutions and his apparent experience with multiple RDBMS products.
Alex' most recent post is about aggregation, and finding a top 3 based on the aggregate:
In MySQL I have a table calledmeanings
with three columns:word
,meaning
,person
.word
has 16 possible values,meaning
has 26. A person assigns one or more meanings to each word. In the sample above, person 1 assigned two meanings to word 2. There will be thousands of persons. I need to find the top three meanings for each of the 16 words, with their frequencies. Is it possible to solve this with a single MySQL query?
Alex presents a solution that uses
GROUP_CONCAT
basically as a poor man's windowing function, a technique I have described on several occasions in the past for ranking, median and percentile solutions in MySQL.Now, Alex' solution is very clever and there are some elements that I think are very creative. That said, I think his solution can be improved still. Normally I wouldn't write a blog about it, and simply leave a comment on his blog, but his blog supports comments only for general articles, which is why I present it here:
SELECT word
, CONCAT(
SUBSTRING_INDEX(
GROUP_CONCAT(meaning ORDER BY num DESC), ',', 1
)
, ' ('
, SUBSTRING_INDEX(
GROUP_CONCAT(num ORDER BY num DESC), ',', 1
) / SUM(num) * 100
, '%)'
) rank1
, CONCAT(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(meaning ORDER BY num DESC), ',', 2
), ',', -1
)
, ' ('
, SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(num ORDER BY num DESC), ',', 2
), ',', -1
) / SUM(num) * 100
, '%)'
) rank2
, CONCAT(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(meaning ORDER BY num DESC), ',', 3
), ',', -1)
, ' ('
, SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(num ORDER BY num DESC), ',', 3
), ',', -1
) / SUM(num) * 100
, '%)'
) rank3
FROM (
SELECT word, meaning, COUNT(*) num
FROM t_meaning m
GROUP BY word,meaning
) a
GROUP BY word
This gives me output like this:
+------+--------------+--------------+--------------+
| word | rank1 | rank2 | rank3 |
+------+--------------+--------------+--------------+
| 1 | 16 (3.9728%) | 17 (3.9648%) | 12 (3.9632%) |
| 2 | 9 (3.9792%) | 10 (3.9632%) | 20 (3.9328%) |
| 3 | 20 (3.9744%) | 13 (3.968%) | 1 (3.9648%) |
| 4 | 26 (3.952%) | 7 (3.9456%) | 17 (3.9424%) |
| 5 | 9 (4.008%) | 21 (3.9824%) | 20 (3.936%) |
| 6 | 19 (3.9504%) | 10 (3.9488%) | 13 (3.9408%) |
| 7 | 23 (4.0464%) | 12 (3.976%) | 19 (3.9648%) |
| 8 | 23 (4.0112%) | 3 (4.0096%) | 8 (3.9328%) |
| 9 | 10 (4.016%) | 19 (3.984%) | 15 (3.9616%) |
| 10 | 10 (4.0304%) | 14 (3.9344%) | 11 (3.9312%) |
| 11 | 16 (3.9584%) | 6 (3.9296%) | 19 (3.9232%) |
| 12 | 7 (3.9968%) | 1 (3.9392%) | 26 (3.9264%) |
| 13 | 8 (4.048%) | 25 (3.9712%) | 23 (3.9616%) |
| 14 | 16 (3.9936%) | 26 (3.9632%) | 4 (3.9536%) |
| 15 | 22 (4.0608%) | 12 (4.0048%) | 1 (3.9632%) |
| 16 | 14 (4.0032%) | 18 (3.9712%) | 4 (3.9488%) |
+------+--------------+--------------+--------------+
16 rows in set (0.63 sec)
On my laptop, my solution is about 30% faster than the one presented by Alex. Personally I think mine is easier to understand too, but that is a matter of taste.
Anyway, I'm just posting this to share my solution - I do not intend to downplay the one presented by Alex. Instead, I invite everyone interested in SQL, MySQL and PostgreSQL to keep an eye on Alex' blog as well as his excellent answers on Stackoverflow. He's an SQL jedi master in my book :)
Of course, if you have a better solution to crack this problem in MySQL, please leave a comment. I'd love to hear what other people are doing to cope with these kinds of queries.
2 comments:
Hi Roland,
Have just came upon this post.
A more general solution, allowing for top-n rows per group:
http://code.openark.org/blog/mysql/sql-selecting-top-n-records-per-group
Nice Article !
I have also work around this topic and created small alternative demo to find N record for each group in MySQL.
Please visit my article.
http://www.dbrnd.com/2015/08/find-top-n-records-for-each-group-in-mysql/
Post a Comment