Hi!It's been a while since I posted - I admit I'm struggling for a bit to balance time and attention to the day job, writing a book, preparing my talks for the MySQL user's conference and of course family life.
A month ago or so I read a couple of posts about using the Google chart API to visualize database size. Although I personally would not consider using Google Charts (in its current form) for serious application monitoring applications, I am quite charmed by its ease of use and availability. Time to give it a try myself.
I inspected the PL/SQL code provided by Alex Gorbachev, and the MySQL code by Walter Heck, as well as the improved implementation by Ruturaj Vartak.
Although I applaud both Walter and Ruturaj's efforts in porting this code, I think their code can be improved still. In this short article I'd like to illustrate how a minor modification can double performance.
Code AnalysisLet's take a brief moment to analyze Walter's original code. I will cite a few fragments of his code. (I have made some simplifications and removed some distractions to make it as easy as possible to understand the logic of the code. If you find an error in my citation, please consider the possibility that it is my doing, not Walter's).
His program takes the form of a MySQL stored function. It accepts a few parameters to configure the chart (chart type and size) and outputs a fully functional URL which can be used to retrieve a .png image that shows the size (in MB) of all databases managed by the MySQL server. This is the function signature:
CREATE FUNCTION FNC_GOOGRAPH_DB_SIZE(The stored function is implemented by retrieving size metadata from the
) RETURNS varchar(255)
information_schema.TABLEStable. The size of each table's data and indexes are then added and summed per database. The results are traversed using a cursor loop.
Here's the cursor declaration:
DECLARE c_schema_sizes cursor FORNote that the figure is immediately represented as megabytes by dividing twice by
, SUM(t.data_length + t.index_length) / 1024 / 1024
FROM information_schema.tables t
GROUP BY t.table_schema
1024. (In other words, divide by
1024to make kilobytes of the raw bytes, then divide by
1024again to make megabytes out of kilobytes.)
During iteration, a list of database names and a list of database sizes are built through string concatenation. These are required to supply the data series and labels to the chart.
Only the code for constructing the data series is shown here:
/* Get the percentage of the total size as the graph's data */Note that instead of making a list of actual sizes, a percentage is taken. Keep that in mind, we'll discuss this in more detail in the next section.
IF v_chart_data = '' THEN
SET v_chart_data =
ROUND(v_data_length_sum / v_data_length_total, 2) * 100;
SET v_chart_data = CONCAT(v_chart_data,',',
ROUND(v_data_length_sum / v_data_length_total, 2) * 100);
Finally, the actual URL is built, using the list of database sizes as the data series and the list of database names as data labels. Here's the URL construction code:
SET v_url = 'http://chart.apis.google.com/chart?';
SET v_url = CONCAT(v_url, 'cht=', p_chart_type);
SET v_url = CONCAT(v_url, '&chs=', p_width , 'x', p_height);
SET v_url = CONCAT(v_url, '&chtt=Database Sizes (MB)');
SET v_url = CONCAT(v_url, '&chl=', v_chart_labels);
SET v_url = CONCAT(v_url, '&chd=t:', v_chart_data);
SET v_url = CONCAT(v_url, '&chdl=', v_legend_labels);
About Google Chart Data FormatsWe just mentioned that the data points are not simply concatenated - rather, data size per database is expressed as a percentage of the total size of all databases. This is not just some arbitrary choice - it is in fact required by the Google Chart API. Here's a quote from the Overview of data formats:
Before you can create a chart you must encode your data into a form that is understood by the Chart API. Use one of the following formats:For this reason, the code divides everything by
- Text encoding uses a string of positive floating point numbers from zero to one hundred.
v_data_length_totaland multiplies by
100. In our code analysis, we did not discuss
v_data_length_totalbecause it was not really essential for the logic of the program. This is how it is computed:
SELECT ROUND(SUM(t.data_length + t.index_length) / 1024 / 1024)Note that this query is very similar to the query used in the cursor declaration. In both cases, the
FROM information_schema.tables t
information_schema.TABLEStable is queried to calculate the combined size of table data and indexes. The main difference is grouping: in the cursor declaration a
GROUP BY table_schemawas used to calculate the size per database. In this case, grouping is absent and the size is calculated across databases, i.e. the combined size of all tables and indexes in the entire server.
Text encoding with Data Scaling
Now, as it turns out, the text-endoding used by this code comes in two flavours:
Text encoding with data scaling uses a string of positive and negative floating point numbers in combination with a scaling parameter.So, as an alternative to 'pre-scaling' the values in the data series to a 100-point scale, we can also opt to let Google do the work, provided we pass this scale factor.
Because the code to calculate the scale factor is already present, we can simply remove the scaling computation inside the loop, and add the scale factor to the URL like so:
However, we can do better than that.
-- inside loop: remove percentage calulation:
IF v_chart_data = '' THEN
SET v_chart_data = ROUND(v_data_length_sum,2);
SET v_chart_data = concat(v_chart_data, ',', ROUND(v_data_length_sum, 2));
-- outside loop: add scaling parameter to URL:
SET v_url = CONCAT(v_url, '&chds=0,', v_data_length_total);
An alternative method to calculate the scaling factorThe problem with the current solution is that it has to do a separate query to obtain the total size. We already noticed that both used queries are quite similar, save for the grouping. Because we are already looping through all results, we can try to calculate the total ourselves. This would allow us to avoid doing the
We would need to add proper initialization for the
v_data_length_totalvariable and add the following line inside the loop:
SET v_data_length_total := v_data_length_total + v_data_length_sum;When comparing performance before and after this change, we can observe a dramatic change. On my laptop, the original function takes anywhere between 45 and 50 seconds. After incorporating these changes, the time is slashed by two and the function takes 'only' 22 to 24 seconds.
The fact that the second solution is twice as fast (~23 seconds instead of ~46) is not a coincidence: it's because we're doing half the number of queries (1 instead of 2). Don't get me wrong - this is still very slow. But this comes all down to poor
information_schemaperformance. In this particular case, there is not much we can do to improve the code further to gain performance.
More performance improvements?I am very much convinced the bottleneck in database size chart function is the query on the
information_schema.TABLEStable. To be more exact, it's the fact that we're accessing the
INDEX_LENGTHcolumns. Just watch this:
mysql> select count(*) from information_schema.tables;As you can see, simply accessing
| count(*) |
| 1576 |
1 row in set (0.08 sec)
mysql> select count(data_length) from information_schema.tables;
| count(data_length) |
| 1563 |
1 row in set (25.98 sec)
mysql> select count(index_length) from information_schema.tables;
| count(index_length) |
| 1563 |
1 row in set (25.41 sec)
INDEX_LENGTHcauses the query to be slow.
Notice also that the time spent to execute this stand-alone query is about the same as it takes for the improved function to complete. Basically, this tells us the function spends all its time performing the query - timewise, the contribution of the remainder of the function, such as cursor traversal, creating the value lists and building the URL is simply negligible.
This means that we simply can't improve the performance of the database size chart function unless we can improve the performance of this query. Because we have no other general way of obtaining index and data size, this is the end of the line.
Other ImprovementsEven though we probably can't improve the performance of Walter's function anymore, I still think it's possible to improve the code.
I guess this is kind of a pet peeve of mine, but I dislike using MySQL cursors. There's a lot of syntax involved to set them up. Usually, they can be avoided anyway.
MySQL Cursors are also pretty slow. This is something you really start to notice when traversing tens of thousands of rows. Not that that really matters for building Google Chart URLs: Most likely, you will hit a limitation in either chart size or URL length, so you can't really gain a lot of performance by eliminating cursors for this particular purpose.
Still, I think that eliminating the cursor will help to make the code less complex, so lets try anyway.
Building Lists: We just explained how the cursor was used to build various lists of values. As it happens, MySQL supports the
GROUP_CONCAT()function, which was designed especially for that purpose.
GROUP_CONCAT()is an aggregate function, just like
Like other aggregate functions,
GROUP_CONCAT()can produce a single summary result on a group of rows. It does so by first concatenating its arguments for each row in the group, and then concatenating the per-row result for the entire group of rows, optionally separating row results using some separator.
Now, there is an important limitation with this function that causes many people to avoid
GROUP_CONCATaltogether. Here's the relevant text from the manual:
The result is truncated to the maximum length that is given by theBasically, this says that if you don't take proper precautions, the lists you generate with
group_concat_max_lensystem variable, which has a default value of 1024.
GROUP_CONCATmay be truncated. This is obviously bad news! However, there is a very simple workaround, which is also hinted at in the documentation:
The value can be set higher, although the effective maximum length of the return value is constrained by the value ofSo, the workaround is simple: By assigning the value of
group_concat_max_lenwil allow the longest possible list of values. You may argue that this might still not long enough. However, that is a mooit point. Just read up on
The maximum size of one packet or any generated/intermediate string.In other words, no MySQL string will ever exceed its length beyond
max_allowed_packet- the same limit holds for any other method of concatenating strings within MySQL, including cursor loops.
A cursor-less database size chart functionWithout further ado, this is how I would write the database size chart function:
CREATE FUNCTION f_dbsize_google_chart(Here's a quick summary that points out some differences with regard to the original code:
, p_width MEDIUMINT UNSIGNED
, p_height MEDIUMINT UNSIGNED
READS SQL DATA
DECLARE v_sum_size, v_max_size DOUBLE;
DECLARE v_size_series, v_size_labels LONGTEXT;
-- store current group_concat_max_len so we can reset
DECLARE v_group_concat_max_len BIGINT UNSIGNED DEFAULT @@group_concat_max_len;
-- ensure group_concat capacity
SET @@group_concat_max_len := @@max_allowed_packet;
-- get the database size
SELECT ROUND(SUM(size),2), MAX(size)
ORDER BY size, table_schema
ORDER BY size, table_schema
INTO v_sum_size, v_max_size
, v_size_series, v_size_labels
, SUM(data_length + index_length) / 1024 / 1024 size
WHERE table_type = 'BASE TABLE'
GROUP BY table_schema
-- restore original group_concat_max_len
SET @@group_concat_max_len := v_group_concat_max_len;
-- build URL
, '?cht=' , p_chart_type
, '&chs=' , p_width, 'x', p_height
, '&chds=0,', v_max_size
, '&chd=t:' , v_size_series
, '&chdl=' , v_size_labels
, '&chl=' , replace(v_size_series, ',', '|')
, '&chtt=MySQL Database Size (', v_sum_size, 'MB)'
- Instead of accepting
CHAR(1)for the chart type, an
ENUM('bhs','p')is used to restrict the value to a listed type.
- Instead returning a
varchar(3000), this function returns
LONGTEXT, effectively leaving it to the Google chart API to report a URL length limitation
- Instead of a cursor, we use a single
- The query in the
SELECT...INTOstatement uses a subquery in the
FROMclause which is functionally equivalent to the actual cursor in the original code. A small but important improvement is the addition of a condition to restrict the result only to base tables. This automatically excludes the
information_schemaand databases that contain only views
- The outer query is functionally equivalent to the cursor loop and uses one
GROUP_CONCAT()expression to obtain a list of values, and one
GROUP_CONCAT()expression to obtain a list of labels. Here, we also calculate the scaling factor by taking the
MAXof the database size.
- To work around the truncation problem with
GROUP_CONCAT, we set
group_concat_max_lenthe the maximum practical value. We don't just set
max_group_concat_lenand leave it at that. We restore its original value at the end of the stored function.