Tuesday, March 13, 2007

So, are Database Stored Procedures Good or Bad?

A little less than a week ago, I opened a zoomerang survey about database stored procedures. In this post, I'm presenting the results.

In just a few days, the survey was visited 232 times, and no less than 155 people completed the survey! (There was a very small (2) number of people that answered some, but not all the items, and these are excluded from the results proper.)

I didn't really know what to expect, but I did not expect as much as this! So, I'm quite pleased, and I want to thank everybody that took the time to complete the survey. Thank You very, very much, I appreciate your efforts a lot!

Before we Begin


Before I present the results, I want to explain a few things about the survey.

First of all, I want to stress that the results I am about to present are not to be generalized. There is no way to discover to what extent the responses are representative for a larger group.

To put things in perspective, one should realize that I actively solicited response by posting announcements on my own blog, the O'Reilly Database blog and the MySQL stored procedure forum. In addition, I posted to the PlanetMySQL Google Group.

If you want to use the results somewhere, feel free but be sure to link back here so people can judge for themselves what the status is of the results. Thank you.

Secondly, why do I need this survey? Isn't it clear when and where to use stored procedures?

Well, I have the impression that there are quite a few myths concerning stored procedures in general. Stored procedures seem to be the kind of topic that deeply divides database professionals in two camps. The discussion whether stored procedures are "Good" or "Bad" seems to have revived ever since MySQL started supporting them.

I happen to believe that stored procedures are Good in some cases, but Bad in others. I also think that in most cases, it is possible to deduce whether one should use a stored procedure - and when not. I've been wanting to write an article about this for some time now. However, there are already a lot of articles about it, and I decided it would be fun to link it to an inquiry so it would be easier for people to relate to it.

(The article is going to have to wait until next time - I'm just presenting the results of the survey -your opinion- today)

Now - without further ado - the results!

Performance


The first item in the survey was:
Certain application tasks can be implemented either as a server-side stored procedure or using client-side application code and 'pure' SQL statements. What is your opinion on the difference in performance between these two approaches?
Three alternative answers were offered:
  • Server side stored procedures are faster than client side application code and pure SQL statements (72%; 112)
  • Server side stored procedures are slower than client side application code and pure SQL statements (8%; 12)
  • I have a different opinion (20%; 31)
Here's a pie chart of the results:

Database Stored Procedures - Good or Bad-RawData_html_m368298f8

I must say, this one really surprised me. There seems to be quite a large group of people that have the opinion that procedures outperform normal, plain SQL statements. The people that think the opposite are in the smallest group. There is quite a substantial group that did not want to choose a particular stand and chose to formulate a different opinion.

In the group that chose "I have a different opinion", the vast majority (25 respondents, 16% of the total) provided an answer to the effect that it all depends on the specific situation. In some cases, a reason was given, usually to indicate that stored procedures could help to reduce network roundtrips.

One respondent had a very original and interesting answer:
Maybe faster but a severe CPU scalability risk.
If I understand correctly, what is meant is that although executing an individual stored procedure maybe faster than executing the equivalent application code, stored procedures can have an adverse effect on scaling up the solution.

Because stored procedures reside in the database, and are executed on the database server, the application will be limited by the processing power of the database server. This can be remedied by physically separating the processing tasks from the database tasks. This allows one to grow the application layer more independently from the data layer - something which will become more and more important as the number of application users grows.

Two persons gave a response that was very MySQL specific. Here are the responses:
Neither. Depends with MySQL on num of things...
and
*MySQL* SP are slower than normal SQL
I find the latter comment interesting because MySQL stored procedures are indeed different from, say, Oracle or MS SQL procedures in this respect. MySQL compiles stored procedures very differently than Oracle or MS SQL does. I'm planning to illustrate this in more detail next time.

A minority (2 respondents) just didn't have a specific opinion, and 1 person argued that it didn't matter at all, and that stored procedure performance would be about the same as that of application code running ordinary SQL statements.

Portabilty

The second item in the survey:
What is your opinion on the effect that stored procedures have for the portability of database applications?
Again, three alternative answers were offered:
  • Using stored procedures increases the portability of database applications (21%; 33)
  • Using stored procedures decreases the portability of database applications (68%; 105)
  • I have a different opinion (11%; 17)
And the pie chart of the results:

Database Stored Procedures - Good or Bad-RawData_html_m7d5cfeeb

There's a large majority that thinks stored procedures make applications less portable. The second largest group thinks the opposite. The group of people that have another opinion is now much smaller as compared to the previous item, but still substantial.

The largest (7) group of people with another opinion say something to the effect of "It depends". I was a bit surprised that only two people answered:
Portability of what - application or database?
Three individuals answered that portability is indeed hampered, but they argued that portability just isn't an issue at all, because they don't port. (Presumably, they mean they don't ever port the database)

Appropriateness

The third item tried to probe how people felt about database stored procedures in general. Given that databases are supposed to offer an implementation of the relational model, some people argue that stored procedures are actually completey alien and should not be a part of them. So my third item was:
Which of the following statements best matches your own opinion regarding stored procedure support in relational database products?
Again, three alternatives:

  • A stored procedure is a completely non-relational thing. Therefore, stored procedures do not belong inside a relational database management system. (12%; 19)

  • Stored procedures offer functionality that is complementary to the relational features of a relational database management system. Therefore, stored procedures are a useful feature for building database applications. (83%; 127)

  • I have a different opinion (5%; 7)

Here's a piechart of the results:

Database Stored Procedures - Good or Bad-RawData_html_42b9a6c1

So, this certainly is the least controversial item when looking at the numbers: a clear majority seems convinced that stored procedures are a useful tool for building applications. Although it is still considerable, it is the minority that feels stored prodedures are inappropriate (the survey does not grant extra points for passionate defense of the opinons ;).

In this case the group of people with different opinions was small, at least as compared to the other items. It was also not really possible to categorize these responses any further: everyone had their own reason for not choosing one of the other alternatives.

Programming Languages

The survey also inquired after the programming languages people use for their stored procedures and application. These item are a bit different as they ask about a fact rather than an opinion. Also, people could select multiple answers in addition to adding extra options.

First of all, I wanted to know which stored procedure languages are used. The item:
Which stored procedure language do you use regularly to build databases/database applications? Please check only those languages from which you use procedural constructs (rather than pure SQL constructs)
People could choose from these languages:

  • DB2 Stored Procedures - IBM DB2 (5%; 8)

  • java stored procedures - Oracle (5%; 8)

  • MySQL Stored Procedures - MySQL (45%; 65)

  • PL/pgSQL - Postgres (18%; 26)

  • PL/SQL - Oracle (36%; 52)

  • T-SQL - Sybase (7%; 10)

  • T-SQL - MS SQL Server (33%; 48)

  • .NET stored procedures - MS SQL Server (6%; 9)

  • Other, specify database vendor and language name (10%; 15)
Here's a bar chart, sorted by frequency:

Database Stored Procedures - Good or Bad-RawData_html_1d6d0594

So, most people that did the survey use MySQL. This is not very surprising as it is my usual topic (and I expect most respondents to be reader of my blog). Still, a considerable number of respondents use other languages (too); Oracle PL/SQL, and T-SQL (MS SQL) are really not that far behind. Beyond that, the numbers become considerably smaller, although PL/pgSQL (Postgres) and "Other" are still quite significant.

The "Other" category indicated two important omissions on my part: 6 respondents (4%) named "Firebird", "Interbase" or some combination of both. I feel I should've included that in the survey and I'm ashamed to say it did not cross my mind when I created the survey. Second to that, 5 respondents (3%) specified that they used no stored procedures at all, or at least not regularly. I should have made a separate category for that too I guess.

What is maybe one of the most interesting obervations is the relatively small but still substantial amount of general purpose languages in the stored procedure arena. Oracle java stored procedures (8 respondents; 5%) and MS SQL .NET stored procedures (9 respondents; 6%) show that these things are actually used. It would be very interesting to find out if these respondents used that in addition to or instead of traditional PL/SQL and T-SQL.

The last item in the survey was about application programming languages. The item:
Which computer languages do you use regularly for programming database applications?
The options were:

  • C (17%; 25)

  • C++ (15%; 23)

  • C# .NET (19%; 28)

  • COBOL (1%; 1)

  • Delphi (5%; 8)

  • Java (38%; 57)

  • Javascript (14%; 21)

  • LISP (0%; 0)

  • Perl (33%; 50)

  • Python (15%; 23)

  • PHP (53%; 80)

  • Smalltalk (1%; 2)

  • VB (9%; 13)

  • VB .NET (8%; 12)

  • VBScript (5%; 8)

  • .NET - other (1%; 1)

  • Other (15%; 23)
And the bar chart:

Database Stored Procedures - Good or Bad-RawData_html_15ca2d92

Clearly, PHP is the most popular language among the respondents, but Java and Perl are also quite strong. After that, there are quite some languages that certainly are used, but not by far as much as PHP, Java and Perl.

The "other" category showed quite a few recurring responses, again indicating I forgot to mention a few important ones. Ruby, Rails, and combinations of both were mentioned 13 times (8%) which is too substantial to ignore. PL/SQL was mentioned 4 times (3%) and it would be interesting to see if these respondents mean they use PL/SQL to create full blown end-user (web)applications (this is a feature I heavily used when I was working as application developer) or 'just' utilities, ETL and the like.

Summary



  • Most repondents think stored procedures are faster than pure SQL and application code

  • Most repondents think stored procedures make the application less portable

  • Most repondents think stored procedures are a useful addition to the database

  • Most repondents use MySQL, PL/SQL and/or MS SQL stored procedures

  • Most repondents use PHP, Java and/or Perl


Stuff that remains to be done


These are just the results. I'm planning to do a real analysis of the results to see if I can identify groups of respondents. For example, it seems obvious that respondents that use PHP and MySQL stored procedures are also the ones that think stored procedures are a good tool for application development, and that they are faster but less portable (it's a safe assumption, as these sum up all majorities ;). But how well do these groups really hang together? This an other things shortly on this blog.

3 comments:

Unknown said...

Really interesting results, Roland! Thanks for posting them. Not sure why anyone would think that stored procedures make portability *better* ?! All RDBMS system implement a sub or super-set of ANSI, so portability is virtually killed with stored procedures ;)

As for performance, it's an interesting subject. I was the one that answered "For MySQL, totally depends..." because for simple procedures that *are not repeatedly executed on ***every*** connection, there is little performance benefit. Why? Because, unlike every major RDBMS implementation of an SP cache, MySQL puts the cache in the connection thread, not a global cache. What effects does this have? Well, stored procedures have to be recompiled for each connection thread... therefore you aren't really gaining any benefits of SP compilation if you just execute a SP once per connection. But, by the same token, lock contention for stored procedure compilation and caching is much less in MySQL than with other systems since the thread itself manages a cache of the SPs, not a global manger, which requires no lock management for various requesting threads. It will be interesting to see how this benefit/disadvantage is handled in the future coding of the SP source...

Anyway, looking forward to your article!

Cheers

rpbouman said...

Hi Jay,

thanks for your reply.

Actually I wanted to write that article ever since I read your article on MySQL Stored Procedure Performance ;) So thank you for the inspiration.

Regarding portability: I have a few thoughts about it, but I can give it away now: it has to do with what you want to port, the application or the database.

BTW, On the O'Reilly blog, I already got a comment from an individual that is very outspoken in the increased portability

Anonymous said...

Thanks for your nice post! Keep it up

SAP HANA Trick: DISTINCT STRING_AGG

Nowadays, many SQL implementations offer some form of aggregate string concatenation function. Being an aggregate function, it has the effe...