Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Friday, January 25, 2008

I don't do rants...normally

Normally, I don't do rants. But now I feel compelled to debunk the - IMO - downright slanderous post entitled "14 reasons not to use MySQL or other mid-range database management systems" by Curt Monash - an otherwise seemingly un-offensive blogger.

Anyway - here's my maybe somewhat emotional analysis. But really, this post is a load of cr*p! I mean, more than half of the "reasons" are not even about MySQL (as in the database - not the company), and the other aren't even reasons.

Judge yourself.

Many enterprises get quantity discounts. License and in some case even maintenance fees may not be bad at all.


You make it seem like this is an advantage! With MySQL, *everybody* gets a volume discount. We don't care if you're big or small, if you need many MySQL servers you simply pay *one single flat fee* to cover for support for *all you can eat*

http://www.mysql.com/products/enterprise/unlimited.html

Who cares if the system contains code for features you don’t need? Hardware is really cheap these days.


This is not a reason to go for a high end database - merely something that might not hamper you in doing so. That said, it's not that hard to argue why less is more. Take for example MS SQL's xp_cmdshell which lets you run an arbitrary OS-level command. Well, if your app has an injection vulnerability, an extra feature like this just allows a cracker more room to do damage. So it really pays off to be sure your database is exactly that - a database.

If you already have DBAs on staff, how much work is it to administer a few more small systems? Besides, an Oracle or SQL Server DBA has access to some pretty good remote tools, which let her administer many database servers at once.


I don't see how this is an argument to buy into high end databases. First of all, if you run MySQL or any database that is anything worth to you, you need a DBA. Period. The "besides" is hilarious - what's makes you think MySQL DBA's do not have such tools? In many shops I've seen Oracle and MS SQL DBAs use remote desktop or shell access to manage their databases, in much the same way as many MySQL DBAs manage theirs. And, in addition MySQL Enterprise comes with an enterprise monitor which lets you manage and monitor many, many (hundreds) of MySQL instances remotely.

http://www.mysql.com/products/enterprise/monitor.html

You can even get it for trial:

http://www.mysql.com/ent-trial-reg-2007/

If you run a Windows-only shop, why not go Microsoft soup-to-nuts?


heh, maybe because you want to rely on what you know will work - not on what you are promised that it will work. Sometime ;) Seriously - the choice for MS vs non-MS is all about choice vs piece of mind. Nobody ever got fired for buying Microsoft, so many IT managers buy a site-wide all MS license to get some piece of mind. They get some assurances that because it's MS everything is integrated and aligned and whatnot. But if you unpack the shiny boxes - that's when you find out how incredibly compatible MS is with MS. Not.

Contrast that to open source - not just MySQL. If it doesn't work or you don't like what you see: fine, grab a new. Try that with MS. Once people have bought into it, they feel compelled to make it work, why else did they spend all that money on it? We're talking about a company that can't even keep their frigging *browser* compatible between minor versions....seriously.

SQL Server used to be a mid-range DBMS, and still plays that role in many Oracle and DB2 shops today.


heh, this is hardly an argument. "Rain is wet, and still falls into many seas world-wide". I mean, how can this be a reason not to go for MySQL or another open source database?

Early on, Microsoft did a great job of usability engineering on SQL Server administration tools.


I won't contest that they did. But we are seeing it no being gradually replaced with more and more MS Visual Studio headaches. Is that what people really want?

Largely in response to Microsoft competition, Oracle radically improved its own tools. For sufficiently simple databases, installation and administration really aren’t that hard in any of the high-end DBMS.


gah...so how is this an argument for not choosing for MySQL? Seems like you should have titled your article "14 reasons why I like MS SQL" - not "14 reasons why people should not choose MySQL"

Oracle, SQL Server, DB2, and Informix all offer cheap or free low-end editions, with good upwards compatibility. Those might happen to meet your deployment needs, now and in the future.


I won't contest that the "express" editions are very useful development tools. But you must be joking when you are saying that these are serious deployment platforms. In fact, if you like these "Express" editions so much, you should be in favour of more MySQL, more postgres, more SQLite. Because those products are the sole reason that finally made the big vendors look for a way to make their overpriced offerings more appealing. Go rewind Tom Kytes Oracle XE announcement podcast - they started giving it away for free because they are basically forced to.

If your application grows so quickly that you really do wind up needing a high-end database management system underneath, you won’t have to rewrite it.


That's a load of it. MySQL is all about scaling beyond the scale-up threshold. Don't buy into bigger and more expensive - buy more and inexpensive.

http://www.mysql.com/why-mysql/scaleout.html

Most high-end database management systems have more robust datatype support than most mid-range DBMS, the PostgreSQL family of products excepted.


Not sure what this means exactly - I guess this is not specific enough for me to recognize anything that has to do with MySQL.

Upstart mid-range database management systems have a variety of maturity issues. What are the most common kinds of error messages you see in a typical week? If you use the Web a lot, MySQL errors may be in the top three. Those memory buffers seem to fill to the choking point all too often.


So now MySQL is an upstart? heh...yeah, you're right they've been around only 12 years. Personally the most common errors I see on the web are broken IE-specific pages but I guess that's another story. (actually I'm curious if that's in your top three as well) The most likely reason is that most dynamic websites are LAMP sites - MySQL is simply most ubiquitous. I don't know how ever you were going to substantiate your "memory buffers" remark but most mysql errors I see on the web are "out of connections" - i.e. the site grew so hard they need to phase in another MySQL instance.

Individual features may also not be very mature yet. MySQL has long offered stable transactions and decent clustering, but not necessarily with the same storage engines (and not necessarily either in the most common configurations). And how is performance on relatively new features like declarative referential integrity, user-defined functions, or stored procedures?


Referential integrity relatively new...boy you must be an old fart to call something that's been in there for more than 7 years "relatively new". And what about the new features in Oracle and MS SQL? I don't hear you complaining about the "relatively new" partitioning features in Oracle...

There are more and better third-party tools for popular high-end DBMS than there are for upstart mid-range database management systems.


And all of those are adding support for what database again? Right, MySQL, because at some point developers keen on those tools moved at some point to MyQL

Nobody you know ever got fired for recommending a traditional, over-engineered computing platform.


I think you hould make that your slogan Curt.


"On the whole, I think there should be a lot more use of mid-range database management systems than there is today. But the case isn’t entirely one-sided."

Well, you sure make a fine point out of making it look like a one sided issue.

Wednesday, January 23, 2008

Gartner on the Sun / MySQL deal

Heh, who knew we'd see the day that Gartner would write this:
Recommendations for all enterprises
...
MySQL is capable of supporting clusters, delivering atomic, consistent, isolated and durable (ACID) functionality and handling some mission-critical applications. Evaluate MySQL as an alternative to IBM DB2, Oracle and other DBMSs.
Well, fact of the matter is, they did write it in a recent report on the MySQL / Sun deal. You can download the Gartner analysis right here free of charge.

Friday, January 11, 2008

MySQL Cluster Study Guide getting good comments

One of the major projects I worked on last year was the MySQL 5.1 Cluster DBA certification. I did the background research to come up with a sensible scope and outline of the exam, wrote most of the questions and actively looked for feedback from our cluster engineering and development team as well as consultants that are deploying MySQL Cluster in real-world situation.

For example, I got great feedback on the exam from MySQL Cluster Senior Software Engineer Stewart Smith
...Response from the exam is overwhelmingly positive and as it’s new, afterwards I discussed a few questions with Roland (in fact, all the ones I got wrong). The (good!?) news is that I legitimately brain-farted on some.

... this isn’t some trivial know-a-tiny-bit-and-score-really-high certification...this is a serious one).

I may have heard at some point one of our people saying they thought it was the best certification exam we have...


I guess that could've turned out worse, heh? ;-)


I was also involved in the preparation and coordination of the creation of the accompanying study guide, did a fair share of reviewing the content and eventually I authored almost 3 chapters.

All in all it was a big project, and I learned a great deal in the process. Unfortunately, due to a number of reasons it took a while before we could actually publish the book, but now that we did, we are getting a number of pretty good comments. Take for example, Alan Snelson ordered the book at lulu.com en judging from his blog he enjoyed it a lot:
I booked the exam a couple of weeks ago thinking I might need some incentive to fully digest MySQL Cluster details. I need not have worried, the book provides a great introduction to MySQL Cluster in easy-to-read chapters bringing you up to speed in no time, much more entertaining than the reference manual!


Thanks Alan for your kind words! I'm glad you passed the exam.

Are you curious about the MySQL Cluster Certification? You can read more about it in the candidate guide, and if you feel lucky, you can even take a free evaluation test before you sign up for the real thing.

Enjoy, and good luck with your Exams ;-)

Wednesday, January 09, 2008

TeProFoKATeMUR - News from The Project Formerly Known as The MySQL UDF Repository

Hi all - boy, have I got news for you!

I would like to announce that The MySQL UDF Repository is no more. The rumour is that its growing popularity has caused it to be noticed by some folks at MySQL AB's legal department, kindly requesting that the project's name be changed to something even cooler.

However, these are just rumours - this is only the internet, so relying on a first-hand source, I can safely confide in you that the project's name and website are being changed mainly out of religious conviction to offer even more and better free UDFs for the MySQL database.

During the transition, the project can be referred to using this unpronounceable symbol:



However, in some circles, people refer to this symbol as TeProFoKATeMUR, which is believed to be an acronym of ThE Project Formerly Known As "ThE MySQL UDF Repository". The real incrowd however prefers to call it simply The Symbol.

The latest news that I heard through the grape vine is that the Project's identity crisis may already be over as I am writing this. I hear people whispering that a new name for the project will be announced shortly, and although the final name is not yet released it will most likely be either one of "The Order of Valiant Knights In Defense of the Honour of Fast-performing Freely Available Functions for MySQL" or simply The UDF Repository For MySQL.

I wouldn't know whether it is good or bad that the project is abandoning it's name. What I do know is that the project's new website is looking way better than the old one.

udf-site

It has got an easier to remember URL too: http://www.mysqludf.org/. The Project's google group has not moved though - it is still available at http://groups.google.com/group/mysql-udf-repository and welcoming new members all the time.

If this is indicative for the other things the project has to offer, then these changes sure are promising. In terms of last developments, it is believed that the group is working on a UDF library for MySQL that wraps around libcurl to enable internet access using MySQL functions:

mysql> select * from mysql.func;
+----------------------------+-----+----------------------+----------+
| name | ret | dl | type |
+----------------------------+-----+----------------------+----------+
| http_request | 0 | lib_mysqludf_curl.so | function |
| lib_mysqludf_curl_info | 0 | lib_mysqludf_curl.so | function |
| http_version_1_0 | 2 | lib_mysqludf_curl.so | function |
| http_version_1_1 | 2 | lib_mysqludf_curl.so | function |
| http_method_get | 2 | lib_mysqludf_curl.so | function |
| http_method_delete | 2 | lib_mysqludf_curl.so | function |
| http_version_none | 2 | lib_mysqludf_curl.so | function |
| http_method_head | 2 | lib_mysqludf_curl.so | function |
| http_method_post | 2 | lib_mysqludf_curl.so | function |
| http_method_put | 2 | lib_mysqludf_curl.so | function |
| http_method_trace | 2 | lib_mysqludf_curl.so | function |
| http_authtype_basic | 2 | lib_mysqludf_curl.so | function |
| http_authtype_digest | 2 | lib_mysqludf_curl.so | function |
| http_authtype_gssnegotiate | 2 | lib_mysqludf_curl.so | function |
| http_authtype_ntlm | 2 | lib_mysqludf_curl.so | function |
| http_authtype_any | 2 | lib_mysqludf_curl.so | function |
| http_authtype_anysafe | 2 | lib_mysqludf_curl.so | function |
| url_encode | 0 | lib_mysqludf_curl.so | function |
| url_query | 0 | lib_mysqludf_curl.so | function |
| url_encode_component | 0 | lib_mysqludf_curl.so | function |
+----------------------------+-----+----------------------+----------+
20 rows in set (0.00 sec)

mysql> select lib_mysqludf_curl_info();
+-----------------------------------------------------------------------------------+
| lib_mysqludf_curl_info() |
+-----------------------------------------------------------------------------------+
| lib_mysqludf_curl version 0.0.1/libcurl/7.16.4 GnuTLS/1.6.3 zlib/1.2.3 libidn/1.0 |
+-----------------------------------------------------------------------------------+
1 row in set (0.11 sec)

mysql> set @page := http_request('http://www.mysqludf.org/' as url);
Query OK, 0 rows affected (0.04 sec)

mysql> select substring_index(substring_index(@page,'</title>',1),'<title>',-1);
+-------------------------------------------------------------------+
| substring_index(substring_index(@page,'</title>',1),'<title>',-1) |
+-------------------------------------------------------------------+
| UDF Repository for MySQL - Home |
+-------------------------------------------------------------------+
1 row in set (0.00 sec)

An interesting year ahead for people interested of MySQL UDFs....

Tuesday, January 08, 2008

Speaking at the 2008 MySQL Conference

Yes - it's that time again ;-)


The new year has just started which means the MySQL Conference and Expo is little more than 3 months away. I am very glad to have the opportunity to attend, and even more glad to be speaking.

I will be doing two sessions:I'll provide some more details about my sessions shortly, but enough of that for now. Instead, I'd like to share a list of some of the sessions (in no particular order) I am looking forward to attend:

  • Astronomy, Petabytes and MySQL, Kian-Tat Lim; Stanford Linear Accelerator Center:
    The Large Synoptic Survey Telescope (LSST) will produce more than 100 petabytes of data when it goes on-line in 2014. All the non-pixel data will be managed by a database. This database is expected to contain some 50 billion astronomical objects, 3 trillion detections of those objects, and all the project’s metadata; a single copy will amount to tens of petabytes.

  • Architecture of Maria, the New Transactional Storage Engine for MySQL - Michael "Monty" Widenius; MySQL AB: Overall design of this new transactional (MVCC) storage engine.

  • DBSlayer: A Simpler Way To Proxy, Derek Gottfrid; The New York Times:
    The DBAccessLayer or DBSlayer (http://code.nytimes.com/projects/dbslayer) for short is a lightweight, open source, web aware proxying and pooling layer for MySQL used for backend architecture at nytimes.com. Assembled out of a few open-source libraries, DBSlayer communicates with clients using HTTP as a protocol and JSON as an encoding, instead of the custom binary protocols common to other proxying mechanisms. This enables the DBSlayer to easily interoperate with any programming languages that speak HTTP and JSON.

  • External Language Stored Procedures for MySQL, Antony Curtis and Eric Herman; MySQL AB:
    With version 5.0, MySQL delivered a stored procedure implementation and here we extend it to provide general support for external language stored procedures. We detail the philosophy and design of the external language stored procedure framework and its implementation within the server and libraries. We then also detail how an external language plug-in is written using the simple skeleton plug-in which exercises the basic interfaces between the server and external stored procedures.

  • MySQL Cluster with Replication for Financial Transactions - Cicero Torteli; Paggo
  • Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community - Farhan Mashraqi; Fotolog:
    Fotolog is a top 19 Internet destination with more than 12 million members, 315 million photos and more than 3 billion page views a month. Through modifications to its data architecture, Fotolog was able to serve four times the number of users using the same number of database servers. A non-conventional, hybrid presentation that conveys the importance of scalability, performance tuning and schema optimizations in a practical way.
  • Geographical Information and OpenGIS on MySQL - Markus Franz, BF Blogform Search GmbH:
    In this presentation, Markus Franz shows how to efficiently handle geographical information in a MySQL based environment. The presentation covers setup of a structured database, calculation of distances between two or more points, social tagging for data and solutions to automatically guess geo tags from existing content. This session is highly recommended for those who look for powerfull ways to enhance existing business infrastructures with geographical information.
  • MySQL Proxy Wizardry - Giuseppe Maxia; MySQL AB:
    Are you tired of the limiting stored routines language? Can you distribute queries to several servers transparently? Try the power of MySQL Proxy. You will see an apparently normal MySQL server do all the above, and much more, with the addition of MySQL Proxy, a lightweight application that extends the server features on the fly.

  • The Lost Art of the Self Join Beat Vontobel, Meteonews AG:
    Can you solve a Sudoku with only one SELECT statement? No temporary tables, SQL extensions, stored procedures or functions, not even non-relational features such as ORDER or GROUP BY—just plain joins and predicates? You can! With a combination of (admittedly quite a few) self joins.

  • Developing INFORMATION_SCHEMA Plugins, Mark Leith; MySQL AB:
    A walk through of a very basic ‘HELLO WORLD’ INFORMATION_SCHEMA table, showing a user the very bare minimum that is required to create a pluggable I_S table. Then, examples of making some more interesting plugins – such as including the SIGAR OS statistics gathering library to collect CPU, Memory, Disk, Network and OS Configuration data within I_S tables, and a general ‘df -h’ INFORMATION_SCHEMA table.
  • EXPLAIN demystified, Baron Schwartz; The Rimm-Kaufman Group, LLC:
    I learned EXPLAIN by studying source code, documentation, and learning from the query optimizer team. I verified everything I learned with complex test cases. Along the way I discovered how to reverse-engineer EXPLAIN into a query execution plan. Finally, I wrote a program to do this—MySQL Visual Explain, the first of its kind. In this session I’ll teach EXPLAIN from the ground up.


And this is just the tip of the iceberg!

There is just so much good stuff, it's hard to single out a few. Anyway - take a look at the schedule yourself. Word is that you can get Early Bird discounts up to somewhere in Februari, and there is also a 15% Alumni discount for attendees of past conferences. Check out the details here.

CU @ UC!!!