Thursday, January 06, 2011

MQL-to-SQL: A JSON-based query language for your favorite RDBMS - Part I

Yesterday, I wrote about how I think this year's MySQL conference will differ from prior editions. I also wrote that I will attend and that I will be speaking on MQL-to-SQL.

I promised I would explain a little bit more background about my talk, so here's the first installment.

Abstract: MQL is a JSON-based database query language that has some very interesting features as compared to SQL, especially for modern (AJAX) web-applications. MQL is not a standard database query language, and currently only natively supported by Freebase. However, with MQL-to-SQL, a project that provides a SQL adapter for MQL, you can run MQL queries against any RDBMS with a SQL interface.

This article covers mostly background information on modern web applications, JavaScript and JSON. This background information should help you understand why a JSON-based database query language is a good fit for modern web applications. A following installment will discuss the MQL database query language and how it relates to SQL. A third article will cover the mql-to-sql project itself.

MQL, a JSON-based database query language


MQL (pronounced as Mickle) is an abbreviation of Metaweb Query Language, which is the JSON-based database query language natively supported by Freebase. If you're unfamiliar with the terms Metaweb and Freebase, see this quote from Wikipedia about Metaweb:
Metaweb Technologies, Inc. is a United States company based in San Francisco that is developing Freebase, described as an "open, shared database of the world's knowledge".
...and this quote about Freebase:
On March 3, 2007 Metaweb publicly announced Freebase, described by the company as "an open shared database of the world's knowledge," and "a massive, collaboratively-edited database of cross-linked data." Often understood as a wikipedia-turned-database, Freebase provides an interface that allows non-programmers to fill in structured, or 'meta-data', of general information, and to categorize or connect data items in meaningful, or 'semantic' ways.
So, Freebase is a database of just about everything (think of it as a machine readable version of wikipedia), and MQL is its query language.

As a quick primer on the MQL query language, here's a simple MQL query (taken from the Freebase manual)
{
"query": {
"type":"/music/artist",
"name":"The Police",
"album":[]
}
}
...and here's the result:
{
"status": "200 OK",
"code": "/api/status/ok",
"transaction_id":"cache;cache01.p01.sjc1:8101;2008-09-18T17:56:28Z;0029",
"result": {
"type": "/music/artist",
"name": "The Police",
"album": [
"Outlandos d'Amour",
"Reggatta de Blanc",
"Zenyatta Mondatta",
"Ghost in the Machine",
"Synchronicity"
]
}
}
I won't discuss these samples just yet - the MQL query language is the topic of the next installment. This is just a quick sample to give you an idea what MQL queries and their results look like. If you're didn't recognize the syntax already - read on, the JSON syntax is covered in the next section of this article.

The collection of data accumulated in Freebase is impressive - you should check it out sometime. However, this post is not so much about Freebase - it's the MQL query language I want to discuss here.

JSON

MQL is based on JSON (pronounced as Jason), which is an abbreviation of JavaScript Object Notation. If you're not familiar with JSON, hang on and read all of this section - it provides just enough background to understand why it is highly relevant for modern web-applications, and it provides a good-enough description of the JSON syntax for you to read and write MQL queries. You might also want to check out the Wikipedia entry on JSON for an objective overview.

If you are already familiar with JSON, then you can safely skip through to the next section. But please do take a minute to review the JSON object code sample - it is used as basis for developing simple MQL queries in the next article in this series.

Now, back to JSON.

What is JSON?

JSON is a data exchange format. Syntactically, JSON is a proper subset of JavaScript, the de-facto web browser scripting language. The JSON JavaScript subset is defined in such a way that it can express JavaScript values and objects of arbitrary complexity. However, JSON cannot be used to create an executable program - all dynamic, executable elements (functions) have been stripped away.

Because JSON literally is JavaScript (just not the full set), JavaScript programs can easily read and write data expressed as JSON. For this reason, JSON is often characterized as a JavaScript object serialization format.

An excellent description of the JSON syntax can be found at the JSON homepage. But if you don't feel like reading up on the details of JSON, no worries - you just need to remember a few JSON features to read and write MQL queries:
  • 3 scalar* data types:
    • strings - for example: "Oxygen", or 'O'
    • numbers - like 8 (integer), 13.6181 (float) and -2.1835e+2 (float)
    • booleans - true and false are the only possible boolean values
  • 2 composite data types:
    • arrays - an ordered comma-separated list of values, enclosed in square braces: ["Oxygen-16", "Oxygen-17", "Oxygen-18"]. The individual values in an array are usually referred to as elements
    • objects - an unordered comma-separated list of key/value pairs (which are uniquely named items), enclosed in curly braces:
      {
      "name": "Oxygen",
      "symbol": 'O',
      "atomic_number": 8,
      "ionization_energy": 13.6181,
      "melting_point": -2.1835e+2,
      "isotopes": ["Oxygen-16", "Oxygen-17", "Oxygen-18"]
      }​
      The key/value pairs of an object are usually called properties or members. Note that name and value are separated by a colon (:). Keys are strings, and have to be quoted. Values can be of any of the data types described above, including arrays (like the "isotopes" property in the example above) and objects.
  • A special null-value (which actually constitutes a data type of its own)

(* I should point out that JavaScript is an object-oriented language. At runtime, all data types described above, including the array and "scalar" types, are in fact objects. Objects are by definition composite, and not scalar. However, the list above is about JSONs syntactical constructs, not about their runtime JavaScript representation)

As you can see, JSON provides a few very simple rules. Yet it allows you to build data structures of arbitrary complexity: the object example above illustrates how you can bundle a collection of named values together to represent structures of a higher order. In this particular case, we denoted an object that represents the chemical element Oxygen. And, we needn't have stopped here: we could've added more key/value pairs to represent other complex properties of the element Oxygen, representing things like the person who first discovered it, a list of well known substances that contain Oxygen and so on and so forth.

AJAX and JSON


I already mentioned something about JSON in relation to JavaScript and modern web applications. By modern web applications, I mean web applications that offer a rich and highly interactive user interface that is based on AJAX (An Acronym for Asynchronous JavaScript And XML) technology.

A key feature of AJAX is the usage of client-side JavaScript code for maintaining non-blocking background communication with the web server. This is typically done using a specialized object called the XMLHttRequest. AJAX applications also tend to use JavaScript to dynamically change the contents or appearance of the page (a technique called dynamic HTML or DHTML). This allows web developers to create applications that can avoid a relatively slow page reload most of the time, making the application appear more responsive.

Going by the meaning of the AJAX acronym, you may be under the impression that AJAX applications are all about using asynchronous XML data exchange using the XMLHttpRequest, and thus the relevance or need for JSON might be lost on you. I would say that, yes, you're not wrong: AJAX is often implemented by using the XMLHttpRequest object to communicate with the server using XML messages. But for several reasons, JSON is gaining popularity as data exchange format instead of XML, and techniques like JSONP are used in addition to the XMLHttpRequest object. This technique actually has an advantage over using the XMLHttpRequest because it can be used to do requests to services that reside on another domain than the current web page, whereas this is not allowed with the XMLHttpRequest (at least, not by default and not without explicitly asking the user for confirmation). This makes JSONP a great tool for creating mash-up applications.

I'm sure you're not surprised to hear from me that the internet offers many places where you can get your XML vs JSON brawl on - I'm not particularly interested in that discussion, and I don't feel anybody has to choose sides. The point I want to bring across is that for AJAX applications, JSON is a respected data exchange format. It is excellently supported by both web browsers as well as popular AJAX frameworks, and companies like Amazon, YAHOO! and Google deliver more and more webservices that use JSON as data exchange format.

Next time


Now that you've seen how JSON works, and how it relates to AJAX web applications, you're ready to take a look at MQL queries. This is the topic of the next blog post.

3 comments:

Anonymous said...

Hi Roland! Does this MQL have anything to do with the Pentaho Metadata layer? Like, is it the same MQL ran as a process action?

rpbouman said...
This comment has been removed by the author.
rpbouman said...

Hi Fabio,

no. The MQL in this article is "Metaweb Query Language" (http://mql.freebaseapps.com/ch03). The Pentaho MQL is "Metadata Query Language" (http://wiki.pentaho.com/display/ServerDoc1x/03.+Pentaho+Metadata+MQL+Schema).

Confusing? Unfortunately, yes.

In fact, when I was messing with this stuff, I googled once for MQL, and was able to count no less than 13 different query languages called MQL. In many cases, the "M" stands for a different word, but not always. For example, Microsoft has a Metadata Query Language too.(http://msdn.microsoft.com/en-us/library/windows/desktop/ee719796(v=vs.85).aspx)

The postfix QL is particularly overloaded, you'll find many other QL's for other letters.

SAP HANA Trick: DISTINCT STRING_AGG

Nowadays, many SQL implementations offer some form of aggregate string concatenation function. Being an aggregate function, it has the effe...