Wednesday, April 15, 2015

MDX: retrieving the entire hierarchy path with Ancestors()

A couple of days ago I wrote about one of my forays into MDX land (Retrieving denormalized tabular results with MDX). The topic of that post was how to write MDX so as to retrieve the kind of flat, tabular results one gets from SQL queries. An essential point of that solution was the MDX Ancestor() function.

I stumbled upon the topic of my previous blogpost while I was researching something else entirely. Creating flat tables and looking up individual ancestors is actually a rather specific application of a much more general solution I found initially.

Pivot tables and the "Show Parents" functionality

GUI OLAP tools typically offer a pivot table query interface. They let you drag and drop measures and dimension items, like members and levels to create a pivot table. The cells of the pivot table are aggregated values of the measures, and the row and column headers of the pivot table are dimension members, which are typically derived from a level that was dragged into the pivot table.

Please recall the sales cube example I introduced in my previous post:



Now, suppose we would drag the Sales quantity measure unto the columns axis of our pivot table, and drag the Quarters level from the Time dimension unto the rows axis. The GUI tool might generate an MDX query quite like the one I introduced in my previous post:
SELECT  Measures.Quantity     ON COLUMNS
,       Time.Quarters.Members ON ROWS
FROM    SteelWheelsSales
Here's how this is rendered in Saiku Analytics:



And here's how it looks in Pivot4J:



Now, as I pointed out in my previous post, the problem with this result is that we don't see any context: we cannot see to which year the quarters belong. Both tools have a very useful feature called "Show parents". This is a toggle button that changes the view so that the headers show the values of the corresponding higher levels. For example, this is what the previous result looks like in Pivot4J when "Show Parents" is toggled:

As you can see, the year level and even the "All level" is now visible.

In Saiku we can achieve a similar thing, but the other way around: you can add the year and the all level, at which point totals are shown for these higher levels:



And you can then choose "Hide Parents" to get rid of the rows for the higher level aggregats, leaving you with essentially the same view of the data as shown in the last Pivot4J screenshot.

Implementing Show/Hide Parents

In Saiku, the "Hide Parents" functionality is achieved by post-processing the resultset: when the result is iterated to render the table, rows for all but the lowest level are filtered away and discarded.

In Pivot4J, it works a little bit different. Here's how Xavier Cho describes it:
the information of the parents is obtained by the members present on the axes. Pivot4J accesses it through Olap4J API which exposes a member's parent and ancestors via Member.getParentMember() and Member.getAncestorMembers() respectively:

http://www.olap4j.org/api/org/olap4j/metadata/Member.html

References to the member instances in a given MDX can be obtained by its CellSet interface, which is equivalent to what is ResultSet for JDBC. In addition, Pivot4J exposes the member instance for each cells to the expression language context, so you can reference itself, or its parent or ancestors in a property expression too.

In summary, if you are trying to access the parent of a member included in MDX, you'll first need to execute the query using the Olap4J then get it from the resulting CellSet instance.

A pure MDX expression

I thought it would be fun to try and rewrite our original query in such a way that its result would give us this information.

The Ancestors() function

As it turns out, we can do this for one particular hierarchy in our query by creating a Calculated Member on the Measures hierarchy that applies the Ancestors() function to the current member of the hierarchy for which we want the path.

The Ancestors() function takes 2 arguments
  1. A member for which to find ancestor members (members at a higher level that contain the argument member)
  2. An argument that specifies how many levels to traverse up.
The function returns a set of members that are an ancestor of the member passed as first argument.

Specifying the first argument is easy: we simply want to find ancestors for whatever member, so we can specify it as <Hierarchy Name>.CurrentMember and it will just work.

The second argument can be specified in 2 ways:
  • As a level: the second argument specifies a level and all ancestors up to that level will be retrieved
  • As a integer representing a distance: the second argument specifies the number of levels that will be traversed upwards
The first form is useful if you want to retrieve ancestors up to a specific level. I want to retrieve all ancestors, so the number of levels I want the function to traverse is in fact equal to the level number of the first argument. We can conveniently specify this with the LEVEL_NUMBER property using an expression like:

<Hierarchy Name>.CurrentMember.Properties("LEVEL_NUMBER")

But this is not yet entirely right, since this form of the Properties() function always returns a string, even though the LEVEL_NUMBER property is actually of the integer type. The standard MDX Properties() function allows an optional second argument TYPED. When this is passed, the property will be returned as a value having its declared type.

Unfortunately, Mondrian, a.k.a. Pentaho Analysis Services does not support that form of the Properties() function (see: MONDRIAN-1795). So, in order to retrieve the level number as an integer value, we have to apply the CInt() function to convert the string representation of the level number to an integer.

So, our call to the Ancestors() function will look like this:

Ancestors(<Hierarchy Name>.CurrentMember, CInt(<Hierarchy Name>.CurrentMember.Properties("LEVEL_NUMBER")))

A simpler alternative: Ascendants()

If it is acceptable to also include the CurrentMember itself, then we can even simplify this quite a bit by using the Ascendants() function. The Ascendants() function takes a single member as argument, and returns the set of ancestor members as well as the argument member, all the way up to the member at the top level. With Ascendants(), our expression would simply be: Ascendants(<Hierarchy Name>.CurrentMember)

We will continue this post using Ancestors(), but the approach can be easily applied to Ascendants() instead.

Converting the set of Ancestor members to a scalar value

However we can't just use the bare Ancestors() expression in our query, nor can we use it as is to create a calculated member. That's because Ancestors() returns a set of members, while we want something that we can retrieve from the cells in the result.

As an initial attempt we can try and see if we can use the SetToStr() function, which takes a set as argument and returns a string representation of that set. So We can now finally write a query and it would look something like this:
WITH
MEMBER  Measures.[Time Ancestors]
AS      SetToStr(
          Ancestors(
            Time.CurrentMember, 
            CInt(
              Time.CurrentMember.Properties("LEVEL_NUMBER")
            )
          )
        )
SELECT  Measures.[Time Ancestors] ON COLUMNS
,       Time.Quarters.Members ON ROWS
FROM    SteelWheelsSales
The results might look something like this:
Time Time Ancestors
QTR1 {[Time].[2003], [Time].[All Years]}
QTR2 {[Time].[2003], [Time].[All Years]}
QTR3 {[Time].[2003], [Time].[All Years]}
QTR4 {[Time].[2003], [Time].[All Years]}
QTR1 {[Time].[2004], [Time].[All Years]}
QTR2 {[Time].[2004], [Time].[All Years]}
QTR3 {[Time].[2004], [Time].[All Years]}
QTR4 {[Time].[2004], [Time].[All Years]}
QTR1 {[Time].[2005], [Time].[All Years]}
QTR2 {[Time].[2005], [Time].[All Years]}
Well this certainly looks like we're on the right track! However, there are at least two things that are not quite right:
  • The string representation returned by SetToStr() looks very much like how one would write the set down as a MDX set literal (is that a thing? It should be :-). While entirely correct, it does not look very friendly and it is certainly quite a bit different from what our GUI tools present to end-users
  • The order of the members. It looks like Ancestors() returns the members in order of upward traversal, that is to say, from lower levels (=higher level numbers) to higher levels (=lower level numbers). The fancy way of saying that is that our result suggests that Ancestors() returns its members in post-natural order. We'd like the members to be in natural order, that is to say, in descending order of level (from high to low). Note that the specification of Ancestors() does not specify or require any particular order. So in the general case we should not rely on the results to be in any particular order.
First, let's see if we can fix the order of ancestor members. There's two different MDX functions that seem to apply here:
  • Order() is general purpose function that can be used to order the members of a set by an arbitrary numberic expression.
  • Hierarchize() is designed to order members into hierarchical order, that is to say, the members are ordered by their level number and by the level number of any of its ancestors.
While Order() is a nice and reasonable choice, Hierarchize() seems tailored exactly for our purpose so that's what we'll use:
WITH
MEMBER  Measures.[Time Ancestors]
AS      SetToStr(
          Hierarchize(
            Ancestors(
              Time.CurrentMember, 
              CInt(
                Time.CurrentMember.Properties("LEVEL_NUMBER")
              )
            )
          )
        )
SELECT  Measures.[Time Ancestors] ON COLUMNS
,       Time.Quarters.Members ON ROWS
FROM    SteelWheelsSales
And the result will now look like:
Time Time Ancestors
QTR1 {[Time].[All Years], [Time].[2003]}
QTR2 {[Time].[All Years], [Time].[2003]}
QTR3 {[Time].[All Years], [Time].[2003]}
QTR4 {[Time].[All Years], [Time].[2003]}
QTR1 {[Time].[All Years], [Time].[2004]}
QTR2 {[Time].[All Years], [Time].[2004]}
QTR3 {[Time].[All Years], [Time].[2004]}
QTR4 {[Time].[All Years], [Time].[2004]}
QTR1 {[Time].[All Years], [Time].[2005]}
QTR2 {[Time].[All Years], [Time].[2005]}
Now, as for obtaining a more friendly, human-readable string representation of the set, this is a considerably more open requirement. One the one hand there is the matter of how to represent each member in the ancestor set; on the other hand there is the matter of extracting this information from the resultset and using it in the GUI.

To represent members we have a handful of options: we could use the member name, or we could use its key value; however since we want to expose the information to the user, the only thing that seems really suitable is the member caption. Placing that data into the GUI is an implementation detail that need not concern us too much at this point. Let's say we aim to return the data as a comma-separated list, and assume our GUI tool is capable of extracting that data and then use it to render a result.

The function that seems to suit our need is called Generate(). There are actually 2 forms of Generate(), which frankly seem to suit completely different purposes. The form we're interested in is functionally quite similar to the MySQL-builtin aggregate function GROUP_CONCAT().

The arguments to this form of Generate() are:
  1. A set. This is where we'll feed the Ancestors() expression in
  2. A string expression. This expression will be evaluated for each member in the set passed as first argument. We'll use this to retrieve the caption of the current member of the hiearchy for which we're generating the ancestors list.
  3. A separator. Generate() concatenates the result values returned by the string expression passed as second argument, and this string will be used to separate those values. Since we want to obtain a comma-separated list, we'll use the literal string ", " for this argument.
The result is a single string value.

Putting it together, our query becomes:
WITH
MEMBER  Measures.[Time Ancestors]
AS      Generate(
          Hierarchize(
            Ancestors(
              Time.CurrentMember, 
              CInt(
                Time.CurrentMember.Properties("LEVEL_NUMBER")
              )
            )
          )
        , Time.CurrentMember.Properties("MEMBER_CAPTION")
        , ","
        )
SELECT  Measures.[Time Ancestors] ON COLUMNS
,       Time.Quarters.Members ON ROWS
FROM    SteelWheelsSales
And the result:
Time Time Ancestors
QTR1 All Years,2003
QTR2 All Years,2003
QTR3 All Years,2003
QTR4 All Years,2003
QTR1 All Years,2004
QTR2 All Years,2004
QTR3 All Years,2004
QTR4 All Years,2004
QTR1 All Years,2005
QTR2 All Years,2005
And we can repeat this process for every hierarchy on every axis, just like we did with the Ancestor() function in the previous post.

2 comments:

Fabian said...

I usually solve context problems by simply using better labels. In this case QTR1 could be Q1/2003, Q1/2004, Q1/2005.

Time Time Ancestors
Q1/2003 All Years,2003
Q2/2003 All Years,2003
Q3/2003 All Years,2003
Q4/2003 All Years,2003
Q1/2004 All Years,2004
Q2/2004 All Years,2004
Q3/2004 All Years,2004
Q4/2004 All Years,2004
Q1/2005 All Years,2005
Q2/2005 All Years,2005

Now the context is implicit, labels are still short, now you don't need ancestors to understand the MDX results.

But end users always to ask for ancestors, so my response is just press show ancestors button. This kind of options are efficient and allow me to keep the MDX simple, also important for maintenance reasons, maintenance, maintenance, maintenance.

Fortunately with Mondrian you can define alternate hierarchies, for example Quarter / Year / Month / Date to simplify user navigation when they try to compare quarters. In this example QTR1 label has sense.

rpbouman said...

Fabian,

of course it's good advice to make descriptive labels.

But IMO, from a query tool's perspective, the problem does not go away. There will always be schemas with labels that need context of ancestors and the query tool better be able to deal with it.

Because of this particular use case, I'm not really that worried about maintenance. The query tool should simply have a powerful generator that can figure out how to translate the user's requests into a query that can be executed and retrieve all required information.

If the query tool decides it needs to generate extra elements that are not explicitly present in the user's model, then so be it.

Final consideration - looking at the kind of labels you propose, I'm wondering whether this is really that good. It seems to me there is a risk of cluttering the view, because you're going to need a lot of text in some cases. In addition, creating these full context labels will have impact on the DWH loading process and add a maintenance cost on that end. I'm not saying that's bad, but I'm just pointing out that your solution requires maintenance too, just in another place.

It seems to me the ideal situation would be if the query tool would be able to access all properties associated with a member and allow the user to define precisely how to generate the labels, using whatever member name, caption, property or even properties from related members (at ancestor or maybe even child level). Maybe this is a bit much for a ad-hoc analysis tool but for a report writer this seems like a hard requirement.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...