Ancestor()
function.
I stumbled upon the topic of my previous blogpost while I was researching something else entirely. Creating flat tables and looking up individual ancestors is actually a rather specific application of a much more general solution I found initially.
Pivot tables and the "Show Parents" functionality
GUI OLAP tools typically offer a pivot table query interface. They let you drag and drop measures and dimension items, like members and levels to create a pivot table. The cells of the pivot table are aggregated values of the measures, and the row and column headers of the pivot table are dimension members, which are typically derived from a level that was dragged into the pivot table.Please recall the sales cube example I introduced in my previous post:
Now, suppose we would drag the Sales quantity measure unto the columns axis of our pivot table, and drag the Quarters level from the Time dimension unto the rows axis. The GUI tool might generate an MDX query quite like the one I introduced in my previous post:
SELECT Measures.Quantity ON COLUMNS , Time.Quarters.Members ON ROWS FROM SteelWheelsSalesHere's how this is rendered in Saiku Analytics:
And here's how it looks in Pivot4J:
Now, as I pointed out in my previous post, the problem with this result is that we don't see any context: we cannot see to which year the quarters belong. Both tools have a very useful feature called "Show parents". This is a toggle button that changes the view so that the headers show the values of the corresponding higher levels. For example, this is what the previous result looks like in Pivot4J when "Show Parents" is toggled:
As you can see, the year level and even the "All level" is now visible.
In Saiku we can achieve a similar thing, but the other way around: you can add the year and the all level, at which point totals are shown for these higher levels:
And you can then choose "Hide Parents" to get rid of the rows for the higher level aggregats, leaving you with essentially the same view of the data as shown in the last Pivot4J screenshot.
Implementing Show/Hide Parents
In Saiku, the "Hide Parents" functionality is achieved by post-processing the resultset: when the result is iterated to render the table, rows for all but the lowest level are filtered away and discarded.In Pivot4J, it works a little bit different. Here's how Xavier Cho describes it:
the information of the parents is obtained by the members present on the axes. Pivot4J accesses it through Olap4J API which exposes a member's parent and ancestors via Member.getParentMember() and Member.getAncestorMembers() respectively:
http://www.olap4j.org/api/org/olap4j/metadata/Member.html
References to the member instances in a given MDX can be obtained by its CellSet interface, which is equivalent to what is ResultSet for JDBC. In addition, Pivot4J exposes the member instance for each cells to the expression language context, so you can reference itself, or its parent or ancestors in a property expression too.
In summary, if you are trying to access the parent of a member included in MDX, you'll first need to execute the query using the Olap4J then get it from the resulting CellSet instance.
A pure MDX expression
I thought it would be fun to try and rewrite our original query in such a way that its result would give us this information.The Ancestors()
function
As it turns out, we can do this for one particular hierarchy in our query by creating a Calculated Member on the Measures hierarchy that applies the Ancestors()
function to the current member of the hierarchy for which we want the path.
The
Ancestors()
function takes 2 arguments- A member for which to find ancestor members (members at a higher level that contain the argument member)
- An argument that specifies how many levels to traverse up.
Specifying the first argument is easy: we simply want to find ancestors for whatever member, so we can specify it as
<Hierarchy Name>.CurrentMember
and it will just work.
The second argument can be specified in 2 ways:
- As a level: the second argument specifies a level and all ancestors up to that level will be retrieved
- As a integer representing a distance: the second argument specifies the number of levels that will be traversed upwards
LEVEL_NUMBER
property using an expression like:
<Hierarchy Name>.CurrentMember.Properties("LEVEL_NUMBER")
But this is not yet entirely right, since this form of the
Properties()
function always returns a string, even though the LEVEL_NUMBER
property is actually of the integer type. The standard MDX Properties()
function allows an optional second argument TYPED
. When this is passed, the property will be returned as a value having its declared type.
Unfortunately, Mondrian, a.k.a. Pentaho Analysis Services does not support that form of the
Properties()
function (see: MONDRIAN-1795). So, in order to retrieve the level number as an integer value, we have to apply the CInt()
function to convert the string representation of the level number to an integer.
So, our call to the
Ancestors()
function will look like this:
Ancestors(<Hierarchy Name>.CurrentMember, CInt(<Hierarchy Name>.CurrentMember.Properties("LEVEL_NUMBER")))
A simpler alternative: Ascendants()
If it is acceptable to also include the CurrentMember
itself, then we can even simplify this quite a bit by using the Ascendants()
function. The Ascendants()
function takes a single member as argument, and returns the set of ancestor members as well as the argument member, all the way up to the member at the top level. With Ascendants()
, our expression would simply be:
Ascendants(<Hierarchy Name>.CurrentMember)
We will continue this post using
Ancestors()
, but the approach can be easily applied to Ascendants()
instead.
Converting the set of Ancestor members to a scalar value
However we can't just use the bareAncestors()
expression in our query, nor can we use it as is to create a calculated member. That's because Ancestors()
returns a set of members, while we want something that we can retrieve from the cells in the result.
As an initial attempt we can try and see if we can use the
SetToStr()
function, which takes a set as argument and returns a string representation of that set. So We can now finally write a query and it would look something like this:
WITH
MEMBER Measures.[Time Ancestors]
AS SetToStr(
Ancestors(
Time.CurrentMember,
CInt(
Time.CurrentMember.Properties("LEVEL_NUMBER")
)
)
)
SELECT Measures.[Time Ancestors] ON COLUMNS
, Time.Quarters.Members ON ROWS
FROM SteelWheelsSales
The results might look something like this:
Time | Time Ancestors |
---|---|
QTR1 | {[Time].[2003], [Time].[All Years]} |
QTR2 | {[Time].[2003], [Time].[All Years]} |
QTR3 | {[Time].[2003], [Time].[All Years]} |
QTR4 | {[Time].[2003], [Time].[All Years]} |
QTR1 | {[Time].[2004], [Time].[All Years]} |
QTR2 | {[Time].[2004], [Time].[All Years]} |
QTR3 | {[Time].[2004], [Time].[All Years]} |
QTR4 | {[Time].[2004], [Time].[All Years]} |
QTR1 | {[Time].[2005], [Time].[All Years]} |
QTR2 | {[Time].[2005], [Time].[All Years]} |
- The string representation returned by
SetToStr()
looks very much like how one would write the set down as a MDX set literal (is that a thing? It should be :-). While entirely correct, it does not look very friendly and it is certainly quite a bit different from what our GUI tools present to end-users - The order of the members. It looks like
Ancestors()
returns the members in order of upward traversal, that is to say, from lower levels (=higher level numbers) to higher levels (=lower level numbers). The fancy way of saying that is that our result suggests thatAncestors()
returns its members in post-natural order. We'd like the members to be in natural order, that is to say, in descending order of level (from high to low). Note that the specification ofAncestors()
does not specify or require any particular order. So in the general case we should not rely on the results to be in any particular order.
Order()
is general purpose function that can be used to order the members of a set by an arbitrary numberic expression.Hierarchize()
is designed to order members into hierarchical order, that is to say, the members are ordered by their level number and by the level number of any of its ancestors.
Order()
is a nice and reasonable choice, Hierarchize()
seems tailored exactly for our purpose so that's what we'll use:
WITH
MEMBER Measures.[Time Ancestors]
AS SetToStr(
Hierarchize(
Ancestors(
Time.CurrentMember,
CInt(
Time.CurrentMember.Properties("LEVEL_NUMBER")
)
)
)
)
SELECT Measures.[Time Ancestors] ON COLUMNS
, Time.Quarters.Members ON ROWS
FROM SteelWheelsSales
And the result will now look like:
Time | Time Ancestors |
---|---|
QTR1 | {[Time].[All Years], [Time].[2003]} |
QTR2 | {[Time].[All Years], [Time].[2003]} |
QTR3 | {[Time].[All Years], [Time].[2003]} |
QTR4 | {[Time].[All Years], [Time].[2003]} |
QTR1 | {[Time].[All Years], [Time].[2004]} |
QTR2 | {[Time].[All Years], [Time].[2004]} |
QTR3 | {[Time].[All Years], [Time].[2004]} |
QTR4 | {[Time].[All Years], [Time].[2004]} |
QTR1 | {[Time].[All Years], [Time].[2005]} |
QTR2 | {[Time].[All Years], [Time].[2005]} |
To represent members we have a handful of options: we could use the member name, or we could use its key value; however since we want to expose the information to the user, the only thing that seems really suitable is the member caption. Placing that data into the GUI is an implementation detail that need not concern us too much at this point. Let's say we aim to return the data as a comma-separated list, and assume our GUI tool is capable of extracting that data and then use it to render a result.
The function that seems to suit our need is called
Generate()
. There are actually 2 forms of Generate()
, which frankly seem to suit completely different purposes. The form we're interested in is functionally quite similar to the MySQL-builtin aggregate function GROUP_CONCAT()
.
The arguments to this form of
Generate()
are:
- A set. This is where we'll feed the
Ancestors()
expression in - A string expression. This expression will be evaluated for each member in the set passed as first argument. We'll use this to retrieve the caption of the current member of the hiearchy for which we're generating the ancestors list.
- A separator.
Generate()
concatenates the result values returned by the string expression passed as second argument, and this string will be used to separate those values. Since we want to obtain a comma-separated list, we'll use the literal string", "
for this argument.
Putting it together, our query becomes:
WITH MEMBER Measures.[Time Ancestors] AS Generate( Hierarchize( Ancestors( Time.CurrentMember, CInt( Time.CurrentMember.Properties("LEVEL_NUMBER") ) ) ) , Time.CurrentMember.Properties("MEMBER_CAPTION") , "," ) SELECT Measures.[Time Ancestors] ON COLUMNS , Time.Quarters.Members ON ROWS FROM SteelWheelsSalesAnd the result:
Time | Time Ancestors |
---|---|
QTR1 | All Years,2003 |
QTR2 | All Years,2003 |
QTR3 | All Years,2003 |
QTR4 | All Years,2003 |
QTR1 | All Years,2004 |
QTR2 | All Years,2004 |
QTR3 | All Years,2004 |
QTR4 | All Years,2004 |
QTR1 | All Years,2005 |
QTR2 | All Years,2005 |
Ancestor()
function in the previous post.
2 comments:
I usually solve context problems by simply using better labels. In this case QTR1 could be Q1/2003, Q1/2004, Q1/2005.
Time Time Ancestors
Q1/2003 All Years,2003
Q2/2003 All Years,2003
Q3/2003 All Years,2003
Q4/2003 All Years,2003
Q1/2004 All Years,2004
Q2/2004 All Years,2004
Q3/2004 All Years,2004
Q4/2004 All Years,2004
Q1/2005 All Years,2005
Q2/2005 All Years,2005
Now the context is implicit, labels are still short, now you don't need ancestors to understand the MDX results.
But end users always to ask for ancestors, so my response is just press show ancestors button. This kind of options are efficient and allow me to keep the MDX simple, also important for maintenance reasons, maintenance, maintenance, maintenance.
Fortunately with Mondrian you can define alternate hierarchies, for example Quarter / Year / Month / Date to simplify user navigation when they try to compare quarters. In this example QTR1 label has sense.
Fabian,
of course it's good advice to make descriptive labels.
But IMO, from a query tool's perspective, the problem does not go away. There will always be schemas with labels that need context of ancestors and the query tool better be able to deal with it.
Because of this particular use case, I'm not really that worried about maintenance. The query tool should simply have a powerful generator that can figure out how to translate the user's requests into a query that can be executed and retrieve all required information.
If the query tool decides it needs to generate extra elements that are not explicitly present in the user's model, then so be it.
Final consideration - looking at the kind of labels you propose, I'm wondering whether this is really that good. It seems to me there is a risk of cluttering the view, because you're going to need a lot of text in some cases. In addition, creating these full context labels will have impact on the DWH loading process and add a maintenance cost on that end. I'm not saying that's bad, but I'm just pointing out that your solution requires maintenance too, just in another place.
It seems to me the ideal situation would be if the query tool would be able to access all properties associated with a member and allow the user to define precisely how to generate the labels, using whatever member name, caption, property or even properties from related members (at ancestor or maybe even child level). Maybe this is a bit much for a ad-hoc analysis tool but for a report writer this seems like a hard requirement.
Post a Comment