**TL;DR**: A Year-to-Date solution based on a

**SUM**()

window function is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a self-`JOIN`

(combined with a `GROUP BY`

), a subquery, and a `UNION`

(also combined with a `GROUP BY`

).
Note: this is the 4th post in a series.

- For sample data and setup, please see the 1st post in this series.
- For a solution based on a self-

and**JOIN**

, please find the 2nd post in this series.**GROUP BY** - For a solution based on a subquery, please find the 3rd post in this series.

### Using a `UNION`

We mentioned how the solution with the

`JOIN`

relates each row of the main set with a subset of "extra" rows over which the YTD value is calculated by aggregating over the key of the main set using a `GROUP BY`

.
It may not be immediately obvious, but we can also use the SQL

`UNION`

(or rather, `UNION ALL`

) operator to generate such a related subset.
Just like with the `JOIN`

-solution, this can then be aggregated using `GROUP BY`

.
An example will help to explain this:selectSalesYear , SalesMonth ,sum(SumOfSalesAmount)asSumOfSalesAmount ,sum(YtdOfSumOfSalesAmount)asYtdOfSumOfSalesAmountfrom(selectSalesYear , SalesMonth , SumOfSalesAmount , SumOfSalesAmountasYtdOfSumOfSalesAmountfromSalesYearMonthunion all-- JANUARYselectSalesYear , SalesMonth + 1 -- february ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 1union allselectSalesYear , SalesMonth + 2 -- march ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 1union all... and so on, all for JANUARY ...union allselectSalesYear , SalesMonth + 11 -- december ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 1union all-- FEBRUARYselectSalesYear , SalesMonth + 1 -- march ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 2union all... and so on, for the rest of FEBRUARY, and then again for MARCH, APRIl, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER... -- NOVEMBERselectSalesYear , SalesMonth + 1 -- december ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 11 ) Salesgroup bySalesYear , SalesMonth

#### Duplicating metric-data so it contributes to the following months

In the top of the

`UNION`

we simply provide the entire resultset from `SalesYearMonth`

, reporting the `SumOfSalesAmount`

-metric as is, but also copying it to `YtdSumOfSalesAmount`

.
The other parts of the `UNION`

are used to selectively duplicate the data for the `SumOfSalesAmount`

-metric into the `YtdSumOfSalesAmount`

, so that its data contributes to the `YtdSumOfSalesAmount`

for all following months.
We start by grabbing january's data by applying the condition that demands that the

`SalesMonth`

equals 1:-- JANUARY. This is done for a total of 11 times, each time adding 1, 2, 3 and so on - all the way up to 11 - to theselectSalesYear , SalesMonth + 1 -- february ,null, SumOfSalesAmountfromSalesYearMonthwhereSalesMonth = 1union all... repeat to duplicate january's data into february, march, and so on all the way up to december...

`SalesMonth`

attribute.
This ensures january's data, as captured by the condition in the `WHERE`

clause, is reported also in february (`SalesMonth + 1`

), march (`SalesMonth + 2`

), and so on, all the way up to december (`SalesMonth + 11`

).
After the string of

`UNION`

s for january appear more parts to duplicate the data also for february and all following months:
-- FEBRUARY
**select**SalesYear , SalesMonth + 1 -- march ,

**null**, SumOfSalesAmount

**from**SalesYearMonth

**where**SalesMonth = 02

**union all**... repeat to duplicate february's data into march, april, and so on all the way up to december... . Again, february's data is selected by applying the condition

, and this happens now 10 times, again adding a number to thewhereSalesMonth = 2

`SalesMonth`

so it is duplicated to march, april, may, all the way up to december - in other words, all months following february.
What we thus did for january and februry is repeated for march, april, and so on for all months up to november. November is the last month we need to do this for: November's data still needs to be copied to december, but as december is the last month, that data only needs to be counted in december itself.

While it may seem wasteful to duplicate all this data, it really is not that different in that respect from the other solutions we've seen so far. It's just that now it's really in your face, because there is a pretty direct correspondence between the SQL code and the data sets that are being handled. The

`JOIN`

and subquery solutions hande similar amounts of data, it's just achieved with way less code, and in a far more implicit manner.
#### Original metric is retained

Note that the original metric also computes correctly, because the parts of the union only duplicate the data to the YTD column. The union parts that duplicate the data to the subsequent months select a

`NULL`

for the original metric.
So the data for the original metric is never duplicated, and thus retains its normal value.
#### Drawbacks to the `UNION`

-solution

The main drawback to the

`UNION`

-solution is its maintainability.
A lot of code is required, far more than for any of the methods we have seen so far.
Despite the indiviual patterns are simple (condition to get one month, adding a number to project that data to a future month), it is suprisingly easy to make a little mistake somewhere
We just argued that this solution is not so much different from the

`JOIN`

solution, but that remark only pertains to how the calculation is performed.
The `JOIN`

-solution generates the data it operates upon dynamically and declaratively; the `UNION`

solution does this statically and explicitly.
This is also why it is impossible to generalize ths approach for any arbitrary `JOIN`

: YTD is a special case, because we know exactly how often we should duplicate the data as this is dictated by the cyclical structure of our calendar.
### Next installment: Solution 4 - window functions

In the next installment we will present and discuss a solution based on a window functions.

## No comments:

Post a Comment