Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Wednesday, January 27, 2010

Easter Eggs for MySQL and Kettle

To whom it may concern,

A MySQL stored function to calculate easter day


I uploaded a MySQL forge snippet for the f_easter() function. You can use this function in MySQL statements to calculate easter sunday for any given year:

mysql> select f_easter(year(now()));
+-----------------------+
| f_easter(year(now())) |
+-----------------------+
| 2010-04-04 |
+-----------------------+
1 row in set (0.00 sec)

Anonymous Gregorian algorithm


To implement it, I simply transcribed the code of the "Anonymous Gregorian algorithm" from wikipedia's Computus article.

You might ask yourself: "how does it work?". Frankly, I don't know. Much like a tax form, I treat the calculation as a black box. But, it's wikipedia, so it must be right, right?

A Javascript snippet to calculate easter day


I also transcribed the algorithm to javascript, so I could use it in Kettle (a.k.a. Pentaho Data Integration). Of course, nothing should stop you from using it for another environment, such as a webpage.

I don't have a proper place to host that code, so I'm listing it here:

//Script to calculate Easter day
//according to the "Anonymous Gregorian algorithm"
function easterDay(year) {
var a = year % 19,
b = Math.floor(year / 100),
c = year % 100,
d = Math.floor(b / 4),
e = b % 4,
f = Math.floor((b + 8) / 25),
g = Math.floor((b - f + 1) / 3),
h = (19 * a + b - d - g + 15) % 30,
i = Math.floor(c / 4),
k = c % 4,
L = (32 + 2 * e + 2 * i - h - k) % 7,
m = Math.floor((a + 11 * h + 22 * L) / 451),
n = h + L - 7 * m + 114;
return new Date(year,
Math.floor(n / 31) - 1,
(n % 31) + 1);
}

easter = easterDay(year);

To use this in your kettle transformations, create a stream with an field of the Integer type called year. The year field should take on the value of some year. In the step, create one output field of the Date type to take on the value of the easter script variable. (For usage outside kettle, just use the easterDay() function as you see fit)

Nice, but so what?


The thought may have crossed your mind: "So what, who cares - why should I ever want to know when it's easter day?"

Apparently, if you think like that, you don't like eggs very much. That's ok - I don't blame you. But I happen to like eggs, and people in the egg business like people that like eggs like me so they can sell them more eggs. In fact, they like selling eggs so much, that it makes a big difference to them whether their business intelligence reports say: "On March 22, 2008, we sold 10 times more eggs than on February 22 and May 22 of the same year" as compared to "In 2008, on the day before Easter, we only sold half the amount of eggs as compared to the day before Easter in 2009".

In order to report these facts, special events and holidays like easter are stored in a date dimension. (I wrote about creating a localized date dimension, a date dimension that speaks your language some time ago)

So there you go: you could use these solutions in order to build a date dimension that understands easter. The nice thing about easter is that it can be used to derive a whole bunch of other Christian holidays, like good friday, ascension, and pentecost, and in many western countries, these will be special days with regard to the normal course of business. I leave all these as an exercise to the reader, but trust me - calculating easter is the key to a solving a lot of these problems.

19 comments:

water outbreaks said...

Side note:

As an exercise left to the other readers, the computation of the Orthodox Easter may be equally or perhaps more important depending on the country's religious make up.

Roland Bouman said...

Hi water outbreaks,

pardon my ignorance - I wasn't aware of such a date. The Christian holidays I referred to can all be calculated by simple addition / substraction of days. You're implying the calculation of Orthodox Easter is not trivial, and I am the last one to doubt it. Should I have been aware of any hairy issues, I would have mentioned them.

Of course, if you can spare us a moment, feel free to elaborate what Orthodox Easter is, and if you have a way to calculate it, it'd be great if you could share that too.

Thanks in advance, and kind regards,

Roland.

Gerry Narvaja said...

Easter is calculated based on the lunar calendar and the Spring equinox (Northern Hemisphere), which means you should be able to use a similar formula for Passover, Ramadan and other important religious holidays.

Maybe you should start a forge.mysql.com page on religious calendars.

My $.02

Roland Bouman said...

Gerry, yes, indeed, easter is based on the lunar cycle. But from what I understand after reading a bit on the subject, there's also some convention and synchronization with the ecclestial calendar going on.

Anyway, I did look for calculations for other holidays as well, and found at least some that seem reasonably doable for Passover and Rosh Hashannah:

http://quasar.as.utexas.edu/BillInfo/ReligiousCalendars.html

crorella said...

Hi Roland,

nice post, let me say that I had 0 experience with Kettle, but after buying your (and Jos) book I learned to work with this tool and also many other things about Pentaho. I even implement this algorithm transformation... it took me less than 2 mins! (http://bit.ly/b1292s)

This post give me the idea to complement my date dimension with other religious holidays, there many in the form "The first day of month's nn week".

I'm looking forward for the next book about Kettle :)

Regards

Matt Casters said...

I couldn't help myself and posted another version using Kettle UDJC:

http://www.ibridge.be/?p=180

Roland Bouman said...

@crorella: thanks for the kind words! great you managed to put it together in kettle so fast - I'm glad, that was exactly my intention.

@Matt: hehe, as I was writing it, I was contemplating doing a Janinio, Calculator and Formula example, but resisted - and rightly so, now we can use your work instead :)

Thanks!

Rafa said...

This came at the right time! I was doing some changes to my time_dim at the time I saw this. Easter was the only special day left, and it felt on my lap :D

Thanks a lot Roland.

Roland Bouman said...

Hi Rafa,

glad this was helpful for you :)

Can you tell us what other special dates you recognize? I'd be interested in hearing about them, and maybe build a little toolkit for these things.

Rafa said...

Hi Roland,

It would be very nice to have a toolkit for special/holidays. But the special days and holidays are always very dependent to what country/religion you are building information for. Most of the floating holidays are Easter related (as for my knowledge) and in my case (I'm working with Portuguese and Spanish data), besides all the static holidays, I just needed to calculate Easter and three other holidays: First one is 47 days before easter, second one is 2 days before easter and the other is 60 days after easter.

I thing it is an interesting Idea, but I'm not seeing a way to do that without having to make cases for a considerable big set of countries.

Cheers!

Roland Bouman said...

Rafa,

Indeed. The ones I care about are either fixed on a certain date (new years day, christmas, queens day, liberation day) with a regular recurrence, or the christian holidays that can be derived from easter: good friday (friday before easter), ascension (40 days after easter) and pentecost (10 days after ascension).

What's with 47 days before easter and 60 days after easter?

Regarding locality: taking that into account was exactly the idea. It is doable to manually set up special cases for one country/locality. The challenge would be to come up with something that is readily usable even if you're doing an ETL gig outside your country.

Rafa said...

Roland,

The movable holidays I need for Portugal are:
Carnival - 47 days before Easter
Good Friday - 2 days before Easter
Corpus Christi (feast) - 60 days after Easter

More info:
http://en.wikipedia.org/wiki/Public_holidays_in_Portugal

As for the toolkit, it could be built to work with country related information based on a local parameter and using some kind of DB, web service or any kind of properties file. The information might be incrementally added by the users and shared to all.

In my case, I have a SQL procedure to handle time dimension but it would be very nice to have a ETL step (on kettle) to handle this task and automatically set special days.

But I think that in order to make a complete toolkit to populate time dimensions, a way to define Business special days and boundaries should be considered. Depending on the brand/business in case, they are very subjective too.

In my opinion, your idea is very good and it is of good help for every ETL designer for sure.

Cheers

John Dzilvelis said...

Hi Roland. The Orthodox Easter date is always after Passover. The relevence is not just for the Egg company either. The 40 days leading up to Orthodox Easter includes a fast, especially from meat.

Anonymous said...

Roland: in the USA at least, 47 days before Easter is "Mardi Gras" - aka "Fat Tuesday" - which is the last day before Lent begins on "Ash Wednesday"

Previous to this, I've been using the Date::Easter perl module, but obviously when not using perl that's a bit of a hassle. Having the SQL function makes things quite a bit easier. Thanks!

Flavio Daher said...

Man, this is so useful here in Brazil! Obtaining the exact date of the Carnival holiday, 47 days before Easter, is a not so trivial calculation (as stated above).

This information is very for resource planning and business projections since this holiday increases local tourism and causes a central economic impact for the season.

Thanks for the tip!

Lazer epilasyon Adana said...

Regarding locality: taking that into account was exactly the idea. It is doable to manually set up special cases for one country/locality. The challenge would be to come up with something that is readily usable even if you're doing an ETL gig outside your country.

Roland Bouman said...

Lazer epilasyon Adana,

I think it's not really worth waiting until you find a generic, good-for-all-cases approach. What I think does make sense is create a toolkit of utilities that help one tackle common or at least repeatedly occurring cases.

If a few people would share their stuff, i think such a toolkit could grow quite quickly.

Mahesh Chowdary said...

Hi All,
I have Table called Test with some fields like
Id NOTNULL,Name NOTNULL,Amount Null.
when i inserted data into test table like ID and Amount,the trigger has to be raised.and i want to display user msg like name should must eneter.how to achieve this by using triggers

Roland Bouman said...

Mahesh,

please take off-topic questions to public forums.