Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Wednesday, September 14, 2005

Connect by problem: Yep, it's solved alright

Those that have read the past few of my posts on this blog will probably think by now that I'm obsessed with hierarchies. Well, that's right! So, without much ado, here's another related post.

Do you remember that 'slight' CONNECT BY problem I wrote about in my post of 18 august? I've got some wonderful news: it really is solved. Some revision of the Oracle SQL Reference (9.2 - Release 2) tells me that Oracle has a paricular construct that deals with it.

A short recapitulation. Oracle has a special SQL construct that allows you to write a recursive query. This is used to query hierarchies that are modelled using a adjacency list model. Implemented in a relational database, you'd need a table that stores each node in the hierarchy as a row. The table would be set up to have a self-referencing relationship, so that each row would be able to have a 'pointer' to the row that represents the parent node in the hierarchy:

create table employee(
id number not null
, person_id number not null
, formationplace_id number not null
, boss_employee_id number
, constraint pk_employee
primary key(id)
, constraint fk_employee_boss
foreign key(boss_employee_id)
references employee(id)

Here, for a particular employee, the record that represents that employee would have the boss_employee_id hold the value of the id column of the employee record that represents it's boss. So, boss_employee_id points to id, identifying the parent record in the hierarchy.

Now, with the usual SQL constructs, you can't write a query that selects an entire tree; at least, you cannot solve it for an arbitrary number of levels. This is because this structure is recursive. You'd need a recursive query construct to solve it.

Of course, you could use PL/SQL, or an external language with either recursive or iterative possibilities to traverse an entire tree, but these are certainly not 'usual SQL constructs'. (Anyway, you could only use such a tool to traverse the tree, and not to define the resultset for use in other queries)

Now, Oracle has a special language construct to deal with this type of structure, CONNECT BY...PRIOR...START WITH. To select a node and all of it's descendants, we'd do:

select *
from employee e
connect by
prior = e.boss_employee_id
start with = :id_of_interest

So, this tells oracle to get the employee with id equal to :id_of interest first. This represent the root of the tree we are retrieving. then, the records are fetched that have the boss_employee_id equal to the id of the first node. This yields the direct children of our first node. Now, for each of these, we could repeat the process to fetch their direct children until no more rows are found.

The direction -top to bottom- can be reversed by reversing the operands in the CONNECT BY PRIOR clause:

select *
from employee e
connect by
prior e.boss_employee_id =
start with = :id_of_interest

The initial node is the same, but the query will now retrieve it's ancestors rather than it's descendants.

Now, let's stick with the query that get's us all the descendants. What we'd like to be able to do is to print out the hierarchy in some sort of tree view. To be sure, what we mean is a depth-first ordering of the nodes. So, first our initial node, than after that it's first child, and then the children of that node...
When you give it a couple of cracks, you will this discover that you can't. At least not without another 'special' language construct. The construct I'm referring to is the ORDER SIBLINGS BY clause. This just keeps the siblings together, order by some expression:

select *
from employee e
connect by
prior = e.boss_employee_id
start with = :id_of_interest
order siblings by

This is of course much nicer than the makeshift solution that i came up with. It's easier to read, and it does not suffer from the major flaw in my solution, which was not suitable for an arbitrary depth.

So, I'm glad that one's solved. I guess I should've read that SQL Reference better.

1 comment:

Mallesh said...

This was really helpful for me, when I was in real trouble. Thanks for the blog