I reread the article by Mike Hillyer, especially the bit that handles inserting new nodes. I like the function I wrote because it uses a single insert statement to handle both the rearrangement of the tree as well as the insertion of the new node. This also relieves me from the task of explicitly locking tables.
On the other hand, the wee lines Mike uses to perform an insert seem so elegant, that I can't help thinking I'm doing something horribly wrong. But then again, in Mike's article, the user that's inserting the nodes somehow magically knows where to make room in the tree. The whole point of using a hybrid model like I want to, is to allow for easy inserts (and updates for that matter, although I'd still have to implement it) as well as straighforward, efficient extraction. On the other hand, what I really want is to encapsulate all the tree maintenance within triggers, but I know already that this approach will not be easy to realize in MySQL.
Think of it, I would want to build triggers that handle insert,update and delete, but in order to insert or to update or to delete, I would have to rearrange the tree and run some updates as a side effect. This would of course fire my update trigger, which would have to be so smart as to prevent reentry. The fact that mysql only supports row triggers makes it all very cumbersome.
In Oracle or MSSQL I'd probably solve this by writing a view with INSTEAD OF triggers. These triggers would perform the necesary actions in the base table, and everything would be hunky dory.
The table would not be granted to anyone, but the view would. The view wouldn't even expose the left and right attributes upon which the nested set model relies. Anyway, I can't do this in MySQL, not now anyway, so I'm still willing to use other soltions.
Now, I have not yet investigated it thoroughly, but I think that, besides the hybrid solution, there's another way to handle hierarchies that is about as intuitive as the adjacency list model, and yet nearly as manageable as the nested set model when it comes to extracting data from it. I don't know if there's an official name for it, but I will call it the 'ordered level' model until someone prompts me for something better than that.
create table ordered_level(
id int unsigned not null auto_increment
, scope int unsigned not null
, position int unsigned not null
, level int unsigned not null
, label varchar(32) not null
, constraint pk_ordered_level
primary key (id)
, constraint uk_ordered_level
unique (
scope
, position
)
);
insert into ordered_level(scope,position,level,label)values
(1,1,1'Clothing')
,(1,2,2'Footwear')
,(1,5,2'Suits')
,(1,3,3'Shoes')
,(1,4,3'Socks')
,(2,1,1'Food')
,(2,2,2'Beverages')
,(2,5,2'Pastry')
,(2,3,3'Alcoholic')
,(2,4,3'Non-Alcoholic')
Like the nested set model, the ordered level model may be seen as a serialization of the depth first traversal of the tree underlying the hierarchy. The difference with the nested set as described by Joe Celko and Mike Hillyer is that instead of maintaining a
left_node
and a right_node
property, we now rember only the position
and level
properties. The
position
is quite like left_node
in the nested set in that it stores a depth first traversal path through the tree. The level
is needed to add depth to the tree. Together, level
and position
are sufficient to maintain the hierarchy.The
scope
property is there to allow for multiple trees. So, the position
reflects a depth first traversal within the tree identified by scope
. The advantadge to me previous attemt is that it's now a lot easier to keep trees separate. This is reflected in updates: instead of having a single insert potetially update the entire forest, the worst what can happen here is only an entire tree needs to be updated.Querying is not very hard compared to the original nested set, and some queries are actually easier:
/**
* printing an entire tree
*
*/
select scope
, id
, position
, level
, concat(
repeat(' ',4*(level-1))
, label
) label
from ordered_level ol
order by scope
, position
/*
* printing a tree starting with an arbitrary node
*
*/
select r.scope scope
, d.position position
, r.id rid
, r.level rlevel
, r.label rlabel
, d.id did
, d.level dlevel
, concat(
repeat(' '
, (d.level-1)*4)
, d.label
) label
from ordered_level r
inner join ordered_level d
on r.scope = d.scope
and r.position <= d.position
and r.level <= d.level
where d.position < (
select coalesce(min(position),d.position+1)
from ordered_level n
where n.scope = r.scope
and n.level = r.level
and n.position > r.position
)
order by r.scope
, r.level
, r.position
, d.position
For the latter ,there's a variant that does not need a correlated subquery:
select r.scope scope
, d.position position
, r.id rid
, r.level rlevel
, r.label rlabel
, d.id did
, d.level dlevel
, concat(
repeat(' '
, (d.level-1)*4)
, d.label
) dlabel
from (select r.id id
, r.scope scope
, r.position position
, r.level level
, r.label label
, min(n.position) next_position
from ordered_level r
left join ordered_level n
on r.scope = n.scope
and r.level = n.level
and r.position < n.position
group by r.id
, r.scope
, r.position
, r.level
, r.label
) as r
inner join ordered_level d
on r.scope = d.scope
and r.position <= d.position
and r.level <= d.level
where (r.next_position > d.position
or r.next_position is null)
order by scope
, r.level
, r.position
, position
Getting all the leafs is harder than in the nested set though:
/**
* Getting all the leafs
*
*/
select *
from ordered_level ol
where not exists (
select null
from ordered_level p
where p.scope = ol.scope
and p.level > ol.level
)
Getting all the roots is a similar query:
/**
* Getting all the roots
*
*/
select *
from ordered_level ol
where not exists (
select null
from ordered_level p
where p.scope = ol.scope
and p.level < ol.level
)
When we don't want to make the strucure redundant by combining it with the adjacency list model, we must resort to something like this to retrieve a parent or the children:
/**
*
* Getting the parent
*/
select c.label
, p.label
from ordered_level c
left join ordered_level p
on c.scope = p.scope
and c.position > p.position
and c.level > p.level
where p.position = (
select max(pp.position)
from ordered_level pp
where pp.scope = c.scope
and pp.position < c.position
and pp.level < c.level
)
or p.position is null
/**
*
* Getting the children
*
*/
select r.id
, r.level
, r.label
, d.id
, d.level
, d.position
, d.label
from ordered_level r
inner join ordered_level d
on r.scope = d.scope
and r.position <= d.position
and r.level <= d.level
where d.position < coalesce((
select min(n.position)
from ordered_level n
where n.scope = r.scope
and n.level = r.level
and n.position > r.position
),d.position+1)
and d.level = r.level + 1
and r.id = 18
Getting all ancestors is similar:
/*
*
* getting the ancestors
*
*/
select a.id
, a.position
, a.level
, concat(
repeat(' ',4*(a.level-1))
, a.label
) label
from ordered_level c
inner join ordered_level a
on c.scope = a.scope
and c.position >= a.position
and c.level >= a.level
where a.position = (
select max(aa.position)
from ordered_level aa
where aa.scope = c.scope
and aa.level = a.level
and aa.position <= c.position
group by aa.level
)
and c.id = 16
For those who are wondering: no, I did not just make this model up. I picked it up working with Microsoft Project. When you save a project in the database, the task list is stored according to such a datamodel. I've known that structure for quite some time, but only now I see the resemblance to the nested set, in that this too relies on linearization of the depth first traversal of the hierarchy.