Saturday, October 16, 2010

Hibernate: Joins Without Associations

I think I have seen this topic come up dozens of times on the Hibernate discussion forums:

How can I join a table in Hibernate when there is no association in the object model?


And the usual answer is:

Use Theta-style Joins in HQL, and it can't be done in Criteria API.


While this is basically true, I would like to elaborate a little bit more on that.

First of all, we should distinguish between joining and join fetching in Hibernate. Are we using the join in order to fetch a child collection or a single entity within the same query, or not? Because if we are not, there is the chance we don't need a Join at all, and a simple Exists-subquery will do as well, because the joined entity only appears in the From-/Where-clauses, not in the Select-clause. And in a correlated subquery, we can connect pretty much everything to the main query. This has other advantages too, e.g. avoiding unwanted join duplicates and achieving better performance.

Thinking about join-fetching, this might also give us an idea (at least by my interpretation) on why Theta-Joins are required in HQL. They basically look like this:

select foo, bar
from FooEntity as foo, BarEntity as bar
where foo.someothercol = 'foo' and
foo.somecol = bar.somecol


So we will end up with a resultset that contains a list, with each list element being an object-array of size 2, with obj[0] representing a FooEntity and obj[1] representing a BarEntity .

If we had by contrast an association between FooEntity and BarEntity, we could right something like:

select foo
from FooEntity foo
inner join fetch foo.Bars
where foo.someothercol = 'foo'


Looking at the query alone makes it pretty clear, that there must be a distinct definition on how the association between FooEntity and BarEntity looks like. This definition, usually representing a foreign key relation, will be applied on join fetches, lazy loading and - maybe most importantly - when updating associations. That is what Hibernate associations are all about. FooEntity.Bars will always contain child-rows following that association definition (either all child-rows, or a subset). This is why we can not choose any other arbitrary join criteria in order to fetch other BarEntities into FooEntity.Bars. It just does not make sense.

"But I want to use Criteria API", I can hear you say. OK, I'll come to that in a moment.

First I should mention at that point, that Hibernate also offers nice support for legacy databases in this context. Associations can be defined even when referencing non-primary keys. Unfortunately the NHibernate-guys have not ported that feature yet.

And I want to note that the restriction on associated rows can be narrowed down even more by providing an additional "With"-expression in de Join-Clause or within the Where-clause in HQL.

But I digress. So we have a solution for HQL, and we know why associations are needed for join fetching, and when there is no join fetching but only joining, we might be better off using an Exists subquery anyway. So what about Criteria API?

Fact is: Joining requires an association under Criteria API. Fact is also: There are join criterias that can not be expressed by an association. So the solution I came up with was: Use a database view as a kind of mediator. This view, let's call it FooBar, contains the join criteria expression in it's From/Where-clause, and selects the primary keys of the two associated tables Foo and Bar:

create view FooBar as
select Foo.Id as FooId,
Bar.Id as BarId
from Foo
inner join Bar on foo.somecol = bar.somecol


The view can as well contain further columns, and even a primary key (constructed on-the-fly), or might be materialized for performance reasons.

It is mapped into the Hibernate object model just as any table would be in case of a 1:1 or 1:N relation (and we can then define associations both to Foo and to Bar), or as a N:M mapping table. Hibernate won't even know (nor care), that it's a view. In our object model, we make those associations read-only (as we rather won't try to update the view), simply by providing public access only to an enumerator/iterator, resp. a getter but no setter for a single associated entity.

We can now join-fetch formerly non-associated entities under Criteria API as well, or project some of their columns into a flat resultset (AKA DTO-fetching). And we are able to provide further criteria in our queries' Where-clauses. Works like charm!

From a design viewpoint, we should not pollute our domain model with too many of those view-associations. What we can do is to provide an additional table mapping for those cases, e.g. by subclassing our main entities.

With all those options at hand, there is very little we can NOT do with Joins in Hibernate. And in that case, there is always native SQL support in Hibernate as well.

More Hibernate postings:

For database tuning tips, you may want to have a look at:


Friday, October 15, 2010

HashMap Vs. TreeMap

This is one of my favorite programming interview questions: "What's the difference between Hash Tables and Binary Search Trees?" (or: "HashMap vs. TreeMap"). Let's see...


Hash Tables (HashMap)Binary Search Trees (TreeMap)
AlgorithmKeys are mapped to values by using hash functions.

Hash functions transform the key to a numeric index (usually by calculating an integer value from the key first, and then applying a "modulo arraysize" operation). This index identifies an array element ("bucket"), where all corresponding values reside (values which all stem from keys with equal hash value).

The key must still be looked up within the bucket, but as hashing algorithms are supposed to ensure a good hash value distribution, hence small bucket size, this lookup is very fast.

The bucket array may have to be resized dynamically when a certain load factor is reached.
Node-based tree data structure, where every node contains one record (key-only, or key and value), and has a left subtree (with smaller keys) and a right subtree (with larger keys); hence keys must be comparable.

Self-balancing trees, like red-black-trees, keep their height as small as possible for fast searching and inserting.
Java ImplementationsHashtable
HashMap
HashSet
TreeMap
.NET ImplementationsHashtable
Dictionary
HashSet
SortedDictionary
SortedList
C++ STL Implementationsstd::unordered_map
std::unordered_set
std::map
SortedNoYes
Range SearchNoYes
Runtime ComplexityInserting: O(1) (normal case), O(N) (worst case, only with bad hash algorithm)

Searching: O(1) (normal case), O(N) (worst case, only with bad hash algorithm)

More details
Inserting: O(log(n)) (when balanced)

Searching: O(log(n)) (when balanced)

More details
Implementation NotesEqual objects must produce the same hash code, however unequal objects need not produce distinct hash codes. Equals() implementations must be reflexive, symmetric, transitive, consistent. More details.

.NET's ValueType.Equals() and GetHashCode() methods are based on reflection, hence slow; you should provide your own implementation in structs.
CompareTo() implementations must be reflexive, symmetric, transitive, consistent. More details.

Thursday, October 14, 2010

The Story Behind Balsamiq Mockups

Peldi Guilizzoni writes on the creation of Balsamiq Mockups. They have indeed built a wonderful tool, elegant in its design, simple in its usage, yet extremely powerful in what can be achieved by applying it. Our UI folks have been using Balsamiq Mockups since summer 2008, but I just happened to read the history of its creation today. What a truly inspiring story - sometimes one has to take risks to fulfill his dream.

Saturday, October 02, 2010

The Best Programmers...

The best programmers are not marginally better than merely good ones. They are an order-of-magnitude better, measured by whatever standard: conceptual creativity, speed, ingenuity of design, or problem-solving ability.
(Randall E. Stross)