Wednesday, January 26, 2011

Is "Select Distinct" a Code Smell?

From my experience, "Select distinct" indeed is a code smell (and I am not alone). Why would I fetch something, just to get rid of it again in the next step? There are situations when "distinct" makes sense, but they are rare...

In 9 out of 10 cases when I encountered a "Select distinct", the programmer tried to remove duplicate rows caused by joins along multiple too-many associations (AKA cartesian product). This not only is bad programming style, it might also kill performance.

There is always a way to rephrase such a query in order to avoid join duplicates, e.g. by replacing joins with exist-clauses, using subselects within the select-clause, derived tables in the from-clause, etc.