Friday, January 28, 2011

SqlServer: When querying previously inserted rows within the same transaction is slow...

... this might have to do with table statistics not having been updated yet!

I was banging my head on this problem today: One and the same query took minutes when run within a transaction, and only seconds when run right after transaction commit. What's special about that case is, that the query reads from a table, that undergoes massive inserting just the statement before (insert-by-select).

As I was testing in an isolated environment without any other connections, it was pretty clear from that start, that this was not a locking issue. I suspected a query optimizer problem from the beginning, and tried "option(recompile)", rephrasing the query (it was pretty complex, three levels of subqueries and the like) and even indexed views, but to no avail.

I was about to dig into a huge profiled transaction batch execution plan (the same query's post-transaction execution plan had looked fine), when I found this posting, and scales fell from my eyes. My immediate guess: this was not an index issue (as the original poster had presumed), this was an "outdated statistics" issue! See, every index has statistics attached, and rebuilding an index automatically updates its statistics as well.

But instead of calling some heavyweight "dbcc dbreindex" as explained in the posting (index rebuild should go to a maintenance plan, but not be placed in an application runtime SQL batch!), I simply invoked "update statistics" on the table in question, right between the insert statement and the query. Voila! Query optimizer is back on track, and now executes the same query about a hundred times faster. I should note though, that "update statistics" also comes with a certain overhead (two to three seconds in our case).

I was aware that SqlServer decides on its own when to update statistics. Seems as if 50,000 new rows and a complex follow-up query isn't necessarily considered sufficient reason for a statistics update, at least not when being part of the same transaction.

Wednesday, January 26, 2011

Is "Select Distinct" a Code Smell?

From my experience, "Select distinct" indeed is a code smell (and I am not alone). Why would I fetch something, just to get rid of it again in the next step? There are situations when "distinct" makes sense, but they are rare...

In 9 out of 10 cases when I encountered a "Select distinct", the programmer tried to remove duplicate rows caused by joins along multiple too-many associations (AKA cartesian product). This not only is bad programming style, it might also kill performance.

There is always a way to rephrase such a query in order to avoid join duplicates, e.g. by replacing joins with exist-clauses, using subselects within the select-clause, derived tables in the from-clause, etc.

Sunday, January 23, 2011

Hibernate Performance Tuning Tips

Top 10 Hibernate Performance Tuning Recommendations:

(1) Avoid join duplicates (AKA cartesian products) due to joins along two or more parallel to-many associations; use Exists-subqueries, multiple queries or fetch="subselect" (see (2)) instead - whatever is most appropriate in the specific situation. Join duplicates are already pretty bad in plain SQL, but things get even worse when they occur within Hibernate, because of unnecessary mapping workload and child collections containing duplicates.

(2) Define lazy loading as the preferred association loading strategy, and consider applying fetch="batch-size" or "subselect" rather than "select". Configure eager loading only for special associations, but join-fetch selectively on a per-query basis.

(3) In case of read-only services with huge query resultsets, use projections and fetch into flat DTOs (e.g. via AliasToBean-ResultTransformer), instead of loading thousands of mapped objects into the Session.

(4) Take advantage of StatelessSessions, HQL Bulk Update and Delete statements, as well as Insert-By-Select (supported by HQL as well).

(5) Set FlushMode to "Never" on Queries and Criteria, when flushing is not necessary at this point.

(6) Set ReadOnly to "true" on Queries and Criteria, when objects returned will never be modified.

(7) Consider clearing the whole Session after flushing, or evict on a per-object basis, once objects are not longer needed.

(8) Define a suitable value for jdbc.batch_size (resp. adonet.batch_size under NHibernate).

(9) Use Hibernate Query-Cache and Second Level Caching where appropriate (but go sure you are aware of the consequences).

(10) Set hibernate.show_sql to "false" and ensure that Hibernate logging is running at the lowest possible loglevel (also check log4j/log4net root logger configuration).

Special Addendum:

NHibernate's 2.x ProjectionList implementation added considerable runtime-overhead, namely by invoking some reflection API calls for every resultset row. This might slow down queries with large resultsets.

As those reflection API calls were invariant in regard to the resultset rows being iterated over (as a matter of fact, what these calls actually did was to gather resultset column schema information), and I needed an immediate fix and did not have time to wait for any NHibernate patch, I forked my own ProjectionList from NHibernate's code, did all those reflection calls once, and re-used the result from that point on. Now I doubt this is still an issue under NHibernate 3.x (and I think Hibernate under Java was never affected), but if you are using NHibernate 2.x and your profiling tool shows lots of CPU cycles being burnt in ProjectionList, you might want to give this approach a try.


All those advices are Hibernate-specific. For additional database tuning tips, you may want to have a look at:




More Hibernate postings:

Fetching Strategies In Hibernate

One thing always to remember is that the fetching strategies - as defined within Hibernate mapping configuration - only affect data retrieval via Session.get(), Session.load() and association navigation, plus Criteria queries, as long as they do not override fetch behavior via Criteria.setFetchMode(). HQL in contrast works in a kind of WYSIWYG-manner - whatever fetching is defined by the query will be done, not more, and not less, with mapping configuration being ignored as far as fetching is concerned.

More Hibernate postings: