Failing The Turing Test: February 2011

Thursday, February 17, 2011

When DataTable.Select() Is Slow, Use DataView.FindRows()

ADO.NET's DataTable.Select() comes with a runtime complexity of O(N). This turns into a problem when the DataTable contains a lot of rows, and DataTable.Select() is invoked many times. It's pretty obvious why - just like in LinqToObjects-queries - there is no indexed lookup. DataTable.Select() needs to scan through all rows, and compare all values.

What I usually ended up doing in those cases, was to loop over the DataTable once, and fill a Dictionary using the Select's lookup-criteria as the entry's key, and the DataRow as its value, and use the Dictionary for lookups from this point on. This works fine, but it clutters the code with all those Dictionaries, which - in addition - must be synchronized manually each time changes occur in the underlying DataTable.

The DataView class offers an alternative. By defining DataView's Sort-property in accordance to the search criteria, DataView.FindRows() then allows indexed lookup with the usual binary search tree lookup runtime complexity of O(log(n)) (yes I know, Hashtable lookup might even execute at O(1), but O(log(n)) will do here). For example:

// do this just one time
DataView view = new DataView(table);
view.Sort = "name, city";

// do this N times
DataRowView[] res = view.FindRows(
    new object[] { "Earp", "Tombstone" });

I should probably mention that DataTables and DataRows actually do offer indexed searching in two special cases:

(1) When there is a primary key defined on a DataTable, DataTable.Rows.Find() (resp. DataTable.FindBy[KeyColName]() on typed DataSets) allows for indexed primary key loookup.
(2) When there is a child relation defined on a DataTable, DataRow.GetChildRows() (resp. DataRow.Get[ChildRelName]() on typed DataSets) will return the parent row's children by indexed lookup as well.

Sunday, February 06, 2011

When SQL IS Broken

I think it was Mr. Benedict in "Last Action Hero", who said:

"Eliminate all logical solutions to a problem - and the illogical becomes invariably true."

I received a distress call the other day, concerning a legacy application locking a SqlServer 2000 database, together with some profiler tracefiles.

Looking at the traces I indeed saw how certain statements were blocked for minutes, until other connections committed their transactions. Nothing extradordinary one might think, of course uncommitted data updates lead to data locks, which will block ensuing statements, that are trying to read or write the same data. Maybe those transactions were just taking too long. But here comes the catch: They did not contain any Inserts/Updates/Deletes, just Selects (the only reason for the existence of the transaction being, that a programmer some fifteen years ago had decided to activate implicit transactions). And with Isolation Level "Read Committed", and those queries lasting only a few milliseconds, there should not be any long-living locks.

Unfortunately, SqlServer 2000 tracing does not support "Blocked process report" events. This tracing event was only introduced in SqlServer 2005, and delivers in detail which SQL statements block which other statements (you can find it in the "Errors and Warnings" event category. Hint: don't forget to activate 'show advanced options', and set an appropriate 'blocked process threshold' via sp_configure()). Of course one can always have a look at the Management Studio Acitivity Monitor, but the statement information there is not so fine-grained (all it displayes is the last statement within a batch, not necessarily the one causing blocking). And I hadn't received any activity monitor data for my analysis.

So when I took a closer at the tracefile, all I could do was to scan through all of those Lock Acquired / Lock Released events. This is a really tedious task, not only because there might be hundreds of locks being acquired and released on any scope from row to table, but most of all because you only see which locks get acquired, and not which locks actually fail to get acquired (and hence lead to blocking).

The statement being blocked turned out to be a single row update. I saw a key lock and a page lock being acquired, and then things stood still. So what I did was to look for an identical statement within the trace, but one which would not be blocked. Luckily, I found one, and by closer examination, I noticed that the next lock acquisition would have been a page level lock mode upgrade from "Intent Update (IU)" to "Intent Exclusive (IE)". That's also not surprising, just standard procedure when a row is about to be updated.

The only thing that collides with such a lock upgrade would have been an existing "Shared (S)" lock on the same page (or a row within that page). But how could there be a shared lock, when no other query accessing the same data was currently running (taking in consideration Isolation Level "Read Committed")? OK, some minutes before, a query accessing the same table had been executed, and its transaction was still open (due to implicit transactions), but the statement's shared locks should be gone by now. OK, it didn't make a lot of sense to use a transaction for a read-only operation (again: Isolation Level "Read Committed", not "Repeatable Read"), and applying a clientside cursor was also kind of weird. But the cursor had been closed at this point, so it's locks should be gone. Legacy applications sometimes act weird, and we should still expect proper database locking.

I built a repro testcase, and sure enough, could reproduce it on SqlServer 2000. There were several shared locks on the query's connection, even once the query's cursor had been closed. In comparison, running the same testcase on SqlServer 2005 resulted to no dagling IS locks, hence no blocking. Well, it's anyway about time to upgrade ten year old database systems. You see, "SQL(Server 2000) WAS broken" in this case.