Failing The Turing Test: 2011

Monday, December 19, 2011

While We Were Doing That...

Sunday, December 18, 2011

JAX-WS: WSDL Fetched During Service Proxy Creation

I did some webservice client programming again lately, and was surprised to learn that by default, the JAX-WS service client library fetches the WSDL from its original location again each time a service proxy is created. I am pretty sure that not everyone is aware of that, and that this leads to production system problems once WSDLs become unavailable.

The JAX-WS FAQs mention runtime binding as the reason behind this, but as the WSDL is not supposed to change, I wonder why it isn't simply mapped into a generated Java class, or added as a local resource. Anyway, this can be done manually using the JAX WS wsgen tool (commandline option "-wsdl"), adding the local WSDL file to your JAR file, and then setting the service proxy's WSDL_LOCATION constant to:

MYSERVICE_WSDL_LOCATION = 
    new URL(
        MyClass.class.getResource(""), 
        "MyWsdl.wsdl");

Wednesday, December 14, 2011

Chickens Can Do Your Job

And because chickens can do a developer's job, most interviewers don't verify a candidate's skills by asking technical questions. That might hurt the candidate's feelings after all! Some interviewers might have heard of studies stating that, from a group of people with identical formal qualification, the top-notch folks can outperform their counterparts by a factor of 10 in efficiency (that is coding productivity, code quality, correctness and performance, whatever measure you choose). More than that, even people with lower formal qualification can be more efficient up to a factor of 10, compared to the other end of the spectrum.

But does that mean most companies will be searching for top-notch developers only, and once they have found them, pay them accordingly, treat them with respect, and give them tasks that match their skills? No! Developers are developers are developers. They are interchangable, and what they don't know now, they will simply learn on-the-job. It's that easy!

Tuesday, November 22, 2011

COM Programming

Luckily, with all those ATL macros and wizards available, resp. with COM-specific annotations in case of .NET, COM development became a lot easier over the years. Still, I would recommend everyone who is programming in COM, or even just reusing an existing COM component, to at least once walk through the complete process of manually implementing and registrating a COM component (including ClassFactory and hand-written registration code), to find out what's going on under the hood. A very nice step-by-step tuorial can be found at codeguru.com.

Monday, October 03, 2011

Tips For Lightning-Fast Insert Performance On SQL Server

How to achieve lightning-fast insert performance on SQL Server:

1. Increase ADO.NET BatchSize to eliminate unnecessary network roundtrips, e.g. SqlDataAdapter.UpdateBatchSize when working with DataAdapters, or SqlBulkCopy.BatchSize when applying SqlBulkCopy.

2. Limit the number of indices on the target table to what is really essential for query performance. SQL Server Index Tuning Wizard can come in handy here.

3. Place indices on master table columns referenced by the target table's foreign keys, so that referential integrity can be checked quickly.

4. Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column (AKA "autoinc") is a good choice. If you don't have autoinc primary keys, consider introducing a new identity column just for the sake of making it your clustered index.

5. Let the client insert into a temporary heap table first (that is, a table that has no clustered index, resp. no index at all). Then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table. The "insert-into-select"-statement must contain an "order-by"-clause which guarantees ordering by clustered index.

6. Apply SqlBulkCopy.

7. Decrease transaction logging by choosing bulk-logged recovery model, resp. setting SqlServer traceflag 610.

8. If your business scenario allows for it, place a table lock before inserting. This will make any further locking unnecessary, and is especially a viable option on staging tables as described in (5). SqlBulkCopy also supports table locks via SqlBulkCopyOptions.

9. Place database datafile and logfile on two physically separated devices, e.g. on two disks or two SAN LUNs configured for different spindles.

10. Prefer server-side processsing (e.g. by means of "insert-into-select") to client-to-server-roundtrips wherever possible.

11. This is probably the fastest insert-approach I have ever heard of (taken from this sqlbi whitepaper, see final paragraph): Create a new heap table just for the current insert batch, SqlBulk-Copy data into that table, then create a suited clustered index on the table, and add the table as a new table partition to an existing partitioned table.

12. Check execution plan when inserting, and go sure it does not contain anything unexpected or dispensable that might slow down your inserts, e.g. UDF-calls during check constraint execution, heavyweight trigger code, referential integrity checks without index usage or indexed view updates.

For additional database tuning tips, you may want to have a look at:

Database Performance Tuning from Arno Huetter

Sunday, September 04, 2011

How To Make A Borderless WPF Windows Movable

Say you have a borderless WPF window (WindowStyle="None"), the problem arises, how can one still move/drag the window around on the desktop (as there simply is no more window title bar)?

It's actuelly pretty simple:

MouseDown += delegate { DragMove(); };

Thanks to Marlon for this hint.

Saturday, September 03, 2011

Is Your WPF ProgressBar Eating Up Unnecessary CPU Cycles?

When your WPF ProgressBar is causing lots of CPU usage, even if it is not visible any more, this most likely is caused by the IsIndeterminate property, which - when set to true - simply continues the animation to be running in the background.

One solution is to bind the IsIndeterminate property to the same underlying value as the Visibility property, for example something like this:

<ProgressBar IsIndeterminate="{Binding IsBusy}" 

Visibility="{Binding IsBusy, 

Converter={StaticResource VisibilityConverter}}"/>

The IsBusy property of the bound DataSource can then represent whatever condition is suited in this scenario.

Thanks to Adam for this tip!

Wednesday, July 06, 2011

Compuware Acquires DynaTrace

Congrats to the folks at dynaTrace: Compuware acquires dynaTrace.

Friday, July 01, 2011

Confessions Of An IT Manager

As I stumbled back out into the light of the car-park, I wondered, for the life of me, how the IT industry had got in such a mess that unqualified and untrained people were in responsible positions within organisations all over the country, managing the databases that are the lifeblood of the enterprise.

What would happen if the same state of affairs infected Surgery, so that there was a chance of being operated on by someone whose only qualification was that he'd cut open his teddy bear as a child, using his 'My Little Doctor' kit, but managed to bullshit his way into a job armed with a bogus CV and a string of false references.

From: "Confessions of an IT Manager"

Thursday, June 30, 2011

Free EBooks On Redgate.Com

RedGate offers several free eBooks on SqlServer and other topics (some of them only announced and yet to become downloadable) on their bookstore. I had a look at "SQL Server Execution Plans" by Grant Fritchey, which seems to be pretty good. So now we can probably forgive RedGate for turning Reflector into a commercial product, after promising the opposite two years before, right?

Thursday, February 17, 2011

When DataTable.Select() Is Slow, Use DataView.FindRows()

ADO.NET's DataTable.Select() comes with a runtime complexity of O(N). This turns into a problem when the DataTable contains a lot of rows, and DataTable.Select() is invoked many times. It's pretty obvious why - just like in LinqToObjects-queries - there is no indexed lookup. DataTable.Select() needs to scan through all rows, and compare all values.

What I usually ended up doing in those cases, was to loop over the DataTable once, and fill a Dictionary using the Select's lookup-criteria as the entry's key, and the DataRow as its value, and use the Dictionary for lookups from this point on. This works fine, but it clutters the code with all those Dictionaries, which - in addition - must be synchronized manually each time changes occur in the underlying DataTable.

The DataView class offers an alternative. By defining DataView's Sort-property in accordance to the search criteria, DataView.FindRows() then allows indexed lookup with the usual binary search tree lookup runtime complexity of O(log(n)) (yes I know, Hashtable lookup might even execute at O(1), but O(log(n)) will do here). For example:

// do this just one time
DataView view = new DataView(table);
view.Sort = "name, city";

// do this N times
DataRowView[] res = view.FindRows(
    new object[] { "Earp", "Tombstone" });

I should probably mention that DataTables and DataRows actually do offer indexed searching in two special cases:

(1) When there is a primary key defined on a DataTable, DataTable.Rows.Find() (resp. DataTable.FindBy[KeyColName]() on typed DataSets) allows for indexed primary key loookup.
(2) When there is a child relation defined on a DataTable, DataRow.GetChildRows() (resp. DataRow.Get[ChildRelName]() on typed DataSets) will return the parent row's children by indexed lookup as well.

Sunday, February 06, 2011

When SQL IS Broken

I think it was Mr. Benedict in "Last Action Hero", who said:

"Eliminate all logical solutions to a problem - and the illogical becomes invariably true."

I received a distress call the other day, concerning a legacy application locking a SqlServer 2000 database, together with some profiler tracefiles.

Looking at the traces I indeed saw how certain statements were blocked for minutes, until other connections committed their transactions. Nothing extradordinary one might think, of course uncommitted data updates lead to data locks, which will block ensuing statements, that are trying to read or write the same data. Maybe those transactions were just taking too long. But here comes the catch: They did not contain any Inserts/Updates/Deletes, just Selects (the only reason for the existence of the transaction being, that a programmer some fifteen years ago had decided to activate implicit transactions). And with Isolation Level "Read Committed", and those queries lasting only a few milliseconds, there should not be any long-living locks.

Unfortunately, SqlServer 2000 tracing does not support "Blocked process report" events. This tracing event was only introduced in SqlServer 2005, and delivers in detail which SQL statements block which other statements (you can find it in the "Errors and Warnings" event category. Hint: don't forget to activate 'show advanced options', and set an appropriate 'blocked process threshold' via sp_configure()). Of course one can always have a look at the Management Studio Acitivity Monitor, but the statement information there is not so fine-grained (all it displayes is the last statement within a batch, not necessarily the one causing blocking). And I hadn't received any activity monitor data for my analysis.

So when I took a closer at the tracefile, all I could do was to scan through all of those Lock Acquired / Lock Released events. This is a really tedious task, not only because there might be hundreds of locks being acquired and released on any scope from row to table, but most of all because you only see which locks get acquired, and not which locks actually fail to get acquired (and hence lead to blocking).

The statement being blocked turned out to be a single row update. I saw a key lock and a page lock being acquired, and then things stood still. So what I did was to look for an identical statement within the trace, but one which would not be blocked. Luckily, I found one, and by closer examination, I noticed that the next lock acquisition would have been a page level lock mode upgrade from "Intent Update (IU)" to "Intent Exclusive (IE)". That's also not surprising, just standard procedure when a row is about to be updated.

The only thing that collides with such a lock upgrade would have been an existing "Shared (S)" lock on the same page (or a row within that page). But how could there be a shared lock, when no other query accessing the same data was currently running (taking in consideration Isolation Level "Read Committed")? OK, some minutes before, a query accessing the same table had been executed, and its transaction was still open (due to implicit transactions), but the statement's shared locks should be gone by now. OK, it didn't make a lot of sense to use a transaction for a read-only operation (again: Isolation Level "Read Committed", not "Repeatable Read"), and applying a clientside cursor was also kind of weird. But the cursor had been closed at this point, so it's locks should be gone. Legacy applications sometimes act weird, and we should still expect proper database locking.

I built a repro testcase, and sure enough, could reproduce it on SqlServer 2000. There were several shared locks on the query's connection, even once the query's cursor had been closed. In comparison, running the same testcase on SqlServer 2005 resulted to no dagling IS locks, hence no blocking. Well, it's anyway about time to upgrade ten year old database systems. You see, "SQL(Server 2000) WAS broken" in this case.

Friday, January 28, 2011

SqlServer: When querying previously inserted rows within the same transaction is slow...

... this might have to do with table statistics not having been updated yet!

I was banging my head on this problem today: One and the same query took minutes when run within a transaction, and only seconds when run right after transaction commit. What's special about that case is, that the query reads from a table, that undergoes massive inserting just the statement before (insert-by-select).

As I was testing in an isolated environment without any other connections, it was pretty clear from that start, that this was not a locking issue. I suspected a query optimizer problem from the beginning, and tried "option(recompile)", rephrasing the query (it was pretty complex, three levels of subqueries and the like) and even indexed views, but to no avail.

I was about to dig into a huge profiled transaction batch execution plan (the same query's post-transaction execution plan had looked fine), when I found this posting, and scales fell from my eyes. My immediate guess: this was not an index issue (as the original poster had presumed), this was an "outdated statistics" issue! See, every index has statistics attached, and rebuilding an index automatically updates its statistics as well.

But instead of calling some heavyweight "dbcc dbreindex" as explained in the posting (index rebuild should go to a maintenance plan, but not be placed in an application runtime SQL batch!), I simply invoked "update statistics" on the table in question, right between the insert statement and the query. Voila! Query optimizer is back on track, and now executes the same query about a hundred times faster. I should note though, that "update statistics" also comes with a certain overhead (two to three seconds in our case).

I was aware that SqlServer decides on its own when to update statistics. Seems as if 50,000 new rows and a complex follow-up query isn't necessarily considered sufficient reason for a statistics update, at least not when being part of the same transaction.

Wednesday, January 26, 2011

Is "Select Distinct" a Code Smell?

From my experience, "Select distinct" indeed is a code smell (and I am not alone). Why would I fetch something, just to get rid of it again in the next step? There are situations when "distinct" makes sense, but they are rare...

In 9 out of 10 cases when I encountered a "Select distinct", the programmer tried to remove duplicate rows caused by joins along multiple too-many associations (AKA cartesian product). This not only is bad programming style, it might also kill performance.

There is always a way to rephrase such a query in order to avoid join duplicates, e.g. by replacing joins with exist-clauses, using subselects within the select-clause, derived tables in the from-clause, etc.

Sunday, January 23, 2011

Hibernate Performance Tuning Tips

Top 10 Hibernate Performance Tuning Recommendations:

(1) Avoid join duplicates (AKA cartesian products) due to joins along two or more parallel to-many associations; use Exists-subqueries, multiple queries or fetch="subselect" (see (2)) instead - whatever is most appropriate in the specific situation. Join duplicates are already pretty bad in plain SQL, but things get even worse when they occur within Hibernate, because of unnecessary mapping workload and child collections containing duplicates.

(2) Define lazy loading as the preferred association loading strategy, and consider applying fetch="batch-size" or "subselect" rather than "select". Configure eager loading only for special associations, but join-fetch selectively on a per-query basis.

(3) In case of read-only services with huge query resultsets, use projections and fetch into flat DTOs (e.g. via AliasToBean-ResultTransformer), instead of loading thousands of mapped objects into the Session.

(4) Take advantage of StatelessSessions, HQL Bulk Update and Delete statements, as well as Insert-By-Select (supported by HQL as well).

(5) Set FlushMode to "Never" on Queries and Criteria, when flushing is not necessary at this point.

(6) Set ReadOnly to "true" on Queries and Criteria, when objects returned will never be modified.

(7) Consider clearing the whole Session after flushing, or evict on a per-object basis, once objects are not longer needed.

(8) Define a suitable value for jdbc.batch_size (resp. adonet.batch_size under NHibernate).

(9) Use Hibernate Query-Cache and Second Level Caching where appropriate (but go sure you are aware of the consequences).

(10) Set hibernate.show_sql to "false" and ensure that Hibernate logging is running at the lowest possible loglevel (also check log4j/log4net root logger configuration).

Special Addendum:

NHibernate's 2.x ProjectionList implementation added considerable runtime-overhead, namely by invoking some reflection API calls for every resultset row. This might slow down queries with large resultsets.

As those reflection API calls were invariant in regard to the resultset rows being iterated over (as a matter of fact, what these calls actually did was to gather resultset column schema information), and I needed an immediate fix and did not have time to wait for any NHibernate patch, I forked my own ProjectionList from NHibernate's code, did all those reflection calls once, and re-used the result from that point on. Now I doubt this is still an issue under NHibernate 3.x (and I think Hibernate under Java was never affected), but if you are using NHibernate 2.x and your profiling tool shows lots of CPU cycles being burnt in ProjectionList, you might want to give this approach a try.

All those advices are Hibernate-specific. For additional database tuning tips, you may want to have a look at:

Database Performance Tuning from Arno Huetter

Fetching Strategies In Hibernate

One thing always to remember is that the fetching strategies - as defined within Hibernate mapping configuration - only affect data retrieval via Session.get(), Session.load() and association navigation, plus Criteria queries, as long as they do not override fetch behavior via Criteria.setFetchMode(). HQL in contrast works in a kind of WYSIWYG-manner - whatever fetching is defined by the query will be done, not more, and not less, with mapping configuration being ignored as far as fetching is concerned.

More Hibernate postings: