Failing The Turing Test: March 2007

Monday, March 19, 2007

Hints And Pitfalls In Database Development (Part 1): Missing Indices

I have been involved in a lot of database tuning lately. This has partly to do with the fact that some of our databases simply outgrew the originally anticipated data dimensions. Furthermore I sometimes have to deal with application modules from "external contributors".

So I am trying to sum up some of my experiences on database development over the years in a little mini-series. Most of those things should be quite obvious to many developers, others might be piece of news for one or the other. I will fall back into some SqlServer terminology at times, but many principles should apply to any major database.

So let me start with the number one reason for bad database performance:

Missing Indices

It's just amazing how creating indices tends to be forgotten again and again. I don't know, it might have to do with the fact that on 4th dimension platforms like Microsoft Access, where a lot of folks come from, they did not have to worry too much about indexing. Databases were kind of smaller there, and the database schema wizard most likely made any foreign key an index by default anyway.

I often hear the argument that "indices are going to slow down our updates". Well I don't know what is worse, slowing down updates by a factor of 0.1, or slowing down queries a factor of 100. The numbers vary of course. And it is true, indices that will not be applied by the query optimizer hence won't lead to any performance gain should not be introduced in the first place. The best way to find out is to have a look at the query execution plans, and see if the indices are being actually being used.

So in general, foreign keys and other query criteria fields are candidates for indices, with the exception of attributes with small value ranges like booleans or a tiny set of numbers, large varchar columns and tables with a small number of rows.

But sometimes even "golden rules" of when to create or not to create indices can turn out to be wrong. One of those rules is to avoid indices on columns with sparse value distribution, like a status attribute with 5 different values. But I experienced a case when the query optimizer made completely wrong assumptions about row counts for one of those status values. I added an index on this column, which implicitly induced statistic maintenance, and that in turn helped the optimizer to make the right decisions.

That being said, it is undoubtedly important to avoid excessive index creation, as this can lead to performance penalties during inserts, updates and deletes.

In addition, composite indices can lead to major speedups as well, when applied to tuples of columns which are typically jointly queried (note: leftmost index columns must all appear in a query for the index to be applied).

The good news is: you are not alone. There are plenty of tools which help finding a good indexing strategy. I start up SqlServer's Database Engine Tuning Advisor (formerly known as Index Tuning Wizard) quite frequently, and in addition I have several schema check scripts at hand which - among other things - look for missing indices, e.g. on foreign key columns.

And: Indices must be rebuilt frequently. This serves two purposes: (1) The index nodes then have a fill factor for optimal balance between read and write performance, and (2) Index rebuilding updates index statistics as well. Outdated statistics can be a major cause for slow query performance.

Follow-Ups:

Tuesday, March 13, 2007

Automatic CompSci Research Paper Generator

SCIgen is a program that generates random Computer Science research papers, including graphs, figures, and citations. While the content is completely gibberish, this did not stop SCIgen's creators from submitting to and presenting the generator's work at conferences. Conferences with such low submission standards that they accepted the papers.

Those folks did not even bother to take a look at the auto-generated slides before giving their talk, they just went along - absolutely priceless!

Monday, March 05, 2007

Windows For Warships 3.11

News of the day: British Royal Navy T45 destroyers to be equipped with Windows 2000.

From the text:

Windows platforms may be troublesome to maintain, but most civilian sysadmins simply wouldn't believe the resources the navy can throw at problems. A present-day Type 42 destroyer carries at least four people who have absolutely nothing else to do but care for the ship's command system. As of just a few years ago, this was still a pair of antique 24-bit, 1MHz machines each with about 25KB of RAM.

There may also be perfectly valid criticisms to be made regarding Windows usability. When triggering missile decoys with seconds to spare, one doesn't need a superfluous pop-up box saying "Do you want to use soft kill?" with "Confirm" and "Cancel" buttons. But this kind of thing isn't going to faze you if you're used to entering instruction sets such as "PE L5414.10N L00335.67E R6000 TMDA [INJECT]" from memory without backspace or delete. During combat, mind. The one group of users to whom Windows 2000 might look pretty marvellous are RN warfare operators.

Friday, March 02, 2007

Why Great Coders Get Paid Far Too Little

Kevin Barnes picks up a similar topic I was talking about the other day - the question why great coders get paid far too little. Very insightful posting, just as some statements from the comments section. Let me quote:

Also, the industry keeps complaining about a labor shortage, while employees complain about a job shortage. I finally got to the bottom of this. Industry is finding a shortage of good people, whereas crappy people can’t get jobs. Universities, with dropping enrollment in engineering programs, are lowering standards so that more people can train to be engineers. Poor engineers enter the job market, can't get jobs, and when they do, they lower the overall pay range. As a result, good people see low wages and a job shortage, and go into other fields, increasing the problem.

Great coders do not fit in typical organizational hierarchies. Because the hierarchy rarely recognizes the value produced by higher quality developers until they are gone.

I've been in this market for over 20 years. What I noticed is that in 90% of the cases, the manager has no clue who is good and who's mediocre. Mostly, he criteria is based on how each one advertise himself.

The difference between programmers, doctors and lawyers is that if a doctor or a lawyer is mediocre you will notice it pretty soon. With the programmers, only time will show the truth.