Sunday, April 29, 2007

Hints And Pitfalls In Database Development (Part 3): Database Programming Requires More Than SQL Knowledge

First of all, "knowing SQL" is a rather broad term. SQL basics can be taught within a day, but gaining real in-depth SQL know-how, including all kind of database-specific extensions, might take years of practice.

Then there are always plenty of ways of how to build a solution in SQL, and only a small subset of those are really good ones. It is important to be aware of the implications of certain SQL constructs - which kind of actions the database engine has to undertake in order to fulfill a given task.

Let me provide a little checklist - those are things every database programmer should know about in my opinion:

  • ANSI SQL and vendor-specific additions (syntax, functions, procedural extensions, etc). What can be done in SQL, and when should it be done like that.
  • Database basics (ACID, transactions, locking, indexing, stored procedures, triggers, and so on).
  • Database design (normalization plus a dose of pragmatism, referential integrity, indices, table constraints, stuff like that).
  • Internal functioning (for instance B-trees, transaction logs, temp databases, caching, statistics, execution plans, prepared statements, file structure and requirements for physical media, clustering, mirroring, and so on).
  • How do certain tasks impact I/O, memory and CPU usage.
  • Query optimizer: what it can do, and what it can't do.
  • Error handling, security (for example how to avoid SQL injection, ...).
  • Database tools: profiling, index tuning, maintenance plans (e.g. backup and reindexing), server monitoring.
  • Interpretation of execution plans.

Previous Posts:


Thursday, April 05, 2007

Hints And Pitfalls In Database Development (Part 2): Let The Database Enforce Data Constraints

Letting the database enforce data constraints might sound obvious, but as a matter of fact I had to listen to some people just a few months ago who advocated checking for referential integrity within business code instead of letting the database do so, because "foreign keys are slow" - I mean geez, what kind of nonsense is that?! I don't know, they probably forgot to define their primary keys as well, hence didn't have any index for fast lookup (*sarcasm*).

No other entity is better suited to safeguard data integrity than the database. It's closest to the data, it knows most about the data, and it's the single point every chunk of data has to pass through before being persisted.

And I am not only talking about primary keys and foreign key constraints. I put all kinds of checks on my databases, for example unique compound indices whenever I know that a certain combination of column values can only occur once. Or table constraints that enforce certain rules on the data which are being checked by the database on each insert and update. Triggers are another possibility, but care must be taken - they might be slower than table constraints.

Setting up constraints usually is quite easy, easier than implementing the same rules in business code. Of course performance matters, and it is important to balance costs and benefits. I do not recommend to code complex SQL-statements within constraint checks - they should be limited to simple logical expressions.

During application development, the database will scream immediately when data integrity rules are being violated. In the unlikely case that such a programming mistake slips through into production code, the customer will be confronted with some error message, but his data is unendangered, any you will find out about it immediately and can provide a fix at no time - not months or years later, when data corruption would have occurred, which then might have been impossible to repair. This has saved us on several occasions.

Plus you never know if other clients are going to be manipulating data in the future, clients that might not be aware of the existence of those data rules. Data normally lives longer than application code.

Previous Posts: