Saturday, January 01, 2005

Mediocre Programming And The Lack Of A Well-Defined Test Plan

The German federal labor office is facing a software-"MCA" ("maximum credible accident") those days. Their new unemployment benefit software solution - a software project which received enormous attention lately, mainly due its heavily criticized time schedule (IBM had backed out, doubting its realizability) and questionable implementation - seems to fail miserably on its bank account formatting function: Leading zeros are being appended instead of prepended (e.g. account number "123456" is being formatted as "12345600" instead of "00123456"), which results in erroneous transfers.

Hundreds of thousands of unemployed are not receiving their benefits. The responsible federal agency had already declared victory as things had finally seemed to work - but disaster stroke only two days later.

Hard to imagine how this is possible - this functionality must be at the center of every test plan. The coding error itself is horrible, no doubt (never let mediocre programmers touch your crown jewels), but human failure is as certain as anything in software development. The testing process has to ensure that such errors will never slip into production code.

Somehow this reminds of an incident that happened to me years ago. We had let a college intern write some amount formatting function (this was in the days of Java 1.0, way long before java.text.NumberFormat). The code was clumsy and hard to read, but it seemed to work (and we had some time pressure, as always) - so we gave green light to merge it into our production code branch. Some days before installation, I had another look at it (there was just this unpleasant feeling about it, that wouldn't leave me alone), and from some more detailed static analysis I noticed that it would fail on a certain value range: the leading sign was reversed in those cases (from plus to minus, and vice versa). Our test data accidentally didn't contain values within that range. Now that would have been a "maximum credible accident" as well. Luckily we were just on time to fix it. Our test data was adjusted accordingly the very same day.