Failing The Turing Test: June 2007

Friday, June 29, 2007

More Performance Tuning

I was occupied with some performance tuning this week on your .NET client/server solution, here are some of the conclusions I have drawn.

One of the bottlenecks turned out to be caused by some missing database indices. This again reminds me that I have hardly experienced a case when there were too many indices, but plenty of times when there were too few. Foreign keys AND where-criteria attributes are primary candidates for indices, and I'd rather have a real good reason when leaving out an index on those columns. I know some folks will disagree, and point out the performance penalty for maintaining indices, and that database size will grow. While I agree those implications exist, they are negligible in comparison to queries that run a hundred times faster with an index at the right place.

Also, the reasoning that relatively small tables don't need indices at all simply does not catch up with reality. While an index seek might not be much faster than a scan on a small number of rows, missing an index on a column that is being searched for also implies that there are no statistics attached to this column (unless statistics are manually created - and guess how many developers will do so), hence the query optimizer might be wrong about that data distribution which can lead to ill-performing query plans. Also, the optimizer might make different decisions because it considers the fact that there is no index on certain columns.

There are several ways to find out about missing indices: Running the Tuning Advisor, checking execution plans for anomalies, and of course some excellent tool scripts from SqlServerCentral and from Microsoft.

A query construct that turns out problematic at times under SqlServer looks something like

where (table.attribute = @param or @param is null)

While these expressions help to keep SQL statements simple when applying optional parameters, they can confuse the query optimizer. Think about it - with @param holding a value, the optimizer has a restraining criteria at hand, but with @param being null it doesn't. When the optimizer re-uses an execution plan previously compiled for the same statement, trouble is around the corner. This is specifically true for stored procedures. Invoking stored procedures "with recompile" usually solves this issue. But this may happen within client-generated SQL-code as well. Options then are forcing an explicit execution plan for the query (tedious, plus if you still have to support SqlServer 2000 you are out of luck), using join hints, or finally re-phrasing the query (in one case we moved some criteria from the where- to the join-clause, which led the optimizer back to the right path again - also additional criterias might help).

By the way I can only emphasize the importance of mass data tests and of database profiling. I don't want my customers to find out about my performance problems, and I want to know what is being shoveled hence and forth my database at any time.

I was also investigating another performance problem that appeared when reloading a certain set of records with plenty of child data. Convinced of just having to profile for the slowest SQL, I was surprised that they all performed well. In reality, the time got lost within the UI, with too much refreshing going on (some over-eagerly implemented toolbar wrapping code which re-instantiated a complete Infragistics Toolbar Manager being the reason - external code, not ours).

I enjoy runtime tuning. A clear goal, followed through all obstacles, and at the end watching the same application running faster multiple times - those are the best moments for a developer.

Saturday, June 23, 2007

Pure Ignorance Or The Not-Invented-Here Syndrome?

I just finished listening to this .NET Rocks episode with guest Jeff Atwood from Coding Horror fame. I can really identify with a lot of what he says and what he propagates on his blog. One topic was brought up by Richard Campbell I think, when he mentioned that nothing else hurts him more than watching a 60K or 80K developer working on problems for months or years (not to mention whether successfully or not), when there are existing solutions available either free in terms of open source projects, or as commerical products priced at some hundred bucks. Jeff wholeheartedly agreed.

They hit the nail on the head. I don't know whether it's pure ignorance or just the infamous Not-Invented-Here Syndrome, but this just seems to happen again and again. For my part, I blame decision makers approving budgets for such projects just as much as developers who try to solve problems that have been solved a hundred times better a hundred times before.

Here are some examples:

Why not implement my own Database Diff tool? See RedGate SQL Compare and SQL data Compare.
Why not implement my own O/R mapper? See Hibernate.
Why not implement my own XML parser? See Apache Xerces.
Why not implement my own XML binding? See JAXB.
Why not implement my own HTTP protocol? See package sun.net.www.http or Jakarta Commons HttpClient.
Why not implement my own Java Enterprise Framework? See Spring Framework.

Friday, June 22, 2007

Bidding For An Atari Pong Videogame

Please don't tell anybody, but I am currently bidding for an Atari Pong videogame (the so called Sears edition) on EBay.

Thursday, June 21, 2007

COM Registration In .NET

Just a short follow-up to my posting about Hosting MS Office inside .NET WinForms applications: In case there is the need to install ActiveX controls like DSOFramer or other COM components, but no Installer that could do so (e.g. because of XCopy deployment), this sample code shows how to accomplish on-the-fly COM registration in .NET.

Previous Posts:

Hosting MS Office Inside .NET Windows Forms Applications

Follow-Ups:

Wednesday, June 20, 2007

EBay'ed An IBM 5150

After weeks of trying I finally managed to buy an IBM 5150 on EBay (yes, that's the first IBM PC). EUR 100 plus EUR 70 for shipping - but hey, it's worth the price.

I found original IBM DOS 2.0 disks as well, which complete the whole purchase (IBM DOS 1.1 or 1.0 would have been even better, but they are also more rare).

Friday, June 15, 2007

In Defense Of Data Transfer Objects

In the J2EE world, Data Transfer Objects have been branded as bad design or even as an anti-pattern by several Enterprise Java luminiaries. Their bad image also stems back to the time when DTOs were a must at EJB Container boundaries, as many developers ended up using Session EJBs at the middle tier which received/passed DTOs from/to the client, tediously mapping them to Entity Beans internally.

Then came Hibernate, which alongside its excellent O/R-mapping capabilities allowed for passing entities through all tiers, attached in case of a Open-Session-In-View scenario, detached in case of a loosely coupled service layer. So no need of DTOs and that cumbersome mapping approach anymore, this seems to be the widely accepted opinion.

And it's true, DTOs might be a bad idea in many, maybe even in most of the cases. E.g. it doesn't not make a lot of sense to map Hibernate POJOs to DTOs when POJO and DTO classes would just look the same.

But what if internal and external domain models would differ? One probably does not want to propagate certain internal attributes to the client, because they only matter inside the business layer. Or some attributes just must be sliced off for certain services, because they are of no interest in their context.

What if a system had been designed with a physical separation between web and middle tier (e.g. due to security and scalability reasons)? An EJB container hosting stateless session beans is still first class citizens in this case. Other services might be published as webservices. It's problematic to transfer Hibernate-specific classes over RMI/IIOP or SOAP. Even if it's possible (as it is the case under RMI/IIOP) this necessarily makes the client Hibernate-aware.

While it is true that Hibernate (and as well EJB3 resp. the Java Persistence API) are based on lightweight POJOs for entities, Hibernate has to inject it's own collection classes (PersistentSet) and CGLib-generated entity subclasses. That's logical due the nature of O/R-mapping, but having these classes being transferred over service layer boundaries is not a great thing to happen. And there are more little pitfalls, for example state management on detached entities - how can an object be marked for deletion when it is detached from it's Hibernate session?

Sorry, but I have to stand up and defend DTOs for these scenarios. Don't get me wrong, I appreciate Hibernate a lot and use it thoroughly within the middle tier, but I also don't see the problem of mapping Hibernate POJOs to DTOs on external service boundaries, especially when done in a non-invasive way. No mapping code has to pollute the business logic, no hardwiring is necessary, it can all be achieved by applying mapping frameworks like Dozer, using predefined mapping configurations. What goes over the wire at runtime is exactly the same as declared at compiletime, a clear service contract, no obsolete entity attributes, no object trees without knowing where the boundaries are, and no surprising LazyInitializationExceptions on the client.

Monday, June 04, 2007

Learning From Experience (Introduction)

Last weekend I digged out an old university project of mine - Visual Chat, a graphical Java chat system that I built for a class assignment exactly a decade ago. I was allowed to open-source Visual Chat back then, and I still receive questions from people today who are working on extending the application. Back in 1997 I was clearly still an apprentice, as this was one of my first Java resp. OOP projects at all. When I look at the sourcecode today, I remember what I learned in the process of developing it, but I also see I was still missing a lot of experience.

Learning not only in theory but also from real projects is particularly important in software development, and this is something that - in my opinion - is done far too little during education. Designing a system at a larger scale is a completely different task than solving isolated algorithmic problems (which is what you normally get to do for homework).

Class assignments that simulate real projects are great because one has the freedom to make mistakes and learn from them, much more than in professional life. Around 500.000 people have signed up at the Visual Chat test installation site so far, so I could learn a lot from monitoring what was going on at runtime. The worst thing that might have happened was that people would not like my chat program and would switch to another one (as I am sure, many did). There was no danger of financial losses (or worse) for any user in case of an application mistake. I am glad I had this opportunity - it helped me to build better products once the chips were down at work, e.g. when developing banking or hospital information systems.

Receiving a solid training is important in our profession, but only as long as it is accompanied by applying what one has learned in classroom. Half-baked knowledge is dangerous, and it often takes two or three tries to make it right. Only a small fraction of people is able to do a perfect job from the beginning (and even a genius like Linus Torvalds openly admits that he also wrote some ugly code once upon a time).

So I decided to do two things. I am going to provide a little series of what I learned back then, and since then (looking at the code today), hoping that novices reading those articles can benefit by taking this path as a short-cut instead of walking through the same of experience on their own (which would be far more time-consuming). And I will do some refactoring on the old code, so everyone who decided to continue development work on Visual Chat will be able to take advantage of that as well.

Here is my topic list at this time of writing:

1. OOP Design - On lose coupling, the sense of interfaces and the concept of Singletons
2. Multithreading on the server - Gaining stability and performance through queuing
3. Multithreading on the client - The secrets of the AWT EventDispatch thread, EventQueues and repainting
4. How not to serialize a java.awt.Image
5. What are asynchronous Sockets (and why are they not supported in JDK1.1)?
6. Conclusions