Friday, August 31, 2007

Word COM Interop: Problems With Application.Selection

Most code samples out there use the Application.Selection component for creating Word documents with COM Interop. From the Word object model documentation:

The Selection object represents the area that is currently selected. When you perform an operation in the Word user interface, such as bolding text, you select, or highlight, the text and then apply the formatting. The Selection object is always present in a document. If nothing is selected, then it represents the insertion point. In addition, it can also be multiple blocks of text that are not contiguous.


Often those code samples do not even bother to set the selection beforehand (e.g. by invoking Document.Range(begin, end).Select()), as they misconceive that Application.Selection just keeps on matching while they insert content to the end of the document. Even with Range.Select() there are potantial race conditions, as we will soon see...

The problem here becomes obvious when taking a closer look at the object model: the Selection object is attached to the Application object, not to the Document object. If the application's active document changes, its selection will refer to a different document than before.

Now it still seems to be save at first sight as long as the caller operates on one document only, but not even this is the case: If the user opens a Word document from his desktop in the meantime, this is not going to fork a new Word process, but the new document will be attached to the running Word Interop process (hence the same Application). This Application has run window-less in the background so far, but now will pop up a Word window, where the user can watch in real-time how the Interop code suddenly manipulates the WRONG document, because from this moment on Application.Selection refers to the newly opened document.

How come this simple fact is not mentioned in the API documentation? I have found several newsgroup postings on that issue, people are struggling with this. So what might be a possible solution? Working with the Range API on document ranges (e.g. Document.Range(begin, end)), instead of application selections.

Previous Posts:

Thursday, August 23, 2007

COM Interop Performance

COM Interop method calls are slow. When you think about it, that does not come as a surprise. Managed parameter types need to be marshalled to unmanaged parameter types and vice versa. This hurts performance especially with plenty of calls to methods which only accomplish a tiny piece of work, like setting or getting property values. 99% percent of the time might be lost on call overhead then.

There are more issues, e.g. incompatible threading models that may lead to message posting and thread context switching, memory fragmentation due to all the parameter marshalling effort, unnecessarily long-lived COM objects which are not disposed by the caller, but by the garbage collector much, much later. So when handled improperly, systems applying COM Interop not only tend to be slow, but also degrade over time regarding performance and system resources.

Most of our COM Interop code is related to MS Office integration, assemblying word documents (content and formatting) for example, this kind of stuff. We had to do quite some tuning to kind of reach a level of acceptable runtime behavior. Limiting the number of method calls and keeping references to COM objects (respectively their COM interop wrappers) for reuse was one step towards improvement, while explicitly disposing them as soon as not needed any more was another one. And never forget to invoke Application.Quit() when done, unless no one cares for zombie Word processes sucking up system resources.

Previous Posts:
Follow-Ups:

Sunday, August 12, 2007

Handling The Bozo Invasion

Invaluable advice from Joel Spolsky in "Getting Things Done When You're Only a Grunt":

Strategy 4: Neutralize The Bozos

Even the best teams can have a bozo or two. The frustrating part about having bad programmers on your team is when their bad code breaks your good code, or good programmers have to spend time cleaning up after the bad programmers.

As a grunt, your goal is damage-minimization, a.k.a. containment. At some point, one of these geniuses will spend two weeks writing a bit of code that is so unbelievably bad that it can never work. You're tempted to spend the fifteen minutes that it takes to rewrite the thing correctly from scratch. Resist the temptation. You've got a perfect opportunity to neutralize this moron for several months. Just keep reporting bugs against their code. They will have no choice but to keep slogging away at it for months until you can't find any more bugs. Those are months in which they can't do any damage anywhere else.


I agree for the scenario described above, that is if you have to deal with one or two bozos under your coworkers. There might be other situations, though. For example bozos placed in strategic positions, so they can highly influence your work. Or company growth based on hiring legions of morons. An invasion of bozos usually is a sign that an organization is doomed anyway. The path to downfall might be long and winding, and being doomed doesn't necessarily mean going out of business - just a slow and steady decline with failed projects paving the way. This is just inevitable. It's better to leave the sinking ship earlier than later in those cases.