Wednesday, October 31, 2007

How To Trim A .NET Application's Memory Workingset

.NET developers who have monitored their application's memory consumption in Windows Taskmanager, or slightly more sophisticated performance monitors like PerfMon, might have noticed the effect when memory usage slowly rises, and hardly ever drops. Once the whole operating systems starts to run low on memory, the CLR finally seems to give back memory as well. And as some people have noted, memory usage also goes down once an application's main window is minimized.

First of all it's important to note that by default the Windows Taskmanager only shows the amount of physical memory acquired. There is another column for displaying virtual memory usage, but it's not visible originally. So when physical memory usage drops, it's not always necessarily the CLR returning memory, but probably physical memory being swapped out to disk.

So memory consumption drops at some point in time - just probably too late. Those symptoms give us a first clue that we are not dealing with memory leaks here (of course memory leaks are more unlikely to happen in managed environments than in unmanaged ones, still it's possible - e.g. static variables holding whole trees of objects that could otherwise be reclaimed, or that EventListener that should have been unregistered but wasn't). Also, whatever amount of native heap the CLR has allocated, the size of the managed heap within that native heap is a whole different story. The CLR might just have decided to keep native memory allocated even if it could be free'd after a garbage collection pass.

And this does not look like a big deal at first glance - so what if the CLR keeps some more memory than necessary, as long as it's being returned once in a while? But the thing is, the CLR's decisions on when the right moment for freeing memory has arrived (or for that matter, the OS swapping unused memory pages to disk) might not always coincide with the users' expectations. And I have also seen Citrix installations with plenty of .NET Winforms applications running in parallel, soaking up a lot more resources than necessary, hence restraining the whole system.

Some customers tend to get nervous when they watch a simple client process holding 500MB or more of memory. "Your application is leaking memory" is the first thing they will scream. And nervous programmers will profile their code, unable to find a leak, and then start invoking GC.Collect() manually - which not only doesn't help, but is a bad idea generally speaking.

Under Java there is a way to limit the default maximum heap size (the default value depends on the Java VM), which can be overridden by passing the "-Xmx" commandline parameter to the runtime. Once the limit is reached, the garbage collector will be forced to run once more, and if that doesn't help any more either, an OutOfMemoryError is thrown. This might be bad news for the Java application, but at least it will not bring down the whole system.

I don't know of a counterpart to "-Xmx" in the .NET world. Process.MaxWorkingSet property allows for limiting the physical memory a process may occupy. I have read several postings recommending this approach to keep the whole .NET memory footprint low, but I am not so sure, plus setting Process.MaxWorkingSet requires admin privileges - something that application users will not (and should not) have.

A better choice is the Win32 API function SetProcessWorkingSetSize() with two special paramater values: -1.

From MSDN:

BOOL WINAPI SetProcessWorkingSetSize(
__in HANDLE hProcess,
__in SIZE_T dwMinimumWorkingSetSize,
__in SIZE_T dwMaximumWorkingSetSize

If both dwMinimumWorkingSetSize and dwMaximumWorkingSetSize have the value (SIZE_T)-1, the function temporarily trims the working set of the specified process to zero. This essentially swaps the process out of physical RAM memory.

What SetProcessWorkingSetSize() does is to invalidate the process's memory pages. what we have achieved at this point, is that our application's physical memory usage is limited to the bare minimum. And all that unused memory - as long as it is not being accessed, it will not be reloaded into physical memory. The same is true for .NET Assemblies which have been loaded, but are not currently used.

And the good news: this does not require the user to have admin rights. By the way, SetProcessWorkingSetSize is what's being invoked when an application window is minimized, which explains the effect described above.

I should not that there might be a performance penalty associated with that approach, as it might lead to a higher number of page faults following following the invocation, in case other processes regain physical memory in the meantime.

Obviously Windows' virtual memory implementation can not always swap out unused memory as aggressively. And it's my guess that what might hinder it furthermore is the constant relocation of objects within the native heap caused by garbage collection (which means a lot of different memory pages are being accessed over time, hence hardly ever paged to disk).

A Timer can be applied for repeated invocations of SetProcessWorkingSetSize(), with a reasonable interval between two calls of maybe 15 or 30 minutes (this depends heavily on the kind of application and its workload). Another possibility is to check from time to time on the physical memory being used, and once a certain amount has been reached the call to SetProcessWorkingSetSize() will occurr. A word of warning though - I do not advocate to invoke it too often either. Also, don't set the minimum and maximum working sizes (let the CLR take care of that), just use the -1 parameter values in order to swap out memory, after all that's what we are trying to achieve.

The complete code:

static extern bool SetProcessWorkingSetSize(IntPtr handle, int minSize, int maxSize);

SetProcessWorkingSetSize(Process.GetCurrentProcess().Handle, -1, -1);

Anyway, our Citrix customers are happy again, and no one has ever screamed "Memory leak!" since we implemented that workaround.

Thursday, October 25, 2007

Five Easy Ways To Fail

Joel Spolsky describes the most common reasons for software projects to go awry in his latest articel "How Hard Could It Be? Five Easy Ways to Fail".

As kind of expected, a "mediocre team of developers" comes up as number one, and as usual Joel Spolsky describes it much more eloquently than I ever could:

#1: Start with a mediocre team of developers

Designing software is hard, and unfortunately, a lot of the people who call themselves programmers can't really do it. But even though a bad team of developers tends to be the No. 1 cause of software project failures, you'd never know it from reading official postmortems.

In all fields, from software to logistics to customer service, people are too nice to talk about their co-workers' lack of competence. You'll never hear anyone say "the team was just not smart enough or talented enough to pull this off." Why hurt their feelings?

The simple fact is that if the people on a given project team aren't very good at what they do, they're going to come into work every day and yet--behold!--the software won't get created. And don't worry too much about HR standing in your way of hiring a bunch of duds. In most cases, I assure you they will do nothing to prevent you from hiring untalented people.

I tend to question the four other reasons he mentions though (mainly estimating and scheduling issues). Don't get me wrong, he surely got his points, but I would rank other problem fields higher than that, lack of management support, amateurish requirements analysis or suffering from the NIH-syndrome among them.

Monday, October 15, 2007

Hints And Pitfalls In Database Development (Part 5): The Importance Of Choosing The Right Clustered Index

In database design, a clustered index defines the physical order in which data rows are stored on disk (Note: the most common data structure for storing rows both in memory and on disk are B-trees, so the term "page" can also be interpretated as "B-tree leaf node" in the following text, although it's not necessarily a 1:1 match - but you get the point). In most cases the default clustered index is the primary key. The trouble starts when people don't spend any further thought and stick with that setting no matter whether the primary key is a good choice for physical ordering or not...

File I/O happens at a page level, so reading a row implies that all other rows stored within the same physical disk page are read as well. Wouldn't it make sense to align those rows together which are most likely to be fetched en bloc too? This limits the number of page reads, and avoids having to switch disk tracks (which would be a costly operation).

So the secret is to choose an attribute for clustering which causes the least overhead for I/O. Those rows that are most likely going to be accessed together should reside within the same page, or at least in pages next to each other.

Usually an auto-increment primary key is a good choice for a clustered index. Rows that have been created consecutively will then be stored consecutively, which fits in case they are likely to be accessed along with each other as well. On the other hand if a row contains a date column, and data is mainly being selected based on these date values, this column might be the right option for clustering. And for child rows it's probably a good idea to choose the foreign key column referencing the parent row for the table's clustered index - a parent row's child rows can then be fetched in one pass.

I work on a project that applies unique identifiers for primary keys. This has several advantages, the client being able to create primary keys in advance among them. But unique identifier primary keys are a bad choice for a clustered index, as their values disperse more or less randomly, hence the physical order on disk will be just as random. We have experienced a many-fold performance speedup by choosing more suitable columns for clustered indexing.

Previous Posts:

Friday, October 05, 2007

Fun With WinDbg

I did some debugging on an old legacy reporting system this week, using WinDbg. The reporting engine terminated prematurely after something like 1000 printouts.

After attaching WinDbg and letting the reporter run for half an hour, a first chance exception breakpoint hit because of this memory access violation:

(aa8.a14): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=665b0006 ecx=7c80ff98 edx=00000000 esi=00000000 edi=00000000
eip=665a384f esp=0012bdc4 ebp=00000005 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
*** ERROR: Symbol file could not be found. Defaulted to export symbols for GEEI11.dll -
665a384f 668b7804 mov di,word ptr [eax+4] ds:0023:00000004=????

Trying to access address 0x0000004 ([EAX+4]), one of the reporting DLLs was obviously doing some pointer arithmetics on a NULL pointer. The previous command was a call to GEEI11!WEP+0xb47c, which happened to be the fixup for GlobalAlloc:

665a3849 ff157c445b66 call dword ptr [GEEI11!WEP+0xb47c (665b447c)]
665a384f 668b7804 mov di,word ptr [eax+4] ds:0023:00000004=????

GlobalLock takes a global memory handle, locks it, and returns a pointer to the actual memory block, or NULL in case of an error. According to the Win32 API calling conventions (stdcall), EAX is used for 32bit return values.

The reporting engine code calling into GlobalLock was too optimistic and did not test for a NULL return value.

The next question was, why would GlobalLock return NULL? Most likely because of an invalid handle passed in. Where could the parameter be found? At the ESI register - it was the one pushed onto the stack before the call to GlobalAlloc, thus must be the one and only function parameter, and it is callee-saved, so GlobalAlloc had restored it in its epilog.

665a3848 56 push esi
665a3849 ff157c445b66 call dword ptr [GEEI11!WEP+0xb47c (665b447c)]

0:000> r
eax=00000000 ebx=665b0006 ecx=7c80ff98 edx=00000000 esi=00000000 edi=00000000
eip=665a384f esp=0012bdc4 ebp=00000005 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246

As expected, ESI was 0x00000000, and GetLastError confirmed this as well:

0:000> !gle
LastErrorValue: (Win32) 0x6 (6) - Das Handle ist ung├╝ltig.
LastStatusValue: (NTSTATUS) 0xc0000034 - Der Objektname wurde nicht gefunden.

Doing some further research, I found out that the global memory handle was NULL because a prior invocation of GlobalAlloc had been unsuccessful. Again, the caller had not checked for NULL at that point. And GlobalAlloc failed because the system had run out of global memory handles, as there is an upper limit of 65535. The reporting engine leaked those handles, neglecting to call GlobalDelete() on time, and after a while (1000 reports) had run out of handles.

By the way, I could not figure out how to dump the global memory handle table in WinDbg. It seems to support all kinds of Windows handles, with the exception of global memory handles. Please drop me a line in case you know how to do that.

Now, there is no way to patch the reporting engine as it's an old third party binary, so the solution we will most likely implement is to restart the engine process after a while, so all handles are free'd by terminating the old process.