In this case we easily managed to improve performance by a factor of up to 30000. Workloads that took seconds or even minutes before now finish within some milliseconds. Here is what I did:
- Applying a standard XML parser instead of byte-wise self-implemented parsing.
- Buffering read data in byte chunks that grow dynamically by powers of two instead of concatenating single bytes to an MFC CString (the later implied constant reallocation of CString's internal buffer).
- One library was written in Java. The bytecode had been obfuscated. This included string constants, which were used inside the innermost-loop of the XML-parsing algorithm (these strings actually contained XML element names). Each time one of those strings was referenced at runtime, the inverse obfuscation algorithm was invoked on it - in order to "decrypt" the string to its original content. The performance impact was devastating - esp. as the "decryption"-algorithm was not really what I would consider lightweighted.
- Removing a memory violation, that the original vendor circumvented by delivering a debug version of one library. The debug version always protects the method-stack by some trailing bytes.