Sunday, July 23, 2006

MinGW's tricky prologue code

Continuing with my ongoing test program extract_kyra.exe from the scummvm tools I've been looking at the very first call in main. It would appear that this exe was compiled with stack checking runtime code enabled. That very first call is to a short little procedure that takes a single parameter in the eax register; the number of bytes to subtract from esp. Here's a disassembly of the procedure:

push ecx
mov ecx, esp
add ecx, 8

cmp eax, 1000h
jb short loc_40534D
sub ecx, 1000h
or dword ptr [ecx], 0
sub eax, 1000h
jmp short loc_405336

sub ecx, eax
or dword ptr [ecx], 0
mov eax, esp
mov esp, ecx
mov ecx, [eax]
mov eax, [eax+4]
jmp eax

It not only subtracts the requested number of bytes from the stack pointer, it also tests the stack every 4k to ensure that a stack overflow hasn't occured. This is all very amusing but it isn't the kind of stuff we want to see in a decompilation. If you're interested in low level detail like this, a disassembler is the tool to use to discover it.

The first thing we have to do is modify Win32BinaryFile::IsStaticLinkedLibProc to return true when it is passed the address of this procedure. I decided to encapsulate the pattern which recognises this procedure in a function called IsMinGWsAllocStack which just does a memcmp on the literal bytes. If the procedure contained any relocations I'd have to do something more complicated, but thankfully this one doesn't.

Next, I need to modify Win32BinaryFile::SymbolByAddress to return an appropriate name for any procedure that matches the pattern. I chose __mingw_allocstack, but I might modify this later if a more appropriate name (like the one the mingw programmers actually used) becomes known to me.

Finally, I need to modify PentiumFrontEnd::helperFunc to recognise this name and replace the call with a single statement: *32* r28 := r28 - r24.

Here's my changes to Win32BinaryFile.cpp and pentiumfrontend.cpp.

The next two procedure calls in main look like they're part of the runtime initialisation code too. I'll recognise and remove them. In summary, I now recognise five static library procedures in MinGW exes: __mingw_allocstack, __mingw_frame_init, __mingw_frame_end, __mingw_cleanup_setup, and an inline asm implementation of malloc. As a result, I have reduced the number of user procedures to be decompiled from 70+ to 7. Here they are; still atrocious, but there's now less of them. Turning on Mike's dataflow based type analysis makes the output different and sometimes better. Seeing as Mike is actively working to make this better I've modified the GUI to use dataflow based type analysis by default. Eventually I've got to add some more options to the init phase of the workflow so you don't have to recompile to fiddle with different options.


  1. I think this function is alloca (or more likely __alloca). The parameter is the amount of data to allocate. The strange stack accesses are probing consecutive pages of the stack so the VMM commits pages. I haven't confirmed that this is correct by checking MinGW, but it looks very much like Visual C++'s alloca code.

  2. IDA Pro identifies it as __chkstk, but looking at the MinGW code it appears to have some other name. Whatever, we know what it does.

  3. What about AI and pattern training, based on standardized test programs? When the high level code of such a program is known, the compiler/linker added parts can be identified automagically, and can be remembered in a library.