Types, Globals and Varargs

I have a sample input program that has some code similar to this:

228 { *32* r24, *32* r28 } := CALL knownLibProc( .. arguments .. )
..
307 *32* m[r24{228}] := 232
308 *32* m[r24{228} + 4] := 91
309 *32* m[r24{228} + 8] := "some string"

where knownLibProc returns a pointer to a struct in r24. Early in the decompilation this type will be propogated into the statements in 307, 308 and 309 producing:

307 *32* m[r24{228}].size := 232
308 *32* m[r24{228}].id := 91
309 *32* m[r24{228}].name := "some string"

our intermediate representation doesn't have an operator equivalent to C's -> operator, the above is more like writing (*p).size, but the backend takes care of that and will emit a -> instead. Unfortunately I was getting an assert fault before I even get to that. The problem was that the 228 instance of r24 was being assigned a local variable, and that local was not inheriting the return type of the call. So the adhoc type analysis would take a look at an expression like m[local30].size and come to the conclusion that m[local30] has an impossible type because the type of local30 was int.

Fixing this was not as easy as it should be because adhoc type analysis is such a big mess. Investigating this bug I found that globals that are passed as arguments to library procedures were not being typed with the obviously valuable type information in the libproc's signature. I then discovered that even when they were typed with the correct type the initial values for those globals were not being calculated correctly. In the case of a struct type (called a compound type in Boomerang) we weren't calculating any initial value at all. This was obviously a terrible oversight.

Finally, I've found a problem with variable number of arguments calls. I was under the impression that the new defcollectors were effectively collecting live statements and adding them as appropriate, but apparently not. For the binary I am working on, it appears that the last argument of a vararg call is always 0, so I should be able to put a hack in to add arguments to call until I hit one with that constant.

Otherwise, the UI continues to evolve. I can now view struct information for any named type, which is particularly useful at sorting out padding issues. One day I might consider an analysis to determine padding for a struct automatically from use information, but for the moment it's easiest to just write a sample program with the original header information, calculate byte offsets to members and compare them with the bit offsets in the parsed signature information.

Comments

Popular posts from this blog

Disabling OS-X Device Removal Warnings In Yosemite

Living Inside An Asteroid

Rebirth Of The Spaceship