Tuesday, June 06, 2006

Types, Globals and Varargs

I have a sample input program that has some code similar to this:

228 { *32* r24, *32* r28 } := CALL knownLibProc( .. arguments .. )
307 *32* m[r24{228}] := 232
308 *32* m[r24{228} + 4] := 91
309 *32* m[r24{228} + 8] := "some string"

where knownLibProc returns a pointer to a struct in r24. Early in the decompilation this type will be propogated into the statements in 307, 308 and 309 producing:

307 *32* m[r24{228}].size := 232
308 *32* m[r24{228}].id := 91
309 *32* m[r24{228}].name := "some string"

our intermediate representation doesn't have an operator equivalent to C's -> operator, the above is more like writing (*p).size, but the backend takes care of that and will emit a -> instead. Unfortunately I was getting an assert fault before I even get to that. The problem was that the 228 instance of r24 was being assigned a local variable, and that local was not inheriting the return type of the call. So the adhoc type analysis would take a look at an expression like m[local30].size and come to the conclusion that m[local30] has an impossible type because the type of local30 was int.

Fixing this was not as easy as it should be because adhoc type analysis is such a big mess. Investigating this bug I found that globals that are passed as arguments to library procedures were not being typed with the obviously valuable type information in the libproc's signature. I then discovered that even when they were typed with the correct type the initial values for those globals were not being calculated correctly. In the case of a struct type (called a compound type in Boomerang) we weren't calculating any initial value at all. This was obviously a terrible oversight.

Finally, I've found a problem with variable number of arguments calls. I was under the impression that the new defcollectors were effectively collecting live statements and adding them as appropriate, but apparently not. For the binary I am working on, it appears that the last argument of a vararg call is always 0, so I should be able to put a hack in to add arguments to call until I hit one with that constant.

Otherwise, the UI continues to evolve. I can now view struct information for any named type, which is particularly useful at sorting out padding issues. One day I might consider an analysis to determine padding for a struct automatically from use information, but for the moment it's easiest to just write a sample program with the original header information, calculate byte offsets to members and compare them with the bit offsets in the parsed signature information.

No comments:

Post a Comment