Sunday, June 18, 2006

Order of Decompilation

Which order you decompile the procedures of a program in can determine how much information you have available to make good analysis. In Boomerang we've developed a system of depth first search of the call graph, which takes into account recursion, to have the most information about parameters available when needed. For example, if a program consists of two procedures, one called by the other, it makes the most sense to decompile the procedure which is the leaf in the call graph so that the procedure that calls it has the most information about what parameters it needs to pass to it.

What happens if the way the leaf procedure takes a parameter that is a pointer to a compound type? By the parameter's usage the decompiler may be able to determine that it is a pointer, it might even be able to determine that it is a pointer to a compound, but unless every member in the compound is accessed in a way that restricts that member's type sufficiently, the type that the decompiler can determine for the parameter is going to be very vague.

On the other hand, the value that the calling procedure is passing to the leaf procedure may be the result of a library call, or in some other way have a well defined type. As such, from a typing perspective, it would appear to make more sense to do top down processing of procedures in the call graph. However, this can only be done once a bottom up processing has been completed.

As such, I am currently looking at implementing an analysis that determines when additional type information for a called procedure that has been decompiled is discovered and marks that procedure as requiring additional processing. Hopefully no processing in the called procedure will need to be "undone".

No comments:

Post a Comment