Sunday, May 28, 2006

More relocation mayhem

I couldn't get to sleep last night, as something about relocations was nagging at me. Finally, around 2am, it hit me. I got up and sent a long email to Mike. The problem is, we've been thinking about relocations way too literally. The ElfBinaryFile loader treats relocations as the loader of an operating system would treat them, as numbers to be added to addresses in the image. But a decompiler wants a lot more information than this. The relocations tell us something that is gold and we just ignore it. For example, suppose you have a relocation like this:

00000009 R_386_32 myarray

To a traditional loader it is saying: go look up the symbol myarray and calculate its address in memory, then go to offset 9 in the .text segment and add that address to whatever is there. But to a decompiler, what it is telling us is that we should add a global reference to myarray to the expression we generate for the address at offset 9. So say the instruction that included offset 9 was decoded as something like this:

r24 := m[r27 * 4 + 12]

then we should be processing this relocation to change the expression to something like this:

r24 := m[a[myarray] + r27 * 4 + 12]

which the decompiler can recognise as an array reference and finally produce code like:

local1 := myarray[local2 + 3]

What's funny is that Boomerang does a lot of work to figure out what a constant refers to and produce just these kinds of expressions. If we were to take better note of what the relocation information is telling us, we wouldn't need to do all this work.

But now the hard part. Boomerang is not designed to perform this kind of processing of relocations. The BinaryFile classes are completely seperate from the NMJCDecoder classes. When we want to decode an instruction we simply pass in the native address we want to decode and the delta between the native address and the host address. Even if we could modify the decoder to add relocation information to the expressions it generates (and I'm having some difficulty seeing how we can, as dis_Mem appears to hide the addresses of immediate values) we'd have to redesign this to pass in a BinaryFile object. I see a lot of work in my future.

No comments:

Post a Comment