Saturday, May 27, 2006

Overlapping registers

The x86 instruction set is an ugly mess. Often with a desire to make things more flexible, people make things harder to understand. In the case of instruction sets, this makes a decompiler's job more difficult. Consider the following x86 asm code:

mov eax, 302
mov al, 5
mov ebx, eax


What value is in ebx? It makes it easier if we write 302 as 12Eh. Then we can easily say that ebx contains 105h, that is, 261. In boomerang, the decoder would turn those three instructions into this RTL:

*32* r24 := 302
*8* r8 := 5
*32* r27 := r24


This is clearly wrong. As the information that r8 overlaps with the bottom 8 bits of r24 is completely absent. This is more correct:

*32* r24 := 302
*16* r0 := truncu(32, 16, r24)
*8* r12 := r24@15:8
*8* r8 := truncu(32, 8, r24)
*8* r8 := 5
*32* r24 := r24@31:8 | zfill(8, 32, r8)
*16* r0 := truncu(32, 16, r24)
*32* r27 := r24


But just look at the explosion in the number of statements. I havn't even included statements to define bx, bh, and bl, which should go after the assignment to ebx. Boomerang currently contains code to add these statements, but because of the explosion of statements it is typically disabled. The users of Boomerang would rather have wrong RTL than have unreadable RTL.

Can we improve on this? I am currently writing some code that will search the procedure for any use of the 16 bit and 8 bit overlapping registers. If it finds no use of a particular register it will not output statements to define that register. So, for the code above, the RTL would only be:

*32* r24 := 302
*8* r8 := 5
*32* r24 := r24@31:8 | zfill(8, 32, r8)
*32* r27 := r24


Which is both readable at decode time and very early into the decompilation stage reduce to:

*32* r24 := 302
*8* r8 := 5
*32* r24 := 261
*32* r27 := 261


Dead code elimination will then remove the first statement. The second statement will go away when unused statements are removed because inevitably r24 will become an output of the procedure and the decompiler will not add r8 as an output as r24 already covers it.

No comments:

Post a Comment