Posts

Showing posts from July, 2006

And then, some calls just don't return

A small number of library procedures are "called" but never actually return. Eventually I'd like to have a way to specify these procedures with anotations in the signature files, but for the moment they are hard coded in the frontend. That's acceptable for the moment as there is only five: _exit, exit, ExitProcess, abort and _assert . Thing is, what happens when you have a branch over one of these calls, as you often do. Typically you end up with phi statements that refer to the call or the statements before it because there's no way to show that these definitions are killed by the call but not defined by it. We could add this to the SSA form but a simpler solution is available. Whenever the decompiler determines that the destination of a call is one of these no return procedures then we simply drop the out edge of the containing basic block. Without an out edge the definitions cannot flow beyond the call. Using dataflow based type analysis and some of m

MinGW's tricky prologue code

Continuing with my ongoing test program extract_kyra.exe from the scummvm tools I've been looking at the very first call in main . It would appear that this exe was compiled with stack checking runtime code enabled. That very first call is to a short little procedure that takes a single parameter in the eax register; the number of bytes to subtract from esp . Here's a disassembly of the procedure: push ecx mov ecx, esp add ecx, 8 loc_405336: cmp eax, 1000h jb short loc_40534D sub ecx, 1000h or dword ptr [ecx], 0 sub eax, 1000h jmp short loc_405336 loc_40534D: sub ecx, eax or dword ptr [ecx], 0 mov eax, esp mov esp, ecx mov ecx, [eax] mov eax, [eax+4] jmp eax It not only s

Chronicle of a Decompilation

Image
I've been promising to do this for a while, so here goes. I have downloaded a small open source utility called extract_kyra which is part of the scummvm tools . It doesn't really matter what the utility does. In fact, I prefer not to know as it gives me an unfair advantage compared to, say, decompiling malware. What's important is that this tool is under the GPL, so it is permisible for me to decompile it. I will be writing a number of these posts that describe all the issues I've run into and the steps I've completed to produce the decompilation. I promise that I will not go and download the source code for this program until such time as I believe my decompilation is "complete". The first problem with decompiling this program is that Boomerang has failed to find main() . I can tell this because the load phase of the workflow does not list main as a found entrypoint, it only lists start . This is because this exe was compiled with MinGW and it

Bi-directional Dataflow Type Inference

After reading this paper I've starting implementing a brand new type analysis in Boomerang. The first step is calculating reverse dominance frontiers. A regular dominance frontier is the set of nodes that have a predecessor that is dominated by the given node, but is not itself dominated. These are the nodes where phi statements are to be placed when transforming a program into SSA form. The reverse dominance frontier is much the same, except it is the successors that must be post -dominated. Boomerang already calculates immediate post-dominators for use by the code generation phase, but we've never before had a use for reverse dominator frontiers. The paper describes an extension to SSA form called static single information form (SSI) which introduces a new kind of assignment: the sigma function statement which are placed on the reverse dominator frontiers. The purpose of this new statement is to split definitions before uses on different code paths. I will be using

Short Circuit Analysis

I've checked in some code that detects branches to branches and merges them if they meet some basic requirements. As such, Boomerang can now generate code like: if (a < b && b < c) { // x } else { // y } instead of generating a goto statement. I've checked in a couple of test programs that exercise this analysis. I havn't looked at how this analysis effects loops or more obscure control structures.. so there could well be bugs here.

Unfinished Assortments

I received an email of inspiration from Mike a week ago outlining how similar conjunctions and disjunctions in short circuited if statements are at the binary level. After looking at the problem myself I found it was a pretty simple problem. If one of the two out edges of a branch node is another branch node which contains only a branch statement, and the destination of that statement is shared with the first branch statement then you can merge the condition of the second branch into the condition of the first branch. Exactly what combination of fall/taken edges are shared by the two branches is what determines whether an || or a && should be used. This is a pretty easy transformation to do just before code generation (or just after decompilation) and I'm about half way through implementing it. Unfortunately I got sidetracked. My work, you know, the people who pay me, they have me doing - wait for it - Java development. I moaned about not just being a one trick pony