Saturday, May 27, 2006

A(nother) GUI for Boomerang

Quite a while ago I attempted to write a GUI for Boomerang. In fact, I've done this a couple of times. The stalling point has always been: what good is a GUI? Decompilers are supposed to be automatic. You should be able to give a decompiler an executable and trust it to spit out a C file that meets some definition of program equivalence with that executable. So if the decompiler works perfectly, what is there for the user to do? Surely anything they can offer will be more productively applied to the output, and for that they can just use standard source code manipulation tools. Well, there's two problems with this line of thinking. First, there's the sad fact that no decompiler is perfect. In fact, the state of the art is far, far, from adequate, let alone perfect. Secondly, standard source code manipulation tools are woefully underpowered for even the most simplest tasks of traditional reverse engineering (where traditional means "starting from source code"). For example, renaming a function or a variable is still a completely manual process for most C programmers. Java programmers might have a few more fancy tools - what they call "refactoring" tools - but for C programmers it's basically a text editor and sometimes something like Cscope.

So there certainly is a need for a GUI for a decompiler. I started by documenting the workflow of Boomerang. I figured we need to present this workflow as a key paradigm to the user, so best to get it right up front. I identified 5 key steps:
  1. Initialize
  2. Load
  3. Decode
  4. Decompile
  5. Generate Code
The sixth step, make the code maintainable, is more freeform than the rest, so I've not presented it explicitly in the workflow tab of the GUI. At each stage, widgets are shown which present information to the user so they can monitor the progress of the decompiler. The user is also given the opportunity to enter or update the information that will be used in the next step.

So far, the user can see and edit the entrypoints of program and where each section of the program is loaded into memory before proceeding to the decode step. The user is then presented with the procedures of the program as they are decoded, and any library procedures that statements have been found to call. I hope to allow the user to specify any information on unknown library procedures, and the ability to remove any unwanted procedures, before proceeding to decompilation.

At decompilation time, the user is shown a truncated call graph built as the decompilation progresses. Procedures that have been decompiled or are currently under decompilation are shown in blue type, whereas procedures that have been delayed awaiting the completion of the decompilation of their children (or as a result of recursion) are left in black type. The procedure currently under work is made the current selection of the tree and all parent nodes are expanded. Double clicking on any procedure in the tree will open a new tab containing the RTL of that procedure. I hope to allow the user to manipulate the RTL before proceeding to the next step.

The code generation stage presents the user with a tree of clusters (directories and files) to which the code of each procedure will be generated. The user can add new clusters and drag procedures from cluster to cluster. Double clicking on a file will open a new tab (or an existing one if already opened) with the contents of the file. Double clicking on a procedure will do the same but scroll the contents such that the selected procedure is at the top of the window. At the moment, the only operations available for manipulating the contents of the file are the usual text editor commands. However, I hope to offer a range of commands, such as renaming procedures and variables, that make improving the quality of the output easier.

An option presented at the initialization stage (and in a menu option) offers the user an opportunity to more fine grainly monitor the decompilation step. After enabling "decompilation debugging" a Step button is added to the status bar of the GUI. Upon the completion of the decode step the user can select which procedures they are interested in monitoring the decompilation, or they can select the default, all procedures. Proceeding to the decompilation stage, the progress will now stop and the status bar will read "decompiling proc1: before initialise" and a new tab containing the RTL of proc1 will be opened. When the user has completed inspection of the RTL of proc1 they can press Step and the decompilation will continue. For each procedure selected in the decode stage, around eight messages will be displayed in the status bar and the user will need to press Step. Each time the RTL of the procedure will be updated. This allows the user to spot bugs in the decompiler. At the moment, I'm using this to fix the bugs, but in the future, users may prefer to just fix the RTL.

I'm writing the GUI in C++ on Windows using the Qt4 toolkit. I will ensure it compiles and works on Linux before I check it into the Boomerang CVS repository. Shortly after I will be making binary packages for both Windows and Linux.

No comments:

Post a Comment