Saturday, June 24, 2006

What license is that MTA?

A Mail Transfer Agent is that bit of software that listens on port 25 and accepts mail when you send it. There's a lot of them available, but which ones are truely free?

I find that a good moral compass on questions of licensing is to look at the OpenBSD project. What they use is typically the most free you can get. So what do they use?

Sendmail, which has these license terms. They're pretty ass. Basically you can distribute it if you're "open source" in the GPL sense of the term; you have to promise to hand over source code, or if you are "freeware". So yeah, if you want to make a binary only CD of OpenBSD and include Sendmail you're going to have to promise whoever you give it to that you'll give them the source if they ask, or you can't charge them anything more than distribution costs. Seems kind of anti-OpenBSD-philosophy to me. But maybe there's nothing better out there.

What about qmail? Ask anyone and they'll tell you it's more secure than Sendmail, so why doesn't OpenBSD, the poster OS for security, use that? Well, the distribution terms for qmail are about as free as Microsoft's shared source licenses. i.e., they're not.

What about postfix? They've got a happy mouse with a mail sack logo and everyone loves them, what license have they got? Well it's a bit hard to find out. If you dig through the web page you won't find a reference to it, but if you download the source code you'll find it is under the IBM Public License. Which is not only very GPL-like it is also incompatible with the GPL. Something to do with patents.

Exim is pretty popular. It's under the GPL.

There's the Courier MTA, which most people probably thought was just an MDA, I know I did. It's also under the GPL.

The Apache project has an MTA called James which is written in Java. Ironically it's the most free. It's under the Apache License 1.1 which has an obnoxious advertising clause but doesn't require you to provide source code. If you like your Java there's also the JBoss MTA, which is under the LGPL.

So I guess if you want a BSD licensed MTA you need to go dig up an old copy of Sendmail. I was honestly surprised and thought OpenBSD would have forked Sendmail way back when it was BSD. OpenMTA? Maybe, someday.

Sunday, June 18, 2006

Order of Decompilation

Which order you decompile the procedures of a program in can determine how much information you have available to make good analysis. In Boomerang we've developed a system of depth first search of the call graph, which takes into account recursion, to have the most information about parameters available when needed. For example, if a program consists of two procedures, one called by the other, it makes the most sense to decompile the procedure which is the leaf in the call graph so that the procedure that calls it has the most information about what parameters it needs to pass to it.

What happens if the way the leaf procedure takes a parameter that is a pointer to a compound type? By the parameter's usage the decompiler may be able to determine that it is a pointer, it might even be able to determine that it is a pointer to a compound, but unless every member in the compound is accessed in a way that restricts that member's type sufficiently, the type that the decompiler can determine for the parameter is going to be very vague.

On the other hand, the value that the calling procedure is passing to the leaf procedure may be the result of a library call, or in some other way have a well defined type. As such, from a typing perspective, it would appear to make more sense to do top down processing of procedures in the call graph. However, this can only be done once a bottom up processing has been completed.

As such, I am currently looking at implementing an analysis that determines when additional type information for a called procedure that has been decompiled is discovered and marks that procedure as requiring additional processing. Hopefully no processing in the called procedure will need to be "undone".

Tuesday, June 13, 2006

Binary Release Alpha 0.3

I've added a new binary release of Boomerang for win32 to the sourceforge project. The Linux binaries are also up. If you'd like to try making a release for some other platform, please let me know.

What can the new release do? Well, it crashes less. It supports ELF .o files much better than previous releases. It includes some changes that make for better output in some instances. Overall, just general improvements.

What can't the new release do? This is still an alpha release. That means we don't expect you to be able to do very much work with it. Running it on /bin/ls will still give you horrendous output, but try /bin/arch.

Saturday, June 10, 2006

So Many Types and No (Good) Type Analysis

The type analysis in Boomerang remains a nice big mess. There have been three attempts at a type analysis system: dataflow based, constraint based and adhoc. At the moment the adhoc gives the best results, and the other two crash, a lot. Sometimes there is an absolute orgy of types in an input program, and the type analysis will assign the type void to a local. I've just added some more adhoc type analysis that will recognise when the programmer is assigning the elements of an object of unknown type to an object of known type and copy the known types for the elements to the unknown type. This is very specific but hopefully it occurs in more than just the one input program I was looking at. In C the programmer would have written something like this:

struct foo *p = someFuncThatReturnsAFoo();
p->name = global.name;
p->count = global.count;
p->pos = global.pos;
p->other = 0;

If that call is to a library proc we will have the struct foo, and know that p is a pointer to one, but for global we probably only know that is has a size of 24 bytes, if we know anything. Clearly we need to invent a type for global and clearly that type must be a compound consisting of 3 elements, so why not copy the names and types of those elements from what we are assigning to.

The question is, can a generic universal type analysis algorithm do this? Or is this simply a heuristic that works well for some programs but would produce strange and unpredictable results for others.

Friday, June 09, 2006

Conflation of Pointer and Array Types

A common source of confusion for new C programmers is the conflation of pointers and arrays that C does. I often think of the dynamic semantics of the language when I'm thinking deeply about passing arrays to functions. Typically, you can tell an experienced programmer that C always passes arrays by reference, never by value, and they won't go wrong.

Not all languages are like this, so in Boomerang we try to represent pointers and arrays as seperate non-conflated types. In our type system an array is a type used to describe those bytes in memory that contain a finite number of objects of a particular base type. Similarly, a pointer is a type used to describe those bytes in memory that contain an address, which if followed will reveal a single object of a particular base type.

As such, it is necessary to refer to somethings explicitly using the Boomerang type system that are typically implied by the C type system. For example, a C string is often written in the C type system as char *, or "a pointer to a character". This is clearly a misnomer. The only string that is accurately defined by this type is the empty string as C strings must be zero terminated, and if you're only pointing to a single character then that character must be zero. In the Boomerang type system we would write a C string as char[] *, or "a pointer to an (unspecified length) array of characters".

When I wrote Boomerang's header file parser I had the choice of which type system to use. Should I assume the signature file was using the C type system and silently translate the contents into the Boomerang type system? Or should I allow the user to specify types exactly as they would appear if we were to call Type::print on the resultant object? I tried both and found that 99% of the time the C type and the Boomerang type were the same, so the signature file expects the types to be in Boomerang format. This means that ocassionally you will see something weird in a Boomerang signature file. For example:

typedef struct {
int token;
const char* name;
OptionValueType type;
ValueUnion value;
Bool found;
} OptionInfoRec;
typedef OptionInfoRec *OptionInfoPtr;
typedef OptionInfoRec OptionInfoRecs[];

typedef OptionInfoRecs *AvailableOptionsFunc(int chipid, int bustype);

what's that function pointer type returning? It's a pointer to an array of OptionInfoRec structs. In C we'd just use OptionInfoPtr, because we assume the programmer knows that an AvailableOptionsFunc will return more than just one OptionInfoRec. In fact, an AvailableOptionsFunc is supposed to return a pointer to a null terminated array of OptionInfoRec structs, where "null terminated" means most of the members of the last OptionInfoRec are zero. It's pretty hard to define a sensible type for that, but C programmers work with types like that all the time so we have to try.

This also means that in the code generator for C we have to recognise when certain operations are not needed. For example, this assignment:

344 *32* m[local6][local1].name := "foo"

would cause code like this to be emitted:

OptionInfoRecs *local6;
int local1;
...
(*local6)[local1].name = "foo";

which is correct, but is not very pretty. We're not using the full syntactic sugar of the language. This is much better:

OptionInfoRec *local6;
int local1;
...
local6[local1].name = "foo";

and is probably how the programmer wrote it in the first place.

Thursday, June 08, 2006

Reverse Strength Reduction

Strength reduction is a compiler optimisation that tries to replace expensive operations involving a loop variant with less expensive operations. The most common example of strength reduction is the replacement of array indexing with pointer arithmetic. Consider this program fragment:

for (int i = 0; i < 25; i++)
arr[i] = 9;

where arr is an array of integers. An unoptimised compilation, in an RTL form, might look like this:

1 *32* r25 := 0
2 BRANCH 6
Highlevel: r25 >= 25
3 *32* m[a[arr] + r25 * 4] := 9
4 *32* r25 := r25 + 1
5 GOTO 2

which would actually be some very nice RTL for the decompiler to discover because it is easy to recognise that arr is an array with stride 4 and replace statement 3 with:

3 *32* a[r24] := 9

unfortunately, the compiler will have gotten to the RTL before we have, and most any compiler will do some strength reduction to get rid of that multiply in the middle of the left of statement 3. So what the decompiler will see is more like this:

1 *32* r25 := 0
2 BRANCH 6
Highlevel: r25 >= 100
3 *32* m[a[arr] + r25] := 9
4 *32* r25 := r25 + 4
5 GOTO 2

what we need to do is some analysis that will reverse the strength reduction and turn this RTL into that preferable RTL. As it turns out this isn't too hard to do for simple loops. Here's what the RTL looks like once it is in SSA form.

1 *32* r25 := 0
81 *32* r25 := phi(1, 4)
2 BRANCH 6
Highlevel: r25{81} >= 100
3 *32* m[a[arr] + r25{81}] := 9
4 *32* r25 := r25{81} + 4
5 GOTO 81

We can simply do a pass over the statements of the procedure looking for any assignment of the form x := x{p} + c where c is a constant and p is a PhiAssign that has only two elements, the assignment we're looking at and another assignment of the form x := 0. This a "high level pattern" and when matched it indicates that we can apply a "transformation". All we have to do to transform this RTL into the array indexing form is replace every occurance of r25{81}, except for the one in the increment statement, with r25{81} * c, and replace c in the increment statement with 1. Another high level pattern will recognise m[a[arr] + r25{81} * 4]and replace it with arr[r25{81}].

Wednesday, June 07, 2006

Using Qt4 and the Boehm Garbage Collector on Linux

I had an major issue where the Boehm garbage collector was crashing and spitting errors because of my use of Qt4's QThread class. The problem was simple enough, Qt4 calls pthread_create when it should be calling GC_pthread_create. I could have solved this problem by modifying qthread_private.cpp to do this, but that would mean distributing a modified Qt4, which is just silly for such a small change. So after much annoyance, I managed to come up with a solution that, although not pretty, seems to work. As such, there will be a Linux version of the GUI available to download when I make binary packages sometime in the next week.

Forcing a Signature

A while ago I added a bool to the Signature class that allowed the user to specify that the signature was already fully formed and did not need any processing. This was part of the "symbol file" hack that we used to do procedure-at-a-time decompilation using the command line tool. I noticed today that we were not honouring the forced bit anymore, for example, we were removing unused parameters and returns from the signature, so I fixed that. It occured to me that any procedure we discover via a function pointer was an excellent candidate for setting the forced bit on. The results were pretty spectacular as locals and globals were soon inheriting type information from the parameters of the signature. Unfortunately, the same could not be said of returns. In particular, consider some code like this:

mystruct *proc5()
12 { *v** r24 } := malloc(343)
13 m[r24{12} + 4] := "foo"
14 RET r24{12}

It's pretty clear that any local we create for r24 should be of type mystruct* but the first definition of r24 has us assigning it a type of void*. We could use some kind of lattice logic to determine that mystruct* is "better" than void* but the fact that the signature is forced tells us that this is no debate, r24 must have the type mystruct*. This analysis can be done in the early decompilation stage, but where to put the type? At the moment I'm considering changing the type of the return in the call statement. So line 12 will become:

12 {*mystruct** r24 } := malloc(343)

and immediately this type will flow on to turn statement 13 into something similar to:

13 m[r24{12}].name := "foo"

which is much more preferable.

Tuesday, June 06, 2006

Types, Globals and Varargs

I have a sample input program that has some code similar to this:

228 { *32* r24, *32* r28 } := CALL knownLibProc( .. arguments .. )
..
307 *32* m[r24{228}] := 232
308 *32* m[r24{228} + 4] := 91
309 *32* m[r24{228} + 8] := "some string"

where knownLibProc returns a pointer to a struct in r24. Early in the decompilation this type will be propogated into the statements in 307, 308 and 309 producing:

307 *32* m[r24{228}].size := 232
308 *32* m[r24{228}].id := 91
309 *32* m[r24{228}].name := "some string"

our intermediate representation doesn't have an operator equivalent to C's -> operator, the above is more like writing (*p).size, but the backend takes care of that and will emit a -> instead. Unfortunately I was getting an assert fault before I even get to that. The problem was that the 228 instance of r24 was being assigned a local variable, and that local was not inheriting the return type of the call. So the adhoc type analysis would take a look at an expression like m[local30].size and come to the conclusion that m[local30] has an impossible type because the type of local30 was int.

Fixing this was not as easy as it should be because adhoc type analysis is such a big mess. Investigating this bug I found that globals that are passed as arguments to library procedures were not being typed with the obviously valuable type information in the libproc's signature. I then discovered that even when they were typed with the correct type the initial values for those globals were not being calculated correctly. In the case of a struct type (called a compound type in Boomerang) we weren't calculating any initial value at all. This was obviously a terrible oversight.

Finally, I've found a problem with variable number of arguments calls. I was under the impression that the new defcollectors were effectively collecting live statements and adding them as appropriate, but apparently not. For the binary I am working on, it appears that the last argument of a vararg call is always 0, so I should be able to put a hack in to add arguments to call until I hit one with that constant.

Otherwise, the UI continues to evolve. I can now view struct information for any named type, which is particularly useful at sorting out padding issues. One day I might consider an analysis to determine padding for a struct automatically from use information, but for the moment it's easiest to just write a sample program with the original header information, calculate byte offsets to members and compare them with the bit offsets in the parsed signature information.

Sunday, June 04, 2006

Memory Leaking and Automatic Collection

I checked in a change to Makefile.in today that lets one disable the garbage collector more easily. I then tried out a few memory leak detection tools. First I tried ccmalloc. I couldn't even get this working, it just crashes, even with the sample program on the web site. Then I gave mpatrol a go. I'd heard good things about mpatrol. Unfortunately it doesn't seem to like dynamic loading of shared libraries and (for no good reason) we don't link to the loaders statically in Boomerang. So I gave up and installed valgrind. It still rocks. It not only told me how much memory we were leaking and where from, it also told me some use-before-define errors of which I wasn't aware. I checked in fixes.

Next, I had a look at Boost's shared_ptr class. I'm hoping to figure out a way to easily add something like this to replace the garbage collector. Unfortunately, the use of smart pointers is anything but easy. You'd think that you could define something like this:

typedef boost::shared_ptr<prog> ProgPtr;

and then replace every ocurrance of Prog* with ProgPtr. Of course, that doesn't work. Not only do you have the problem of no operator= being defined for a shared_ptr but you also have the problem of where to put this declaration. You can't put it after the definition of the Prog class, cause how do you forward declare ProgPtr if your Prog class includes any references to ProgPtr? So you try to put the typedef before the class definition, but then you get errors which indicate that a forward declaration of Prog is not sufficient.

In many ways, this is the greatest strength of the garbage collector, it's a drop in replacement. I'm starting to think that adding all those thousands of delete statements would be less trouble than using smart pointers. If anyone knows how you go about migrating to smart pointers without so much pain, I'd appreciate an email.

Friday, June 02, 2006

Debugging the Decompiler

One of the most useful features of the new GUI will be the ability to step through a decompilation and inspect the RTL at each step. To date I have implemented a Step button that allows the user to inspect a procedure before each major phase of the decompilation on that procedure. In the future, I intend to add more debugging points, perhaps even to the resolution of a single RTL change. I expect that some way for the user to specify the desired level of resolution will be required. Whether that is a bunch of menu options, or a spinner or even multiple Step buttons (step to next change, step to next analysis, step to next phase, step to next procedure, etc), I havn't decided.

The UI already has a course form of breakpoints. At the decoding phase you can specify which procedures you want to inspect, and the decompilation will run without stopping until it gets to one of those procedures. It would be sensible to allow the user to set a breakpoint on a particular line of the RTL and run the decompilation until that line is changed.

Mike has sent me a paper which defines the term "time travel debugging" where the user has not only the ability to step forward, but also the ability to step backwards. This is particularly useful in conjunction with breakpoints. If something has gone wrong you need only place a breakpoint on that something and then run backwards until the breakpoint fires, then you can see what broke. Implementing something similar in Boomerang is definitely possible, but it requires something like the current "memo" system to checkpoint the RTL at appropriate intervals.

Multithreaded Garbage Collection on Linux

I tried compiling the Boomerang GUI on Linux last night. After much effort getting Qt4 to compile I was hopeful that everything would just work. Unfortunately the trick I used to get the garbage collector to only collect memory allocated by the decompiler thread on WIN32 doesn't work on Linux. Apparently you're supposed to call GC_pthread_create instead of pthread_create to start new threads, well I'm not calling either, I'm getting Qt to create my thread for me. So what to do? I guess I could modify Qt to use GC_pthread_create, but that means any binaries I distribute will have to include my modified Qt. I'm going to look into ways to register a thread with the allocator directly.

Another alternative is to just stop all this garbage collector nonsense, but modifying Boomerang to delete when appropriate is just out of the question. I have seriously considered using smart pointers, possibly from Boost, but as of yet I've not made a firm decision. It would be nice to remove the libgc dependancy, especially as I have been thinking of removing the libexpat dependancy recently.

Thursday, June 01, 2006

Displaying Pretty RTL

I did some fixes to the html output option in the various print() functions of Boomerang today. This is all so I can display RTL as pretty as possible. I'm thinking that hovering the mouse over a RefExp should highlight the line that is referenced by it. That's my first goal, and then I'll work on a context menu. All this is possible because I can hide pointers in the html output which refer back to the intermediate representation. Qt4 has the advantage that good html display widgets are standard parts of the toolkit. What I don't intend to do is to allow freeform manipulation of the RTL. That would require an RTL parser, which I'm simply not in the mood to write, at least this month.