Tuesday, April 24, 2007

FreeSynd version 0.2 released

The never ending effort to recreate a game from 1993 continues. There's now nine developers in the project and none of us do much. Every now and then I get bored and fix something.. go to check it in and discover that, shock, someone else has checked something in too! Then I have to update and resolve conflicts. Take the good with the bad I guess.

Anyway, in release 0.2 you will be happy to discover that we have done some work to get sound faithfully reproduced. For a while, we were focusing on music and trying to get the SDL midi synthesiser to play back this ancient XMIDI format. Believe it or not, this was actually acheived. Slight problem though, Syndicate used custom instruments. This was not uncommon at the time. The result was that the music just didn't sound good.. but it was good enough to ignore for a while.

Having declared that I wasn't going to touch sound.. I got bored one day and changed my mind. I told myself that I really ment I wasn't going to touch music so fiddling with the sound effects was fine. Turns out the sound effects were stored in the data files in much the same way as the sprites are, so it wasn't difficult to get it going. Pretty soon I had the whooshy menu sounds going.. so then I moved onto the weapons and other in-game sounds.

After completing that I figured, what the hell, might as well look at music. Turns out that SDL_Mixer uses a library called timidity to convert MIDI to wav output.. I actually thought it used whatever real MIDI hardware you had.. but apparently not. Anyway, SDL_Mixer will read a file called timidity.cfg in the current directory, in which you can specify custom instruments. You can download packs of these instruments, along with a suitable config file, from various places on the web. At first I figured this was the solution to the custom instrument problem.

Unfortunately, Syndicate predates all this stuff. Sure, MIDI was around when Syndicate came out. Sure, XMIDI is a MIDI format. But your average home PC that Syndicate ran on didn't have MIDI hardware.. it had an adlib card, or a sound blaster.. so that's what Bullfrog made their music for. Some people owned Amigas, and I'm sure Syndicate for the Amiga used the Amiga MIDI hardware to play beautiful sound.. but on a PC of the day, the best you could hope for was that the user had a Gravis Ultra Sound card.. and that's was so rare that Bullfrog never bothered to release GUS support for Syndicate.

Which is a shame, cause it turns out that the instrument format used by timidity is none other than the GUS instrument patch format. So, if Bullfrog had supported the GUS when they released Syndicate, we'd have faithful XMIDI playback right now. But they didn't.

Of course, if anyone out there knows how to make GUS patches from a sample.opl file, please, send me an email.

Instead, what I did was take some advice that someone gave me a while ago: just press CTRL-F6 in dosbox and save a WAV of the original game music. I then edited the wav in Audacity, exported it as an mp3, dumped it in the FreeSynd directory and told SDL_Mixer to play it. I did this for the intro and the great mood music for actual gameplay.

Of course, that gave me the problem that the intro music (and sound effects) were not synchronized with the FLI playback. So I tried fiddling with the frames-per-second of the FLI.. that gave me about half a solution.. but the real problem was that there are actual pauses in the intro.. there's a big table of them. They mainly correspond with those captions you see down the bottom of the screen (eg "CITY NAME: NEW HESSEN EUROPE") which I also wasn't doing. Once I started displaying the captions and doing the pauses, the mp3 started to line up.

So there ya go.. another release of FreeSynd. Right on track to being feature complete by 2013.

Wednesday, April 11, 2007

Boomerang 2

I was recently asked what I would do differently if I was writing a decompiler from scratch today. Having worked on decompilers for the past 9 years myself and worked with others who have worked on decompilers for longer still, I like to think we now know enough about it to recognise some of the mistakes we've made and identify some areas of improvement.

The primary difficulty of decompilation is that most analysis requires interprocedural information. Most everyone recognises this need early on so there's an assumption that you need to keep the intermediate representation for the entire program in memory at the same time. For many years we've had a hell of a lot of memory to play with (4 gig will be standard on desktop PCs next year) and with 64 bit architectures and virtual memory you can swap to disk for as much memory as you need. So it would appear that structuring a decompiler like a compiler, with intermediate disk formats for each compilation module is unnecessary. I think this is a mistake.

Having intermediate disk forms is more than just a solution to the memory issue. For a start, there's the ability to start and stop the analysis. In compilation, this is used to great effect when making incremental changes to source files; the files that are not changed do not need to be recompiled to their intermediate disk forms. It's a rare use case where one incrementally changes an input binary to a decompiler, (this does happen though, for example, when doing incremental comparisons of different versions of a program), but a decompilation takes a long time and a single bug can crash the decompiler resulting in hours of lost work. Interruptability alone is justification for intermediate disk forms, but there's another reason.

Because decompilation is still experimental engineering, inspection of intermediate forms is important. The Boomerang decompiler had a number of ways in which we could inspect the intermediate representation. We could stop the decompilation early, print out call graphs, dataflow graphs and register transfer language (RTL). A number of attempts were made at producing a GUI.. the primary purpose being to provide feedback on the progress of the decompilation and to allow inspection at the different stages. Tools to convert intermediate disk forms (which should be stored as efficiently as practical) into a form that is presentable to an engineer are much better seperated from the GUI, which now becomes primarily for feedback of progress.

This suggests another benefit of intermediate disk forms: no longer is it necessary to have a single tool, "the decompiler", to work on the intermediate representation. The decompiler can be seperated into individual tools which do a specific job and do it well. So what kind of tools do we need?

Firstly, a tool which uses a loader library to put the executable image into memory, and an instruction decoder library to turn individual instructions into intemediate representation, would use the variety of techniques that we've developed over the years to find all the code in an executable and, using another library, output it in the intermediate disk forms.

The inspection tools can be used at this point to confirm that all the code has been found and is loaded correctly. Because the intermediate representation is on disk, in a collection of files, the engineer can make a backup of these files and the decoding stage not need be repeated should some later analysis mess up the decompilation.

Following the decoding stage, a lot of analysis and data gathering occurs. Converting the intermediate representation to SSA form. Performing use/def analysis, liveness analysis, parameter analysis, etc. All this information could go into seperate files.. that way the previous intermediate forms can be kept around to do other analysis on, and an invalid analysis result can be recalculated without affecting other analysis phases.

After collecting all this information, the process of decompilation is essentially just transforming the intermediate representation from one form to another. As such, the workhorse of the decompiler is the transformation tool. It should load up the appropriate parts of the intermediate representation, read a script written in a domain specific language and then write back out the intermediate representation.

Often a transformation will require the incremental update of an analysis. I believe this can be acheived using a simple timestamp system on the intermediate disk forms. Yes, what I'm saying is that something similar to a Makefile can be used to drive the entire decompilation process.

The last tool, of course, is the code generation tool. For each designed language we need a tool that will take the intermediate representation and produce "source code" files.

That's enough about architecture. Already I've described a decompiler system which is very different to anything that has been built todate. If you're thinking about writing a decompiler, I hope you'll consider something similar to this.

Another thing I like to think I've learnt over the years of working on decompilers is that it sucks having to come up with an intermediate representation all by yourself. In fact, coming up with an intermediate representation for compilers could be a life time obsession all on its own. There's a lot of schools of thought on the subject. Obviously, something like the tuples of simple compiler is not a lot of good for decompiler. One needs to at least represent all the statements of a high level language as well as the individual instructions you might see in an assembly listing. There's a lot of work in it.. so why not take some advice from people who think about this kind of stuff?

The LLVM Compiler Infrastructure Project has a well thought out "virtual instruction set" which allows such complicated operands that it can express high level languages to such a degree that a simple PERL script can turn a C program into LLVM instructions (not that I recommend that). Perhaps more importantly, it has a mature, efficient on-disk representation and a number of useful analysis phases that work on it. There's an active community where you can report bugs, get help and recruit interested parties. I think you could do worse.