Monday, May 29, 2006

Woe of the Instruction Decoder

Boomerang uses the NJMC toolkit to decode instructions. These are the files in frontend/machine/pentium (or sparc or ppc or whatever you're interested in). We chose to use this technology because it ment we didn't have to write code to implement a new architecture, we could just write a "specification" file. Unfortunately, the NJMC toolkit is slowly rotting. It is hard to build. I've never built it. Mike has built it a couple of times (and failed a lot more times). Every architecture is different and no-one maintains it. We also have some issues with the code it generates. It produces huge cpp files which struggle to compile on some build targets and make the resulting binary much bigger than it could be.

So how much work is it to replace? I considered writing a new tool that would take the same input files as the NJMC toolkit and generate the same output, but that only solves half the problems. Then I came to wonder, what's wrong with just using a simple disassembler and modifying it to return a DecodeResult. There's even some BSD code available for a number of architectures that we could base it on. We could modify PentiumFrontend::decodeInstruction to call both the NJMC toolkit generated code and the new code. If the results are different, we output a warning to the LOG file and use the result from the original code. Once all the warnings have been fixed we can retire that original code.

In the mean time, Mike and I have managed to put together a binary package for x86 linux users that they can use instead of going through the pain of trying to build the toolkit.

No comments:

Post a Comment