More GUI Work and Relocations
Today I got a lot of work done on the GUI. I can now edit signature files in a tab at decode time and the corresponding signatures and parameters are shown in the Library Procedures table. For the rare times where a decompilation actually makes it to code generation without something going wrong, I can now open the output file and edit it in a tab. I even gave my main window a title.
On the topic of relocations/symbol information. I can now load a linux .o file and get absolutely no addresses in my RTL. This is because I take a relocation at memory location x as an absolute guarentee that memory location x contains an address. I look up the address in the symbol map and replace the constant that would ordinarily be produced with an a[global] expression. One surprise I had on my test binary was that string constants are not assigned a symbol. I expected at least a "no name" symbol. As such, I speculatively test the memory at the given address and, if a suitable string constant is found, I replace the address constant with the string constant.
When you consider that I'm doing all this at decode time, it leaves the decompile stage to focus on the hard problems of recognising parameters, etc. I've yet to decide if an STT_FUNC symbol implies that the procedure conforms to a known ABI. This is clearly true for some binaries but may not be true for all. Perhaps a user specified option is the way to go there. Then I could recognise parameters at decode time.
Another interesting source of information is the STT_FILE symbols. This is definitely of interest as it tells us how the output of the decompiler should be clustered. "Clustering" is a term used for any grouping of code or data. One could say that OO programming is a clustering philosophy. Boomerang has supported emitting procedures to a tree of files for some time. Although all globals currently go into the "main" output file (typically named program.c). Of course, this will be of more interest to me when I actually have output worthy of clustering.
On the topic of relocations/symbol information. I can now load a linux .o file and get absolutely no addresses in my RTL. This is because I take a relocation at memory location x as an absolute guarentee that memory location x contains an address. I look up the address in the symbol map and replace the constant that would ordinarily be produced with an a[global] expression. One surprise I had on my test binary was that string constants are not assigned a symbol. I expected at least a "no name" symbol. As such, I speculatively test the memory at the given address and, if a suitable string constant is found, I replace the address constant with the string constant.
When you consider that I'm doing all this at decode time, it leaves the decompile stage to focus on the hard problems of recognising parameters, etc. I've yet to decide if an STT_FUNC symbol implies that the procedure conforms to a known ABI. This is clearly true for some binaries but may not be true for all. Perhaps a user specified option is the way to go there. Then I could recognise parameters at decode time.
Another interesting source of information is the STT_FILE symbols. This is definitely of interest as it tells us how the output of the decompiler should be clustered. "Clustering" is a term used for any grouping of code or data. One could say that OO programming is a clustering philosophy. Boomerang has supported emitting procedures to a tree of files for some time. Although all globals currently go into the "main" output file (typically named program.c). Of course, this will be of more interest to me when I actually have output worthy of clustering.
Comments
Post a Comment