Open sourcing hopefully also means that a community will build around it, improving the overall quality of the tool.
Malware hunting biz and nautical jargon Avast has released its machine-code decompiler RetDec as open source, in the hope of arming like-minded haters of bad bytes and other technically inclined sorts with better analytical tools. image of RetDec output As discussed as the recent Botconf 2017 in France earlier this month, …
If this is real, then it's probably going to be the biggest game-changer since ..... well, forever, really. At least, since programs got bigger than a few tens of kilobytes ..... Who remembers modifying 8-bit games for things like extra lives, infinite energy or different control keys? Admittedly this could often be done "blind", without disassembling the code in full, just because there were only a limited number of instances of, say, "DECrease" instructions ..... One of them reduces your energy on collision with an enemy ..... so note them all down, change just one of the DECs to NOP, no operation, and if that crashes the program horribly then move on to the next, and keep going until you find the one that lets you blunder into things with no ill effect.)
The output from `strings` often includes things that look like variable and/or function names, so these may be recoverable -- which would improve the comprehensibility of the output "source" code. (Otherwise it would be necessary to infer them.)
With output modules for different languages, it should be possible for code written in one language to be decompiled into another language. This would mean two programmers could collaborate on a project without even having a programming language in common. (Alternatively, it could disprove the Church-Turing hypothesis, by showing that a given program would be impossible to write in some language or group of languages. Which would be an interesting result, either way.)
Hardware for which no driver exists for modern Windows versions would no longer be obsolete: drivers from a legacy OS could be forwards-ported.
There are many mission-critical, custom programs in daily use in industry for which the Source Code has been lost, and which depend on features of ancient operating systems, which in turn is preventing their users from upgrading their sofware -- and possibly even their hardware, if it uses things like ISA cards or RS232 serial ports, or relies on software timing loops thus limiting it to a certain maximum processor speed. Decompiling them as a starting point would be much less risky that attempting the from-scratch rewrite that would otherwise be needed to get them to work on a more modern OS.
If it was the default action for compiled binary attachments in e-mails to launch an IDE as opposed to executing the code, that woukd have definite positive security implications.
And then of course, there is the matter that people will be empowered to take Freedoms One and Three by force .....
@JulieM do not forget that 1) the decompiled source will have been after all the optimizations that the original compiler applied, hence it will be removed from the original programmers intent 2) it will not have any of the symbolic names that the original programmer intended and finally 3) it will not reflect the design of the original source, since all the static program constraints will have been optimized away (things like encapsulation etc.).
The tool is meant for providing a more readable form of what the program actually does, which is very useful in itself. However, I would not put collaboration between projects without appropriate language bindings in this bucket because collaboration implies a statement of intent, which is next to impossible if the design is hidden.
I was going to say something very similar, and add to it by saying that decompilers are not exactly new.
I know it's a bit clumsy, but various debuggers have decompilers built into them to turn lumps of machine code into something more readable.
I mean, dbx and gdb have been around a good long time, and I used adb, cdb and in fact the original db (on UNIX edition 6) 35+ years ago.
Whilst I would not want to decompile a complete software suite using one of these tools, investigating interesting bits of code has always been possible.
This isn't changing any games. I've been using decompilers recently, and they all fall down, and fast. While they can offer insights as to what is happening in the code, they get lost very easily and the result is garbage. Thus disassembly is the real reference.
I am glad that they released the source. Most if the decompilers need extension and tuning.
One would expect a decompiler to be supplied with its own Source Code as a matter of course. If you downloaded it as a pre-built binary, wouldn't that be the absolute first thing you would try it out on, as a sort of smoke test? And the next test would be to recompile the generated Source Code, expecting it to produce a binary that was bitwise-identical to the decompiler input.
I wonder if anyone has taken RetDec's product and run it back through a compiler and linker to see if it (a) succeeds in those steps, (b) runs, and (c) approximates the original executable's behaviour and performance reasonably? Otherwise, how do you know that what RetDec is producing is credible?
This post has been deleted by a moderator
I took that as meaning that right now, it's only capable of working with 32-bit native code instruction sets. But RetDec is designed as a modular architecture, so would only need new input modules to be written to make it compatible with other instruction sets -- including Java, Python and other bytecodes.
"like compressing an image with a lossy algorithm and then re-enlarging it"
No, because lossy compression by definition loses something so what you de-compress (the opposite of compression is not enlargement) is not what you originally compressed only something that is indistinguishable from that original within a reasonable degree of tolerance.
The better imaging analogy would be "like turning a painted image into an array of coloured pixels". You "lose" the process by which the final image was arrived at - the canvas washing, background fields of colours onto which the detail was over painted, the different brush strokes etc. but what you get is an exact representation of how the finished image *could* have been constructed.
Likewise, there is no real reason why the technology could not be used on Java bytecode or MSIL - those are just "instruction sets" for a virtual machine, rather than a physical one. The principle is exactly the same. The real reason those aren't supported is because they don't need to be - alternatives already exist (and to an extent even already built-in to the SDK's for those VM's).
Honestly, why write a technically aimed and focused article then try and dumb it down (and to do so so very badly) ?
I wouldn't underestimate the space gained by stripping the executables though.
When you've got a big executable, like the 100 DLLs of an office suite that can make quite a difference in load time, and the ability to respond to user functions.
If they are stripped the de-compile will likely fall back on some default variable naming system. IOW it will be "Pseudo C" (or whatever they modeled their HLL on) with "Fn1" creating "Var1" "Var2," "Var3" etc.
An ability to over ride those defaults can really help when you're gradually working out what the code does a lot easier. It does not change the code logically, but boy does it improve readability.
In every major software agreement I've read, decompiling is specifically forbidden or it will void the license. I wonder why ... oh wait a minute, Excel doesn't find what I was looking for because when I sorted the sheet, the pointers got screwed up ... It would sure be embarrassing for a Major Software House to have their s##t code exposed for all to see. And they actually charge money for it. Oh, that's why. Decompile at your own risk.
[Posting anonymously because I have to use Excel as it's Corporate Standard, and I don't want trouble.]
I've finally had a quick muck about with it, and it's certainly got some promise.
The output is for sure not identical to the original source, but it does successfully recompile to produce a bitwise-identical binary -- which is the really crucial bit.
On the bad side, RetDec eats RAM. Don't even think about running it on a box with only eight gigs, if you're interested in decompiling anything more complex than MS-DOS executables that will fit in 256kB. (Which is still not to disparage any such efforts. There almost certainly is at least one factory somewhere in the world, whose production line is dependent upon an ancient 8088 machine still running some long-since-abandoned software to talk to a custom hardware interface .....)
And with that, I'm off to order a bigger motherboard and some RAM .....
Biting the hand that feeds IT © 1998–2022