back to article Merry Xmas, fellow code nerds: Avast open-sources decompiler

Malware hunting biz and nautical jargon Avast has released its machine-code decompiler RetDec as open source, in the hope of arming like-minded haters of bad bytes and other technically inclined sorts with better analytical tools. image of RetDec output As discussed as the recent Botconf 2017 in France earlier this month, …

  1. Bronek Kozicki
    Thumb Up

    Nice!

    Open sourcing hopefully also means that a community will build around it, improving the overall quality of the tool.

    1. Anonymous Coward
      Anonymous Coward

      Re: Nice!

      That will be Avast's goal. Instead of having to pay their own engineers to maintain and improve it people in the community will do it for them for free.

      Cheaper and better than outsourcing the maintenance to India.

    2. JulieM

      Re: Nice!

      Well, if they hadn't Open Sourced their decompiler, somebody else would have .....

  2. Doctor Evil

    Awesome work!

    Jakub Kroustek and the Threat Intelligence team at Avast -- you guys rock! Thanks!

  3. Mephistro
    Thumb Up

    Another thumbs up for Avast!

    Nuff said.

  4. cutterman

    Good for them! I'll have a poke around with it and report back.

  5. Duncan Macdonald

    Intel ME

    How long until someone produces a full decompile of Intel's secret ME code (for the embedded god mode CPU in current x86 chips).

    1. Degenerate Scumbag

      Re: Intel ME

      Hopefully some enterprising nerd is on the case. We already know it's based on Minix 3, so a comparison of the decompiled code with the MINIX 3 source could tell us quite a lot about Intel's customisations and how it all works.

    2. g00se
      Thumb Up

      Re: Intel ME

      Hopefully soon. And some EFI implementations too. There can't be that many at large i wouldn't have thought

  6. Anonymous Coward
    Anonymous Coward

    I am aghast at avast

  7. JulieM

    This is game-changing stuff

    If this is real, then it's probably going to be the biggest game-changer since ..... well, forever, really. At least, since programs got bigger than a few tens of kilobytes ..... Who remembers modifying 8-bit games for things like extra lives, infinite energy or different control keys? Admittedly this could often be done "blind", without disassembling the code in full, just because there were only a limited number of instances of, say, "DECrease" instructions ..... One of them reduces your energy on collision with an enemy ..... so note them all down, change just one of the DECs to NOP, no operation, and if that crashes the program horribly then move on to the next, and keep going until you find the one that lets you blunder into things with no ill effect.)

    The output from `strings` often includes things that look like variable and/or function names, so these may be recoverable -- which would improve the comprehensibility of the output "source" code. (Otherwise it would be necessary to infer them.)

    With output modules for different languages, it should be possible for code written in one language to be decompiled into another language. This would mean two programmers could collaborate on a project without even having a programming language in common. (Alternatively, it could disprove the Church-Turing hypothesis, by showing that a given program would be impossible to write in some language or group of languages. Which would be an interesting result, either way.)

    Hardware for which no driver exists for modern Windows versions would no longer be obsolete: drivers from a legacy OS could be forwards-ported.

    There are many mission-critical, custom programs in daily use in industry for which the Source Code has been lost, and which depend on features of ancient operating systems, which in turn is preventing their users from upgrading their sofware -- and possibly even their hardware, if it uses things like ISA cards or RS232 serial ports, or relies on software timing loops thus limiting it to a certain maximum processor speed. Decompiling them as a starting point would be much less risky that attempting the from-scratch rewrite that would otherwise be needed to get them to work on a more modern OS.

    If it was the default action for compiled binary attachments in e-mails to launch an IDE as opposed to executing the code, that woukd have definite positive security implications.

    And then of course, there is the matter that people will be empowered to take Freedoms One and Three by force .....

    1. Bronek Kozicki

      Re: This is game-changing stuff

      @JulieM do not forget that 1) the decompiled source will have been after all the optimizations that the original compiler applied, hence it will be removed from the original programmers intent 2) it will not have any of the symbolic names that the original programmer intended and finally 3) it will not reflect the design of the original source, since all the static program constraints will have been optimized away (things like encapsulation etc.).

      The tool is meant for providing a more readable form of what the program actually does, which is very useful in itself. However, I would not put collaboration between projects without appropriate language bindings in this bucket because collaboration implies a statement of intent, which is next to impossible if the design is hidden.

      1. Peter Gathercole Silver badge

        Re: This is game-changing stuff

        I was going to say something very similar, and add to it by saying that decompilers are not exactly new.

        I know it's a bit clumsy, but various debuggers have decompilers built into them to turn lumps of machine code into something more readable.

        I mean, dbx and gdb have been around a good long time, and I used adb, cdb and in fact the original db (on UNIX edition 6) 35+ years ago.

        Whilst I would not want to decompile a complete software suite using one of these tools, investigating interesting bits of code has always been possible.

    2. Anonymous Coward
      Anonymous Coward

      Re: This is game-changing stuff

      Don't see why it wouldn't be real. It wouldn't be the first commerical decompiler either: https://www.hex-rays.com/products/decompiler/index.shtml First open source though, from what I know.

    3. Brian Miller

      Re: This is game-changing stuff

      This isn't changing any games. I've been using decompilers recently, and they all fall down, and fast. While they can offer insights as to what is happening in the code, they get lost very easily and the result is garbage. Thus disassembly is the real reference.

      I am glad that they released the source. Most if the decompilers need extension and tuning.

      1. JulieM

        Re: This is game-changing stuff

        One would expect a decompiler to be supplied with its own Source Code as a matter of course. If you downloaded it as a pre-built binary, wouldn't that be the absolute first thing you would try it out on, as a sort of smoke test? And the next test would be to recompile the generated Source Code, expecting it to produce a binary that was bitwise-identical to the decompiler input.

  8. Anonymous Coward
    Big Brother

    Doh!

    Useful to find the slumping code allegedly in Kaspersky

  9. Anonymous Coward
    Anonymous Coward

    reversing the reversal

    I wonder if anyone has taken RetDec's product and run it back through a compiler and linker to see if it (a) succeeds in those steps, (b) runs, and (c) approximates the original executable's behaviour and performance reasonably? Otherwise, how do you know that what RetDec is producing is credible?

  10. This post has been deleted by a moderator

    1. JulieM

      Re: 32 bit only?

      I took that as meaning that right now, it's only capable of working with 32-bit native code instruction sets. But RetDec is designed as a modular architecture, so would only need new input modules to be written to make it compatible with other instruction sets -- including Java, Python and other bytecodes.

    2. prinox

      Re: 32 bit only?

      Using intrinsics in the original code also doesn't help, it doesn't know anything about instructions operating on YMM registers

    3. Anonymous Coward
      Anonymous Coward

      Re: 32 bit only?

      I believe there is a banch that targets x84-64, as in:

      https://github.com/avast-tl/retdec/blob/x86_64-enabled/CHANGELOG.md

      Hope it helps!

  11. Deltics
    Boffin

    "like compressing an image with a lossy algorithm and then re-enlarging it"

    No, because lossy compression by definition loses something so what you de-compress (the opposite of compression is not enlargement) is not what you originally compressed only something that is indistinguishable from that original within a reasonable degree of tolerance.

    The better imaging analogy would be "like turning a painted image into an array of coloured pixels". You "lose" the process by which the final image was arrived at - the canvas washing, background fields of colours onto which the detail was over painted, the different brush strokes etc. but what you get is an exact representation of how the finished image *could* have been constructed.

    Likewise, there is no real reason why the technology could not be used on Java bytecode or MSIL - those are just "instruction sets" for a virtual machine, rather than a physical one. The principle is exactly the same. The real reason those aren't supported is because they don't need to be - alternatives already exist (and to an extent even already built-in to the SDK's for those VM's).

    Honestly, why write a technically aimed and focused article then try and dumb it down (and to do so so very badly) ?

    1. Anonymous Coward
      Anonymous Coward

      No. You do loose rather useful stuff like comments and if the code is stripped, meaningful variable and function names without which it is so much more challenging to work out what the program is doing.

      1. JulieM
        Boffin

        Build Time Options

        It's possible to leave the variable and function names in a compiled program, for the benefit of debugging tools; or to remove them, for slightly smaller overall size and slightly faster execution.

        `strings` is your friend ;)

        1. John Smith 19 Gold badge
          Unhappy

          "It's possible to leave the..names in a compiled program, for the benefit of debugging tools;"

          True.

          I wouldn't underestimate the space gained by stripping the executables though.

          When you've got a big executable, like the 100 DLLs of an office suite that can make quite a difference in load time, and the ability to respond to user functions.

          If they are stripped the de-compile will likely fall back on some default variable naming system. IOW it will be "Pseudo C" (or whatever they modeled their HLL on) with "Fn1" creating "Var1" "Var2," "Var3" etc.

          An ability to over ride those defaults can really help when you're gradually working out what the code does a lot easier. It does not change the code logically, but boy does it improve readability.

      2. foxyshadis

        If you're at the point where you need a decompiler, it's because you have no access to the source and you never will, so all talk about how much better having the original source would be are absolutely meaningless.

        1. Bronek Kozicki

          I do not agree with the "meaningless" portion, even though you are correct on the first part. The problem is that some people will not realize it.

        2. Badvok

          "If you're at the point where you need a decompiler, it's because you have no access to the source and you never will"

          Says someone who has obviously never found a bug in a compiler.

  12. Anonymous Coward
    Anonymous Coward

    Decompiling specifically voids license (wonder why)

    In every major software agreement I've read, decompiling is specifically forbidden or it will void the license. I wonder why ... oh wait a minute, Excel doesn't find what I was looking for because when I sorted the sheet, the pointers got screwed up ... It would sure be embarrassing for a Major Software House to have their s##t code exposed for all to see. And they actually charge money for it. Oh, that's why. Decompile at your own risk.

    [Posting anonymously because I have to use Excel as it's Corporate Standard, and I don't want trouble.]

  13. John Styles

    I wonder how well it does at working out C++ class hierarchies, spotting vtables, turning a bunch of functions back into methods etc. etc.

  14. JulieM

    RetDec Redux

    I've finally had a quick muck about with it, and it's certainly got some promise.

    The output is for sure not identical to the original source, but it does successfully recompile to produce a bitwise-identical binary -- which is the really crucial bit.

    On the bad side, RetDec eats RAM. Don't even think about running it on a box with only eight gigs, if you're interested in decompiling anything more complex than MS-DOS executables that will fit in 256kB. (Which is still not to disparage any such efforts. There almost certainly is at least one factory somewhere in the world, whose production line is dependent upon an ancient 8088 machine still running some long-since-abandoned software to talk to a custom hardware interface .....)

    And with that, I'm off to order a bigger motherboard and some RAM .....

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022