back to article Forget anonymity, we can remember you wholesale with machine intel, hackers warned

Anonymous programmers, from malware writers to copyright infringers and those baiting governments with censorship-foiling software, may all be unveiled using stylistic programming traits which survive into the compiled binaries – regardless of common obfuscation methods. Youtube Video The work, titled De-anonymizing …

Page:

  1. Ken Moorhouse Silver badge

    Programming style evolves over time

    I could draw (paint?) analogies between programming and being a painter (for example). Art historians can identify painters by characteristics of a painting, including how a painter's style evolves over a period of time. There are problems with attribution though with students of a painter adopting similar characteristics as their tutor, and the input that students have of helping the master with the incidentals of a work of art (e.g., Gainsborough getting his assistants to do the landscape background while he concentrated on the portrait). Then there are new techniques: new types of paint and canvas (Hockney moving from conventional canvas to photographic collage and then to tablet being a good example) which necessitate a change in style - analogous to a new or updated programming language installed on a different pc or with a different targetted platform.

    As mentioned earlier Stack Overflow copy and paste is an example of how things change in the programmer's world. A piece of coding that is homogenously constructed, sporadically interspersed with anachronistic styles where sites such as Stack Overflow have been dipped into for inspiration. Then future works by the same programmer where those code snippets are bedded-in to the coder's customary style.

  2. Version 1.0 Silver badge

    Hungry for results?

    "we can de-anonymize them from optimized executable binaries with 64 per cent accuracy."

    That's slightly better than I can do if I flip a coin - let's look at this from a different angle - there are 10 hamburgers in front of you, 3 or 4 four them have botulism ... are you hungry?

    1. Anonymous Coward
      Anonymous Coward

      Re: Hungry for results?

      If I have a suspect pool of 20 or 30 programmers then identifying the author with 68% accuracy would very useful - in a sinister way. I'm sure you could use the technique to rank the authors in order from most likely to least and then investigate further from the top. A lot more efficient than simply investigating everyone.

      As the presenter points out in the video, governments have used similar techniques to identify and prosecute programmers that contributed to "illegal" websites.

      1. fajensen
        Big Brother

        Re: Hungry for results?

        A lot more efficient than simply investigating everyone.

        Yess - and with Machine Learning it is pretty damn hard to work out how the machine actually reached it's conclusions (it's a research subject), which makes it all the more easier to fudge the results to narc out "the right people" and get away with it too. Especially if we are not exactly talking legal proceedings but are more in the territory of "no flight lists" and "signature strikes"*.

        Remember, If a computer says something is true, it is!

        *) Or maybe not, plod is as dumb as a sack of broken hammers when IT is involved.

    2. Hargrove

      Re: Hungry for results?

      That's slightly better than I can do if I flip a coin

      Splendid comment. It is obvious and only common sense, but common sense is in vanishingly short supply these days.

    3. SoaG

      Re: Hungry for results?

      You've got a 100-sided coin that lands the same way 64% of the time?

      1. Version 1.0 Silver badge

        Re: Hungry for results?

        "You've got a 100-sided coin that lands the same way 64% of the time?"

        OK - let's put it another way - of the 100 people investigated, and charged with writing the infringing application, 34 of them will be completely innocent and the chances are not good that 32 of the others had anything to do with the application either.

        1. Michael Wojcik Silver badge

          Re: Hungry for results?

          OK - let's put it another way - of the 100 people investigated, and charged with writing the infringing application, 34 of them will be completely innocent and the chances are not good that 32 of the others had anything to do with the application either.

          I cannot for the life of me figure out what scenario you are describing, but it doesn't appear to be at all related to anything described in the paper.

          First, they're talking about single authorship, so of the hypothetical "100 people investigated" (by, apparently, the world's least-competent police force), only zero or one would be guilty, and at least 99 would be innocent.

          Second, let's assume the 0.64 accuracy rate does extend to some pool of 100 candidates that the model has been trained on, and the single guilty party is among them. The classifier is presented with input and indicates candidate A is the closest match. Disregarding all other factors, for some reason, the investigators interview candidate A. There's a 0.64 chance they have the guilty party, and a 0.36 chance they don't. So what? It's a place to start. Picking a starting interviewee at random has only a 0.01 chance of being correct, so they've improved their odds significantly.

          Third, the hypothetical suggestion that someone might make stupid decisions based on weak evidence doesn't negate the importance of that evidence. A Perfect Bayesian Reasoner already knew it was weak, and treated it as such. Any other process for accounting for that evidence is inferior, but that's not the fault of the evidence. Nor does that suggestion vacate the importance of the mechanism used to extract that evidence, or of the research that led to the mechanism.

          We see in posts like yours a typical Reg commentator fallacy: if there's any objection that can be raised to research, then that research is useless. It's tiresome, sophomoric anti-intellectualism.

    4. Michael Wojcik Silver badge

      Re: Hungry for results?

      That's slightly better than I can do if I flip a coin

      It's significantly better, and that's only with two alternatives. If they're identifying among a pool of 3 candidates with 64% accuracy, then they're almost twice as accurate as your coin. And so on.

      And for this paper, their pool of candidates was 20 programmers.

      But thanks for playing.

  3. Anonymous Coward
    Anonymous Coward

    indents/no indents

    I'd much rather have well commented code than indents and my editor of choice can soon indent code for me if I need it too. Personally I use both but then I had to support someone else's code from a very early age who never put a single comment to their code, never mind followed change control procedures (which were very minimal), did very limited testing, but management though the world of them, it was also the age of short variable names which didn't help, overall I was very lucky in the enviornment I worked in then as it taught me many lessons which I used throughout my career to improve procedures, fault solve and debug. My comments are for me as much as anyone else as I don't expect to instanlt remember n years later why I did something in a particular way which could often be due to a bug in the compiler or os at that time.

  4. Whitter
    Joke

    Bug count

    No bugs? My code.

    What do you mean there's no bugless code?

    Must have got lost in the process loop... erm...

    1. Sir Runcible Spoon
      Joke

      Re: Bug count

      10 PRINT "hello world"

      20 GOTO hell

  5. Anonymous Coward
    Anonymous Coward

    Awesome

    So all hackers work alone on their code then?

    Im glad they've worked out that.

    Nobody copy and pastes from stack overflow or reuses open source code then?

    Right guys lets go home, sounds like they've got this nailed!

    1. Hargrove

      Re: Awesome

      Related to this comment, nobody uses hacking tools that do the coding for you either?

      If this article represents the state of the art 'mongst the white hats. the black hats have it made.

      1. fajensen

        Re: Awesome

        The black hats have a trillion dollar black budget courtesy of the tax payers - they are already made!

    2. Michael Wojcik Silver badge

      Re: Awesome

      You know what you might enjoy? Learning to comprehend what you read. After that, I'd suggest taking up critical thinking. It'd be a stretch, but who knows - you might surprise us.

  6. amanfromMars 1 Silver badge

    Coming this year to a SCADA Operating System using/abusing you. More FUD Scaremongering.

    Happy New Year, One and All. And does not the tale we comment on here not advise us that all systems are vulnerable, and both practically and virtually indefensible and therefore always susceptible to disruptive exploitation which in extremis can be command takeover and makeover controlling?

    And there is nothing really effective to be done to halt the progress?

    Methinks, we all know that it does. And that makes for interesting future space place programming. :-)

  7. ammabamma
    Devil

    Wrong lesson learned

    So what Messr Aylin is saying is that when I write my nefarious program of dastardlyness, I should run it through a source filter first to emulate someone else's coding idiosyncrasies (like 1980s_coder's lack of indentation) or less maliciously, run it through a source minifier?

    Hmmm...

    1. FatGerman Silver badge

      Re: Wrong lesson learned

      Er... indentation and 'minifying' won't affect the compiled code one iota. To make your code look like somebody else's code you need to *think * like they do. (Insert reference to bad Client Eastwood movie 'Firefox' here).

      When it comes to pasting stuff from StackOverflow (otherwise known as "I'm an incompetent freelancer, please do my job for me") - I doubt many malicious coders would go there. Cracking the problem is all the fun for them, and they tend to work alone and very idiosynchraticly. OTOH they're the people most likely to find a way to anonymise themselves against this type of analysis. Probably won't be long before we see 'GACC' (Gnu Anonymising C Compiler) appear....

    2. Stjalodbaer

      Re: Wrong lesson learned

      Always use a ressembler.

  8. Robert Grant

    Another hack by the Google Closure Compiler!

  9. anonymous boring coward Silver badge

    This might have had some value before the research was publicised. Not so much now.

    1. amanfromMars 1 Silver badge

      The Big IntraNetworking Thing in the Internet of Things is an AI XSSXXXXOSkeletal Thing

      Hi, anonymous boring coward,

      Research being publicised and published is what terrifies systems in operations for command and control, and why captive mainstream media outlets are so terribly entertaining rather than surprisingly educational.

      HoweverAn ignorant world is an increasingly dangerous place though, and especially so for the likes of that and those responsible for grand deceits and failing virtual reality programs?! ...... New American Century Projects ..... because such a state of ignorant ruling affairs is not natural or acceptable to wiser beings minded to change things remotely and relatively anonymously

      1. amanfromMars 1 Silver badge

        Re: The Big IntraNetworking Thing in the Internet of Things is an AI XSSXXXXOSkeletal Thing

        And who and/or what be the postmodern, latter day Hitlerian Saints and Immaculate Sinners in those Versions with Vision and Provisions for New World Order Programming ......... Mass Premeditated and Premoderated and Mediated Mind Command and Control? Any concrete ideas or wild crazy guesses?

        IT and they haven’t gone away, you know, ….. such as would be with AI, Immaculately Resources Assets of Universal Virtual Force, although certainly quite different from what one may have presumed to be leading from before.

  10. Anonymous Coward
    Anonymous Coward

    triviall countermeasues

    just adjust yur style two throwoff the analisys' in those cases when ur writing malware

    simplest thing in teh world.

    1. amanfromMars 1 Silver badge

      Re: triviall countermeasues

      just adjust yur style two throwoff the analisys' in those cases when ur writing malware

      simplest thing in teh world. ... Anonymous Coward

      Although, of course, in not such an alarmingly different manner as that, AC, if one is destined to be really effective and remain continually highly disruptive, buried deep and delving within deserving systems and/or failed exclusive executive order administrations.

      The crack magic trick is, is it not, to be practically invisible and virtually omnipotent/anonymous and almighty, and that has one appearing to be most meek and unrecognisable in plain text sight. Then can there be heavenly fireworks with immaculate displays of alternative explosive worth.

      Such does make one though, in the eyes, hearts and minds of those in the know and in the need to know, both extremely valuable and marvellously dangerous. It is not a pleasant place or comfortable space for anyone or everyone.

    2. Ken Moorhouse Silver badge

      Re: triviall countermeasues

      >just adjust yur style two throwoff the analisys' in those cases when ur writing malware

      simplest thing in teh world.<

      Which reminds me: Think of a program as being an iceberg: the majority of it lies underneath the visible surface as regards those that interact with it (the average user of that app). But what is on the surface can sometimes give some good clues as to what lies beneath. If the person I have quoted above (sorry to pick on you m8, but you are AC anyway so unidentifiable, and I have a feeling you've adjusted your style to demonstrate your point, you're really William Shakespeare aren't you?) were to be a malware writer then they need to pay attention to detail - If they were hacking a banking app I don't think people would be inclined to believe your request to "Clik hear 2 verfy who u r". Sometimes with spam emails it is possible to identify, not just from the occasional typo but by sentence construction, not just that this is a scam, but the nationality of the scammer.

      There was a phase where malware was put through something like UPX to obfuscate its contents, but anyone trying to work out the legitimacy of such executables on their pc's could use a hex editor to look at the headers (is Microsoft using UPX now? I don't think so ((presses delete key))). I think anti-malware software reaches a similar conclusion.

    3. Tail Up
      Paris Hilton

      Re: triviall countermeasues

      Then you'd easily add just your style to overshakespeare those of the ProgrammingWord@Command, says you, AC?

      Paris, because boobz.

  11. Anonymous Coward
    Anonymous Coward

    Completely useless research.

    Here's some bashing, because this really deserves it. Something like this could only be dreamt of and started by those who doesn't understand programming.

    1) 64% chance to deanonymise a small sample set of hand-picked 100 programmers presumably with wildly distinctive ways of programming is utterly useless. How many programmers are there in the world, I very much expect the accuracy to drop off a cliff past a certain point.

    2) Programmer's coding style evolve, they evolve as they get better at it, they evolve when hardware changes, they evolve depending how much alcohol intake they had.

    3) Right now their accuracy is as it is, but I presume this changes drastically depending on what compiler they use. As compilers get even better at optimising, their accuracy will drop.

    4) Sure, there may still be "traits" like one programmer prefers one data structure or control structure more than another, but let's not forget how many programmers or libraries one can use. It'd be completely pointless to predict a binary compiled that's 80% from opensource libraries and I expect the accuracy will drop even further.

    5) None of this helps authorities to catch or identify those reponsible. The sophisticated ones, will learn to mimick, like how they're just as likely now to write "chinese/russian/english code comments" leftovers or originate from a "North Korean IP". The sly ones _NEVER_ makes it obvious it's them.

    Common sense and logic will be able to tell you all this without going into however much resources has been poured into researching this.

    Half or much of the stuff about "cutting-edge" computer security threats are snake oil. Served either to gain more funding from fear or political purposes to pass liberty eroding legislations.

  12. Sean H

    Source Formatting Opinions

    I was interested to see this raise so much attention. I'd thought the pretty printer tools meant you could code how you liked and then format it how your organisation, or team leader, or girlfriend's dad, would find acceptable. Me, I like the vertical alignment of {}, but I'm old enough now to realise that's just me, I can't instantly see the opening brace that goes with a particular closing one unless its directly above. But modern editors solve that, highlight one brace and it highlights the other, however aligned. And if I have to work on something for long I can always pretty-print it "my" way to make it easier, and - theoretically - re-pretty-print it with a different set of preferences afterwards.

    I'm left with only one major gripe, and that's Python. Where indentation is part of the language, well, I thought that was a bad idea in makefiles, and see no excuse for it anywhere. Mr Python wanted to impose his own indentation preference, and didn't like that fiddling punctuation noise, well, IMHO a crappy set of requirements for a language. I'm disappointed to see it hasn't faded into the obscurity it deserved.

    While I'm here, I have a minor aversion to anything "optional". Semicolons in scripting languages, that sort of thing. To me there should be just one correct way to write the syntax, not a lot of woolly alternatives that produce the same compiled code. Names excepted, of course. I used to like Java til it got over-bloated (around Java 2 or so), I liked Pascal and Modula once-upon-a-time, and now I like Erlang, which has the nit-pickiest compiler I ever met, but once you know the syntax it's trivial, and there's never any doubt about whether you need a punctuation character or can get away without it.

    I like to keep the entire language spec in my head. Good luck doing that with C++

    1. nijam Silver badge

      Re: Source Formatting Opinions

      Hooray, another person who thinks Python is a half-century leap backwards! (Although not *quite* as bad as makefiles, where it mattered whether the indentation done using TABs or SPACEs).

  13. Zippy's Sausage Factory

    I reckon

    The most use this will get is going to be in lawsuits - proving who did/didn't write some code, and therefore who does/doesn't own it.

  14. sisk

    I would imagine that their accuracy rate would drop rather radically as the pool of programmers they're trying to identify increases. After all there are only so many ways to implement a given function.

    1. amanfromMars 1 Silver badge

      IPO ProjectUS re Global Operating Devices

      :-) Quite so, SISk. AIMagiCQ roads are Absolutely Fabulous Fabless Advanced IntelAIgent Route and AIRoutes to Perfect Enough Virtual Reality Root in All Manner of Master Spider Webs ....... Phormer Networks with Exclusive Orderly Executive Administration Rights and Ab Fab Fabless Permissions.

      For All Manner of Virtualisations in Future Presentations ........ Expanding Time Lines .... MagiCQ Trails in Immaculate Tales?:-)

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like