Hold on a moment,... Isn't this the same country that has a multitude of software companies with EULAs and other contracts that specifically name reverse-engineering as a prohibited activity? I guess reverse-engineering will makes them liable for considerable damages. The military must be threatening with sending drones to those companies or they will be required to attend the inevitable sueballs, regardless of your name being "DARPA".
What DARPA wants, DARPA gets: A non-hacky way to fix bugs in legacy binaries
Imagine a world where, rather than inspiring fear and trembling in even the stoutest of IT professional's hearts, snipping bugs out of, or adding features to, legacy closed-source binaries was just another basic, low-stress task. A couple of years into a five-year DARPA project and we're perhaps well on our way there, thanks …
COMMENTS
-
-
Friday 18th August 2023 11:52 GMT Pascal Monett
Well, first of all, it's Da Gubbermint. You are doing your patriotic duty in letting it reverse-engineer and modify binaries for which there is no code.
Second, those EULAs, apparently, all belong to defunct companies whose programmers are likely retired today (if they are still alive), and whose IP has not been bought by some company still in activity today.
Third, even if that IP had been bought, there's a fair chance that the accounting department (or internal library) has no clue that they have the rights to that code. Also, there is a non-zero chance that, even if they know, they have no desire to tell anyone because they don't want to put a finger into the update process of a software for which they might have the source code (having the IP rights doesn't mean you have the code) but don't have anyone capable of modifying it.
Because if they show up and make a fuss about IP, then they run the very real risk of being liable for the changes.
I know I wouldn't want that.
-
Friday 18th August 2023 12:00 GMT b0llchit
They'd probably wont be liable for any changes due to convoluted successor-of-interest clauses in the contracts.
It probably is a great opportunity for the IP holder to overcharge the government for a small change with the exclusion of any assurances and liability because it is classified as legacy code.
-
-
-
Friday 18th August 2023 11:36 GMT Andy Non
How did they allow themselves to get into this position anyway?
As part of the contract it should have been mandatory for the source code to have been handed over, or at the very least held in escrow in case the developers moved on, died etc. I've developed many business critical applications over the years and the more savvy companies insisted on either having the source code or the source to be held in escrow with agreed circumstances and procedures for making it available to the company.
-
Friday 18th August 2023 11:55 GMT short
Re: How did they allow themselves to get into this position anyway?
Maybe it's not really their code they want to fiddle with. Maybe it's some code running in some gear they've compromised that they'd like to run with some convenient new features, then reinsert without drawing a lot of attention?
Clearly, 'maybe' is doing some heavy lifting there. But who wouldn't want to patch others' militaries' code if the opportunity arose?
I can easily believe they've lost the source (or toolchains) for an awful lot of stuff, too.
-
Tuesday 22nd August 2023 12:08 GMT Alan Brown
Re: How did they allow themselves to get into this position anyway?
"I can easily believe they've lost the source (or toolchains) for an awful lot of stuff, too."
I've worked on a number of science cases where the source or toolchain may be available, but is so expensive that RE is preferable in order to work out what it's doing and reimplement it (This is why a lot of places don't like using proprietary software. Opensource means that you can (eventually) rerun tests 30 years later, etc)
I was onboard with the idea of REing old software so it can be updated/replaced but the idea of "patching binaries" will fail anything which is cyptographically signed and is unnecessary if you own the equipment it's running on
This clearly has applications in government-sponsored malware (if it hasn't already happened elsewhere) but I can see a burgeoning use case of much older proprietary software being pulled apart by those wanting to reimplement it (and others wanting to snark the likes of Adobe, etc)
-
-
Friday 18th August 2023 12:00 GMT Pascal Monett
Re: How did they allow themselves to get into this position anyway?
That is the marvel of the software industry.
You need to realize that source code used to be treated like gold. You had the rights to the final product, but if you wanted the source code, you paid extra (and I mean HEAVY extra).
And that's not too long ago either. About ten years ago I was working for a European Institution that had a business-critical Notes application for which it had paid a ungodly amount of money to have the rights to the source code (and that was accompanied by an iron-clad contract in which, if ever said source code was published anywhere, a number of first-borns would have to be sacrificed on a night of a new moon).
Even today, having the source code is not a given. Go ask some web designer to do you a website and get the quote for having the website and the source code for all of it. I'm thinking you'll be looking at a different figure compared to just the website and support.
Personally, I've always programmed in Lotus Notes. For me, it was a given that the applications I was writing for the client would not be code-locked.
I know of quite a few software houses in Luxembourg that don't do that today.
-
Sunday 20th August 2023 22:43 GMT david 12
Re: How did they allow themselves to get into this position anyway?
We would of charged an arm and a leg for source code, but escrow was cheap or nothing to us. If the client wanted to pay for escrow, they could do that.
But (1) If clients could do anything with source code, they wouldn't have contracted us in the first place, and (2), escrow is an ongoing cost nobody wants to pay.
Slightly more common was a requirement for business-continuity insurance. That probably wouldn't have helped them, but for some companies it was just a standard contract term.
-
Friday 18th August 2023 22:15 GMT gobaskof
Re: How did they allow themselves to get into this position anyway?
You would think, but US Government procurement is a stick business of poorly written tenders and lower cost bids that meet the letter but not the spirit of the tender.
I worked in a half a billion dollar laboratory when I was a foreign contractor in a US Government lab. We needed incredible temperature control across the laboratory. No long term maintenance was considered, and the specially designed control boards were unavailable for repairs. So when one failed it was near impossible to get replacements.
-
Tuesday 22nd August 2023 12:10 GMT Alan Brown
Re: How did they allow themselves to get into this position anyway?
"lower cost bids that meet the letter but not the spirit of the tender"
Often with the active collusion of those supposed to be in charge of the Tender
The number of times tender requirements have been "rewritten" by procurement is legion. Always go over things with a fine tooth comb before approving what they send out
-
-
Saturday 19th August 2023 07:38 GMT JohnG
Re: How did they allow themselves to get into this position anyway?
When I was writing ITTs for a public entity, for a system procured on behalf of the European Commission, it was a prerequisite that all source code in operational systems would be included with deliveries and that the IP would be transferred to the EC. A side effect of this was that open source software could not be used in the operational systems.
It was similar when I was working in defence. If the ship and it's systems have a 25 year life time, the customer wanted the source or an escrow agreement to gain access to the code, in the event that the company went into administration.
Of course, it's all very well having the source but you then have to find people/companies to work with it. Typically, they will be wary of liability issues.
-
-
Friday 18th August 2023 12:58 GMT Stephen Booth
Java devs and Pythonistas?
Java is shipped as byte code that (unless obfuscated) that can already be converted back into reasonable source code with minimal effort. I expect python is similarly easy. Good job too as I'm pretty sure that the tool would only be easily applicable to other statically compiled languages not interpreted/JIT-ed languages.
Any tool that can cope with C will probably do a reasonable job on Fortran binaries etc. but more modern languages like RUST might be more of a challenge
-
Friday 18th August 2023 13:55 GMT Jou (Mxyzptlk)
Americans and their love for cool sounding abbreviations...
V-SPELL = Verified Security and Performance of Large Legacy Software
They are masters of that art. Proven so many times. Including many examples where the abbreviation and the long version contradict 100% to mask the true intention.
-
Friday 18th August 2023 17:42 GMT Lil Endian
Re: Americans and their love for cool sounding abbreviations...
I'd suggest "Geeks and their love for cool sounding abbreviations..." as I must confess I'm partial to an acronym. Who am I kidding, every hacker knows that projects start with sub-zero acronyms! One of my favourites was FLAMES (Fundamental Logical Algorithmically Mapped Encryption Structures), which was basically a data structure used to compare "components" using bitwise operations for speed. Or so I told myself :)
-
-
Friday 18th August 2023 14:26 GMT abend0c4
Beware the perils of space
Or, rather, lack thereof.
I've done rather more than my fair share of staring blankly at hex dumps of assorted ROMs trying to work out what was going on and one of the issues with more "mature" systems is that a considerable amount of ingenuity was often expended to get the code to fit the space available. I vaguely recall that the Commodore PET ROM relied rather heavily on jumping into the middle of instructions (often operands whose values were also valid opcodes).
Even if your decompiler can solve those kinds of oddities, I suspect the recompiled code stands a good chance of being bigger than the original as it's unlikely to make use of such arcane optimisations. Perhaps not even the expletive-powered encouragement of the deputy chairman of the Conservative party would be sufficient to make it go back where it's come from.
-
Friday 18th August 2023 18:36 GMT Jou (Mxyzptlk)
Re: Beware the perils of space
> Commodore PET ROM relied rather heavily on jumping into the middle of instructions
I remember programming on the C16, a faster draw-line routing than the basic supplied, using the built-in assembler-disassembler. During initialization to draw the line I self-modified the loop to switch between DEX and INX and a few BEQ to BNE instructions, depending which of the eight quadrants (or four possible general directions) are we in.
-
Friday 18th August 2023 19:57 GMT Lil Endian
Re: Beware the perils of space
"...staring blankly at hex dumps of assorted ROMs... I totally hear what you're saying abend0c4!
That made me think of the Acorn Atom (that I still have) which has a free (EP)ROM slot that can take a daughter board, seating eight ROMs, individually selectable. That's populated with ROMs for disassembly, hex dumps etc etc. Ah! I know that blank stare all too well!
About your point, space, yep! Not a discipline that someone who's only banged out some C#/.NET spaghetti is going to manage too well, me thinks!
-
Wednesday 30th August 2023 10:48 GMT Jamie Jones
Re: Beware the perils of space
Good point. And I used to do similar on the ZX Spectrum. Saving one byte was a great thing.
Other examples: XOR A (xor a with a) was 1 byte, whilst "LD a, 0" was 2 bytes.
Blanking unused bits of the screen, and using the area to store data...
Also a lot of what would be called "self-modifying code" to speed up loops etc.
Fun times.
-
-
-
Friday 18th August 2023 17:17 GMT Anonymous Coward
Seriously???????????
Has everyone forgotten what patching used to be? In the beginning, patching was clipping a wire or adding a new wire. Then patching was inserting or removing cards in a deck. Next was to replace a cpu instruction with a jump, adding your patch at that location,and jumping back to the next instruction. And that's where patching ended and recompile-and-replace took over.
-
Saturday 19th August 2023 09:45 GMT Anonymous Coward
Re: Seriously???????????
Has everyone forgotten what patching used to be?
Be serious. Most of the new kids barely know what a baudrate is, let alone educate themselves over historic gear with instructions on parchment (actually, the fact that instructions come with something dates it in itself).
:)
-
Saturday 19th August 2023 13:34 GMT John Brown (no body)
Re: Seriously???????????
"actually, the fact that instructions come with something dates it in itself"
True :-) The first version of WordStar I used, on CP/M, came with a manual so complete, it included as standard user level procedure, the step by step instructions on how to patch the binary with HEX codes to make it work with printers that used different codes for bold, italic, half-linefeed, overprinting etc so that OKI, or Epson, or Juki printers etc could be used to most of it's potential even though it might not have been invented when WordStar came out :-)
-
-
-
Friday 18th August 2023 17:35 GMT trevorde
Pure sorcery
Worked for a company which had a product which did exactly this. It extracted the image from a car's ECU and listed all the variables and functions. This was then imported into Matlab where you could then replace the functions with your own or define a function to replace a variable. You could then press the magic button and it would reflash the ECU.
I looked under the covers and it was an unholy mix of Matlab, gnu disassemblers, assemblers and string replacement. They told me it was written by the bloke who developed the bit-torrent protocol. Three hundred years ago he would have been burnt at the stake for witchcraft.
-
Friday 18th August 2023 18:20 GMT Anonymous Coward
Good luck with that.
Decompilation is the easy bit. Trying to modify and then recompile the code with the exact same dependencies, libraries and entrypoints is the difficult bit.
I'm old enough to remember when it was possible to modify the BIOS via EPROM.
However it does highlight the vulnerability around legacy code.
Incidentally how did somebody supposedly conversant with the topic manage to write an entire article without mention of escrow (big upvote for fellow commentards that did). Is this the Florida Man style taking over ?
-
Friday 18th August 2023 18:20 GMT DS999
There are a lot of problems with this
Converting machine code to say C is easy, but you might not like the output. There are a lot of different compilers out there, especially in the embedded world, and they all have someone different formats. For embedded or older code where storage was at a premium the symbols have probably been stripped, so you don't get the original variable names (though if they were a, b, c, d, etc. maybe you don't mind) Even if they weren't the decompiler has to understand the format to match them up correctly.
Then you get into stuff like optimization. If the compiler unrolls a loop you'd want the decompiler to re-roll it to make the code easier to follow but that's not an easy job. There are tons of tricks optimization will play that will make decompiled source less readable, the more of them you can undo the more readable and understandable it will be.
Given that decompilers have existed forever, but every one I've seen does the naive direct translation without accounting for symbols and optimization I'm assuming they want the gold plated version which will not be easy or cheap to produce. It is probably not far from the effort required to write an optimizing compiler - and there's a reason hardly anyone writes new compilers these days rather than using gcc or llvm.
-
Saturday 19th August 2023 13:41 GMT John Brown (no body)
Re: There are a lot of problems with this
Yeah, optimising compilers can be a double edged sword. Try some clever "trick" the compiler doesn't understand, thinks it's something it can optimise, and bang, your code doesn't do what you expected. Sometimes, you really need to understand how the compiler works and the mindset of the author[s] to get your code to work the way you intended. Luckily for me I pretty much stopped any serious programming/coding many years ago so don't worry about stuff like that any more :-)
I also assume that disassemblers have moved on in leaps and bound since I last played with stuff like that. Telling the difference between code and data was still a game of hit and miss the last time I used one. Getting a bunch of random code output from data that didn't seem to do anything mixed in with actual code was always fun.
-
-
-
-
-
Sunday 20th August 2023 01:49 GMT jake
Re: There are a lot of problems with this
Show me where I said I "expected everyone to know"?
Because I can assure you that I don't expect that at all.
But I DO expect anybody who considers themselves to be a professional programmer to at least have clues. If you don't know the tools that you make your bread & butter with inside out, how can you consider yourself to be a professional?
-
Sunday 20th August 2023 20:54 GMT Jou (Mxyzptlk)
Re: There are a lot of problems with this
> Show me where I said I "expected everyone to know"?
OK
> > One should ALWAYS understand how the compiler works
Not said directly, but clearly implied. Well, actually you did not communicate which level you are expecting, and reading someones mind over the internet is not yet implemented as RFC. Hence the question "beyond the general concept", to which you clearly communicated you are way "beyond the general concept". So, what exact level are you expecting? You seem to have an advantage as a contributor and can specify that much better than I do.
-
Sunday 20th August 2023 21:24 GMT jake
Re: There are a lot of problems with this
"Not said directly, but clearly implied. Well, actually you did not communicate which level you are expecting, and reading someones mind over the internet is not yet implemented as RFC."
Read the rest of my post where I modified that with "unless you're just an easily replaceable cog in the machine, and content to stay that way."
"So, what exact level are you expecting?"
Probably not the answer you are looking for, but you wouldn't believe the number of times I've run across a so-called "programmer" who had spent days or weeks weeks trying to track down a bug in his code ... but it turned out to be a rather well documented compiler bug (or sometimes a feature, depending on the angle you observed it from).
Again, if you don't know the toolset inside out, are you truly a professional in that field?
-
Sunday 20th August 2023 22:05 GMT Jou (Mxyzptlk)
Re: There are a lot of problems with this
That depends. I found several powershell cmdlets with limits or bugged behaviour, so I had to implement what I needed on my own.
Remove-Item -Recurse is well known to be broken, that is easy.
Get-ADGroupMember is broken when you have a group with members from across trusts (aka interforest) - it just throws an error. So i had to program on my own, including -recurse, to crawl all trusts and forests to get more than just a SID, verified from whichever domain-controller is responsible since the /ForeignSecurityPrincipals pseudo OU in the customer AD was filled with bad entries.
Get-ChildItem -recurse on a large directly takes too much RAM (> 5 GB, but I saw > 16 GB too), so I had to dig out my old C16 (with 12k of memory) knowledge to recurse in a style which does NOT need that much RAM.
According to MS documentation you cannot set the "set explicit view permissions on the DFS folder" option on a DFS-N folder-link in powershell - but I needed it for a larger DFS-N which several thousand links across over a hundred namespaces, so it had to be done in powershell (for my pride, to prevent errors with wrong clicks, and to save time). In the end it was easier than expected once I found the way.
All those examples still in powershell 5.1, since that is available everywhere from Server 2008 R2 to Server 2022. And without external modules from, for example powershell gallery, since the customer is paranoid about that due to bad experiences. I rarely have to go down to PS 3.0 or 2.0, and I don't like to since long paths are not well supported there and a few other things are missing or more cumbersome.
OK, enough bragging with my bit of powerhsell knowledge. There is more I could brag about, but there are a SO many people out there which know powershell way deeper than I do and on whose knowledge I built upon to break the limit of what is supplied directly. Like the solutions in above's examples - which cannot be found on Stackexchange and the like, except you may find a post from me here and there. So others see me as a professional and they clearly tell me, but I see myself on the way to be a pro in powershell since I see how tall the mountain actually is.
-
Friday 25th August 2023 07:53 GMT Roo
Re: There are a lot of problems with this
To be honest more often than not the bug has been within the developer's (my) code rather than the compiler.That said I believe developers *should* be at least able to determine that the compiler is misbehaving through proper testing of their code and using appropriate tools such as another compiler, and/or a disassembler. In my experience the vast majority of developers who insist that the tools are to blame don't meet that criteria. :(
I've only had two occasions (in 36 years of using compilers) that the code was categorically correct and the compiler was misbehaving (ie: not doing what the documentation said it should do). It took a couple of days to pin it down & fix it in both cases. There was a false positive raised by cow-orker who had been kicking furniture around the office while mooing about the compiler being busted for over 3 weeks that I debunked in 10 minutes using a disassembler - because 1) the lying dickhead refused to show his source, 2) didn't know the difference between K&R functions & ANSI Prototypes and 3) failed to actually unit-test his shit code.
-
-
-
-
-
-
-
-
-
-
Friday 18th August 2023 23:52 GMT Bebu
The sinister fingers of contemporary AI in this? :)
I imagine any particularly old binary built with old toolchains (with intact symbols a plus, debugging info a real bonus) could be fairly easily decompiled into a reasonable replica of the original source code.
If you had the original toolchain and a (cross) environment to run it, then training AI/Ml to recognise common programming idioms and algorithms could produce pretty decent code.
Not all code would have been C/C++ as I imagine there could have been a fair bit of Ada and PL/1. Ada was developed to replace literally hundreds of programming languages then in use by the DOD. One ring ;) but in the end it looks like it was C that in darkness ruled them all. ;)
I am curious whether decompiling CISC binaries is easier than RISC binaries. Or harder, or the same?
I also wonder why DARPA would want to resurrect old code rather than starting from scratch with an updated design? Perhaps the programmers of yore were giants in the golden age of their craft and the lesser creatures of this bronze age are mere squabbling dwarves. ;)
-
Monday 21st August 2023 05:31 GMT amanfromMars 1
Future things don’t work better that way ..... stuck with old faltering worn out tired systems.
V-SPELLS aims to radically broaden adoption of software verification by enabling incremental introduction of superior technologies into systems that cannot be redesigned from scratch and replaced as a whole.
That is commendable and just as one would expect things to happen but its success is fundamentally catastrophically limited to be always bested by the introduction of superior technology systems as replacements for such aged outmoded wholes, which in all honesty would really need to be realised as totally unfit for and dangerously detrimental to future great purpose.
Imagine the state and performance of an early Model T Ford type system vehicle, incrementally modified but not redesigned, whenever up against the likes of the finest and latest from the Porsche VAG garage stables and you should see the obvious problem which cannot be resolved by V-SPELLS and DARPA ‽
-
Thursday 28th September 2023 20:04 GMT MikeyDoesn'tLike It
Time will tell
I've been a software developer with a Government contractor for over 40 years and EVERY contract I've developed software for has delivered the source code as well as the executables to the Government. O&M and sustainment contracts sometimes even cause the supporting contractor to change as well and the software, both source and executables transfer with the possible exception of existing proprietary software that was used in the original development and in that case linkable libraries or executables are delivered.
So in that case I don't really see the problems that this is supposedly trying to solve.
However, more recently many contracts have used more and more COTS software as delivered systems have become more mainstream and less stove-piped. As I have seen recently myself, many of those companies no longer exist either going out of business or becoming absorbed by larger companies which, in turn, abandon their original product lines. In that case, I can see a need for this capability as it would preclude any requirement to rewrite the software from scratch with its requisite massive costs in both labor and time.
Will this actually be a solution as others have pointed out the potential licensing pitfalls with this strategy? Or will this result in the Government going back to developing its own software as before or continuing down this road? Time will tell.