Justine
This lady is a real computer scientist. She shows up the current emphasis on 'coding' for the shallow drivel it is.
A bunch of almost unbelievably clever tech tricks come together into something practical with redbean 2: a webserver plus content in a single file that runs on any x86-64 operating system. The project is the culmination – so far – of a series of remarkable, inspired hacks by programmer Justine Tunney: αcτµαlly pδrταblε …
All respect to Justine, but creating polyglots is not "computer science" under any useful definition of the term. It's good old-fashioned hacking, of the sort that could get you a nice article in 2600 or PoC||GTFO.
Indeed, if you read the PoC||GTFO collections you'll find plenty of examples of similar work.
This comes across as a fanboy squee spasm - one that completely unravels in the bootnote.
Some of us are old enough to remember when the PE stub was used to supply a working DOS version so the file was a "fat binary" that could run under DOS and Windows. And I can begin to sketch out in my head how I'd generalise the process, if someone paid me to do it. So it would have been nice to read a technical discussion of what the blocking problems were and how they'd been overcome.
[Author here]
Well, it is, somewhat, yes. It's a remarkable technical accomplishment. I am not sure if that makes it actually very _useful_ in the real world, no.
Ubuntu (with its derivatives) is the dominant Linux out there, and Ubuntu contains changes that stop this working straight out of the box, AIUI.
I don't think that invalidates the exercise, though.
> Some of us are old enough to remember when the PE stub was used to supply a working DOS version
Really? Can you cite any examples at all?
> the file was a "fat binary" that could run under DOS and Windows.
That's not what a "fat binary" means, but I think from your scare quotes that you know that.
> So it would have been nice to read a technical discussion of what the blocking problems were and how they'd been overcome.
I gave copious links to the various subcomponents (APE, Cosmo etc.) so that you can read the author's own explanations. I think repeating it all would have made the piece overly long and too complex for most people, without really adding any value to it. Tunney's own explanations are good and cogent, and I fear that anyone unable to follow them is probably not going to be able to understand how this was done and what an impressive achievement it is.
It's a fine technical accomplishment. I'm not sure it rises to "remarkable" if you follow this sort of work. Like, say, PoC||GTFO 14, which is a PDF (of the issue's contents, of course), a zip archive with the samples from the articles, and a Nintendo ROM with a playable game, all in one file.
And the text of the PDF includes the PDF's own MD5 sum, so they had to calculate an MD5 collision on top of their three-way polyglot. (Because, of course, if you take the MD5 hash of a PDF and then add it to the text, you'll change the hash.)
So, yes, congratulations to Justine for some fine hackery in the best sense, attending to how things work at the low level and bumming bytes. Good stuff. Not astonishing, though.
Even more impressive : the fact that there are still some people intelligent enough - and dedicated enough - to actually understand how our computers work and build the environment to demonstrate it.
Back in the day, I purchased the Norton's PC Bible. It was a very enlightening experience, finding out how things actually happened at the hardware level. Thanks to that book, I toyed around with creating moving objects in graphics mode on my 4-colour IBM-PC 640*320 CRT screen (that was high-resolution, back then).
These days, there is no more PC Bible, there is a PC library, and it would take years for a newcomer to grasp all of it.
Kudos to the creator of this tool.
> With Cosmo and the APE format, you can write a C program and compile it to a single file that will load and run on six totally different operating systems. Oh, and if that wasn't enough, the same binary can be booted directly from the PC BIOS, as well.
A truly impressive piece of work. Unfortunately, I can almost hear the virus-writers and spyware creators (including government establishments across the world) drooling over the possibilities this opens up.
Maybe it's just too powerful?
It's certainly something that will be explored but it should also be possible to block the executables because of the format used.
As for univeral apps running the browser, this is being driven by WASM and comes with a bigger API. This ends up doing many of the achieving the same things but by different means.
Nah. Polyglots already existed before this. Small hosted execution environments existed before this. Unless you're a fairly feeble exploit developer, this is not new.
As I wrote above, it's good work. It's not breaking new ground, particularly not in the world of advanced malware (APTs and such).
That “x” is a Latin letter, so it retains the Latin /ks/ pronunciation (like the Greek letter “ξ”). The Greek letter that is pronounced /x/, with the sound of “ch” in “loch”, is “χ”. (Note that the “c”, “b”, and “l” in “εxεcµταblε” are also Latin letters.)
Ah, the whimsical path that is the comments on an El Reg article. A few articles back we had a comments thread that included recommendations for high quality butters, and now we've got Greek language tuition. I shall have to start dropping in snippets about Chopin and Listz. Perhaps someone else could talk about Japanese armour?
Time for an eye test.
I just got a handheld magnifying glass and read through six months of unread letters in order. All print is small print! One letter told me I was getting an appointment with an ophthalmologist which I badly need but never asked for so it must be a psychic ophthalmologist. The next letter said it was 6th of June, damn. The next letter said it had been rearranged for later this month, hooray.
Same with the council letters demanding council tax, then taking me to court, then setting bailiffs on me, then giving me a cheque for £130 as an apology for the pandemic then another £150 as an apology for the cost of living. A text, email or phone is preferential.
And the census and TV licence morons which I would have ignored anyway - won't somebody think of the trees?
Because my energy company suspected I'd been stealing energy from them (for using energy without topping up) they accused me and cut off my supply. I was £9 in credit at the time, I've been ill. Why cut off someone in credit? I've threatened to take them to the small claims court unless they recompense me for the contents of my freezer, my fridge and my good name. For what are we worth but our reputation? I asked for around £15.
"Why cut off someone in credit?"
My credit with British Gas keeps rising. Unfortunately to reduce my monthly payment needs me to queue and debate with a human on the phone. Alternatively - increasing the amount is just a case of amending my online account details.
Now it is summer - my boiler is totally shut down until the autumn. My rare use of the gas hob doesn't register more than about 0.01 of a unit - which the online readings don't register. Would not be entirely unexpected to be accused of by-passing the meter - which is accessible outside to any meter inspector.
Eon have a slightly better online accounts system for my electricity - except you cannot reduce your monthly payment by more than 10%.
It is obvious that neither company has an accounting system that is any use for automatically correlating customer credit/debit totals and monthly payments.
It will be interesting to see what the regulators proposed new rules about accumulating credit balances will do to the situation.
> cancel the direct debit. if (when) they get snotty, tell them you've already paid.
Ah, but then the energy company will just move the OP onto the expensive standard tariff.
Better to complain to the Direct Debit people saying that the energy supplier is abusing the service and should be prevented from offering it, cc-ing Ofgem and your MP.
Old timers will remember when MS/DOS programs were either .COM or .EXE depending on whether they ran in a single 64k segment or need the OS to setup the environment using header information. .COM files were 8080 programs that could also run on 8086 processors, with PC/DOS or CPM “operating system”: while redbean is undoubtedly clever, it is not conceptually different to 1979 design.
Object file formats data from the 1960’s with little difference until ELF was created to allow for shared object load in Unix System-V for an ABI without hardware dependence on interrupt vector table. ELF vs COFF used to be a question of load-time and space, but is largely irrelevant today. It’s a badge of shame that modern OS do not support multiple executable file formats.
One day we’ll able to load .dll files on Linux (without LoadLibary) and .so files in Windows (without clang cmake).. maybe this is the kind of initiative to get things going
You're incorrect in saying that ".COM files were 8080 programs that could also run on 8086 processors". And who would want to anyway given that the system calls would be completely different for anything other than pretty simplistic programs.
The machine code of the two processors is completely different and although the technical file format of .COM files is the same between DOS and CP/M, there's no way a CP/M program was going to run on a DOS machine without an emulator (of which there were some) involved.
So yes, getting this to work across multiple O/Ses is quite conceptually different to a simplistic file format which happens (for "reasons") to be the same on two O/Ses.
The 8086 processor has segmented memory for compatibility with 8080 programs: for .COM files DOS set the all segment registers the to the same 64k segment, loaded the .COM file and started executing from the first instruction (whether valid or not) . calls to the OS were via processor interrupts, and would execute whatever code was referenced in the interrupt vector table. SCP/MS/PC DOS used the same interrupts (e.g. INT16 for keyboard) as CP/M - they were both essentially boot loaders rather than what we'd call an OS today.
It's a semantic distinction to say that 8086 .COM programs were not 8080 programs - they couldn't reliably change segment registers, so would probably fail if they included non 8080 instruction. It's fair to say that few "useful" programs written for CP/M would run on DOS, but Viscalc was one of them.
>there's no way a CP/M program was going to run on a DOS machine without an emulator
Not strictly true, at least at first. The original QDOS/MS-DOS used CP/M BIOS calls (including the notorious jump around A20 trick which caused so much trouble on later systems). When MS-DOS changed from 1.4 to 2.0 it included system calls that emulated low level UNIX file system calls -- open, close, read, write and so on. The CP/M interface was still in there for years, for all I know it may have lasted to MS-DOS 6 or beyond.
The main reason people didn't run CP/M programs on a PC is that CP/M-86 turned up too late to be significant (and if MS-DOS programs detected it then it issued all sorts of threatware notices). So CP/M was for 8 bit machines so the emulator accommodated primarily Z80 machine code (also, strictly speaking, 8080/8085). There was a handy program called "Uniform" that read all the different CP/M style disks, they were all manufacturer proprietary, so you could carry on as before using either an emulator or a plug-in card.
This post has been deleted by its author
Is in fact completely portable: the limitation is the loaders, not the PE format.
The author has only seen executables in which the 'DOS' program is only "This program requires Windows" (The 'Hello World' program of PE), but that was a choice that individual developers made: to provide an executable linked to MS Windows, and only one other executable, and that a stub.
Programs developed to replace existing DOS programs could and did sometimes include both programs. The Win98SE installer is an example. But more commonly limitations of Windows disk space, and DOS floppy disk space made that a stupid idea.
The container format can contain multiple binaries (and multiple resources). It's not just limited to two. The requirement is that the OS doing the Load-and-GO operation has to recognize it's own executables in the PE file.
I think that many/most/ linux systems don't have any default support for the PE format outside of it's mandated use to contain EFI (boot) information. However, they do have support for script files, and as I understand it, the APE format leverages that by embedding a binary structure that can alternately be interpreted as a script loader or as an executable container. Someone who is more familiar with shell scripts may comment.
the APE format leverages that by embedding a binary structure that can alternately be interpreted as a script loader or as an executable container
Yes, plus some other goodies.
Basically, PEs are recognized on Windows by the magic number in the first two bytes, which happens to be "MZ" (for good historical reasons) in ASCII. There are similar magic numbers for various binary executable formats used by many other OSes. (Claiming "any x86 OS" is clearly a little broad, because who knows how many people have implemented their own experimental OSes with crazy formats and conventions.)
Bourne shell scripts have to contain legal Bourne shell commands, but don't have to start with anything in particular, because when they were introduced the interpreter ("hash-bang") line concept hadn't been introduced.
So, when a POSIX OS is asked to execute a file, it sees if it starts with the magic number of any of the executable formats it knows how to run. If not, it has to try to run it as a Bourne shell script. (It can't really apply any heuristics except that if the first byte is binary 0, it can't be a valid Bourne shell script. Otherwise, in POSIX land, even a 2-byte file could be a valid script, because you can use any character other than NUL or / in a filename, and the script might just execute another file. But I digress.)
So, I can start a Bourne shell script with "MZ" and it looks like a PE to Windows. If I write that script carefully, I can actually make it both a valid Bourne script and a valid PE, because they're both pretty forgiving. In the case of APE, the script starts off by turning that MZ into the beginning of a variable name, as you can see if you follow the link in the article.
Then, if it's running as a PE, great; you just stick in the various PE sections after redirecting around the script stuff. If it's running as a script, it can do various things to get itself re-executed as the proper sort of file.
And then there's the web-app zip trick, which is one of the first tricks you learn when writing polyglots, and one that was used by Phil Karn's original PKZip (in the pksfx utility). A zip file has its metadata at the end, and the directory in the metadata contains offsets to the compressed contents. So anything can come before the zip data; decompressors will start at the end of the file and then jump into the middle of it. So you just take your executable and append the zip data to it, and you have both an executable and a zip archive.
I do this occasionally with a PNG image and a zip archive. The PNG image is of text that reads "This is a zip file. Rename it with a zip extension and open it again." It's handy for sending zips to people whose email clients block them, for example.
In today's common OS/userland design, this is impressive. So many libraries, so many dependencies. I have not looked into the Ubuntu issue mentioned, but if I were a betting man, I would bet on new systemd features.
There are still programs though which eschew the library approach to solve a single required function out of dozens or hundreds provided by the library. These are good candidates for this approach.
I disagree with the notion that this is not computer science, but hacking (in the traditional beneficial sense if hacking). In my studies of computer science, this is exactly what computer science can be leveraged for. I believe computer science practised at this level is hacking and vice versa. Not every computer scientist can do something such as this, likewise for the hackers. I suggest the title of Computer Science Boffin be bestowed upon Justine Tunney.
This post has been deleted by its author