Buffer overrun? still?
someone take me outside and shoot me
I feel like I'm trapped in that never ending circle of hell where whatever you do, whatever happens , you'll end up back in the same place every time
The most dangerous type of software bug is the out-of-bounds write, according to MITRE this week. This type of flaw is responsible for 70 CVE-tagged holes in the US government's list of known vulnerabilities that are under active attack and need to be patched, we note. Out-of-bounds write, sometimes labeled CWE-787, also took …
Look, if this is still being used in 20 years then we’ve got bigger problems.
Look, I’ve left out the code to do sanitization so remember to wrap this in a sanitization function, but this should do what you want.
Look, this code base is crazy interdependent so don’t touch anything unless it’s broken, just pop a new module on the side to add functionality :)
Look…. etc
Yep, not just the copy-and-paste stuff on Stack Overflow but also examples on instructional websites and videos.
They all say something like "remember to add exception handling" or "sanitise input" leaving the implementer to do this without providing good advice on the subject.
It's easy to write memory safe C and C++, but people don't. Too busy making something work and forgetting to make it safe.
Some years back I ran a small team of programmers along with my main job of designing the software they were writing.
I got management complaining about the time one my programmers was taking to complete his work. Too long, too long they said.
I pointed out that he had the best quality code, had fewer system, integration, unit test problems and virtually nothing reported by customers. He was, I pointed out, by far their best programmer.
The response was that he was too slow, and that they had releases to get out.
Rust will not fix that problem.
"The required techniques of effective reasoning are pretty formal, but as long as programming is done by people that don't master them, the software crisis will remain with us and will be considered an incurable disease. And you know what incurable diseases do: they invite the quacks and charlatans in, who in this case take the form of Software Engineering gurus."---
Edsger Dijkstra
What ever happened to LISP? It is fast, memory safe etc. and a pleasure to program in. A modern LISP has everything you need except acceptance. The newer programming languages haven't learned the lessons of LISP and reimplement everything in a buggy or limited way. The first LISP 1.5 is over 60 years old.
well there are warnings you can enable in llvm-based C/C++ compilers that find a lot of things .
Other C/C++ tricks include use of strncpy, snprintf, etc which have buffer checking built in, as well as avoiding more obvious things, by sanitizing "foreign" (and even internal) data and structures, AND including (but not being limited to) the simple fix of 'fgets' vs 'gets' on stdin.
Custom APIs with output to buffers should ALSO do size checking and have the output buffer size specified. The simple things.
No need to implement garbage collection memory control, nor use Rust to fix this.
"All" you need to do is spend time implementing a nice CI environment with all the flags and sanitisers on and one or more static analyser tools as part of the build process and fails the build when it finds something. And first you have to get that past the manager who thinks that the developer just wants to mess around with a build process which already works and it will only throw up more errors which slow things down even more.
We have a problem with the software we write, but a lot of the time it's not down to developers.
Listen to this
https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/
and you will see that Index Errors are an OPP (Old Persistent Problem). The FORTRAN people had this decades ago.
We have index errors in C, C++, Java, C# programs today, we must assume.
Software engineer's understanding of their own "perfection" is wrong. Homo Sapiens is not a machine, but a highly capable, error-prone being.
Machines should be tolerant against human mistake as much as possible. C and C++ are good for Code Generators, because machines will eventually be "perfect".
We were already much better with memory-safe ALGOL, but C+Unix won out for cheapness.
Why would this go away when we still use the same core toolchain that we've used for decades; and at machine level; more or less the same instruction sets that have been in circulation for similar amounts of time?
Compilers, language features and user interfaces have obviously evolved. But C, C++? Still the same beasts that they always were. Capable beasts, yes, but certainly not infallible.
DLL Hell and API dependency means that auditing all aspects of possible overruns is nigh impossible.
I am under no illusions that Rust is a panacea to the world's problems, but it certainly exhibits some good ideas.
I remember when Linux was starting to get around there was no malware written for it and the assumption seemed to be that it was down to its superior engineering (hah!) compared to Windows.
Turns out it was just that nobody cared enough to look for the holes. I wonder what fresh hells rust is leading us in to now that they've finally --- apparently and yet again --- solved memory allocation.
I would suggest you are indulging in a little revisionist history.
While both OSes have had code of questionable quality, the "engineering" of Windows to maintain backwards compatibility or focus on single user operations has long been it's Achilles heel and security issues often persisted because alternatives to key issues were readily available.
Things like ActiveX or the use of SMBv1/DDE implementations that allowed security issues to quickly traverse from a browser to other applications and other systems on common networks
In comparison, most of Linuxes worst engineering issues over the last 20 years were "optional" rather than core functionality and allowed alternatives (of varying ease of use...)
Backward compatibility is probably the biggest issue for Windows (it being driven by the marketing and finance department the next biggest).
Truth is that software development resources are hard to find (have been for decades) and most managers can't afford to have them working on stuff that has no financial or marketing driven purpose. People continue to use old applications because they are used to them (warts and all) and the sheer number and variety of users it directly supports imposes a massive burden on the Windows development team.
Linux has - possibly - more people overall dependant on it now but the vast majority are only secondarily dependant (they do things that rely on a Linux server backend). That reduces and simplifies the interface between old and new code.
I still maintain some code from 1977 written in Fortran on a long-dead mainframe architecture. The user manuals are fantastic - they even estimate how much runtime and power consumption calling on the model would entail.
A tiny amount of equipment designed using this code remains in circulation, and will do so for approximately the next ten years; therefore keeping the tools around is relatively important.
But not important enough to do a proper job of refactoring it onto a current platform; and proving that the refactoring performs equivalently to the originals.
I would open source it for the tiny community of people that would be interested; however, it's legal status is somewhat in Limbo. The rights to use it were divested amongst a dozen companies when they were privatised; and nobody owns the source. The legal entity that owned the copyright was dissolved by an act of parliament quite some time ago.
It's not important enough for anyone to really care of course; but abandonware is hardly the image certain corporate types want out there.
Somewhat amusingly, in an unrelated sector to my own; an amusement park has terrible reliability problems with one of it's rides due to the very problem that this software is designed to solve. This is a problem that these tools could solve for them. Perhaps I should offer my services.
You're not entirely wrong in that it's true that lots of people used to assume (many still do, I guess) that Linux was flat-out immune to malware, and that's not true. It's also true that a sizeable bit of why most malware is Windows is that Windows is by far the biggest target, especially in the sector of non-techy users that are unlikely to follow good security practices. A Linux user that did everything as su would be as easy to infect as a contemporary Windows user - possibly more, if they'are arrogant enough not to use any anti-malware.
However - there are a few other things that are definitely true. The first is that Windows engineering, on the security aspect at least, used to be really, really bad on consumer versions of the OS. Fans can and will debate on the NT branch, but nobody sane could deny that the other branch (95, 98, XP, ME) was a dumpster fire.
The second is that Microsoft is really serious about backwards compatibility, but that comes with a price. Namely, the need to support at least part of that pre-Win2k crap. They've done a lot to secure it, but in the end it's really like polishing a turd. They know it, they really do have good engineers and they know it's an unsolvable problem, but they have no alternative. Microsoft dropping BC would most likely kill the company. The worst part is that, since it all still works, tons of developers are still using it even in new software. So it's not even a problem that will eventually obsolete itself away.
Cough louder cough, extremely loud cough, use a garbage collected language for everything outside of systems programming?
Golang exists for a reason, and if we can build massively distributed server systems with that, that have no performance issues just because it's garbage collected (and even if the GC trips one up, there are ways to mitigate that), then there is no reason why we cannot build RDBMS, webservers, CTI gateways, etc. with it.
"rewrite it in XYZ" is massively easier to sell to people, when the language proposed is simple and familiar. Oh, not having organizational dramas also helps a lot.
Michael Stonebraker's memoirs are available on-line, and one day I spent a bit of time browsing in them. Somewhere along the line, he said, he and colleagues had the notion of writing Postgres (initially, or a newer version, I forget) in LISP. I don't doubt that would have been safer, but then looked at the performance, gave up, and went back to C.
Go and most JavaScript/TypeScript implementations (e.g., nodejs, chromium browser and therefore most browsers) use V8 as the interpreter back end.
Of course interpreted languages have their own problems - especially related to unsanitized inputs.
Go probably has the best security record, but it's also the least used, and is mostly used in Google, which tends to take security seriously.
Well, use a memory safe language. Rust is, Golang is. They just have different mechanisms for achieving it. Golang is GCd. Rust mostly uses memory recovery at compile time and GC when it can't do runtime.
So, I think golang and rust exist for pretty much the same reason.
There exist quite a few non-GC (mark+sweep) languages, which are memory-safe.
Rust, Swift, Sappeur (mine) to name a few.
It must be said that the software engineer must break circular references in non-GC languages or you have a memory leak(which is usually NOT exploitable for subversion, but for a DOS attack)
1. Separate I and D spaces
2. Use the MPU to enforce per-process limits
3. Hardware assist on privileges (not just user/super but > 5 levels)
Then the O/S has a fighting chance to secure itself.
I remember studying this in computer science in 1982. But, we still don't implement it.
Yep.
But that's because x86 / x64 software and Windows OS are written under assumptions that mean they would have to be completely rewritten and all backward compatibility would stop.
The problem was in carrying forward the initial design of the 8086 - and later chips actually marketed the fact that they treated things with a flat memory model, etc. etc.
Since then, we had DEP and ASLR and all kinds of "bodge jobs" to try to implement what you're talking about, but fatally, they all kill backward compatibility and require OS rewrites.
DEP, literally, was one that was supposed to "solve" this for Windows. You have DEP enabled on your machine now. It doesn't solve the problem.
Fixing it in hardware won't do anything - there are almost certainly hardware architectures designed like that out there. The problem is that none of the popular OS or software will work on them because they rely on code/data tricks to operate.
Instead we have a 30-year-plus slow revisionism and culling of parts of backward compatibility and trying to fudge it into the OS piecemeal.
We're still bound by decisions made in the 70's, to a large extent. When everyone was trying to squeeze every cycle they could out of their chips and something like separate instruction and data spaces would have been deemed entirely unnecessary and affecting performance.
And, unfortunately, machines have pretty much stagnated in terms of sheer processing performance (more cores, yes, but faster cores? No). So even things like "emulation" of such for backwards compatibility isn't viable because it would just make computers still appear slower than they are without such technology (when running legacy software, at least).
It needs a redesign from the ground up, or an designed-for-it-from-day-one architecture to take hold of the market (I'm not sure what ones would be "closer", but things like ARM, RISC-V, etc.). That's not going to happen any time soon.
And though we have fixed some problems and deprecated some functions of those early chips in more modern designs (64-bit was a transition not just in word-size, but in deprecating a lot of legacy nonsense), nothing will actually fix the problem after the event.
We'll just keep plugging holes with our fingers until the dam is nothing but fingers, I imagine.
@Lee D: "But that's because x86 / x64 software and Windows OS are written under assumptions that mean they would have to be completely rewritten and all backward compatibility would stop"
Completly re-design the hardware and software and run the legacy apps in an emulator. They did manage to design a reliable platform with the Xbox.
And now Intel are proposing removing rings 1 and 2 as Windows never bothered to use it about 38 years, MS thought it was better to pile crud on top of crud instead. Linux is no better in this regard, apart from it only spent 32 years not using rings 1 and 2.
That's nice, but I think we need a bit more:
* Separate return-address, parameter, and data stacks
* Hardware-enforced prohibition against code stuffing arbitrary values onto the return stack and onto the parameter stack - must use JSR, RTS / PUSHA, POPA, etc. instructions, respectively
* Hardware-enforced prohibition against arbitrary manipulation of the return and parameter stack pointers.
These items, plus the first two you suggested, should prevent code being overwritten and ROPing, but won't prevent other sorts of memory overwrites.
A bunch of tag bits for each memory location would prevent array bounds overruns, and prevent storing say, a double "into" a char and thus overwriting adjacent memory locations, but would do nothing to prevent the error of storing the wrong value/variable/register into a particular location.
Quesiton is, "Will people pay for the 'extra' memory required by the tag bits, and for any other hardware costs associated with all this?"
1. Separate I and D spaces
Harvard architecture. Has advantages AND disadvantages. But Intel memory management has read/write and read-only page flags so it is pretty easy to make code memory "not writeable" already.
2. Use the MPU to enforce per-process limits
easier to do using task switching parameters and a reasonably-written pre-emptive scheduler. YMMV on RTOS
3. Hardware assist on privileges (not just user/super but > 5 levels)
why does hardware need to 'assist' with that many? Typically hardware has separate system and user stacks and/or page tables, even entire register banks. But normally, just 2. And each application's address/page table is different (i.e. separate address spaces) already. Just why do you need > 5 when you already have things LIKE separate address spaces for applications?
Again, scheduler params and the existing password/group security (Linux/BSD/UNIX/etc.) can manage this kind of multi-level security with a reasonably-written pre-emptive scheduler., though I understand there are some ACL mods for Linux available to extend this.
Most commenters seem to be talking about segregation of data and instruction but I read the problem as being separation of different pieces of data. A much finer grained problem.
If I malloc a memory segment and get given say 16 bytes at address X I need to ensure I don't scribble past the end of that allocation (or before the start) because I might overwrite (or read) something else. The something else could be security related or whatever. Its probably not going to be executable instructions.
Instead of my program talking to the hardware in terms of memory addresses there needs to be a level of indirection. Malloc gives me an identifier instead of an address. All my memory accesses are in terms of identifier + offset. The hardware translates the identifier into a real address and validates the bounds. If that fails I get a signal. When I'm done, free tells the runtime to forget about the identifier so read-after-free errors can't happen.
C-language semantics can already express this, the pointer data type doesn't necessarily have to represent a memory address. Downside is the extra translation at access time plus need to track possibly billions of allocations per process. Some sort of LRU cache might make that practical.
Maybe this is what Rust does under the covers, I don't know. My entire knowledge of Rust can be summed up as "its magic".
All of that said I've programmed C and C++ for 35+ years and *very* rarely run into these classes of errors. I don't even use smart pointers particularly often. Whenever I see them used in code I review its generally as a crutch by those that don't want to think things through. So long as you have a very clearly defined view of who owns what a.k.a. software architecture, then things usually work out.
It is very hard/impossible to reproduce the safety assurances of a proper memory safe language in hardware mechanisms. Also, it is NOT just about separating code and data, as function pointers and vtables are effectively a mix of both.
ARM tries to do some of this with their fat pointers, but it looks very expensive to me.
For example, in Sappeur you must declare multithreaded objects as such, in order to ensure automatic locking. How would you do this in hardware ?
Anything that involves drivers, kernel-level code, bus-interfacing etc. requires you to be able to manipulate things provided to you as "unstructured" RAM freely.
To do that in Rust, you need to surround the code with "unsafe" modifiers, which instantly destroy the guarantees of Rust of all that code AND any code that might be near it.
So although we can fix bugs in, e.g. document handling, and web-page processing, buffer overflows etc. are ALWAYS going to be inherent in lower-level code which is where they are also most dangerous.
Anything that's given to you as a memory-mapped set of data, which you then have to interpret and write to, is a serious risk... and Rust doesn't help one bit in dealing with that.
That's basically everything to do with PCIe, USB, TCP/IP acceleration, device drivers, filesystems, DMA, just about everything at the kernel level.
And although you can in theory convert everything to Rust, that code will end up with unsafe keywords EVERYWHERE in order to do so, and doing that destroys guarantees of pretty much all the Rust code. So you're back to square one, after spending an age converting decades of legacy and tested code to new fancy Rust and probably introducing myriad other subtle bugs along the way.
Even in the Linux kernel, which has started accepting some Rust, Rust usage is limited to certain particular areas. Because you can't write a "safe" Rust driver for almost all the hardware that exists in a machine.
This post has been deleted by its author
Even in low-level code, the number of places that you need to do unsafe operations is limited. Rust still helps by specifically indicating where those regions are with unsafe blocks. And outside of this, there are lots of operations in Rust that are not unsafe, compared to C where they would effectively be.
Of course, Rust is not a panacea. What it might be is a little bit better than C which will help somewhat with avoiding problems. That's not a bad thing.
The problem is that these days pretty much everything you're doing may be asynchronous and likely involves some sort of data-sharing with code that at the very least is separately compiled, but in general is controlled by a mechanism your own code and its compiler and runtime are unaware of.
Memory safety within individual modules can only get you so far. You also need to consider, for example, using message-passing rather than shared memory and there's a lot that hardware could potentially do to assist. Although static analysis by a compiler is essentially free (at runtime), the rest has a cost.
I have some experience creating http servers in both C++ and Sappeur. I can assure you the number of exceptions from memory safety in the Sappeur version is very small( "inline_cpp" 4).
http://gauss.ddnss.de/
There are about 60 "inline_cpp" in TCP.ai, System.ai and Math.ai, but those are "system libraries" which finally call the POSIX functions. These system libraries are supposed to be eventually "perfect" from a lot of re-use(read: debugging) in different projects.
So probably 90% of the gauss web server code is memory-safe. In my experience, this web server runs extremely stable(as compared to an equivalent C++ version). Of course it had deterministic crashes during development, but that is exactly what we want: immediate, localized crash upon programming error. No long-running, covert corruption of memory.
So, even if your code is "just 90% memory safe", it is a huge progress from "100% memory unsafe". Each and every method implemented in a memory-safe way is shoring up safety and security.
Every time I've said the same thing I get downvoted, hard.
Because there are still programmers that believe in their own invulnerability, that "real programmers know how to code!", and can reliably and eternally write proper memory-protecting code. Because, you know, they *never* have the unknown happen to them...
How many of these projects with high value CVE are new enough to have used Rust?
How long would it take to re-write one of them in Rust (or any other language du jour) along with the testing, etc?
It is all very well saying this new language/style/feature would have stopped XYZ bug, t is another matter entirely to deal with decades of technical debt and hundred of millions of lines of existing code.
Yes. Also memory-overwrite bug being considered as the most dangerous type of bug does not mean it were the most common and/or high-scoring type of bug.
Majority of the high-CVE bugs on these known-bug lists are nowadays not related to low-level memory access but rather to patchy web services that leave networked systems vulnerable to remote attacks. As such services are typically already written in somewhat memory-safe high-level language such as java, javascript, python etc, it wouldn't really make a difference even if these patchy web services had been originally written in rust.
Using rust in low-level stack is good idea though.
I think that tide is finally turning. Not necessarily towards rust: there are arguably simpler languages (the garbage-collected sort) that lots of applications are now being built in.
However it is well to remember that the CVEs mentioned in the last few paragraphs, authentication errors ant the like, can be written in any language. No amount of type-checking or fencing will protect against implementing the wrong algorithm. Things like that are also often difficult to test: tests are good for verifying that the functionality that you intended is there and behaves as expected. Much harder to check the negative space: that there isn't some other, unintended functionality, like the ability to log in with an empty password...
And yet log4j/log4shell was written in a type-safe, memory-safe language, involved no memory errors of the sort described, and is still not "solved" or fully patched: it's everywhere. Because someone implemented an algorithm with more flexibility than it needed. The code was/is performing as designed, it was just unsafe as designed...
As memory-safe languages (and memory-safe subsets of C++) are increasingly used, especially for network-facing code, that 70% number will inevitably go down (IMO). Writing the wrong algorithm, and accidentally violating security, or providing unexpected behaviour is going to be with us forever.
As I wrote before, the other 30% of CVE bugs (as of 2023) must be taken care of: correct scanners, parsers, appropriate protocols, proper testing on all levels from unit to system. Appropriate system design. KISS. Static Error checkers. Fuzzing, Code Reviews, Bug Bountys.
It's called computer science because there are no simple solutions for all problems. But squash those 70%, so that engineers can focus on the other 30%.
"The most dangerous type of software bug is the out-of-bounds write, according to MITRE this week...” “...and using memory-safe languages like Rust can help here..."
Wouldn't the use of 'ADA' also be of some help in mitigating this problem? Isn't healthy, rigorous protection against out-of-bounds writes one of ADA's strong points?
-------------------------------------------------------------------------------------------
"Simplicity is prerequisite for reliability."-- Edsger Dijkstra
Sure: Ada, Pascal, Modula-[23], Go, Swift, Java, C#, JavaScript, Python, all of the Lisps, all of the MLs: the list is long. Really, memory unsafety is a pretty unusual property for a 3rd-gen programming language. It's just that a couple of examples have historically been really, really popular (C, C++).
If you have your program divided into P and D memory spaces you can avoid all the issues with buffer overruns changing the execution code. P-Space is the only memory space that can be address by the program counter and thus execute instructions and its marked read only as soon as the space is loaded into a process; D-Space is data space. Its read/write (sometime read only) but can't be used to execute commands.
DEC had this right in the 1980's with the VAX/VMS systems. Its all this crappy X286 derivatives that we have to use these days that let us down.
See this simple example:
struct customer
{
char firstname[20];
char lastname[20];
char street[20];
char postcode[6];
char city[20];
unsigned long long creditlimit;
};
Now let's assume there is an automatic "account creation" web service. It will set the name and address as specified by the customer and automatically set the credit limit to 1000 Euros. Now what happens if an attacker enters a 40 character city ? It will overwrite the credit limit to something like 10^28 Euros.
Of course this is a contrived example, as banks will (hopefully) be diligent in input checking, but you can see that a lack of memory safety definitely is risky. It could also be the banks own programming error mistakenly overrwriting the creditlimit.
Separating code and data is a much weaker assurance than Memory Safety from the compiler+runtime.
If you use decent engineering practices (code reviews, good tests, static analyzers, sanitizers and/or Valgrind) then not a lot will get through.
I guess the exception to that is constrained embedded devices where you don't have the luxuruy of having all that tooling.
No: all user-supplied data can be handled safely. The real answer is, don't blindly interpolate user-supplied text inside some other specially structured text (e.g. SQL or HTML) without the necessary escaping.
Wrong:
sql.execute("select * from users where name='" + name +"';")
Right:
sql.execute("select * from users where name=?;", name)
Sanitising inputs, e.g. to reject single quotes inside names, is not the solution, and rejects valid values ("O'Hare" for example). You just have to handle your user data properly.
There are some interesting language developments to help enforce this, distinguishing programmer-supplied literal strings from user-supplied data. e.g.
https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-pep675