Sorry Liam, in the future you'll have to let me proof read your articles!
Liam, you’ve written many thoughtful and entertaining articles that I loved to read. But given the giant blunder you’ve committed in your latest, I have to recommend that you send your drafts to me, to avoid a similar catastrophic repeat.
The problem is this phrase: „Intel delivered that with the 8086, which, rather than being limited to 64 KB, could handle 16 separate segments of 64 KB – for a total of one whole megabyte of memory space.“
What’s even more shocking is the fact, that none of the commenters seem to have caught the blunder yet, risking that this might sink in, sediment into accepted general lore, and one day be regurgitated by AIs as true gospel!
Having been along for the ride since those days of CP/M, I’m obviously facing a diminishing life expectancy myself, and the original technical geniuses at Intel, who designed what you fail to understand, can no longer fight for themselves.
So I’m taking up arms against this fallacy and it won’t be pretty, sorry mate!
The 8086/88 could in fact handle 65536 or 2^16 different segments, because the segment registers were also 16-bit wide. Both would be used together to form effective (or physical) addresses and laid side by side they would have actually formed a 32-bit value, enough for 4 gigabytes of memory space, and without very much change to the initial generation of application software!
So why didn’t they, or why did they do what they did?
As you allude to, memory banking to extend the 64k address space offered by a lot of early computer designs, had been around for a while. Let’s remember that very few machines using 16-bit addressing actually had a full complement of 64KB of RAM! I don’t think CP/M was run on ferrite core memory so the starting point would have been Intel 1103 DRAM at 1 Kb or 1024 bits per chip.
But with the success of those 8/16-bit (ALU/address bus width) microcomputers running CP/M that 64k limit became an issue, just like it had on many other architectures, which like the larger PDP-11 models then resorted to employing MMUs to support segmentation and bank switching to extend the physical address space beyond what applications saw logically.
Various S-100 vendors, the type of machine where CP/M dominated, added RAM cards with usually proprietary bank switching support when 64KB of RAM got too tight, and Digital Research developed a multi-user companion, MP/M, to abstract bank switching and support the larger RAM amounts that required.
Note that this mostly meant that you could run several applications with a maximum of 64k each, not that an application could use more than 64KB of RAM... easily.
In fact the first Unix system I installed myself was an SCO Xenix (sorry!) I put on a 8086 box from Siemens with a discrete MMU: it ran Multiplan for Xenix just fine!
Now switching RAM banks in 64k chunks, the granularity you imply, creates a huge OS problem from the very start: you need to load those apps and you need to transfer between applications and the OS. And if you switch those banks under your feet, that’s like trying to rearrange stuff between rooms without a connecting door. And then those early applications didn’t always use all available RAM, they came from modest roots and not only made do with far less, your initial MP/M system might only have 128KB of RAM instead of 64, but you’d still want to pack in everything that would fit, not just one per 64k partition, you’d want to allow physical overlap!
See where your notion of 16 64KB partitions goes terribly off the rails?
Now the early bank switching logic had to be implemented in the then standard TTL logic chips, which were only relatively cheap: using smaller banks meant both more chips and bigger tables to manage them: 16KB chunks were a common option, some supported distinct sizes.
And if this reminds you of EMM and Intel Above Boards, then you’re right on track, because that was a bit of a repeat kludge.
The 8086 designers aimed to onboard the best of discrete MMU design ideas around at the time with the minimum number of gates and the potential for both immediate benefits and future expansion.
The result is obviously a compromise, but I challenge anyone who wants to criticise the result to find a better one given the constraints!
Using a full 16-bit for a segment address, made things, dare I say “orthogonal”, a term not often associated with x86, but true where the 8086 is 16-bit all around, including the ALU, otherwise. It’s also cheap in terms of transistors within the CPU.
Managing the segments via a fully featured MMU table, which allows for all sorts of mapping, would have been quite another level, both in terms of logic, but also because the table itself would take up memory and either extra memory cycles or caches, much like a small page table in VAX class (or 80386) designs. Such an external MMU would have also allowed for overcommitting physical RAM via disks, if bus fault logic and proper exception handling was included: that’s what that 8086 Xenix system did or a big PDP-11 would do. But the 8086 itself wasn’t designed to compete with those.
So instead the 8086 designers went with a hardwired computed mapping, where the segment address and the 64k offset were simply added together, not mapped via an external table, to get an effective or physical address. It got you quite a few benefits as I’ll explain below.
They could have chosen an offset of 1 to get 128KB instead of 64 via effectively 17 address bits. They could have chosen an offset of 2 for 256KB, ...up to perhaps an offset of 8 bit, to get to a whopping 16MB via an effective 24 bit address: please note that we’ve already excluded a full 16-bit shift for 32 bit effective, because it closes the doors between segments.
So they looked at each offset value between 1 and 16 and gauged what it delivered in value vs the alternatives.
And here, in hindsight I’d say, only here, they could have picked a better number than the 4 bit offset with 20 bits of effective address that they settled on. I’d love to read the notes on why they didn’t go with 8, but perhaps somebody else would like to hazard a guess?
But how could they know that this architecture would run the world of PCs for at least a decade, when they had far more reasonable designs like the 80286 and the 80386 already on the drawing board, which would mimic a big PDP-11/70 or a VAX, the first with full mapping table support and segment fault handling, the latter with both a full 32-bit address space and page based virtual memory?
Where Intel didn’t skimp was in supporting distinct segment registers for all parts of the application that might naturally support disjunct RAM spaces, code and data, but also the stack and something funny and new: a heap. The big advantage was that those segments were implicit and didn’t need to be explicitly “mentioned”, included or quoted within the code: code would use the code segment base, data the data, stack …you get the picture. Then there was even an extra segment, just in case you’d need one, accessible via an override, at the cost of an extra code byte, not a full segment address on top of the offset.
Now you need to remember that the first part of the mission was to run 8-bit 8080 code with a minimum of change. That code at the time typically wasn’t in some high level language, but machine code written with assembly mnemonics and you couldn’t just compile that for another architecture… actually they did support that as much as possible at the time.
But mixing code and data used to be considered proper, even genius or von Neumann, it was also the only original option. And things like recursion, variable length strings, complex linked data structures, or object oriented programming weren’t done on small computers ...until much later.
Then of course stacks and heaps might crash and burn, while turning data into code caused all kinds of other problems, but your old 64k assembly code with hard-wired data offsets strewn all across would simply be loaded with identical code and data segment addresses or full overlap, while say a Cobol program would have run just fine with code and data “tight but separate” just as it originally ran on machines about the same size a decade earlier.
Of course, people did funny things with equivalence statements in Fortran (variant records in Pascal) or even tried recursion via a proxy function, because only direct recursion was detected by the compiler and abort code generation… Yup that was me, on a PDP-11, a quick way to cause a segment fault, because the PDP-11 could detect segment overruns.
To those with discipline or using a safe language like Cobol separating code and data segments immediately provided extra room, while very few stacks would ever grow large without recursion.
And then, if that wasn’t enough, there would always be the possibility to use long addresses for code, or data, or both, to take advantage of every last bit of RAM your PC might have: still better than being tied to 64KB on a machine that increasingly had more while you didn’t share it like those MP/M machines.
Getting the original 8086 segmented architecture wrong, which ruled PCs long past the day the 80386 launched, is hard to forgive.
Sorry, Liam, your site admin can provide you with my e-mail address!