DOS "smal" and "large" memory models
>I remember competing in the BCS's annual programming competition back then, too: each team was given a PC with a copy of Quick-C and you had to keep it small and not bust the "small" memory model >which if memory serves was something like 640KB. Taught you to think about the algorithm and not just throw a highly recursive, clunky monster at it and hope, because the judges (of which I later became >one) would see that coming and would have test cases that would make the code bust the RAM limit. I once set a question (Sudoku solver) which did that, for that precise reason - if you brute-forced it, >you'd blow up, so you had to write a vaguely clever algorithm.
Your bio memory is unfortunately befuddled by early PC memory abstractions...
The 8086 or "DOS" memory model took the 8008/8080 or "8-bit" memory model, which consisted generally of 8-bit registers and ALUs and an 16-bit effective memory addresses space, which either combined two 8-bit registers with base and offset (e.g. 6502) or included a few 16-bit registers in a generally 8-bit architecture (8008/8080/Z-80 and lots of others) and extended that via 16-bit registers (which could also be used in a 8-bit manner, e.g. "AX" (16-bit) also being usable as "AL" (lower 8-bit) and "AH" (higher 8-bit) and extended it via a "segmentation" approach.
A segment was mostly the 64KB area which a 16-bit offset could address natively and then typically 'implied' or translated behind the back via a MMU (memory management unit). E.g. PDP-11 machines would have code, data, stack and heap segments that could be mapped to distinct physical memory spaces e.g. for each process, allowing different processes to run both with physical memory isolation and using far more than just a single 16-bit or 64KB physical address space.
The 8086/8088 only went half-way, not using a full function MMU with flexible mapping and segment faults for transparent on-the-fly translation and virtualization, but shifting segment addresses four bits to the left and then adding 16-bit segment address on top. It gave it an effective 20-bit (1 MByte) address space with a fixed physical mapping, where different address segments might actually overlap do a large degree in the same physical address space: the idea was that lots of programs wouldn't actually need a full 64KB code, data, stack or heap segment so making them non-cotiguous via a full 16-bit shift avoided excessive RAM use when typical segments were smaller.
The only reason that 1024KB address space became 640KB effectively on PCs was the fact that the upper 384KB were mapped to I/O by IBM's PC designers: they just couldn't imagine that the Apple ][ replacement they were designing might actually ever use the full 20-bit address range, which today has reached 64-bit (while IBM's "proper" single address space architecture, the i-series or AS/400 went from 48 to 128 bit during that time...).
The overhead of using a real MMU, including exception handling, was pretty near minimal, even in those early days, comparable to what the IBM PC-AT then used to implement 24-bit DMA for floppy operations, but that's just one of those many personal computing "whatifs", that are so interesting to loose yourself in, ex-post.
A "small memory model" program would then be basically an "8-bit" application, perhaps using 16-bit registers and arithmetic, but only 16-bit addresses/offsets for everything, code, data, stack and heap.
The benefit was tight/native "single action" 16-bit addresses being used throughout, even if very few instructions actually completed in a single clock cycle in those early and pre-RISC days.
If 64k wasn't enough, programmers would have to use a "large memory" model, which implied that you'd have to use "DWORD" addresses, which were a full 16-bit segment and 16-bit offset, 32-bit in total, even if on a 8086 those 32-bit of address only yielded 20-bit of physical address space.
The overhead was significant, but if your code, or your data just would no longer fit into a 16-bit address space, you'd at least be able to make do. Compilers of those days would actually support chosing between "small" and "large" for each domain, e.g. you'd be able to combine a "small code" application with a "large data" model, vice versa, or combine both.
I don't think that "large stack" applications were supported, and I'm not sure about segmented heaps either.
Needless to say, it was a mess, especially once applications and operating systems needed to support both, 16-bit relative addresses and 32-bit DWORD parameters in calling conventions, especially with so few registers to use in case of x86. But in those days it was considered a privilege to be able to somehow compute at all: everything was better than a human computer, or having to resort to pencils and paper, or having to wait for a time-sharing slot.
Recursion was great for transitioning from extremely hardware oriented early code to mathematical abstractions, but meant that a lot of critical data structures wound up on a stack, that then would only take a max of 64KB of RAM; actually heap and stack were typically forced into a single segment, used from the bottom and the top conversely, only to terribly crash, once they met, if "non-typical" input data led them on such a collision course...
The 80286 protected mode implemented the fully "PDP-11" class memory abstraction and eliminated the fixed mapping of segment addresses (via the 4-bit left shift), replacing it with a full MMU and an exception handling mechanism, to implement physical memory overcommit and on-demand swapping of memory segments. The physical memory space was extended to 24-bit, while DWORD pointers still consumed 32-bit, and registers mostly remained 16-bit.
Since VAX like abstractions with 32-bit registers, offsets and 4k page granularity followed only 2 years later via the 80386, the "PDP-11" like memory model on the 80286 never really took off, which turned out to be a great thing: virtual 8086 and DOS was bad enough already.