back to article A path out of bloat: A Linux built for VMs

How hard can you cut down Linux if you know it will never run on bare metal? Further than any distro vendor we know of has tried to go. This article is the fourth based on the Reg FOSS desk's talk at FOSDEM 2024. The first part talked about the problem of software bloat, the second about the history of UNIX, and the third …

  1. abufrejoval

    Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

    I my view WSL mostly exists so even Linux uses have to pay a M$ software tax, which is why I abhor it generally (and continue with Cygwin out of spite).

    But I did notice, that they know how to use the good stuff (9P) for their ulterior world domination gains.

    Did IBM invent the hypervisor?

    I'd say that people at IBM invented the hypervisor, but pretty much against IBM's will.

    IBM was all bent on making TSS (time sharing system) or Big Blue's take on a Multics like OS a success instead, and it's failure was recorded in a famous study and book called "The Mythical Man Month", AFAIK.

    But there were far too many people with 360 workloads out there who needed to make them all work at once on their newest 360 machines, some of which came with the extra hardware bits, that made VMs possible. So some people at IBM started this skunkworks project, that then later some IBM execs noticed and turned into a product, the VM370, pretty much out of necessity because TSS had utterly failed.

    So I really don't want to give IBM the hypervisor credit, but to the people who made it happen there anyway.

    And to Intel's everlasting shame they made sure their 80386 didn't have that same full set of extra hardware bits, so VMs couldn't be done on their 32-bit CPU, only 16-bit VMs were supported.

    It would have been an obvious and easy thing to do, but Intel was evidently afraid they'd sell fewer CPUs if people started consolidating their workloads.

    And that's why VMs on x86 became such complex beast, because the abstractions were simply never at a similar height as on the 370.

    Again a skunkworks project, but not by Intel guys but Mendel Rosenblum, Diane Green and other collaborators (and VMware founders) enabled VM support via bits Intel added to x86 for notebook enabled 80386SL and 80486SL CPUs which had introduced a System Management Mode or ring -1 layer into the CPU to allow operating systems like DOS to be run on battery powered hardware. In Intel's typical cut & paste manner, that got included even on non-mobile chips, where it had no official purpose.

    VMware employed a few other patented tricks like binary translation of privileged guest code to make it performant enough for real usage and were sailing towards a future of riches, which a very furious Intel then wanted to shoot down rather quickly. They had withheld VMs from their 32-bit CPUs because they wanted to sell more of them, not to have this uipstart eat the extra value.

    Only then Intel did add the necessary hardware bits and sponsored Xen's transition from a software VM approach to "hardware virtualization" so VMware's patents lost their value it the company eventually became ready for an internal takeover via one their creatures: Mr. Gelsinger, who had held the keys to VMs before and probably chose to withhold them.

    1. martinusher Silver badge

      Re: Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

      >Did IBM invent the hypervisor?

      I might be missing something but I've always thought that a hypervisor was just a marketing term for 'operating system'. The concept of running a system on top of a system is possibly alien to people who grew up with the ubiquitous PC, Windows and what-have-you but it was commonly understood well before that. This might be the reason why Intel was 'furious' -- its bit like Ford discovering that someone had patented something called a 'wheel'. (This doesn't mean that a product like VMWare isn't useful or valuable, a lot of work's needed to make the theoretical concept a practical reality. But its not unique -- those of us who's worked with industrial systems are familiar with 'real time' extensions to Windows XP, a set of 'drivers' that converted XP into a hard real time system.)

      1. Anonymous Coward
        Anonymous Coward

        Re: Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

        > I might be missing something but I've always thought that a hypervisor was just a marketing term for 'operating system'.

        Hypervisors (especially native ones) present operating systems with a view of a hardware platform in which they do not have to bother whether that hardware is shared or not with other OS instances.

        Examples:

        Hypervisor: present a BIOS. Operating system use a BIOS.

        Hypervisor: leverage HW support such as VT-x, VT-d and VT-c. Operating system: access CPU Ring 0-3 & TLBs, access devices independently and securely, and access network interfaces.

        Even hosted hypervisors (aka type 2) are different from the hypervisor on top of which they execute.

      2. Liam Proven (Written by Reg staff) Silver badge

        Re: Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

        {Author here]

        > a hypervisor was just a marketing term for 'operating system'.

        Not really, no. Tons of OSes has no concept of a VM or of running other OSes under them.

        I've read an argument that Unix is inherently an OS which presents every app as a sort of VM, but IMHO I think it's stretching a point.

        Some hypervisors aren't part of an OS but part of another system component, such as the system firmware.

        A hypervisor is not an OS function, but it may be. An OS doesn't have to have a hypervisor, but it may have. A hypervisor need not have other functions of an OS, but it may have.

        I think it's a mistake to equate them.

        > it was commonly understood well before that

        It was, for some 25 years or so before it was a thing on the x86 platform. But even so, someone was first, and that first version was from IBM, as I describe in this 2011 article that was linked in the one you're commenting on.

        https://www.theregister.com/2011/07/14/brief_history_of_virtualisation_part_2/

    2. ldo

      Re: Did IBM Invent The Hypervisor?

      If you look up references in the usual places, you will come across terms like “type 1 hypervisor” and “type 2 hypervisor”. I forget which is which, but one is the absolute bare minimum of a hardware-abstraction layer, just sufficient to provide virtualization, while the other is a full-function OS in its own right.

      The reason why I forget which is which is because the distinction just isn’t that important any more. Turns out that managing VMs is just like any other sysadmin task, and so you need the usual range of sysadmin tools—e.g. editors for creating/modifying config files, general file- and filesystem-manipulation tools, network management, process management, memory management etc—as for dealing with a full-function OS. And so it follows that the best kind of hypervisor is a full-function OS.

    3. Liam Proven (Written by Reg staff) Silver badge

      Re: Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

      [Author here]

      > So I really don't want to give IBM the hypervisor credit, but to the people who made it happen there anyway.

      A fair point, but in terms of day jobs, ownership, etc., it's a bit of a hair split, I fear.

      > And to Intel's everlasting shame they made sure their 80386 didn't have that same full set of extra hardware bits, so VMs couldn't be done on their 32-bit CPU, only 16-bit VMs were supported.

      > It would have been an obvious and easy thing to do, but Intel was evidently afraid they'd sell fewer CPUs if people started consolidating their workloads.

      IMHO, no. Not in 1987.

      86mode "VMs" took modest resources and were doable on an early 32-bit PC in the 1980s with 2-4MB of RAM.

      Full 32-bit VMs were much harder, and were a Big Computers thing at that point. Anyway there barely were any 386 OSes to run under one another.

      That stuff took years more work and another decade or more of commoditisation and increases in memory sizes and things. You are asking for too much too soon.

      I mean, it's arguable that a better design of the ISA would have prevented it, but overall this is the thesis of the talk and this article series, and my earlier talks:

      My whole point is here that microcomputers threw away a lot of useful stuff from bigger systems, and then re-introduced them in a half-assed badly-implemented way. That is my core argument. What I'm trying to make from it is a case for "what did we throw away that we could reintroduce as a better way to do things than modern half-baked implementations?"

    4. Michael Wojcik Silver badge

      Re: Windows Subsystem for Linux uses 9P and why both IBM and Intel hated VMs

      IBM was all bent on making TSS (time sharing system) or Big Blue's take on a Multics like OS a success instead, and it's failure was recorded in a famous study and book called "The Mythical Man Month", AFAIK.

      Brooks' The Mythical Man Month is based on the development of OS/360, not TSS; and OS/360 was definitely not a failure. (And you want "its", not "it's".) Brooks only mentions TSS a couple of times in passing.

      So some people at IBM started this skunkworks project, that then later some IBM execs noticed and turned into a product, the VM370, pretty much out of necessity because TSS had utterly failed.

      Sort of, but your details aren't correct. Melinda Varian's history of IBM VM, "VM and the VM Community: Past, Present, and Future" [Varian 1997], is the authoritative source here — it was frequently cited by People Who Were There, such as Lynn Wheeler, in Usenet discussions and the like. As Varian describes, Norm Rasmussen at IBM's Cambridge Scientific Center1 was interested in making the S/360 more appealing to academia, and was dubious about the prospects for TSS, which was just starting development at the time (1964) as a potential optional time-sharing OS for the S/360 Model 67, which had a hardware VMM add-on (the "Blaauw Box"). Rasmussen had been in charge of the group that did the proposal for the Model 67 (it was then the model 66; the number changed for some reason before it became publicly available).

      Rasmussen put Bob Creasy in charge of the project to develop an alternative time-sharing multi-user OS for the '67. This was in 1964, the same year TSS development started. So TSS and CP/CMS were developed essentially in parallel.

      CSC, like the other IBM Scientific Centers,2 was not your typical IBM office. It had a much more relaxed culture and the freedom to do a lot of experimental work. Even when I was at the same site decades later, there were rooms full of experimental hardware — some of which we used in development — and plenty of interesting Internal Use Only software tools. So it certainly had a bit of the skunkworks nature. And as Varian describes, Rasmussen used a variety of tricks to pay for CP/CMS development and keep it low-profile. But IBM President Vincent Learson was inadvertently told about CSC's development of CP/CMS (with a demonstration of CTSS, from which CP/CMS inherited a lot of look-and-feel, by Lyndalee Korn) in 1965. So it's not like it was a complete secret from upper management.

      Now, IBM did try to do precisely what Brooks warns about with TSS: after the initial announcement, when pre-release customers started finding a lot of stability issues with TSS, they threw developers at it (also described by Varian). And TSS was essentially a commercial failure. It was never generally released. But TSS/360 was supported for a few years and some customers continued to run it for years. IBM ported it to the S/370 as TSS/370, which was available as a PRPQ (i.e., you had to ask for it, and then IBM would decide what to charge you).

      IBM's real competitor for what eventually became VM/CMS is TSO. But that takes us out of the world of VMs.

      1In Cambridge, Massachusetts. I worked for IBM TCS in the late '80s and early '90s in the same Kendal Square building that housed CSC and knew some of the CSC people, one of whom mentioned that "CMS" originally stood for "Cambridge Monitor System" and was only renamed to "Conversational" when IBM decided to make it a product.

      2For example, IBM also had a Palo Alto Scientific Center, similar to, if less famous than, Xerox PARC.

  2. Doctor Syntax Silver badge

    Since as long as I can remember in computing the goal of the multiuser machine has been to present each of the users with the illusion that they were operating their own individual machine by slicing up the resources, especially processor time and memory. My experience of that goes right back to being a user on an experimental system running on a 1904 (the illusion was somewhat shattered by the fact that I managed to crash the entire 1904).

    The core task facing system developers was how to provide that illusion with minimum resources but maximum performance and security.

    Roll forward a good many years from that and I found myself running Unix boxes of various sizes for various businesses which were rather less experimental. Under Unix we would be running an instance or maybe more than one instance of an RDBMS server and each instance might run one or more databases and against a database we would have applications serving individual users. As far as possible the users would indeed see what looked like individual machines - they would certainly get menus tuned to their roles in the business. Not totally isolated, of course, because their activities amended the contents of the database they were sharing: only one user would be able to enter an order for the last item in stock, etc. The overhead for that was one kernel with its data structures in memory, (usually) one RDBMS server with its data structures and the applications with their per-user data structures in memory.

    The modern trend seems to be to have an overall kernel/hypervisor - either as 2 layers or a hypervisor running on bare metal - with its/their data, and then a host of virtual machines, each with its kernel, it's own data, running - what? A database, a web server, whatever application sits on that combination? Whatever, all with their own memory allocations. Even if the virtual machines were to share the same code images in memory they're all going to need their individual memory allocations for kernel data, server data etc. as well as the application memory for the individual tasks. The while the kernel/hypervisor is going to be busy hypervising doing all the usual kernel tasks of managing time, memory and IO for the virtual machines, each virtual machine kernel is going to be doing the same thing for each process running on it, including the virtualised IO.

    As I understand this proposal it's a means of paring back the resources seen by each virtual machine to reduce bloat. If one were to wish to go further the virtual machines could be eliminated and the single layer of kernel could oversee the services directly and maybe one database instance could run all the databases for the services etc.

    OK, I've heard the sneers about cattle vs pets. What we ran weren't pets, they were work horses.

    1. Martin Gregorie

      More comparisons needed...

      It would be interesting if somebody more qualified to comment than I could compare the Unix/Plan9 design philosophy with some of the better designed mainframe architectures than the System/360, such as the ICL 2900/3900 series and the later Burroughs systems.

      I only know stuff about how the 1900 was organised plus a bit about the Ferranti-Packard 6000, from which the ICL1900 series was developed, IBM's AS400 and,of course the ICL 2900 series.

      The 1900 was interesting because, in the middle range machines anyway, the accumulators, PC, CCR, etc were all virtualized and packaged into the first 32 words of the program. The only hardware registers contained DATUM and LIMIT of the running process, task switches were fast (just push the current DATUM and LIMIT onto the list and replace with the new task's DATUM and LIMIT. No task could address words outside its DATUM and LIMIT. Programs were easily moved within memory or to/from disk because they were by definition a single contiguous block of memory. The exact same mechanism was used to execute the GEORGE 3 OS and as an ex 1903/4 GEORGE 3 admin I can say that it all worked pretty well.

      As a big plus, George 3 was the first OS I saw with a hierarchic filing system - in 1970!

      FWIW I designed and wrote systems and or was sysadmin on 1900s for 6 years, 3 years on 2900s, and around 5 years on AS/400s.

      I wish I knew more about the 2900/3900 machines (the 2903/4 was a 2900 disk controller in a fancy box with a screen and keyboard running a 1900 emulator and bog-standard 1900 system software), but apart from that all I know was that the 2900 ran every process in its own VM and that the underlying microcode (running on a 2MHz 6809 chip in the 2960) allowed you to run COBOL in a byte-oriented VM and (presumably) Algol 60 or Fortran in a word-oriented VM and, there was a choice of running applications in VMs containing either George 3 or VME/B.

      I have a feeling that the ICL 2900/3900 and IBM AS/400 and operating systems were fairly similar, being written at more or less the same time and with one (to me anyway) a similar fault@: neither have hierarchic filing systems.

      1. martinusher Silver badge

        Re: More comparisons needed...

        The PC was a very useful platform but caused a huge leap backwards in computer architecture. Much of the history of the PC could be described as "waiting to catch up" which it never quite did because of the huge number of people who grew up knowing it and its software as a 'computer'. The mass became so large that the tail is now wagging the dog -- its now comparatively rare to find people who don't think of a computer primarily in terms of a keyboard and screen attached to a processing unit.

        I count myself lucky because I grew up in a time 'before PC' (and those dreadful BASIC based single board units). I didn't fully understand what I was experiencing but I was exposed to the concepts you describe and it stuck with me (because, fortunately, I dived off into the real time world and so avoided a lot of Windows voodoo).

        A lot of this is now moot because of the ready availability of very powerful processors and memory for pennies (a PiZero would probably give a 1970s mainframe a run for its money). This has taken the incentive out of paring systems to what's actually necessary from what's easy to build. Bloat, though, does lead to reliability and (eventually) performance problems, problems that might get hidden by every more powerful hardware but will never get solved this way.

        1. ldo

          Re: caused a huge leap backwards in computer architecture

          I remember in the 1990s when the advent of full-featured multiuser, multitasking OSes like Linux, the BSDs and others exposed the shoddiness of a lot of the x86 PC hardware of the time. When all you were running was Windows 9x, with an uptime measured in hours at most, hardware unreliability barely registered on your radar. Quite a different story for machines that ran continuously for days, weeks, months.

          That problem seems to have solved itself.

      2. Michael Wojcik Silver badge

        Re: More comparisons needed...

        I have a feeling that the ICL 2900/3900 and IBM AS/400 and operating systems were fairly similar, being written at more or less the same time

        Eh? The AS/400 was released 14 years after the 2900.

        Maybe you're thinking of the '400's most direct predecessor, the System/38? That was only 4 years after the 2900. (The System/32 is roughly contemporaneous with 2900, but is only quite distantly related to the AS/400.)

        and with one (to me anyway) a similar fault@: neither have hierarchic filing systems.

        OS/400 gained a hierarchical file system in parallel to its original one not that long after it originally came out.

        The AS/400 was an evolutionary step from the S/38, incorporating aspects of IBM's failed Future Systems project (such as the Single Level Store concept and sort-of-OO treatment of OS objects) and a mild version of a capability architecture, with application software running on a virtual machine in the same sense as, say, the JVM — low-level interpretation, in effect.

    2. Liam Proven (Written by Reg staff) Silver badge

      [Author here]

      > Since as long as I can remember in computing the goal of the multiuser machine has been to present each of the users with the illusion that they were operating their own individual machine

      Not really, no.

      A classic mainframe is multiuser, but not interactive. Lots of users submit jobs, the mainframe works out when to run them, runs them, and delivers output. No semblance of unique operation here.

      A fileserver or print server is multiuser, but the users never interact even that much. They store stuff, they get it back, or they all print at once to one printer but the jobs emerge separately, and they don't need to know how it happened.

      DOS wasn't doing any time slicing or sharing or anything: you *were* the only user.

      The original IBM hypervisor type systems had sort of both: an underlying OS that wasn't interactive and that nobody interacted with, and a user-facing OS that was a single-tasking as DOS. Separation of functions, a good idea we've forgotten.

      ISTM that what you're expressing is the Unix view of the world, as I mentioned in an earlier comment, but the whole point of my FOSDEM talks has been to remind people that there _are_ other views of the world that are not like the Unix model at any layer of the system, that some of these different models work better, and we should not try to shoehorn all of computing into the one conceptual model everyone's most familiar with.

      1. Richard 12 Silver badge

        I think you're arguing about the definition of "user"

        In a batch processing system, the users submit work and get the result back later, once their batch has been processed.

        In an interactive system, users submit work and get the result back later, once they get their time slice.

        The only real difference is the size of the time slice. IIRC, some batch processing mainframes used to limit the amount of time a single job was permitted, freezing it (and unloading the tape) if it took too long, continuing it later.

        Early pre-emptive multitasking, I guess. Letting every batch run to completion would be cooperative, if you squint.

  3. MarkMLl

    BTDT

    > How hard can you cut down Linux if you know it will never run on bare metal? Further than any distro vendor we know of has tried to go.

    Demonstrated adequately by User Mode Linux (UML), which has been a standard build target for a considerable number of years.

    The major thing that it couldn't do, when I last looked, was run a 32-bit guest on a 64-bit host. Which is a great pity, since if a guest is only going to be used for work which is not memory-intensive there's no real reason for it to carry around the burden of 64-bit pointers etc.

    1. This post has been deleted by its author

    2. MarkMLl

      Re: BTDT

      I was either fed dud information on the forum by the developers, or the situation has improved over the last few years.

      if ARCH=um, then it's possible to select either a 64- or 32-bit guest kernel build either from the menu system or by using SUBARCH=i386: the result is the same.

      On an x86-64 host, I can build either a 64- or 32-bit kernel+modules set which appears to be consistent.

      However the overall setup and boot has probably grown in complexity over the years: it used to be that you could fire up e.g. a UML kernel and immediately load from ISO media (e.g. Slackware from a CD or disc image) but despite spending a day tinkering I wasn't able to get anything working,

      Revisiting it after a few days, I find that the best documentation comes from Debian's "man linux.uml", which is an extended derivative of https://docs.kernel.org/next/virt/uml/user_mode_linux_howto_v2.html (not very well indexed or linked to).

      Following that carefully, I was able to boot UML and get a root login, which leaves me reasonably confident that I could rebuild an arbitrary kernel version.

      I've not tried comparing operation speed with Qemu/KVM or Docker etc., but this is probably something which is still useful if one needs a specific kernel version, not just a particular set of libraries etc. as one gets with most container environments.

  4. lordminty

    The really clever thing about IBM mainframe VM/CMS was...

    It's 'Shared Segement'.

    Instead of IPLing (booting) CMS for each user from DASD (disk), it was already in CPs (the mainframe OS, aka VM) memory. When you logged into your CMS VM (actual virtual machine) it was there instantaneously as a read-only copy.

    When you IPL VM (the OS, aka CP), CP loaded all the shared segments into upper RAM, where they lived all the time the machine was up.

    The beauty of it was that when you upgraded VM, you upgraded CMS. So every CMS VM got upgraded. None of this lark of having multiple old OSes on different VMs clogging up your estate that all need to be upgraded individually.

    Are there any x64 or Unix/Linux hypervisors that load a shared read-only copy of a guest OS image in memory?

    And Shared Segments weren't just for CMS. They were used by 4GLs like FOCUS and RAMIS. Again when you upgraded it, everything using it got upgraded. No having gazillions of SQL or Oracke versions around.

    I also worked on MVS and one thing I wanted to try before I moved on was a single shared read-only SYRES volume, to IPL our 14 MVS systems from. It was technically possible.

    Do any x86/*nix virtualisation solutions use shared read-only boot disks?

    1. John Riddoch

      Re: The really clever thing about IBM mainframe VM/CMS was...

      "Do any x86/*nix virtualisation solutions use shared read-only boot disks?"

      Solaris sparse root zones shared the binaries from the global zone. It made for efficiency, but it was a pain to manage as it lost flexibility as every package had to be installed in the global zone and moving zones between servers would be even more complicated.

    2. Liam Proven (Written by Reg staff) Silver badge

      Re: The really clever thing about IBM mainframe VM/CMS was...

      [Author here]

      > Do any x86/*nix virtualisation solutions use shared read-only boot disks?

      Well, you know, there is the system that I linked from in the article!

      «

      There is some prior art in the form of this diskless VMs guide, which includes booting VMs from the host over iPXE, which replaces even the GRUB bootloader.

      »

      The system:

      https://github.com/Technohacker/central-diskless-boot

      The boot loader:

      https://ipxe.org/

      1. lordminty

        Re: The really clever thing about IBM mainframe VM/CMS was...

        Thanks for taking the time to respond Liam.

        The diskless boot looks very interesting. I wonder if it would work on a Raspberry Pi Proxmox hypervisor? When I get time I might have a play.

        Now if we can move the master image into ram disk Linux can match IBM VM from 1967!

    3. Dazed and Confused

      Re: The really clever thing about IBM mainframe VM/CMS was...

      > Do any x86/*nix virtualisation solutions use shared read-only boot disks?

      When I have multiple VMs running the same OS I use a shared backing store image with COW R/W layer on top, this means you get instant deploys. This is on KVM/QEMU, Works really well, Seems very cache efficient on the host, so the performance it good. The KSM stuff then seems to de-dupe the memory pretty well so we end up using a lot less RAM than we'd initially estimated.

  5. Børge Nøst

    Your next talk?

    I hope that will involve microkernels, SASOS and the Mill cpu - the unholy trinity made for each other that I'm beginning to fear will never happen. (Someone please check in on the Mill company...)

    Has anyone else amended their hw to fit sw ideas better? (Ok, ID-tagged caches are there, but that helps regular OSes too.)

    1. John Riddoch

      Re: Your next talk?

      AWS pretty much roll their own hardware stack using the Nitro cards which do most of the hypervisor/IO work on each node. That's obviously a niche case where hyperscale cloud providers can make savings by making dedicated hardware whose only purpose is to run virtual machines.

  6. tpepper

    Existing microvms

    Clear Containers, Kata Containers, etc. have been doing this for almost a decade. There’s good work out there around trimming to virtio target contexts.

    1. Nate Amsden

      Re: Existing microvms

      When I was reading this I thought of containers too. Not sure what the author's point is, but from someone who has been using VMware for 25 years(and Linux for 28 years) this concept doesn't sound useful to me. One of the big points of VMs is better isolation, I want local filesystems, local networking, etc in the VM. If I don't want that overhead then I can/do use LXC which I have been using for about 10 years now(both at home and in mission critical workloads of stateless systems at work), never been a fan of docker-style containers myself.

      When I think of a purpose built guest for a VM it mostly comes down to the kernel, specifically being able to easily hot add and more importantly hot remove CPUs and memory on demand (something that VMware at least cannot do). I think I have read that HyperV has more flexibility at least with Windows guests and memory but not sure on specifics. Ideally having the OPTION (perhaps a VM level config option) that if say for example CPU and/or memory usage gets too high for too long *AND* there is sufficient resources on the hypervisor, that the guest can automatically request additional CPU core(s) and/or memory, then release those after things calm down. I believe in Linux you can set a CPU to "offline"(have never tried it so unsure of the effects, if any), but still can't fully remove it from the VM in VMware(at least, unsure about HyperV/Xen/KVM) without powering the VM off.

      Side note Linux systems I guess can freeze if you cross the 3GB boundry hot adding memory so VMware doesn't allow you to go past 3GB if you are below 3GB, which is a bit annoying which means if you built a VM with 2GB memory and want to hot add to 4GB it requires the VM be shut off, so fixing that would be another nice thing for a purpose built VM guest OS too.

      Most distro specific issues especially hardware drivers of course are basically gone in VMs. I spent countless hours customizing Red Hat PXE kickstart installers with special drivers because the defaults didn't include support for some piece of important hardware, the most problematic at the time (pre 2010) was probably the Intel e1000e NIC as well as Broadcom NICs too sometimes(and on at least one occasion needed to add support for a SATA controller). Can't kickstart without a working NIC.. but wow the pain of determining the kernel, then finding the right kernel source to download, compile the drivers, insert them into the bootable stuff, I think that is the only time in my life that I used the cpio command. Intel had a hell of a time iterating on their e1000e NICs, making newer versions of them that look the same, sound the same, but only work with a specific newer version of the driver.

      Exception may be windows on the drivers front, I've installed a bunch of Windows 2019 servers over the past year in VMs, and I have made it a point to attach TWO ISO images to the VM when I create it, the first ISO is for the OS itself, and the 2nd ISO is for a specific version of the vmware tools ISO that has the paravirtual scsi drivers on it (newer versions of the ISO either don't have the drivers or they didn't work last I checked). Just so I don't have to mess around with changing ISO images during install. Don't have any automation around building windows VMs as I'm not a windows person, but have quite a bit around building Ubuntu VMs. So strange to me that MS doesn't include these drivers out of the box, they've been around for at least 10 years now. Not sure if they include VMXNET3 drivers, I don't need networking during install, and installing vmware tools after install is done is the first thing I do which would grab those drivers.

      I never touched Plan 9 I don't think, but the name triggered a memory of mine from the 90s when I believe I tried to install Inferno OS(and I think I failed, or at least lost interest pretty quick) https://en.wikipedia.org/wiki/Inferno_(operating_system) "Inferno was based on the experience gained with Plan 9 from Bell Labs, and the further research of Bell Labs into operating systems, languages, on-the-fly compilers, graphics, security, networking and portability."

      Perhaps someone who knows more(maybe the author) could chime in why they are interested in Plan 9 and not Inferno, as the description implies Inferno was built based on lessons learned from Plan 9, so I assume it would be a better approach at least in their view.

      I dug a little deeper into Inferno recently and found what I thought was a funny bug report, the only bug report on it for github, for software that hasn't seen a major release in 20 years(according to wikipedia anyway)

      https://github.com/inferno-os/inferno-os/issues/8 the reporter was suggesting they update one of the libraries due to security issues in code that was released in 2002. Just made me laugh, of all the things to report, and they reported it just a few months ago.

      side note I disable the framebuffer(?) in my Linux VMs at work by default

      https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-security/GUID-15D965F3-05E3-4E59-9F08-B305FDE672DD.html

      if you do that you need to update grub(these are the options I use,I suspect only the nofb and nomodeset are related to the change):

      perl -pi -e s'/GRUB_CMDLINE_LINUX_DEFAULT.*/GRUB_CMDLINE_LINUX_DEFAULT\=\"spectre_v2=off nopti nofb nomodeset ipv6.disable=1 net.ifnames=0 biosdevname=0\"/'g /etc/default/grub

      perl -pi -e s'/^#GRUB_TERMINAL/GRUB_TERMINAL/'g /etc/default/grub

      if you don't do that in grub you'll just see a blank screen in VMware when the system boots.

      There has been at least one security bug in vmware related to guest escape and the framebuffer or something over the years(maybe just this https://www.blackhat.com/presentations/bh-usa-09/KORTCHINSKY/BHUSA09-Kortchinsky-Cloudburst-PAPER.pdf) so I figure disable it since I don't need it anyway.

    2. Liam Proven (Written by Reg staff) Silver badge

      Re: Existing microvms

      [Author here]

      > have been doing this for almost a decade.

      I did look into this, and AFAICS, not to the extent I am proposing they haven't, no.

      Secondly, do bear in mind that I am suggesting this for a specific reason and in a specific context, which was described at length in parts 1, 2 & 3 of this little series. If you haven't read them, then you may not understand this context, so I suggest doing that and then re-evaluating part 4. The origin of this article was merely the last few minutes of a nearly hour-long talk.

  7. ldo

    IBM As An Example Of How To Do Interactive Computing? Really??

    IBM were, and are, legendary for their complicated, convoluted, inflexible and above all expensive ways of solving problems. Remember Conways’s Law: “any piece of software reflects the organizational structure that produced it”. Why run one OS on top of another, instead of adding the functionality in directly and cutting out the overhead? Because the different parts were produced by different divisions of the company, and that was the only way they could get their stuff to work together.

    Consider that other companies, in particular DEC, could produce higher-performance interactive timesharing systems, with more versatility and functionality, at a fraction of the cost of IBM.

    Why do you think Unix was developed, not on an IBM mainframe, but on a DEC machine?

    1. Liam Proven (Written by Reg staff) Silver badge

      Re: IBM As An Example Of How To Do Interactive Computing? Really??

      [Author here]

      > Why run one OS on top of another, instead of adding the functionality in directly and cutting out the overhead?

      1. Do please remember that this is just part 4 of a 4-part series, and this is the smallest, least important part.

      2.

      > adding the functionality in directly

      Because as p1 said, the problem I seek to address is large, over-complex systems.

      3.

      > cutting out the overhead

      Because you are not cutting it. You're just moving it.

      The result is a larger, more complex system.

      That's the problem we are in.

      And as with one of your previous comments, when you responded to my argument that there are too many Linux desktops and they are too similar, you responded "well write your own then and show us how it's done"...

      Your proposed solution to the problem is what caused the problem, and your solution makes it worse.

      1. ldo

        Re: The result is a larger, more complex system.

        Actually, it was not: all you had to do was compare how DEC dealt with its multitude of operating systems for the PDP-11. RSTS/E was able to offer compatibility shims with some of the others through blocks of resident code, no more than 8KiB in size, called “run-time systems”. No need for full-on hypervisors at all.

        Conway’s Law applies again: compatibility between OSes was easier, because communication between the groups working on those different OSes was easier.

        And you yourself tried to talk about “microVMs”, did you not? Are they not supposed to be lighter-weight and lower-overhead than full-on virtualization?

        And we have them too, now: we call them “containers”.

        1. Michael Wojcik Silver badge

          Re: The result is a larger, more complex system.

          Containers ≠ VMs. Not at all.

          And My VM is Lighter (and Safer) than your Container.

          1. ldo

            Re: Containers ≠ VMs

            Precisely my point. They have less overhead than VMs, because they don’t require running their own kernel.

  8. HuBo
    Go

    Long comments suck (I never read them)

    I like both ideas, the post-FOSDEM 24's stacked micro-VM concept, and the FOSDEM 21's linear address space and persistent memory, and think they should be combined really, all within a post-exascale 128-bit perspective. The VM approach is a bit more focused on hardware-agnostic modularity than performance though, so it might benefit datacenters more than HPC. The questions remain (in my mind) as to whether memory-oriented VMs, or register-based ones, or stack-VMs are best, especially in "impedance"-matching CISC and RISC CPUs. That there is a wide choice of them available today (Dis, JVM, BEAM, Squeak, WebAssembly, ...) is testimony to the popularity of this approach (IMHO).

    In past comment sections, a few Kommentards suggested that some aspects of the approach may be old-hat-ish. For example, Michael Wojcik suggested that linear addressing with memory-mapped stores (rather than tiered storage) was as old as OS/400's Single-Level Store from the 1980s (or even the 1970's really), which is correct. But I think that what was expensive and rather "special order", and proprietary, back then has a chance of becoming more mainstream in the upcoming future, thanks to tech developments, and especially MRAM, around 2035 (if those predictions are to be believed). In any instance, it should take a good 10 years to put together a micro-VM and persistent storage 128-bit OS together, and so it is best to start now, and be ready when MRAM (or similar) becomes more widely available.

    Luckily, as noted by jake, we've worked on 128-bits for a while, and have IEEE-754 quadruple precision ready to go (and used in astronomy). We may also split 128-bit data items into either two 64-bit floating-point numbers, to represent a complex number (useful in quantum mechanics and EE), or an integer numerator and denominator, to represent a fraction, or even just two real 64-bit numbers to use in a vectored op. So, 128-bits is a natural for those (easy choice).

    The FOSDEM 21 talk does mention how files might be treated under the linear-address/persistent-memory approach, but one may understand some folk's worries, as expressed for example by doublelayer, in relation to file expansion and the needed allocation of new clusters (or similar). In my understanding, the need for file-specific operations remains, but they take place within random-accessible memory space, rather than a separate, tiered, address space. This is expected to ease the process to some degree (but others may want to pitch in as to expected advantages and potential drawbacks).

    It may take 10 years to develop, but is definitely the future, and so this development should start now (as an integration of mainly "circuit less traveled" concepts, from the past, actualized to take advantage of the progress in tech that has happened since). Great ideas from Liam!

  9. zeigerpuppy

    Plan9 file-system works nicely with KVM. We use it in our linux (Devuan) based hypervisors for data volumes...

    Strong points are the permission mapping (possible per-VM on same share) and ease of use. It's greqt to back the volume with a ZFS dataset in the hypervisor...

    However, 9pfs is less suited to certain workloads. In particular, it has poor database storage compatibility and suffers from very-poor small-file performance (if sync writes are needed).

    In KVM, virtiofs is much more performant...

    1. ldo

      Re: However, 9pfs is less suited to certain workloads

      Does Plan9 have a VFS layer, like Linux, to allow addition of alternative filesystems?

      1. Liam Proven (Written by Reg staff) Silver badge

        Re: However, 9pfs is less suited to certain workloads

        [Author here]

        > Does Plan9 have a VFS layer

        AIUI: it has *nothing else*.

        9front in particular comes with a choice of 2 root filesystem formats.

        1. ldo

          Re: However, 9pfs is less suited to certain workloads

          For example, did you know that ext4 on Linux has the option of case-insensitive filenames, settable on a per-directory basis?

          Does Plan9 offer something similar?

  10. Anonymous Coward
    Anonymous Coward

    Hypervisors as they are written now largely exist...

    ...because PCs have very bizarre hardware.

    Otherwise the closer you are to chroot, the better. An OS should be able to run an OS just like it runs any other program.

    The original IBM Z/360 was basically a way to consolidate a whole bunch of older computer systems onto one single computer.

  11. Mrs Spartacus

    Stop it, I'm going misty-eyed

    Ah, all this talk of George 3 and MULTICS (at my University in 1979), and the AS/400 (the second company I worked for) is making me long for the days of proper music and flared trousers.

    OK, Maybe not the flared trousers, they were always a disaster. Was I the only teenager who thought they looked ridiculous at the time? Was I the only one who said "no, the Emperor's flares are rubbish"?

    Time for a coffee and my meds <sniff>.

  12. Updraft102

    "Linux running on the bare metal is a tiny niche now. The vast majority of Linux servers are running on some kind of hypervisor, even if that's provided by another Linux distro."

    Another distro which is running on bare metal, right? Seems like it is not really all that niche. And all of the IoT things out there are Linux on bare metal too. There has to be some kernel running on the bare metal... what else would it be?

  13. Greywolf40

    IIRC. Warp 4 ran Windows inside a VM. I recall poking around in my copy and seeing a "network". But I didn't have a network at the time. My inference: Warp 4 set up a virtual network inside the box to link Windows to W OS/2. Still don;t know if my interpretation was/is correct/ Can anyone confirm/clarify?

    Thanks.

  14. Henry Wertz 1 Gold badge

    The other solution for Linux virturalization

    The *other* solution for Linux virtualization, lxc (Linux Containers) and the like. Not as commonly used for sure, but they are very efficient and effective! No virtualization overhead, everything is bare metal, but you still can have per-container RAM, CPU, disk usage, and network quotas if you want to.

    This is just a chroot jail "on steroids" (and BSD has a BSD Jails solution that provides this type of functionality too from what I've read.) It's a chroot jail but with it's own view of processes, it appears to have root and user accounts within it, you can even give it it's own virtual ethernet so it get's it's own IP addresses. Ubuntu has a "cloud-init" version going back to very old versions where it still has a startup script and shutdown script so you can "boot" and "shut down" the thing, but it basically runs nothing on startup (maybe a script to set up the /dev, /proc, /sys and the IP address if it's not sharing the host IP) until you install or put something in there to run on "bootup". You install a cloud-init image with lxc, you can run some command and get a root or user shell in there and set things up as needed. There's some tools so you can limit RAM usage, CPU time, disk quota, etc. (which I didn't use).

    I have an antiquated Ubuntu 11.04-era MythTV setup I got running using a chroot jail type setup (after the ancient Dell croaked, it had just turned 21 years old, I used a full backup to resurrect it.) After I had too much trouble trying to pull the MythTV, MySQL, and Apache over to run directly under Ubuntu 22.04 system, I tried a chroot jail and there was very little issue at all! (I pulled the bits to start up those three and put them into a script to start up just those 3 services, and it worked straight off!)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like