back to article FreeBSD can now boot in 25 milliseconds

Replacing a sort algorithm in the FreeBSD kernel has improved its boot speed by a factor of 100 or more… and although it's aimed at a micro-VM, the gains should benefit everyone. MicroVMs are a hot area of technology R&D in the last half decade or so. The core idea is a re-invention of some of concepts and technology that IBM …

  1. John Robson Silver badge

    Pretty impressive

    that a single sort algorithm was taking 99+% of the boot(init?) time.

    Oh it didn't:

    "FreeBSD’s mi_startup function, which kicks off machine-independent system initialization routines, was using a bubblesort to order the functions it called; while this was reasonable in the 90s given the small number of routines needing to be ordered at that point, there are now over 1000 such routines and the bubblesort was getting slow. Replacing it with a quicksort will save 2 ms. (Not yet committed as of 22 August 2023).

    "

    1. b0llchit Silver badge
      Facepalm

      Re: Pretty impressive

      Bubblesort? That is never appropriate in production systems. It is thought up by novices when they design their first algorithm and it is always a failure to use in practice. Any good teacher will tell you the (w)holy sorted story in graded stability versions.

      1. Anonymous Coward
        Anonymous Coward

        Re: Pretty impressive

        Does anyone remember "syscon" for Novell in the '80s? This was the main admin GUI, responsible to modifying user accounts, print services, etc

        I was a teaching assistant at a Florida uni and we had "Superset" visit (i.e. Drew Major and the rest of the core Novell coders)

        We had something like 125 Novell servers in various departments scattered across campus, and I helped admin several for the computer science department, which had about 20K accounts for the comp sci students.

        Major tries to pop up the user list in syscon, and it's sitting there flashing "Please Wait" for a LONG time, like 5 minutes. (Which we considered a standard response time)

        He asks me how many people in the list, and I said "about 8K" at which point one of the other guys leans across and says to Drew, "I think that uses a bubble sort, we were kind of in a hurry"

        My data structures prof was there, and I saw him turn white.

        And yes, there was a point release the next month that vastly improved that performance, so I think that was a true assessment.

      2. John Robson Silver badge

        Re: Pretty impressive

        If you're sorting a handful of items then it's not a bad choice... but the count of items has ballooned since it was chosen, and the algorithm still worked, and is fast enough that it didn't really matter when compared with the delays of waiting for hardware to do it's stuff...

        I mean we're talking about a 2ms saving, on a process which (before other optimisations) took 10s.

        1. b0llchit Silver badge

          Re: Pretty impressive

          Zero, One, (near) Infinite(*).

          That are the numbers you need to think of. If you allow more than One you are in (near) infinite territory and it is too many for any stupid algorithm. Conclusion, bubblesort is never appropriate and there are better stable-sort algorithms.

          (*)Some argue "Two" should be included in the list, and bubblesort would still not be the appropriate algorithm in that case.

          1. John Robson Silver badge

            Re: Pretty impressive

            There are plenty of cases where you are handling a reasonably small list of items.

            I'm more interested in whether in fact all of the now hundreds of items on that list are needed in a microVM...

      3. matthewdjb

        Re: Pretty impressive

        I first used quicksort in anger in 1990. I learned it during my CS degree in the late 80s. It was already two decades old then.

        1. John Brown (no body) Silver badge

          Re: Pretty impressive

          My A level "project" was a comparison of sort routines, running on a 1.77MHz Z80 cpu. Bubble sort was the slowest, obviously. I can't even remember what the others were, quick sort, shell sort, insertion sort, others, something like that. It was over 40 years ago and I've not needed to deal with stuff like for 30 years :-)

          1. david 12 Silver badge

            Re: Pretty impressive

            I've not needed to deal with stuff like for 30 years

            10 years ago, I finally just took the sort out. System ran fast enough that it didn't matter what the order of operations was.

          2. MacroRodent

            Re: Pretty impressive

            Once coded shellsort in 8088 assembler in one of my first computer jobs. It is not much more code than bubblesort and performs a bit better when the number of items increases. But in that particular application (crude 3d computer graphics) it probably did not make much difference, we did not have very complex scenes. But made me feel a guru...

            1. Michael Wojcik Silver badge

              Re: Pretty impressive

              Shellsort is an interesting case because its theoretical performance is difficult to analyze.

              Quicksort, of course, has bad worst-case performance in the naive implementation. There are various tricks you can use to avoid that, and there's Introsort, which detects adverse performance and switches to a different sort.

              Radix sorts are even faster than comparison sorts like Quicksort under the right conditions; the problem is that those conditions don't fit most cases.

              These days, state-of-the-art sorts seem to be things like the widely-used Timsort (a mergesort / insertion sort hybrid that employs a number of techniques for improving real-world performance and has the advantage of being stable, i.e. not reordering items with the same key) and Learned Sort (which builds a model of the input's distribution and uses that to approximate a radix sort).

              For sorting large amounts of data ("external sorting"), there are various partition / parallel sort / merge approaches, some of which are fairly ornate (such as the clasic polyphase mergesort). There's a tradeoff between partitioning and merging: you can partition so none of the subfiles have overlapping keys, in which case merging is just concatenation; or you can partition randomly, and then during merge you compare the keys of the next record from each subfile. You could even do external Shell sorting where you use the Shell stride to do the partitioning and merging.

              But even for small N, bubble sort is pretty terrible. It's not that easy to code or understand; I'd say naive mergesort is easier to write and read if the language has decent support for recursion and allocation of auxiliary space. And gnome sort is simpler than bubble sort, with the same performance characteristics. Selection sort is even easier to understand; you can describe that to an novice in two sentences. So even if you expect N to remain small, it's hard to see how bubble sort is ever the right choice – except when it's the only sort you know and you can't be bothered to do any research.

              1. that one in the corner Silver badge

                Re: Pretty impressive

                > bubble sort is pretty terrible. It's not that easy to code or understand

                Huh? It is often used as My First Sort simply because it *is* so easy to code and undertand! But that is the main reason to query your statements:

                > So even if you expect N to remain small, it's hard to see how bubble sort is ever the right choice – except when it's the only sort you know and you can't be bothered to do any research.

                Or when you know extremely well what you are doing, on a resource-constrained system, where you need an in-place, non-recursive algorithm (i.e. has a tiny memory overhead, including stack usage) which requires only a tiny bit of code to implement.

                You may wish to expand - or possibly shrink - your programming horizons before making such sweeping statements.

          3. robinsonb5

            Re: Pretty impressive

            Just out of interest, did you also compare them for code footprint and working RAM / stack usage?

      4. Hans 1

        Re: Pretty impressive

        Never say never, it all depends how many things you have to sort.

        1. TheMajectic

          Re: Pretty impressive

          Absolutely agree, everything has its place

        2. Bebu
          Headmaster

          Re: Pretty impressive

          《it all depends how many things you have to sort.》

          Indeed. Whether its worth sorting at all is eqivocal when a linear search may be faster in the worst case than the overhead of sorting. Also depends how often the benefit of having the data sorted is used - probably only once when booting.

      5. Norman Nescio

        Re: Pretty impressive

        Bubblesort? That is never appropriate in production systems.

        Never is quite a long time. It is an interesting student exercise to devise a set of plausible system constraints that would end up with bubble sort being a reasonable choice. Sorting is complex enough for Donald Knuth to have devoted a large part on one of the volumes of his magnum opus (Volume 3. of The Art of Computer Programming) to it.

        Sorting is fundamental in the development of computers. It was entirely possible to have tape-based systems (no hard drives) where a major function was to sort the records on a tape. If you have a single tape drive, and less memory in the computer than the space taken by records on the tape, ending up with a sorted data set in the most time efficient way is an interesting challenge. Things improve if you can use two, or even three tapes ( or even 10), or have a limited amount of random access storage, like a small drum or hard disk. There's a pretty illustration from Knuth's work in this Stack Exchange Retrocomputing question and answer: StackExchange:Retrocomputing:What "spectacular to watch" algorithms were used for sorting tapes?

        Bubblesort can be appropriate if you know that the dataset to be sorted is (a) small and (b) almost completely sorted already. Some other efficient general purpose sorting algorithms don't perform as well in this specific instance. If you are working in extremely memory constrained environments (think microcontrollers) and need, for example, an in-place sort, then bubble sort can be a good candidate for small-enough datasets. Examples of in-place sorting algorithms with other constraints can be found here: https://github.com/ceorron/stable-inplace-sorting-algorithms

        But the point is not to say that bubblesort was appropriate in this instance. (BSD booting). It obviously wasn't. The constraints on your sorting environment can determine the best option. Does it need to be stable? Need it be in-place? Is the dataset to be sorted larger than the working memory available. Is the dataset semi-sorted already? What are the most efficient opcodes for the hardware architecture you are using? Can the sort be parallelized? All can be relevant.

        Oh: and this is a fun site to look at some side-by-side sorting algorithm animations: Toptal: Sorting Algorithms Animations

        NN

      6. Ignazio

        Re: Pretty impressive

        There are values of N for which O(N^2) is smaller than O(Nlog(N)) when you include memory requirements and multiplicative constants. Granted, if memory serves that limit is 12 or so...

    2. abend0c4 Silver badge

      Re: Pretty impressive

      I don't know a whole lot about the FreeBSD kernel, but, as I understand it, the system initialisation routines (or declarations thereof along with their parameters and execution priority) are spread through various parts of the code and rely on linker sets to gather them together at link time in one contiguous block.

      The initialisation process then sorts that block in priority order before executing each initialisation routine in turn.

      If you were looking for a more optimal process still, presumably it would be better to sort the data into the correct order within the executable as part of the build process: after that point it's never going to change. You could then dispense with the runtime sort altogether.

      1. cookieMonster Silver badge
        Joke

        Re: Pretty impressive

        What about replacing the init with systemd??

        /thats a joke icon

        /ducks and runs for cover

        1. Arthur the cat Silver badge
          Joke

          Re: Pretty impressive

          Good job you put the joke icon on that, I was getting ready to call Liam Neeson.

        2. jake Silver badge

          Re: Pretty impressive

          THAT'S NOT FUNNY, MAN!

          Seriously, don't give Marketing any ideas.

        3. katrinab Silver badge
          Mushroom

          Re: Pretty impressive

          On my hardware, which is somewhat faster than the target machine described in the article, Debian boots in 1 second, FreeBSD boots in 5 seconds. If it can now boot in 25ms, then sure, maybe SystemD could get it down to 5ms, but given I'll happily trade 4 seconds of boot time for a sane and usable init system, I'm really not going to be worried about 20ms.

          1. John Robson Silver badge

            Re: Pretty impressive

            But you're not creating and booting a machine for every http request that comes in... a workstation is a somewhat different use case.

            1. katrinab Silver badge
              WTF?

              Re: Pretty impressive

              Why on earth would you want to do that?

              If it was Windows ME, then maybe I could understand. Not FreeBSD, Linux, or even more recent versions of Windows.

              1. John Robson Silver badge

                Re: Pretty impressive

                Because that's apparently what the cool clouds are doing now.

                I mean it's an interesting concept, absolute isolation between instances of a process...

              2. Anonymous Coward
                Anonymous Coward

                Re: Pretty impressive

                The use case from the article was serverless services, like AWS Lambda. Lamba services normally pull job requests from a queue. Then, if there are too many job requests in the queue, Lambda will automatically spin up another instance of your service to help handle the increased load.

                When I was working with it a couple years ago, it would normally take around 30s to get a new instance ready to handle requests. Dropping the startup time to < 1s would have been great since it eliminates a latency issue we had to design around.

                It's also a cost savings for the service owner: since you don't want your customers to wait 30s for their request to be completed, you normally have to keep some instances up at all times, even if they're not being used. But you get charged for each available instance, even if it's not actually doing any work, so being able to spin up a new instance with a negligible latency cost would help minimize the overhead costs. You could have exactly as many running instances as you need to fill the immediate workload.

                Caveats: a) Spinning up a new instance also includes things like copying your service executable to the new VM and starting that executable, so just optimizing the VM boot time isn't a silver bullet. b) Lambda is not for everyone, consult your local practitioner to see whether Lambda may be right for you. If used incorrectly Lambda may cause poor performance, service downtime or excessive fees.

                1. John Robson Silver badge

                  Re: Pretty impressive

                  "Caveats: a) Spinning up a new instance also includes things like copying your service executable to the new VM and starting that executable, so just optimizing the VM boot time isn't a silver bullet. b) Lambda is not for everyone, consult your local practitioner to see whether Lambda may be right for you. If used incorrectly Lambda may cause poor performance, service downtime or excessive fees."

                  a) It's not a silver bullet - but it is one part of the spinup which is used by literally every instance, so savings here *really* add up.

                  b) Absolutely

    3. Anonymous Coward Silver badge
      Boffin

      Re: Pretty impressive

      It wasn't making the boot 100x quicker, it was making the sort 100x quicker (contrary to what the article says).

      From the embedded tweets...

      "it now spends 7% of its time running a bubblesort"

      and subsequently

      "We're now running a mergesort which is ~100x faster"

      So that part is now (approximately) 0.07% of the original boot time, so the new boot time is 6.93% quicker, or 93.07% of the original time.

      Meanwhile, all the other optimisations he's made have reduced the boot time to the point that 2ms is significant enough to look at.

      Looping around, "quicksort will save 2ms" - if you take that 2ms as being 6.93%, then the previous boot time was 28.9ms. Given the precisions involved, that's close enough to the claimed figures.

      In summary, the line in the article "Replacing a sort algorithm in the FreeBSD kernel has improved its boot speed by a factor of 100 or more" is complete bollocks.

    4. Ignazio

      Re: Pretty impressive

      Came here to mock the same misreporting of the tweet thread

  2. druck Silver badge

    The bubble has burst

    Getting over the shock of finding that anyone was still using a bubble sort, let alone in a kernel, I like the idea of a Micro VM, it's a good abstraction which should be used to reduce complexity in all OS's.

    1. Charlie Clark Silver badge

      Re: The bubble has burst

      Certainly not my area of expertise but I believe bubble sort used in some cases by the sort algorithm in Python and that's considered to be pretty amazing.

      1. Paul Crawford Silver badge

        Re: The bubble has burst

        As above, never use a crap algorithm when your library already has qsort() and similar properly implemented.

        1. Vometia has insomnia. Again.

          Re: The bubble has burst

          FreeBSD already has a decent version of qsort in its kernel so I'm not sure why the bubble sort is there. I guess it might just be one of those "this'll do for now, I'll come back to it later" things that ends up being overlooked for years.

          1. Orv Silver badge

            Re: The bubble has burst

            If your list of items to sort is *very* short (as it seems to have been at first), then bubblesort can actually be faster.

            I've also seen situations where a "worse" sorting algorithm ran faster because it fit entirely in processor cache, whereas a more sophisticated sort did not.

            1. that one in the corner Silver badge

              Re: The bubble has burst

              > If your list of items to sort is *very* short (as it seems to have been at first), then bubblesort can actually be faster

              This.

              What seems to be being missed in this discussion is that, if you look carefully, a number of high performance implementations of "clever" sort algorithms will (or used to) bottom-out into a bubblesort when the number of items gets small. It is faster to run an inline unrolled loop version of bubblesort for, say, n=2 to n=6 (where the n=2 case is itself almost degenerate, of course) than it is to, for example, partition that set, perform a recursive call and join the results together again.

              Deciding where and when to perform thus level of optimisations is itself a fun (!) exercise, where all your good work last week is totally ruined when the next CPU comes along and has different code and/or data cache responses (especially in the days when those caches started appearing for the first time in CPUs we could actually afford to buy).

              1. Orv Silver badge

                Re: The bubble has burst

                Bubble sort also outperforms most (but not all) other sorts when the list is already sorted, or nearly so.

        2. Wanting more

          Re: The bubble has burst

          Isn't quick sort recursive and that can cause problems in environments which have limited stack space / memory. Sometimes an exchange sort is the better choice.

          1. Michael Wojcik Silver badge

            Re: The bubble has burst

            Quicksort can be (and often is) implemented iteratively rather than recursively, if there's a ceiling on N (which there is in all real-world implementations). It will still need O(lg N) auxiliary space but the coefficients are small and if lg N is always less than, say, 32, it's unlikely to be a problem.

            Even with recursive Quicksort, if you're careful, you can keep your levels of recursion to O(lg N).

            And even in a heavily-constrained environment, gnome sort would beat bubble sort, because it would use fewer registers.

        3. Charlie Clark Silver badge

          Re: The bubble has burst

          See https://en.wikipedia.org/wiki/Timsort

        4. spuck

          Re: The bubble has burst

          This was along my line of thinking. Maybe bublesort was used because it's a quick and dirty way to do "good enough" cheaply, and qsort() was not implemented in the kernel code. Once the system is booted and you can link to the standard C library, then qsort(), by all means...

          1. Michael Wojcik Silver badge

            Re: The bubble has burst

            For small N the Standard C qsort() is not going to perform well, because comparisons are expensive (call through a function pointer for each one). Generally, for small N, you don't care; but if you know N is going to be small, and performance is important anyway, then inlining some other sort – even one that's not O(lg N) in time complexity – is better.

    2. Paul Crawford Silver badge

      Re: The bubble has burst

      The microVM abstraction is a great idea - for cases when host & guest are cooperative.

      My most common use-case for a VM is running older OS for software support, or to build binaries for running on other old machines, and there the traditional hardware emulation is more or less a necessity as such OS never even contemplated virtualisation.

      1. An_Old_Dog Silver badge

        Hold Your Horses

        One of the current major use cases for hardware emulators such as Qemu, VirtualBox, etc. is running older OSes for which one does not have the source code to, and are no longer supported.

        Frequently the older OS is being run in order to run older, unsupported application software.

        This wonderful new microVM system requires knowledge and cooperation between the microVM and the guest.

        As the microVM hypervisor is "improved", the features and interfaces will change, which in turn means no-longer-supported OSes will not be modified to run under the new microVM. This will stop you from using the older, static OS with the new-and-improved microVM. Running older versions of the microVM hypervisor may expose one to unacceptable security risks. Thus the need for hardware emulation.

        From TFA: The idea is that you don't need to know anything about the infrastructure: your program calls another program, and the management tooling spawns as many instances as needed to run that specific operation, return the result, and then delete the VMs used to run the calculations. You never need to know where it happened or how.

        You do need to know, if you are doing things at large scale. Our ancestors thought the forests, fishes, and game animals were limitless. They were wrong. A CPU context switch is one of the most expensive operations, and enough guppies can eat a whale.

        This new technology can be useful, but it is no panacea.

        1. matjaggard

          Re: Hold Your Horses

          I don't think anyone is suggesting replacing all virtual machines with this.

          This will be ideal for running certain things: docker on Mac where the performance is currently awful, lots of different flavours of function as a service, containers might soon be a thing of the past since this could remove 95% of their benefit over VMs.

          1. Michael Wojcik Silver badge

            Re: Hold Your Horses

            Lightweight VMs have a number of advantages over containers, such as kernel separation.

    3. NeilPost

      Re: The bubble has burst

      / I was surprised that Hypervisor and OS developers has not been doing this this abstraction for years - with Window’s for example reporting VMWare or Hyper-V motherboard for example.

      However I can see it leading to a whole new world of security pain with fake or bent VirtIO being injected to scrape data or allow nefarious access with Guest OS’s not doing a pile of ‘unnecessary’ compute as they ‘trust’ the Hypervisor underneath.

  3. trevorde Silver badge

    Meanwhile on Windows

    Updating ... 10% complete ... Do not turn off your computer ... Windows will restart automatically after installing updates ...

  4. jake Silver badge

    Strangely enough ...

    ... I spec a BSD when I require uptime measured in years. Possibly decades.

    1. Doctor Syntax Silver badge

      Re: Strangely enough ...

      And when you need a new process you just exec one instead of firing up a whole new kernel to run it. Seriously, if I follow the description correctly this is what's happening: job needs a new process so spend 25ms firing up a new server, new server then execs the required process and somewhere along the line the new process server has also to work out how to connect to the process server that needed it. Marketing then calls it "serverless computing". "Yes, Minister" series one, episode one had a name for that: "getting rid of the difficult bit in the title".

      1. david 12 Silver badge

        Re: Strangely enough ...

        FWIW, I think the actual use case for AWS lamda is "when there are lots of service requests, quickly spin up new servers and share the load". AWS charges per lamda call, so yes, they are counting the calls and can manage the load. AWS doesn't have to spin down lamda servers at the end of each request: it just has to follow the curve up and then down.

        micro jobs are typically run by clients, not servers, so 'connection' is just normal overhead. Conversely, if your process takes 2 seconds, connection and 25 ms spin-up is not critical, and you save money on the flexible load demand.

  5. bazza Silver badge

    VM vs Process

    I'm interested in whether or not this microVM idea is better or worse than an alternative, which for the purpose of this chat is as follows: provide a system call shim for the guest application instead of a whole VM.

    A reason why this is possibly viable, for this type of usage, is that such applications aren't necessarily looking for vast swathes of their native OSes utility stack to be present. If you're aiming to put something inside a microVM, you're not aiming to provide SystemD, loads of services, etc. That'd be a macroVM...

    Several OSes already provide such shims - over the years Solaris, QNX, FreedBSD and Windows have all provided Linux system call interfaces for Linux binaries. How hard would it be for Linux to provide, say, a FreedBSD system call interface for FreedBSD applications? Shims take 0ms to be ready...

    Obvs it'd be difficult to provide a Windows system call shim.

    And if one is cutting back a guest OS so much that, in effect, all its doing is some scheduling (not even that if the guest application is single threaded) or passing calls through the thinnest possible later to the host OS, really what is the difference between the guest application being just another process in the host OS?

    Given that things like encrypted guest memory (thanks to AMDs CPU ideas, and Intel's similar) depends on hardware that can also be used for encrypted process memory, what's the difference between a VM and a process?

    1. katrinab Silver badge
      Meh

      Re: VM vs Process

      The Windows shim you describe does exist. It is called Wine. I've never really seen a use-case for it. Gaming on Linux maybe, but you are probably better off just running full-fat Windows for that.

      1. bazza Silver badge

        Re: VM vs Process

        Strictly speaking, Wine is not a system call shim. It's a reimplementation of the Win32.dll and associated libraries. The equivalent would be FreeBSD recreating glibc, instead of providing the Linux system call interface.

      2. _andrew

        Re: VM vs Process

        Many (many) years ago I made good use of Wine to run a Windows ('95-vintage, I think) command-line assembler for a Motorola DSP. Worked really well, surprisingly, and allowed me to use "real" make and work in a bash shell and other development niceties, without having to maintain and feed an actual Windows box.

      3. david 12 Silver badge

        Re: VM vs Process

        In the opposite direction, the system call shim for unix on windows was SFU. MS dropped support some time ago.

        1. bazza Silver badge

          Re: VM vs Process

          SFU was not a system call interface shim. It was a POSIX environment that could be added to Windows. But you could not take a binary for another OS and run it as-is on top of SFU.

          Windows grew a proper Linux system call interface shim in the shape of WSL 1.0, where userland was Linux binaries (not recompiled - stock binaries) running on top of the NT kernel via a shim.

          The import of this is often missed. Windows is, fundamentally, a proactor operating system (call backs are the way you deal with events in Windows). *nix is, fundamentally, a reactor operating system, in that you get function calls such as select() that work on, well, any file descriptor, and react to events (you can decide what to do with the event after it's been raised, you don't need to commit to a course of action before the event occurs). This is such a massive fundamental difference it shows up in quite a lot of stuff; Boost, ZeroMQ, Cygwin - they've all had to deal with Windows proactor / *nix reactor in various different ways. Then, MS do WSL1.0. This means that, internally, the Windows NT kernel has become capable of supporting reactor model code. That must have been a fair old change inside Windows. Windows has long had an implementation of the select() function call, but it's been a minimal implementation only for network sockets; you can't use it for files, serial ports, IPC, shared memory, etc.

          Why the big fuss? On a reactor operating system you can implement proactor model code. You can't implement reactor code on a proactor operating system. This is the problem that Cygwin ran into, Boost rolled with, ZeroMQ ignored.

          1. abend0c4 Silver badge

            Re: VM vs Process

            You can't implement reactor code on a proactor operating system.

            I think that needs a little further explanation.

            If I wanted to handle asynchronous callbacks in a synchronous event loop, could I not simply store the salient details from the callback in a queue for the event loop to process later?

            The problems with "select" are well-known and long-standing (hence epoll, kqueue, etc). It only came into being with the sockets API when networking was added to BSD. Before that, there wasn't really much of a concept of asynchronous programming in Unix (unlike, say, RSX) with the exception of signals and the rather limited and grudging way it was introduced has cast a long shadow.

            1. bazza Silver badge

              Re: VM vs Process

              It's quite illuminating looking back through Unix and Windows history.

              Back in the bad old days, implementations of select() in various POSIXes / Unixes would have to poll each file descriptor for readiness. This was because the operating system didn't have the support for scheduling processes in response to interrupts from devices. Possibly, the devices themselves didn't raise interrupts anyway (things were a lot more primitive). This made select() (or anything like it) pretty inefficient. As POSIXes / Unixes improved, the whole device driver and kernel architectures changed where by a process could be properly suspended, and woken up as a result of the device underpinning the file descriptors becoming ready.

              Meanwhile, over in Transputers, the very essence of the whole programming model was waiting synchronously on channels passing data (Communicating Sequential Processes - a useful evolution of the Actor Model that ZeroMQ implements. CSP is now coming back into fashion in languages like Go, Rust, and Erlang of course).

              Windows went another way - asynchronous call backs.

              When the Cygwin developers came to implement their select() (an similar functions), they realised to their horror that, fundamentally, it was impossible to have a process block on Windows waiting for a bunch of events, unless they were all network sockets (in which case Windows' own select() would do the job). So, in Cygwin, they had to spin up a thread per file descriptor, each one polling a file descriptor. Very old school, very inefficient (unless, all the file descriptors were actually sockets). Somewhere in the mists of the Cygwin dev chat there's a hilarious thread of conversation when they're exploring how to implement select(), and the disbelief that it was impossible!

              When Boost came to do their network aio library they chose to do async call backs, purely so that Boost could be implemented sanely on Windows. Dig around deep enough in Boost's documentation and you can find this reasoning. Boost's documentation uses the words "Proactor" and "Reactor" in discussing the topic.

              ZeroMQ straight up does not support IPC pipes on Windows, because it's impossible to implement ZeroMQ's polling on Windows.

              The issue with trying to do reactor in Windows is that all you can do is wait until some I/O has been completed. So, for example, reading from an IPC pipe. In Windows, you have to have some sort of asynchronous execution (thread, some sort of async keyword, all depending on the language), and all that can do is carry out a read operation from the pipe of an amount of data. It blocks until the read operation is complete. Sure, you can then pass the results on to some other main thread. However, the point is that the read operation has already happened, it's over and done with by this point in time. So, what happens if the main thread decides that, actually, owing to other events on other objects, reading from that IPC pipe was not the right thing to do? Or that reading that amount of data was insufficient / too much? Or that the pipe should have been closed and not read at all?

              This might sound overly fussy, but it does matter, especially in code that's got to deal with errors. In Windows, you can have an async call back stuck trying to read a socket that is never, ever going to complete because the other end has died. Sure, eventually something in the OS may timeout and close the socket (causing an exception in the blocked read). But it's a right nuisance if all you want is for the application to close, but it can't because there's a thread stuck somewhere trying to read data that's never going to turn up. With a reactor system, you don't even begin to try the read() until your reactor has been told that there is data to read, so the application can easily quit cleanly.

              With select(), epoll() or any other reactor, no actual I/O has happened on the file descriptor yet. On Windows, your code has to read data to know the event has occurred (sometime in the past).

              This is why WSL 1.0 is so fascinating. MS must now have built in support for reactors into the kernel that can operate efficiently on more than just sockets. Otherwise, a ton of stuff (e.g. dBus - not really found on WSL 1, but something that is totally dependent on an efficient reactor for IPC pipes) would run like a dog and take a ton of CPU time.

              Of course, I'm assuming that they have made NT kernel changes to make reactors in WSL 1.0 efficient. If they have, then it'd be quite nice if they exposed that functionality through win32.dll as well

              1. abend0c4 Silver badge

                Re: VM vs Process

                I'm grateful for the additional details and, If nothing else, you've confirmed my hatred of design patterns: giving things names doesn't contribute to their explanation and often simply leads to their obfuscation.

                I think there's a huge difference between You can't implement reactor code on a proactor operating system and the specific case that the exact semantics of select can't be reproduced efficiently on Windows.

                It clearly can't because, at the very least, Windows doesn't have the "everything is a file" paradigm that Unix does - for example, IPC pipes. That doesn't mean it's impossible for ZeroMQ to support, it means they'd need significantly more platform-specific code to do so and it's clearly not something they see as being worthwhile. I believe there was some discussion of using IOCP that fizzled out.

                1. bazza Silver badge

                  Re: VM vs Process

                  Jargon can indeed be a problem. However, I liked Boost's labelling of Proactor and Reactor - a proactor is proactive in deciding what to do (by pointing to a callback in advance of the event), whereas a reactor can decide what to do as / when. "Actor Model" and "Communicating Sequential Processes" are fairly opaque in their meaning!

                  I think it quite interesting that three major open source projects have either tried and bodged, folded, or not bothered. I've also tried writing something like select()'s functionality on Windows involving sockets and IPC pipes, and gave up. It was simpler just to do everything as a socket. There's nothing in the docs for IO completion ports that seems to offer a way either (the *Completion* part is probably a big hint!).

      4. An_Old_Dog Silver badge

        Re: VM vs Process

        I use Wine to run a bunch of useful little Windows-only, niche programs which will never be updated, let alone re-written for Unix/Linux. These programs have no Unix/Linux equivalent.

      5. robinsonb5

        Re: VM vs Process

        I have a portable audio recorder which stores recordings on an SD card in a proprietory format. There's a Windows-only vendor-supplied utility for converting them to WAV files, and I'm very grateful for the ability to run it under WINE.

      6. Bill Gray
        Megaphone

        Re: VM vs Process

        About three decades ago, I quit my day job and wrote a desktop planetarium software for DOS, then moved it to Windows. When it came time to do a Linux version, I tried simply running the program under Wine. There were a few bugs, almost all of which turned out to be "me" problems (bugs I'd made that had somehow gone undetected on "real" Windows). Fixing these and working around the few remaining Wine issues, I effectively had the desired Linux version of my program, with very little effort.

        The required changes were roughly at the level I've encountered when Microsoft created a new fork of Windows (a few changes required for Vista, then for Win7, etc.) You do have the problem that vendors will make those fixes for a Microsoft fork of Windows, but will rarely make them for Wine.

        Note that this was about ten years back. Wine has improved a fair bit since then.

        If you have source code, Wine is an excellent idea. I'd recommend it just for the ability to catch bugs. I've had decent success with closed-source code on Wine, but when it fails, there's less you can do to fix/work around it.

        1. Liam Proven (Written by Reg staff) Silver badge

          Re: VM vs Process

          [Author here]

          > Fixing these and working around the few remaining Wine issues, I effectively had the desired Linux version of my program, with very little effort.

          This is essentially how Corel ported the Windows edition of WordPerfect Office 2000 to Linux.

          Sadly, they signed a licensing deal with Microsoft to get VBA and the MS Office 2000 look and feel, and one of MS' terms of the deal was that Corel discontinue all Linux development.

          Corel Linux OS got sold off to become Xandros. The NetWinder ARM desktop was spun off. And both the native Linux WordPerfect and the `winelib` port were simply deep-sixed.

          Then, for MS Office XP, MS simply changed the look and feel and the licensed version was now obsolete and pointless.

      7. that one in the corner Silver badge

        Re: VM vs Process

        > It is called Wine. I've never really seen a use-case for it.

        So, you can never see a use for running a Windows-only executable on a Linux box?

        After all the stories about how "oh, we would change but we rely on X"?

        Personally, I use Wine to - run a favourite Windows-only program that works perfectly well for me but hasn't been updated for over a decade: I just like the way it works and am used to it, so firing it up without worrying about dual-booting (which would be a bit bleeping ridiculous - it is an editor, so edit, reboot, test, reboot, edit, reboot gets boring fast) or even using a VM (and faffing with file sharing - ok, ought to only need to be done once, but - ought) is hardly as smooth an experience as just having its window appear amongst the rest.

    2. Vometia has insomnia. Again.

      Re: VM vs Process

      Back in the ICL days, VME called what is nowadays usually referred to as a process as a VM, and its idea of a process is what's now termed a thread; not sure if the latter was widely used (or used at all) back then, though. But AFAIK that's just a terminology difference rather than any equivalence of functionality regarding a "microVM".

      1. bazza Silver badge

        Re: VM vs Process

        I guess it's all down to what one means by a "machine". These days, we tend to think of it as the thing that has an IP address, shared by all its processes. Containers have blurred that, microVMs seem to be an extreme way of achieving IP addresses per process, and so on.

        I think people assume that there is somehow stronger separation between code running in VMs than in processes scheduled by a single OS, that one process definitely cannot interact with another or escape from the VM into the wider machine. I'm not sure that's true any more, certainly not with microVMs where half the action is actually occurring in the host.

      2. TV nerd

        Re: VM vs Process

        VME's idea of a VM was a little more involved as far as I remember it. In particular the lower half of the address spaces was "per-VM". Each VM could also be run with its own set of system/application libraries. Hardware assisted security features helped prevent cross-VM interactions.

        VM's could also host emulations as finally happened with George and to a lesser extent unix.

        I guess my take is that the combination of the "micro VM" and "containers" gets linux systems to where VME was in, er, 1973?

    3. Orv Silver badge

      Re: VM vs Process

      I think this probably makes sense in cloud scenarios where you're often paying by the minute for resources consumed. You don't want extra computing power just lying around sucking up billable minutes.

  6. Bebu
    Windows

    Full circle?

    《designing OSes specifically to run as guests under another OS》

    I was thinking this is not so different from paravirtualization which was a pretty big part of x86_32 virtualization in the early days due to the limitations of the 32 bit architecture.

    I vaguely recall vmware used to trap privileged/illegal instructions and patch the running code in the VM to make the equivalent paravirtualization syscall. Open source kernels could be recompiled to use the para. calls directly.

    I have wondered how much of the went back to User Mode Linux (UML) which I had fiddled with at the time and thought insanely clever. :)

    1. Vometia has insomnia. Again.

      Re: Full circle?

      It was the case with IBM's VM, at least sometimes. One example is MUSIC which could run directly on the hardware, but it required VM if you wanted any networking; and I think IBM's Unix-for-S/370 needed it for various other "VM assists" too. Plus various other mainframe OSs just ran better under VM than they did on native hardware because it did the paging and stuff that they didn't bother with themselves.

    2. emfiliane

      Re: Full circle?

      Xen restarted modern paravirtualization (in old times *everything* was a little bespoke), and didn't even get unmodified host virt until version 3.0, when it also got 64-bit. It was around the same time that VMware decided to throw into the PV world, and aside from Linuxes and Unixes went so far as to get Microsoft to make a few kernel changes just for them.

      Then PV dried up and effectively disappeared a couple years later, scrubbed from both of their marketing materials. I guess the 64-bit and processor VM support really did obsolete the whole idea.

  7. Jamie Jones Silver badge
    Happy

    25ms?

    That's about as fast as my Sinclair ZX81!

    1. milliemoo83

      Re: 25ms?

      And still crashes less than Starbug.

  8. Ian Johnston Silver badge

    Sorting the init routines suggests they can be run in any order. So why sort them?

    1. that one in the corner Silver badge

      To get your services/servers up before your clients?

      Like the way that you name init[1] scripts 00fred, 99lucy and 50jim, expecting that they'll actually get sorted into "numeric" order before being invoked.

      [1] other init schemes are available, you heathen!

  9. adam 40
    Go

    Sorted!

    innit!???

  10. RLWatkins

    This has gone beyond being merely ridiculous.

    I love the performance, but....

    Firing up a VM, even a lightweight one, to execute a single function or to run a sub-second process the sort of thing that impresses gee-whiz kids, not experienced designers of actual systems. Don't ever do this. There are better ways.

    Yet in the real world we *do* see systems designed this way, and with nobody's name on them there's no way for us to know who never to hire.

    Like the man said, Hell is full and gee-whiz kids are walking the Earth.

  11. that one in the corner Silver badge

    first, get it working at all; then, make it go fast.

    If I may offer the version I was taught:

    First, get it working.

    Second, get it working correctly.

    Third, get it working fast

    I would hate to think that El Reg was in any way accidentally condoning the current trend of apparently forgetting step two.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like