Re: VM vs Process
It's quite illuminating looking back through Unix and Windows history.
Back in the bad old days, implementations of select() in various POSIXes / Unixes would have to poll each file descriptor for readiness. This was because the operating system didn't have the support for scheduling processes in response to interrupts from devices. Possibly, the devices themselves didn't raise interrupts anyway (things were a lot more primitive). This made select() (or anything like it) pretty inefficient. As POSIXes / Unixes improved, the whole device driver and kernel architectures changed where by a process could be properly suspended, and woken up as a result of the device underpinning the file descriptors becoming ready.
Meanwhile, over in Transputers, the very essence of the whole programming model was waiting synchronously on channels passing data (Communicating Sequential Processes - a useful evolution of the Actor Model that ZeroMQ implements. CSP is now coming back into fashion in languages like Go, Rust, and Erlang of course).
Windows went another way - asynchronous call backs.
When the Cygwin developers came to implement their select() (an similar functions), they realised to their horror that, fundamentally, it was impossible to have a process block on Windows waiting for a bunch of events, unless they were all network sockets (in which case Windows' own select() would do the job). So, in Cygwin, they had to spin up a thread per file descriptor, each one polling a file descriptor. Very old school, very inefficient (unless, all the file descriptors were actually sockets). Somewhere in the mists of the Cygwin dev chat there's a hilarious thread of conversation when they're exploring how to implement select(), and the disbelief that it was impossible!
When Boost came to do their network aio library they chose to do async call backs, purely so that Boost could be implemented sanely on Windows. Dig around deep enough in Boost's documentation and you can find this reasoning. Boost's documentation uses the words "Proactor" and "Reactor" in discussing the topic.
ZeroMQ straight up does not support IPC pipes on Windows, because it's impossible to implement ZeroMQ's polling on Windows.
The issue with trying to do reactor in Windows is that all you can do is wait until some I/O has been completed. So, for example, reading from an IPC pipe. In Windows, you have to have some sort of asynchronous execution (thread, some sort of async keyword, all depending on the language), and all that can do is carry out a read operation from the pipe of an amount of data. It blocks until the read operation is complete. Sure, you can then pass the results on to some other main thread. However, the point is that the read operation has already happened, it's over and done with by this point in time. So, what happens if the main thread decides that, actually, owing to other events on other objects, reading from that IPC pipe was not the right thing to do? Or that reading that amount of data was insufficient / too much? Or that the pipe should have been closed and not read at all?
This might sound overly fussy, but it does matter, especially in code that's got to deal with errors. In Windows, you can have an async call back stuck trying to read a socket that is never, ever going to complete because the other end has died. Sure, eventually something in the OS may timeout and close the socket (causing an exception in the blocked read). But it's a right nuisance if all you want is for the application to close, but it can't because there's a thread stuck somewhere trying to read data that's never going to turn up. With a reactor system, you don't even begin to try the read() until your reactor has been told that there is data to read, so the application can easily quit cleanly.
With select(), epoll() or any other reactor, no actual I/O has happened on the file descriptor yet. On Windows, your code has to read data to know the event has occurred (sometime in the past).
This is why WSL 1.0 is so fascinating. MS must now have built in support for reactors into the kernel that can operate efficiently on more than just sockets. Otherwise, a ton of stuff (e.g. dBus - not really found on WSL 1, but something that is totally dependent on an efficient reactor for IPC pipes) would run like a dog and take a ton of CPU time.
Of course, I'm assuming that they have made NT kernel changes to make reactors in WSL 1.0 efficient. If they have, then it'd be quite nice if they exposed that functionality through win32.dll as well