Simpler Solution?
Er, why not just use FreeBSD?
If it's FreeBSD levels of performance that they want, using FreeBSD guarantees getting that. Trying to improve Linux's is certainly possible, but it's not guaranteed.
Facebook wants better comms performance from the Linux kernel, and is recruiting developers to get it. Its job ad, here, says the House of Zuck wants a Linux kernel software engineer who will focus on the networking subsystem. “Our goal over the next few years is for the Linux kernel network stack to rival or exceed that of …
Probably because in other respects, Linux fits the needs of Facebook better. Not being Facebook, I cannot say why. Linux has wider hardware support, but that probably is a non-issue for a company that buys servers by the boatload, and therefore could specify devices that work well with FreeBSD. Perhaps their software has developed dependencies on the details of Linux.
@DougS,
"FreeBSD may have superior network performance, but Linux has superior performance in most other metrics that matter for a kernel."
Maybe, but they've seemingly identified network performance as their bottleneck. Hence their saying they want to improve on that. Other kernel performance metrics (memory allocation, context switch times, etc, etc) are clearly irrelevant to their system performance, so FreeBSD as-is would bring them a performance benefit (assuming they've not got dependencies on the specifics of Linux).
It's not surprising that their bottleneck is network performance. As soon as you start handling vast amounts of data the system I/O performance is king and almost nothing else matters in comparison.
For instance one of the biggest problems GPUs have in super computers is that they're not directly addressable node to node. To get data from one GPU to another in a different node it has to go via a PCIe bus to a CPU/memory, back across the PCIe bus to some sort of NIC, across some sort of interconnect (Myrinet, whatever), from the destination NIC across another PCIe bus into another CPU/memory and finally across that PCIe bus again one last time to the destination GPU. Great compute performance, terrible I/O, resulting in sustained performance not being anything like as good as peak performance (though of course that's very application dependent).
@John Robson,
"maybe if they used BSD then the memory *would* be a bottleneck, and a slower one than the linux networking bottleneck."
All right, suppose that BSD was crap at things like memory, scheduling, I/O, etc. If so, how does it manage to overcome all that to deliver better network performance than Linux? A network stack is a pretty thorough work out for pretty much everything the OS has to offer.
Short answer - FreeBSD is not slow at those other things.
"All right, suppose that BSD was crap at things like memory, scheduling, I/O, etc. If so, how does it manage to overcome all that to deliver better network performance than Linux? A network stack is a pretty thorough work out for pretty much everything the OS has to offer."
Perhaps because it can crank out superior performance at certain optimized workloads. It's sort of like quoting an "up to" Internet speed: they can achieve it in controlled conditions, but attempting to achieve this in a general workload may not be possible.
IOW, FreeBSD may have a better network stack but it's probably too specialized, and Facebook's workload can't take advantage of this. So they need something with FreeBSD's network performance but better tuned for a different kind of workload.
FreeBSD may have superior network performance, but Linux has superior performance in most other metrics that matter for a kernel.
No it doesn't. Linux is trying to be all things to all people, and just isn't optimised for the some of the key workloads that Facebook has. FreeBSD on the other hand has benefited from being a bit niche, in that companies that use it extensively - such as Yahoo! - have been able to get tweaks into it that do target those workloads.
FreeBSD has also benefited from not having so many developers and commercial entities pulling it in so many directions, nor has it suffered from well meaning developers screwing up performance with things like "O(1) algorithms everywhere" that ignored the bigger picture by creating bottlenecks at higher level abstractions.
Perhaps its from a philosophical point of view, that the GPL is their preferred license.
However the more accurate reasoning is probably because any engineer working to improve the Linux network stack is improving the data centre, mobile users and a small number of desktop user at the same time. Three birds one stone.
@Nextweek,
"Perhaps its from a philosophical point of view, that the GPL is their preferred license."
Perhaps, perhaps not. They are a money making profit machine, so they'd be quite motivated to use whatever it takes to maximise that. However they have open sourced quite a lot of the things they've done, which is certainly to their credit, so who knows!
"any engineer working to improve the Linux network stack is improving the data centre, mobile users and a small number of desktop user at the same time"
That assumes that they push their code upstream. They could just fork the kernel, and since they aren't distributing anything with Linux in it, they can keep the improvements to themselves while staying in compliance with the GPL.
"Perhaps its from a philosophical point of view, that the GPL is their preferred license."
I don't see the relevance. The BSD licence lets them do pretty much what they want with the OS. The stuff they develop themselves can be licenced how they choose.
I suspect it's more likely what others have mentioned. Their own stuff is tied into the Linux way, eg expecting system files in certain places, relience on /proc or any/some more of the other differences between the two differing systems.
This post has been deleted by its author
My guess would be "for performance reasons". You can gain quite a lot of performance by avoiding context switches, but in the context of networking classical layer model makes that difficult. You may want to either to move network drivers to user mode, or move application APIs to kernel mode.
I honestly can't say what FB would do, but I know what I would want to research given objective to improve performance of network stack.
Why would you have context switches with epoll[1]?
[1]: http://linux.die.net/man/4/epoll
Edit: actually did you mean mode switches (userland->kernel->userland)? They are not that expensive to matter and in the end the bytes have to be pushed to the network card (or read from, but pushing is usually the dominant part).
>research given objective
Quite.
A network stack that is intended primarily for routing (or packet filtering) has different design goals to one that's intended only to run as an end system. And you can optimised end systems differently depending on whether you're looking to increase the performance of a small number of streams for a small number of processes or to get the maximum throughput for hundreds or thousands of streams and a large number of processes/threads. General purpose operating systems have to provide performance across a wide range of use cases and they're bound to be suboptimal for many of them.
If Facebook are serious about networking performance, they should really be looking at running as much of the networking stack as possible on hardware designed for that purpose. A typical server-style Ethernet adapter with ring buffers to reduce the interrupt load to manageable levels isn't really where you would want to start if your goal is to maximise performance, At extreme load, it's what you throw away that matters as much as what you process and finding out when you're run out of receive buffers that half of the packets would have been better discarded is not helpful!
FaceBook offers no original content, their freetard userbase is egotistical and lacks self-respect. The website itself is mostly incomprehensible, confusing on purpose, privacy is practically non-existant, in short: FaceBook is a mess.
And it runs on PHP? The Tonka toy of programming languages.
So, they want an insider (well, contributor) who they can manipulate into changing Linux code to their liking. A company notorious for scrupulous and invasive information-gathering wants to meddle with the Linux kernel?
How about: Hell No!
Then again, I enjoy Mr Torvalds' rants just like the next guy who isn't the target.
I think you need to look a bit more in to what Facebook actually runs with, it might look like PHP in places, but it's far from the PHP most people know and it isn't exclusively PHP either - that's just the presentational layer. Facebook have also made a lot of their software open source for anyone to use.
The article (and ad) reference adding support for new (IPV6 I think) multicast protocols, and for speeding up and stabilizing IPV6. I don't think they're comparing FreeBSD and Linux for opening up a socket and dumping max MB/s (I think both kernels can use a sendfile() style call to read a file off disk straight into an ethernet device memory buffer so you can't get much faster than that...), but rather performance of the IPV6-specific stuff itself.
... actually I have no comment on that.
But, the fact of the matter is that it's informative when one implementation is faster than another, to see if the two have totally different designs (and one design is simply faster), if there's just numerous intangible tweaks, if there's a few adjustments that are the difference (probably not tunables or Facebook would have found "the magic tune for our workloads" and been done with it?) or what.