> no server is more than 10 per cent more burdened than others in 10 jumps instead of 100,
I've read that about ten times. What?
Professor Mikkel Thorup of the University of Copenhagen claims his research team has vastly improved a dynamic load balancing algorithm for server workloads that is already used by tech giants like Google and Vimeo. The algorithm ensures that incoming requests from clients, like those connecting to streaming video services, …
AbstractIn dynamic load balancing, we wish to distribute balls into bins in an environment where both balls and bins can be added and removed. We want to minimize the maximum load of any bin but we also want to minimize the number of balls and bins that are affected when adding or removing a ball or a bin. We want a hashing-style solution where we given the ID of a ball can find its bin efficiently.
So server A is less than 10% more burdened than server B. If B has 50, A has 50-55.
So if the binning system balances load across a distance of 10 as opposed to 100, it means the balancing function is working more efficiently. Each of those jumps has a cost, so achieving balance faster (in less moves) is more efficient and also means a running cluster will run closer to equilibrium.
At least if the article authors numbers and quoting are right. The text, while interesting, is dense enough I gotta stare at it over a cup of joe at end of shift. Not lunch material. Too much fiber.
Has anybody build a library implementation of this yet?
I'm reading a marketing press release.
What they are describing is a bit like the old probes that pulled server metrics such as CPU and RAM when doing weighted load balancing, exactly like we used to do about 20 years ago, yes it levels out load changes from sessions stopping and new ones being added, so nothing new there at all. Same thing with their description of sticky sessions, which seems a bit odd given the move to stateless techniques where possible.
It suggests that people will start using it immediately - well, its decades old so I'd guess some are already
I'm also struggling with the idea of a billion servers for one platform, that seems rather a lot for a world population of 7.9 billion.
But as they say 76.345% of all statistics are made up on the spot.
From my perspective, nothing new here, just a regurgitation of already known problems and already engineered solutions.
No, this is nothing like "the old probes that pulled server metrics such as CPU and RAM when doing weighted load balancing"
To understand what this is about, imagine a distributed object storage system. There are M servers ("bins"). Each incoming object ("ball") is written to one of those servers. You want to distribute them evenly, so that you don't run out of capacity on one server while others have free space.
When writing, it would be easy to pick the one with the most free space. That's what you're describing.
The problem is, how do you *read back* a given object? Either you have to search for it across every server, or you have to maintain a huge database of every object and its location.
To avoid the huge database, you want to locate an object from just a hash of its name. Such an algorithm will necessarily require objects to be moved when you add or remove a storage server, and you want to minimise the number of movements in that case.
This is what storage systems like Ceph and Swift do. They use variations on "consistent hashing" which is mentioned in the paper as the previous state-of-the-art. However, such algorithms give effectively 'random' distribution. If all the objects are the same size then this gives pretty good results, but if you have a handful of huge objects mixed in, you can get badly out of balance.
The paper describes a better algorithm which avoids having a central database, maintains a good balance within some constraint factor, and minimises the number of servers you have to look on to find a particular object.
This is a big deal. You should give them credit for it, rather than dismissing something you don't understand as being nonsense.