Network congestion algorithms have design flaw, says MIT • The Register Forums

Monday 22nd August 2022 16:24 GMT cjcox

Not necessarily related

MIT is also rebranding itself simply as "Karen"

Monday 22nd August 2022 17:37 GMT Version 1.0

Congestion algorithm solution

We've seen congestion issues for years but the current solution is to privatize the organization, pay big bonuses to the people running the organization and then, when congestion becomes a problem, just flush the untreated issues into a river. Congestion issues are being solved everywhere in the UK nowadays in a very profitable manner but let's not go swimming.

12 3 Reply

Monday 22nd August 2022 18:09 GMT Korev

Re: Congestion algorithm solution

To many people flushing their buffers?

4 0 Reply

Monday 22nd August 2022 17:45 GMT John Smith 19

Too many big buffers in too many nodes

The internet was designed to run with packets (occaisionally) being thrown away.

And every timet that happens now to get a new packet for the data stream you want all those buffers need to be gone through.

Unless you split out the jitter and delay sensitive packets and give them priority (IE your VoIP packets and game control packets Vs that big dowload of pron you've got going on) that's always going to be the case.

4 3 Reply

Tuesday 23rd August 2022 04:53 GMT runt row raggy

Re: Too many big buffers in too many nodes

DSL called. it wants 1990 back.

0 2 Reply
Wednesday 24th August 2022 12:12 GMT mtrantalainen

Re: Too many big buffers in too many nodes

Yes, internet is designed to route packets and congestion control algorithm tries to find optimal transmit rate when the network behavior is unknown (symmetric or asymmetric, data loss caused by poor radio links vs packets getting dropped because of too heavy traffic).

Basically you have algorithms such as Vegas which tries to make sure that the RTT is as much as possible at all times and algorithms such as Cubic which tries to make sure that the bandwidth is fully utilized at all times.

Combined with the fact that some routers have insanely huge queue buffers (e.g. up to 2-5 seconds of full transmission speed of their own interfaces) this ends up causing huge delays to everybody. Users of Cubic that truly only mind about the throughput (e.g. user downloading install files for latest game or windows update) think everything is well but gamers and teleconference users think the network is not usable at all.

The problem with Cubic is that once it has filled the overly huge queue buffer in the router, the only way to push it away from the router is to fill that buffer even faster because such routers will start dropping packets only when the buffer overflows. And Cubic will not reduce transmit speed until the packet loss because of dropped packets signal that transmit speed is too high.

Unfortunately, as long as any commonly used operating system has Cubic (or similar bandwidth maximizing congestion control algorithm) the only thing we can do to reduce latency is to make queue buffers in routers smaller.

However, reducing queue buffers would reduce max bandwidth that can be (easily) measured in benchmarks so router manufacturers actually have negative incentive to do this in real world.

I think the only way to fix this is long run is to start by fixing the benchmarks! If people benchmarking new routers would always report high latency for any given router during the bandwidth testing, then manufacturers would do something.

I think optimal router queue would have low timeout for packets in the queue. Something like limit queue length to max 2 ms of current transmit speed of the router. Any packages that stayed in the queue for longer would be dropped. That way any connection that tries to transmit too fast would be punished with packet drops but clients using congestion control algorithms that can accurately measure the optimal transmit speed (e.g. BBR, Vegas, CDG) could utilize the connection with full speed. Note that this queue timeout doesn't need to be implemented by every router – you would just need at least one router for the full end-to-end connection to enforce the limit and cubic would be forced to behave.

Cubic is basically an optimal algorithm to maximize the queue length of all routers without dropping packets!

1 0 Reply

Monday 22nd August 2022 18:40 GMT Kevin McMurtrie

Duh

Given the inputs available to the algorithm, not all scenarios can be uniquely detected. For some of those scenarios, you may choose greedy or cooperative solutions. This is why TCP is still around despite so many claims of amazingly better algorithms.

6 0 Reply

Monday 22nd August 2022 21:33 GMT Bitsminer

Re: Duh

There are a variety of TCP congestion control algorithms....

TCP Tahoe, TCP Reno, TCP New Reno and TCP Vegas...

Three guesses as to why the developers tend to rely on, umm, gambling towns as motivation for their codes.

4 0 Reply
1. Tuesday 23rd August 2022 04:59 GMT runt row raggy
  
  Re: Duh
  
  or maybe they were the names of the bsd distribution.that introduced them. also.there are tens of congestion control schemes with no obvious connections to places or gambling at all
  
  4 1 Reply
Wednesday 24th August 2022 12:26 GMT mtrantalainen

Re: Duh

Congestion control algorithms try to solve interesting social problem.

If everybody were running congestion control algorithm that tried to be as nice as possible, you can improve your personal internet experience simply by using more aggressive CCA and going "elbows out" in the internet – basically you could bully more bandwidth to yourself because all the other users using nice algorithms would see that you "need" more bandwidth and they would slow down their traffic to make more space for you.

So there's obviously an incentive for more aggressive CCA for the benefit of the user running said CCA.

However, at the same time, all users would benefit from low latency connections.

With current internet with hugely oversized queue buffers in the routers, a couple of users optimizing purely for bandwidth (e.g. using CCA Cubic or Westwood) will fill every queue buffer on the route they are using and the latency for all users of those routes suffer as a result.

The only real fix I can see is to change routers to start punishing for filling the queue with too many packets. RED (Random Early Detection) algorithm in routers tuned to much more aggressive than usual might be the correct answer. That would cause cubic to reduce the transmit speed much faster when links are congested and better algoritms that can avoid filling the queue could use more bandwidth.

I think many better routers are already running RED so we just need more aggressive settings for it to enforce shorter queue. Luckily you only need one router for a single end-to-end connection to force the cubic at al to behave. If cubic tries to send too much, that single router will start to build queue and RED will then start to randomly drop packets. Since cubic is the pushing too much packets in the queue, the probability of dropping cubic packets is higher than dropping packets of better algorithms which have only a few packets in the queue.

1 0 Reply

Monday 22nd August 2022 18:56 GMT Warm Braw

Arguably, there's too much reliance on good behaviour

The interesting thing about this is that these attempts at "fair" capacity sharing is that they're pretty much dependent on transport implementations that are entirely under the control of end users doing "the right thing". There are things a "greedy" system could do (such as unnecessary retransmissions) that might reduce the likelihood of its traffic being dropped compared with that of another user. I don't think we can for ever rely on Internet users simply using the protocol stack that came with their operating system.

5 0 Reply

Monday 22nd August 2022 20:24 GMT Eclectic Man

Complexity Theory?

I wonder if the mathematical complexity theorists have looked at this problem and determined whether or not it is, in general, NP-complete? This sounds like designing an effective CCA is related to the 'travelling salesman problem', which is known to be in general a difficult problem, as are many concerning graphs and networks.

0 1 Reply

Tuesday 23rd August 2022 01:30 GMT david 12

Re: Complexity Theory?

When a comp-sci graduate took over my work, the first thing he told management was that the problem was 'a difficult problem" (it was a traveling salesman problem).

Actually, it was already solved: on a 1980's PC the brute-force solution took less time than the screen display.

"NP-complete" is a scare-word. It's a class that includes small easy problems as well as large slow problems.

2 2 Reply
1. Tuesday 23rd August 2022 15:38 GMT Eclectic Man
  
  Re: Complexity Theory?
  
  NP-Completeness refers to the complexity class of a problem, not specific instances of the problem itself. The travelling salesperson problem can be solved easily for a few nodes and edges, but for the general problem with a large number of nodes and edges (i.e. one where a brute force attack is not viable in a reasonably amount of time) there is no easy method of solution.
  
  From https://en.wikipedia.org/wiki/NP-completeness
  
  "a problem is NP-complete when:
  
  1. it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying all possible solutions.
  
  2. the problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.
  
  The name "NP-complete" is short for "nondeterministic polynomial-time complete". In this name, "nondeterministic" refers to nondeterministic Turing machines, a way of mathematically formalizing the idea of a brute-force search algorithm. Polynomial time refers to an amount of time that is considered "quick" for a deterministic algorithm to check a single solution, or for a nondeterministic Turing machine to perform the whole search. "Complete" refers to the property of being able to simulate everything in the same complexity class."
  
  0 0 Reply
2. Wednesday 24th August 2022 12:40 GMT mtrantalainen
  
  Re: Complexity Theory?
  
  You cannot brute-force traveling salesman problem (TSP) even with todays computers when number of cities goes high enough.
  
  This is because the brute force algorithm for TSP has runtime O(n!). Computing the solution for 30 cities would require at least 30! operations or longer than the total age of the universe using a single 4 GHz core. Scaling to multiple cores wouldn't help a lot.
  
  You can indeed calculate exact solutions using clever algoritms but brute force doesn't cut if you want exact solution.
  
  For any practical use, you can use approximations (which may result in ~25% longer route than exact solution) which can run practically realtime for any number of cities. If you need to visit 50 cities, it's better to use the extra 25% on traveling than to wait for the exact solution before starting to travel.
  
  See https://en.wikipedia.org/wiki/Travelling_salesman_problem#Computing_a_solution for details.
  
  1 0 Reply
3. Thursday 25th August 2022 07:50 GMT John Smith 19
  
  NP-complete" is a scare-word. It's a class
  
  True.
  
  If the specific instance of a problem you were solving was small enough then it could be brute forced.
  
  If considerably larger you'd be into various heuristics that more-or-less give you a fairly good result (depending on your exact problem and definition of "fairly good")
  
  The joker in the pack would be was the size of problem being dealt with likely to grow. This is where NP-complete problems turn nasty.
  
  I'd say you were both right. You had a solution at the current scale, they were saying if the problem got bigger that solution would not in a reasonable amount of time. Both are right, within a certain context.
  
  0 1 Reply

This post has been deleted by its author

Tuesday 23rd August 2022 06:59 GMT FILE_ID.DIZ

Unless you operate a RFC 1072 compliant network, otherwise known as LFP, or Long, Fat Pipe. They are inherently subpar to ~~lab~~ real results.

1 0 Reply
Tuesday 23rd August 2022 10:02 GMT emfiliane

> Run fatter wires.

Much like wider freeways, more bandwidth has diminishing returns. You can have infinite bandwidth and still have latency choking your real speed.

1 0 Reply

Tuesday 23rd August 2022 09:29 GMT Mayday

Not needed if everyone implements RFC1149

https://www.rfc-editor.org/rfc/rfc1149.txt

4 0 Reply

Tuesday 23rd August 2022 12:35 GMT Graham Cobb

Explicit congestion signalling

Personally, my thought is that explicit congestion signalling (as mentioned in the article) is the best way to manage congestion. Unfortunately, it immediately creates incentives for "rogue" systems to ignore the signal and take advantage of everyone else reducing their offered load to steal unfair amounts of bandwidth.

On the other hand, the world has changed a LOT since the 1980's and 90's. In many ways there is much more of a mono-culture of transport layer implementations. In particular, if Microsoft and Linux implement fair rules that would deal with most end points counted by device (corporate networks, phones, etc).

But it wouldn't prevent a new entrant with its own devices, such as an entertainment system (think Apple TV, or a games console) or large scale video network (think high-def surveillance cameras) creating their own "rogue" implementation to get unfair transfer rates at everyone else's expense.

3 0 Reply

Wednesday 24th August 2022 12:49 GMT mtrantalainen

Re: Explicit congestion signalling

The problem with ECN is exactly as you described: it gives edge to any users that don't pretend to lose the packet when ECN marked packet is received.

At the same time, we already have systems in the internet that use congestion control algoritms such as cubic and do not support ECN marking so even if you had a system that supported ECN and routers would ECN mark packets instead of dropping when the queue starts to fill, you could only donate more of your bandwidth to cubic users if you honored ECN mark!

As such, there's zero incentive to honor ECN marking in real internet because that would only slow down your own connection and cubic streams would still fill the queue of every possible router so you would end up with less bandwidth and equally bad latency to the situation where you just ignore any ECN mark.

The only way to really fix this mess is to start punishing cubic users on routers. Luckily only one router for each end-to-end connection is required to effectively push back against cubic users. However, for best results, this router must be the single bottleneck for the whole connection. And as we don't know which router will be the bottleneck for any random end-to-end connection over the internet, we would need every router to slowly implement this to reach optimal state.

As long as the router's max queue length is limited to less than half the best possible RTT of users transmitting packages over it, there will be incentive to use better congestion control algoritms than cubic because in that case algorithm that doesn't fill the buffer would have better throughput.

With nearly unlimited buffering in routers cubic is nearly optimal algorithm for max bandwidth so any user that doesn't mind latency will use it today. And this will not change unless cubic stops being the (nearly) optimal algoritm for bandwidth!

1 0 Reply

Wednesday 24th August 2022 04:31 GMT Yes Me

Free lunch

"CCAs have to choose at most two out of three properties: high throughput, convergence to a small and bounded delay range, and no starvation"

Is this supposed to be news? The people who design and implement congestion control for a living have known this for many years. It's nice to have the maths, but the absence of free lunch has long been known. The closer you get to a bounded delay target, the more traffic will be completely excluded because admitting it will increase the delay.

3 1 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Not necessarily related

Congestion algorithm solution

Re: Congestion algorithm solution

Too many big buffers in too many nodes

Re: Too many big buffers in too many nodes

Re: Too many big buffers in too many nodes

Duh

Re: Duh

Re: Duh

Re: Duh

Arguably, there's too much reliance on good behaviour

Complexity Theory?

Re: Complexity Theory?

Re: Complexity Theory?

Re: Complexity Theory?

NP-complete" is a scare-word. It's a class

Not needed if everyone implements RFC1149

Explicit congestion signalling

Re: Explicit congestion signalling

Free lunch

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

Tough luck, bosses, AI is coming for your job, too

HPE bakes LLMs into Aruba as AI inches closer to network takeover

Boffins deem Google DeepMind's material discoveries rather shallow

MIT breakthrough means there's no material too weird for 3D printing

Zero-day exploited right now in Palo Alto Networks' GlobalProtect gateways

96% of US hospital websites share visitor info with Meta, Google, data brokers

It's 2024 and Intel silicon is still haunted by data-spilling Spectre

CHIPS Act hangover sees most US science agency budgets cut for 2024

Starlink clashes with Telecom Italia over frequency data sharing

ZenHammer comes down on AMD Zen 2 and 3 systems

Virgin Media sets up 'smart poles' next to cabinets to boost mobile network capacity

Truck-to-truck worm could infect – and disrupt – entire US commercial fleet

About Us

Our Websites

Your Privacy