back to article Network congestion algorithms have design flaw, says MIT

Trying to create a network that's fair, equitable, and starvation-free may simply be impossible – at least with current congestion control algorithms (CCA), an MIT study has found. Published ahead of a conference presentation this week, the MIT paper [PDF] found that, regardless of approach, CCAs like Google's BBR, FAST, and …

  1. cjcox

    Not necessarily related

    MIT is also rebranding itself simply as "Karen"

  2. Version 1.0 Silver badge
    Unhappy

    Congestion algorithm solution

    We've seen congestion issues for years but the current solution is to privatize the organization, pay big bonuses to the people running the organization and then, when congestion becomes a problem, just flush the untreated issues into a river. Congestion issues are being solved everywhere in the UK nowadays in a very profitable manner but let's not go swimming.

    1. Korev Silver badge
      Coat

      Re: Congestion algorithm solution

      To many people flushing their buffers?

  3. John Smith 19 Gold badge
    Unhappy

    Too many big buffers in too many nodes

    The internet was designed to run with packets (occaisionally) being thrown away.

    And every timet that happens now to get a new packet for the data stream you want all those buffers need to be gone through.

    Unless you split out the jitter and delay sensitive packets and give them priority (IE your VoIP packets and game control packets Vs that big dowload of pron you've got going on) that's always going to be the case.

    1. runt row raggy

      Re: Too many big buffers in too many nodes

      DSL called. it wants 1990 back.

    2. mtrantalainen

      Re: Too many big buffers in too many nodes

      Yes, internet is designed to route packets and congestion control algorithm tries to find optimal transmit rate when the network behavior is unknown (symmetric or asymmetric, data loss caused by poor radio links vs packets getting dropped because of too heavy traffic).

      Basically you have algorithms such as Vegas which tries to make sure that the RTT is as much as possible at all times and algorithms such as Cubic which tries to make sure that the bandwidth is fully utilized at all times.

      Combined with the fact that some routers have insanely huge queue buffers (e.g. up to 2-5 seconds of full transmission speed of their own interfaces) this ends up causing huge delays to everybody. Users of Cubic that truly only mind about the throughput (e.g. user downloading install files for latest game or windows update) think everything is well but gamers and teleconference users think the network is not usable at all.

      The problem with Cubic is that once it has filled the overly huge queue buffer in the router, the only way to push it away from the router is to fill that buffer even faster because such routers will start dropping packets only when the buffer overflows. And Cubic will not reduce transmit speed until the packet loss because of dropped packets signal that transmit speed is too high.

      Unfortunately, as long as any commonly used operating system has Cubic (or similar bandwidth maximizing congestion control algorithm) the only thing we can do to reduce latency is to make queue buffers in routers smaller.

      However, reducing queue buffers would reduce max bandwidth that can be (easily) measured in benchmarks so router manufacturers actually have negative incentive to do this in real world.

      I think the only way to fix this is long run is to start by fixing the benchmarks! If people benchmarking new routers would always report high latency for any given router during the bandwidth testing, then manufacturers would do something.

      I think optimal router queue would have low timeout for packets in the queue. Something like limit queue length to max 2 ms of current transmit speed of the router. Any packages that stayed in the queue for longer would be dropped. That way any connection that tries to transmit too fast would be punished with packet drops but clients using congestion control algorithms that can accurately measure the optimal transmit speed (e.g. BBR, Vegas, CDG) could utilize the connection with full speed. Note that this queue timeout doesn't need to be implemented by every router – you would just need at least one router for the full end-to-end connection to enforce the limit and cubic would be forced to behave.

      Cubic is basically an optimal algorithm to maximize the queue length of all routers without dropping packets!

  4. Kevin McMurtrie Silver badge

    Duh

    Given the inputs available to the algorithm, not all scenarios can be uniquely detected. For some of those scenarios, you may choose greedy or cooperative solutions. This is why TCP is still around despite so many claims of amazingly better algorithms.

    1. Bitsminer Silver badge

      Re: Duh

      There are a variety of TCP congestion control algorithms....

      TCP Tahoe, TCP Reno, TCP New Reno and TCP Vegas...

      Three guesses as to why the developers tend to rely on, umm, gambling towns as motivation for their codes.

      1. runt row raggy

        Re: Duh

        or maybe they were the names of the bsd distribution.that introduced them. also.there are tens of congestion control schemes with no obvious connections to places or gambling at all

    2. mtrantalainen

      Re: Duh

      Congestion control algorithms try to solve interesting social problem.

      If everybody were running congestion control algorithm that tried to be as nice as possible, you can improve your personal internet experience simply by using more aggressive CCA and going "elbows out" in the internet – basically you could bully more bandwidth to yourself because all the other users using nice algorithms would see that you "need" more bandwidth and they would slow down their traffic to make more space for you.

      So there's obviously an incentive for more aggressive CCA for the benefit of the user running said CCA.

      However, at the same time, all users would benefit from low latency connections.

      With current internet with hugely oversized queue buffers in the routers, a couple of users optimizing purely for bandwidth (e.g. using CCA Cubic or Westwood) will fill every queue buffer on the route they are using and the latency for all users of those routes suffer as a result.

      The only real fix I can see is to change routers to start punishing for filling the queue with too many packets. RED (Random Early Detection) algorithm in routers tuned to much more aggressive than usual might be the correct answer. That would cause cubic to reduce the transmit speed much faster when links are congested and better algoritms that can avoid filling the queue could use more bandwidth.

      I think many better routers are already running RED so we just need more aggressive settings for it to enforce shorter queue. Luckily you only need one router for a single end-to-end connection to force the cubic at al to behave. If cubic tries to send too much, that single router will start to build queue and RED will then start to randomly drop packets. Since cubic is the pushing too much packets in the queue, the probability of dropping cubic packets is higher than dropping packets of better algorithms which have only a few packets in the queue.

  5. Warm Braw

    Arguably, there's too much reliance on good behaviour

    The interesting thing about this is that these attempts at "fair" capacity sharing is that they're pretty much dependent on transport implementations that are entirely under the control of end users doing "the right thing". There are things a "greedy" system could do (such as unnecessary retransmissions) that might reduce the likelihood of its traffic being dropped compared with that of another user. I don't think we can for ever rely on Internet users simply using the protocol stack that came with their operating system.

  6. Eclectic Man Silver badge
    Boffin

    Complexity Theory?

    I wonder if the mathematical complexity theorists have looked at this problem and determined whether or not it is, in general, NP-complete? This sounds like designing an effective CCA is related to the 'travelling salesman problem', which is known to be in general a difficult problem, as are many concerning graphs and networks.

    1. david 12 Silver badge

      Re: Complexity Theory?

      When a comp-sci graduate took over my work, the first thing he told management was that the problem was 'a difficult problem" (it was a traveling salesman problem).

      Actually, it was already solved: on a 1980's PC the brute-force solution took less time than the screen display.

      "NP-complete" is a scare-word. It's a class that includes small easy problems as well as large slow problems.

      1. Eclectic Man Silver badge

        Re: Complexity Theory?

        NP-Completeness refers to the complexity class of a problem, not specific instances of the problem itself. The travelling salesperson problem can be solved easily for a few nodes and edges, but for the general problem with a large number of nodes and edges (i.e. one where a brute force attack is not viable in a reasonably amount of time) there is no easy method of solution.

        From https://en.wikipedia.org/wiki/NP-completeness

        "a problem is NP-complete when:

        1. it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying all possible solutions.

        2. the problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

        The name "NP-complete" is short for "nondeterministic polynomial-time complete". In this name, "nondeterministic" refers to nondeterministic Turing machines, a way of mathematically formalizing the idea of a brute-force search algorithm. Polynomial time refers to an amount of time that is considered "quick" for a deterministic algorithm to check a single solution, or for a nondeterministic Turing machine to perform the whole search. "Complete" refers to the property of being able to simulate everything in the same complexity class."

      2. mtrantalainen

        Re: Complexity Theory?

        You cannot brute-force traveling salesman problem (TSP) even with todays computers when number of cities goes high enough.

        This is because the brute force algorithm for TSP has runtime O(n!). Computing the solution for 30 cities would require at least 30! operations or longer than the total age of the universe using a single 4 GHz core. Scaling to multiple cores wouldn't help a lot.

        You can indeed calculate exact solutions using clever algoritms but brute force doesn't cut if you want exact solution.

        For any practical use, you can use approximations (which may result in ~25% longer route than exact solution) which can run practically realtime for any number of cities. If you need to visit 50 cities, it's better to use the extra 25% on traveling than to wait for the exact solution before starting to travel.

        See https://en.wikipedia.org/wiki/Travelling_salesman_problem#Computing_a_solution for details.

      3. John Smith 19 Gold badge
        Unhappy

        NP-complete" is a scare-word. It's a class

        True.

        If the specific instance of a problem you were solving was small enough then it could be brute forced.

        If considerably larger you'd be into various heuristics that more-or-less give you a fairly good result (depending on your exact problem and definition of "fairly good")

        The joker in the pack would be was the size of problem being dealt with likely to grow. This is where NP-complete problems turn nasty.

        I'd say you were both right. You had a solution at the current scale, they were saying if the problem got bigger that solution would not in a reasonable amount of time. Both are right, within a certain context.

  7. This post has been deleted by its author

    1. FILE_ID.DIZ

      Unless you operate a RFC 1072 compliant network, otherwise known as LFP, or Long, Fat Pipe. They are inherently subpar to lab real results.

    2. emfiliane

      > Run fatter wires.

      Much like wider freeways, more bandwidth has diminishing returns. You can have infinite bandwidth and still have latency choking your real speed.

  8. Mayday
    Angel

    Not needed if everyone implements RFC1149

    https://www.rfc-editor.org/rfc/rfc1149.txt

  9. Graham Cobb Silver badge

    Explicit congestion signalling

    Personally, my thought is that explicit congestion signalling (as mentioned in the article) is the best way to manage congestion. Unfortunately, it immediately creates incentives for "rogue" systems to ignore the signal and take advantage of everyone else reducing their offered load to steal unfair amounts of bandwidth.

    On the other hand, the world has changed a LOT since the 1980's and 90's. In many ways there is much more of a mono-culture of transport layer implementations. In particular, if Microsoft and Linux implement fair rules that would deal with most end points counted by device (corporate networks, phones, etc).

    But it wouldn't prevent a new entrant with its own devices, such as an entertainment system (think Apple TV, or a games console) or large scale video network (think high-def surveillance cameras) creating their own "rogue" implementation to get unfair transfer rates at everyone else's expense.

    1. mtrantalainen

      Re: Explicit congestion signalling

      The problem with ECN is exactly as you described: it gives edge to any users that don't pretend to lose the packet when ECN marked packet is received.

      At the same time, we already have systems in the internet that use congestion control algoritms such as cubic and do not support ECN marking so even if you had a system that supported ECN and routers would ECN mark packets instead of dropping when the queue starts to fill, you could only donate more of your bandwidth to cubic users if you honored ECN mark!

      As such, there's zero incentive to honor ECN marking in real internet because that would only slow down your own connection and cubic streams would still fill the queue of every possible router so you would end up with less bandwidth and equally bad latency to the situation where you just ignore any ECN mark.

      The only way to really fix this mess is to start punishing cubic users on routers. Luckily only one router for each end-to-end connection is required to effectively push back against cubic users. However, for best results, this router must be the single bottleneck for the whole connection. And as we don't know which router will be the bottleneck for any random end-to-end connection over the internet, we would need every router to slowly implement this to reach optimal state.

      As long as the router's max queue length is limited to less than half the best possible RTT of users transmitting packages over it, there will be incentive to use better congestion control algoritms than cubic because in that case algorithm that doesn't fill the buffer would have better throughput.

      With nearly unlimited buffering in routers cubic is nearly optimal algorithm for max bandwidth so any user that doesn't mind latency will use it today. And this will not change unless cubic stops being the (nearly) optimal algoritm for bandwidth!

  10. Yes Me Silver badge

    Free lunch

    "CCAs have to choose at most two out of three properties: high throughput, convergence to a small and bounded delay range, and no starvation"

    Is this supposed to be news? The people who design and implement congestion control for a living have known this for many years. It's nice to have the maths, but the absence of free lunch has long been known. The closer you get to a bounded delay target, the more traffic will be completely excluded because admitting it will increase the delay.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like