See comment above
Ok, so..
But the implementation of TCP acknowledgement is implemented as a single stream with head of line blocking. One lost packet effectively pauses everything until the retransmission happens. (Well ok its a bit more complicated than that but the simplification is close enough to reality).
Kind of, although as you say, reality gets a bit more complicated. So figure on this example-
https://en.wikipedia.org/wiki/Head-of-line_blocking#/media/File:HOL_blocking.png
So yes, there's a risk of head-of-line issues, if the receiver is busy. But TCP provides more feedback to the apps of the connection state, should the developer choose to use that data. With TCP, a lost packet won't pause 'everything', it should only pause that specific TCP connection. Which admittedly gets FUN if the app is attempting to use parallel TCP connections for a single transfer.
The protocol designer on top of UDP is free to implement their own flow control and retry mechanisms just as TCP does over IP.
It's not really a case of 'free to implement', more essential to implement, especially if your app isn't loss-tolerant. But consider the wiki pic again, just with the 'switching fabric' being replaced with the Internet. Session 4 is people trying to get tickets from Ticketmaster, which is congested, so random UDP packets will be merrily filling the bit bucket.
So there'd be multiple potential blocks. The host, while the app tries to figure out what packets were lost and thus which packets to re-transmit, or re-request. Then peering locations, where there's frequently congestion and packet loss, but the peering routers have no knowledge of the application, or request state, and finally the server at the far end that may be congested, drop packets, but has a better chance of being 'app aware'.
Then add in multiplexing to try and cram more data into a single 'connection', and dropped packets will result in apps having to figure out how that impacts the muxed transport, and trying to re-request the lost data. So computationally far more expensive, and potentially leading to higher latency while the app tries to figure out what the hell is going on.
Meanwhile, buffers are still filling up, LIFO, FIFO or WRED is merrily dropping more packets and goodput falls through the floor. Especially if retransmission is occuring end-end, ie both host and server are in the middle of this mess. Basically it'll create huge spikes in retransmission/recovery any time there's congestion.
It's pretty much why real-time stuff like voice and video tends to run with a TCP control session so the apps have at least some chance of monitoring and managing link state. It's also probably not something the 'network' can fix. So an app may be able to prioritise sessions it thinks are important, but it won't be able to signal that to the network, ie routers. That implies prioritisation at the network level, which according to 'Net Neutrality fans is a very bad thing.. Even though prioritising real-time transmissions is arguably a good thing.
So it's a little strange. We know network connections are frequently congested, packet loss is common, so any new protocol that promises to improve performance by creating more congestion problems seems a bit pointless.