Not really, distributed computing has been around for a while. Deutsch et al’s eight fallacies of distributes computing were written in 1994-1997 and are all applicable to a cloud world -

“The network is reliable” is fallacy #1. Not considering failure cases properly for a service run by a different organisation at the other end of a WAN is pretty ridiculous. Experienced designers/architects don’t trust the network between two of their own services in neighbouring racks in a physical datacentre.

(Side note: Anyone running service in a cloud should assume they’re going to eventually see some truly weird failure conditions given the multiple levels of compute and network virtualisation stacked atop each other and run on heterogenous no-name hardware. If your application’s designed and monitored correctly, it shouldn’t matter that much.)

I can understand why problems connecting to APNs would cause problems messaging and hence authenticating iOS users. What is harder to understand is why a backlog formed and caused further problems. Keeping a long backlog of remote API requests (or doing unbounded retries etc.) which are irrelevant after a few tens of seconds because they are feeding into an interactive system is not a desirable property...

