back to article What is it with cloud status pages not reflecting reality?

Internet services in the US on Thursday were far more stable than those in Ukraine and Russia, but even so reports of problems surfaced. DownDetector.com, which tracks service outages from individuals along with real-time data analysis, showed spikes reflecting connectivity issues for Amazon Web Services at about 1700 UTC for …

  1. ShadowSystems

    An easy way to prove it.

    Write a script to Ping & TraceRoute via multiple VPN's, dump the results to a log file, & repeat itself every ~10 minutes throughout the day.

    If you get crap/no connections from a direct connect attempt, but decent/any connection from the VPN endpoints, then the target knows that what their status page claims for your supposedly available status is a lie.

    Example: if your local ISP is sending your entirely local signal on a ~300 mile round trip before coming back to the entirely local server, you can use the log to prove that fact, then to beat them with that fact to rip & rebuild a proper local circuit.

    AWS can claim what they want, but a simple Ping+TraceRoute script run every few minutes for a few (days/weeks/months) will prove their claims as utter hogwash.

    I did this to prove to my ISP that they were blowing smoke up my arse as far as signal degredation was concerned, so if they didn't do an R&R to establish a proper *entirely local* connection to the servers located in my own damned town (rather than the 300+ mile round trip the script proved), I'd use that as proof by which to complain to the various government agencies about the ISP's incompetence/fraud.

    Two days later I had a proper local connection.

    If you're having problems with $Provider claiming to be available when you can't reach them, consider running a Ping+TraceRoute>LogFile.txt for a few days to generate the evidence you need to beat them with a ClueBy4. =-J

    1. Anonymous Coward
      Anonymous Coward

      Re: An easy way to prove it.

      "Two days later I had a proper local connection."

      Or, two days later you had a 300+ mile round trip connection implemented at layer 2 instead of layer 3.

    2. The Man Who Fell To Earth Silver badge
      FAIL

      Re: An easy way to prove it.

      Amazon lies about everything. I've been tracking their on-time delivery rate for years. Since the beginning of 2022, I've had 38 deliveries from Amazon and their on-time rate is only running 92.31%. On-time is defined as delivering on the date they promised at the time of the order.

  2. Nate Amsden

    Dyn

    Dyn had a pretty big outage I think last Thursday, it was regional. Their status page indicated they were in maintenance doing some big migration the maintenance was taking a month or more they were taking their time to be careful. But for at least 30+ minutes in several areas DNS was unreachable. I emailed them and they finally acknowledged the issue and updated their status page(with a "partial outage" message).

    https://www.dynstatus.com/incidents/kc7plp9945ng

    Kind of surprised I didn't notice any news articles about it. Still waiting to see if they release a root cause, but from my perspective it was their biggest outage since the big DDoS attacks many years ago. Though it was regional, it took a long time for my external monitors to trip. The issue was related to their anycast system, BGP broke down somewhere.. Dyn has a super strict SLA too (unlike Amazon), something like if you can't reach their DNS for more than 15 seconds then it's an outage. This was consistently unreachable for a very long period of time from certain regions (my home connection was unaffected).

    Two decent outages in ~13 years as a customer of Dyn(that I can recall anyway). Not perfect but something I can live with. I think Amazon had more outages in Q4 2021 alone. I've had more outages from our CDN providers as well over the years, maybe half dozen decent outages in the past decade.

  3. DS999 Silver badge

    Around the same time

    Apple supposedly had some brief outages with iCloud and iMessage around that same time. The former was maybe caused by AWS issues, but they use their own servers for iMessage (and the traffic usually goes direct unless both are behind NATs/firewalls) so there may have been something bigger happening at that time.

    Not much of a stretch to suggest it may be somehow related to the situation in Ukraine, i.e. Putin's henchmen doing some quick tests of their ability to knock down western infrastructure if given the order.

  4. GuildenNL

    Was MUCH larger than even reported here

    My ISP was down across the USA (Cox). Thinking major USA cyber attack, I checked 15-20 major ISPs and services, all had a hit at this time. Google, Azure and AWS took hits. Nothing reported in public makes me consider that my initial suspicions were not off. Was a 5-10 minute hit.

    1. cyberdemon Silver badge
      Paris Hilton

      Re: Was MUCH larger than even reported here

      A cyber attack may be likely, but what makes you think the aggressor was the USA, rather than say, Russia?

      1. GuildenNL

        Re: Was MUCH larger than even reported here

        I didn’t say who. My thoughts were either Russia or Ukraine (let’s be honest admit where many of these good live.) Perhaps Ukraine to anger America?

        Who knows?

        I tried to get comments from former L3 people and no comment. Something was off on a backbone or two, but what and why?

  5. ronkee

    Status page is mostly marketing. If they were really serious about it being usable it would be a lot more usable.

    They host their status page on their own infrastructure. Hence it breaks whenever there's a really big outage. If your first architectural decision is that compromised then it sets a low bar for trust. They'd never let a spokesperson say that of course.

  6. Doctor Syntax Silver badge

    Meh

    File in the Rice-Davies folder.

  7. thejoelr

    Automated status page..

    During the wonderful 2021 winter AWS outages, one was so bad they couldn't update the status page as they did normally and had to find another way. So, automation would be prone to such a situation. That said, yes, status pages are usually useless and one of the last places I hear about an outage.

    1. bombastic bob Silver badge
      Unhappy

      Re: Automated status page..

      It does not help when the status page is hosted by the troubled network...

  8. Steve Davies 3 Silver badge
    Pirate

    Clouds are not permanent

    They aren't solid. They come and go. AWS is just like clouds in the sky. Here today, gone tomorrow.

    Sadly the beancounters out there rule the IT roost. They see 'putting everything and everything' in the cloud as a way of removing CAPEX from their bottom lines.

    One day, one of them will get royally hacked and be down for days costing its customers Billions.

    With what Putin is doing in the Ukraine, what is to stop him from getting his hackers to take down AWS or any of the other major cloud service? Nothing.

    Watch this space...

  9. Anonymous Coward
    Anonymous Coward

    It's not complete

    When I was at Amazon, EU-WEST-1 had a fibre bundle down to one data centre, which affected some production loads we were running. This affected one building, not the full AZ and the status indicated showed no issues. But there were for our production load which failed. So it depends on what you think this reflects

  10. bombastic bob Silver badge
    Facepalm

    Rule #1 of Outage Club - do not talk about Outage Club!

    The comments by the AWS spokes-droid at the end of the article are SO typical of the kinds of "There ARE no problems" spin-meistering you seem to get from governments and large corporations. Everything burning down around you, with frequent explosions and sounds of collapsing buildings near by, and it's a "minor problem" that you're "working on" and "everything's ok, no worries, la la la, in my own world now."

    that 'reality' thing... "My beautiful bubbles, stop bursting them!!!" (a quote from a P.A. in one of my favorite video games)

    icon, because, facepalm

  11. DJV Silver badge

    What I want to know is...

    ...if anyone out there has a status page that's monitoring whether or not the Status Page Status Page is up and running!

  12. OldSod
    Pint

    It can happen to the best of us

    Just this morning I could reach theregister.com on my mobile phone, but not on my laptop. The laptop showed a resolution for the name, and I could ping the name/IP address, but trying to make an http(s) connection to theregister.com resulted in a "server not found" error in both Safari and Firefox. I rebooted my laptop and could then get to The Register's web pages.

    I have suffered myself trying to figure out how to show how a service that I was responsible for was up or down when the answer was almost always "it's mostly up but decidedly down for some folks". The problem with obtaining an "are they up or not" view of any major Internet service is compounded by the fact that we don't all see the same view of the Internet due to things like different routing paths, proxies, and (for media) content delivery networks.

  13. Anonymous Coward
    Anonymous Coward

    AWS’s Personal Health Dashboard lies too, it’s absolutely useless in my experience.

  14. Anonymous Coward
    Anonymous Coward

    Standard

    When using a standard service management system, a status page can be generated automatically. The statuses only change if a P1 or P2 incident is raised and there’s a process behind that.

    1. yetanotheraoc Silver badge

      Standard Operating Procedure aka Positive Feedback Loop

      Does the process involve the service desk looking at the status page, and only agreeing to open a ticket if it's not green?

  15. Henry Wertz 1 Gold badge

    Network problem?

    More likely it's just what the article suggests, that it's manager description to flip the flag to yellow or red and they just didn't do that.

    But another possibility, Amazon could have had everything working, all their peer internet links working, but some link in between them and some users was down. I had a "Google outage" once, my ISP's links were up, I think Google's were up, running "mtr" (a modern traceroute replacement) showed some link in between was down (... and the traffic was not being properly rerouted through some other path.) It came back up some time later, I didn't re-check with MTR if the link came back up, or if the traffic was rerouted.

  16. beekir

    Azure, Too

    I can't speak to AWS but Microsoft has a long-established track record with keeping silent about widespread outages. I frequently find headlines about Azure outages but they never seem to show up on the official status and/or history page.

  17. Anonymous Coward
    Anonymous Coward

    Threshold

    I was told (by AWS) they apply a threshold. So if the customer impact is under x % then the status is green. The magic threshold number is secret.

  18. Disgusted Of Tunbridge Wells Silver badge
    Coffee/keyboard

    EA, Sony and Activision are all terrible for this too

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like