"Oops! There was a problem finding suggestions for you."
There's a never a good time for these cutsie, oopsie! Error messages. Why do companies insist on using them?
The domain name server "OpenSRS" crashed and burned during the dead of night following a network failure, taking down scores of customers' portals with it. Normal service has yet to resume. OpenSRS is owned by the operating service Tucows, and according to builtwith.com, there are about 690,000 active websites using it. …
Almost 40 million residents of Japan spent the weekend in The Time Before Smartphones after local telco KDDI Corp. experienced its biggest outage to date – affecting both voice calls and data communications.
Luckily for the company and its customers, the outage began in the wee hours of Saturday morning – 1:35AM (1635 Friday UTC) to be exact – rather than peak time.
However, the disruptions did drag on, at least for some, until Monday morning.
Opinion Edge is terribly trendy. Move cloudy workloads as close to the user as possible, the thinking goes, and latency goes down, as do core network and data center pressures. It's true – until the routing sleight-of-hand breaks that diverts user requests from the site they think they're getting to the copies in the edge server.
If that happens, everything goes dark – as it did last week at Cloudflare, edge lords of large chunks of web content. It deployed a Border Gateway Protocol policy update, which promptly took against a new fancy-pants matrix routing system designed to improve reliability. Yeah. They know.
It took some time to fix, too, because in the words of those in the know, engineers "walked over each other's changes" as fresh frantic patches overwrote slightly staler frantic patches, taking out the good they'd done. You'd have thought Cloudflare of all people would be able to handle concepts of dirty data and cache consistency, but hey. They know that too.
Here:s a novel cause for an internet outage: a beaver.
This story comes from Canada, where CTV News Vancouver yesterday reported that Canadian power company BC Hydro investigated the cause of a June 7 outage that "left many residents of north-western British Columbia without internet, landline and cellular service for more than eight hours."
That investigation found tooth marks at the base of a tree that fell across BC Hydro wires. Canadian mobile network operator shares the poles BC Hydro uses, so its optical fibre came down with the electrical wires.
Infrastructure operators are struggling to reduce the rate of IT outages despite improving technology and strong investment in this area.
The Uptime Institute's 2022 Outage Analysis Report says that progress toward reducing downtime has been mixed. Investment in cloud technologies and distributed resiliency has helped to reduce the impact of site-level failures, for example, but has also added complexity. A growing number of incidents are being attributed to network, software or systems issues because of this intricacy.
The authors make it clear that critical IT systems are far more reliable than they once were, thanks to many decades of improvement. However, data covering 2021 and 2022 indicates that unscheduled downtime is continuing at a rate that is not significantly reduced from previous years.
Internet interruption-watcher NetBlocks has reported internet outages across Pakistan on Wednesday, perhaps timed to coincide with large public protests over the ousting of Prime Minister Imran Khan.
The watchdog organisation asserted that outages started after 5:00PM and lasted for about two hours. NetBlocks referred to them as “consistent with an intentional disruption to service.”
The Atlassian outage that began on April 5 is likely to last a bit longer for the several hundred customers affected.
In a statement emailed to The Register, a company spokesperson said the reconstruction effort could last another two weeks.
The company's spokesperson explained its engineers ran a script to delete legacy data as part of a scheduled maintenance for unidentified cloud products. But the script went above and beyond its official remit by trashing everything.
Who, Us? Atlassian has published an account of what went wrong at the company to make the data of 400 customers vanish in a puff of cloudy vapor. And goodness, it makes for knuckle-chewing reading.
The restoration of customer data is still ongoing.
Atlassian CTO Sri Viswanath wrote that approximately 45 percent of those afflicted had had service restored but repeated the fortnight estimate it gave earlier this week for undoing the damage to the rest of the affected customers. As of the time of writing, the figure of customers with restored data had risen to 49 percent.
It appears that even users of Elon Musk's Starlink service are not immune to the odd bit of borkage as the broadband-from-orbit system suffered an outage at the weekend.
Starlink boasts of serving up high-speed, low-latency broadband via its constellation of satellites, claiming download speeds ranging between 100Mb/s and 200Mb/s with network latency as low as 20ms in some locations, according to the company.
Users around the world reported issues on Saturday morning, around 04:20 Eastern (11:00 UTC), with dishes stuck hunting for satellites. Customers in Europe and the US were forced to face the horror of the real world for as long as 20 minutes as their Musk-provided service became unavailable.
Atlassian opened its Las Vegas Team '22 event this week with products aimed at giving management an insight into projects – and developers an insight into what is actually going on.
The timing of the launch is just a little awkward: Atlassian is suffering a significant ongoing outage, with Jira, Confluence and Opsgenie unavailable for some customers. Trello and Bitbucket remain operational.
What's also unfortunate is the stuff unveiled on Wednesday is all about communication and keeping an eye on projects. The three new products, in various stages of early access, are Atlas, Compass, and Analytics.
Updated DevOps outfit CircleCI is suffering from performance issues that have been ongoing for most of the day, with developers left waiting for the platform to come back online.
The service keeps an eye on the likes of GitHub and Bitbucket and fires off a build when commits are made. It then tests builds and pings out notifications if things go wrong. The platform also features a managed cloud service.
However, today things went a bit awry. CircleCI described the issue as "degraded performance," which had begun with "database read/write delays." Meanwhile, over on Twitter, the firm promoted a Lego raffle as users became increasingly antsy.
Biting the hand that feeds IT © 1998–2022