
Cheers to that guy who hit the ENTER key!
I've certainly done some eff-ups, but nothing that comes remotely close to this scale.
The Atlassian outage that began on April 5 is likely to last a bit longer for the several hundred customers affected. In a statement emailed to The Register, a company spokesperson said the reconstruction effort could last another two weeks. The company's spokesperson explained its engineers ran a script to delete legacy data …
"We maintain extensive backup and recovery systems"
Backup maybe but not recovery.
There does not seem to be a system in place to recover and restore this particular set of data. Because if there had been a well maintained and tested recovery system in place the data would have been restored days ago.
It would not have taken a week to restore just 9% of their customers data.
Or the Backup System is designed to recover from a System level failure - not deleted individual data sets? So they're having to rebuild a complete system from backups before they can pull the data sets?
Suspect this setup common in cloud land, hence the need for customers to somehow keep backups of their own data sets (read the small print for MS365/Azure for example)
Given the embarrassment caused by this outage, cloud providers need to look at their customer backup/recovery processes?
Probably this.
Working with Confluence in Jira a good number of years ago, the only way to give someone a copy of one workspace was to backup everything, then restore it all to a new instance, delete everything you didn't want and then export what was left over.
Caught in their own incompetency it seems.
@thondwe “Or the Backup System is designed to recover from a System level failure - not deleted individual data sets? So they're having to rebuild a complete system from backups before they can pull the data sets?”
I think you are very much on the money there. Probably made all the more difficult and time consuming by this @AC
“the only way to give someone a copy of one workspace was to backup everything, then restore it all to a new instance, delete everything you didn't want and then export what was left over.”
It looks like there is no way to export/backup import/restore at the workspace or even the individual customer level. If that is the case, then Jira/Confluence etc is missing a very important disaster recovery tool/feature for both Atlassian’s customers and themselves.
Any ransomware outfit would be delighted to come up with a script that does so much damage for so little effort - the destructive efficiency on show is unprecedented. I hope Atlassian are keeping the script under lock and key, because we would be well and truly f**ked if the Russians got hold of it!
But on a more serious note, I would like to offer heartfelt and genuine sympathy to the person or persons responsible. It could happen to the best of us at some point in our careers. I can only imagine how they have felt the last week or two, chin up guys and/or girls!
@thondwe
"Given the embarrassment caused by this outage, cloud providers need to look at their customer backup/recovery processes?"
They won't. It's just easier to go cheap and write it off. This is clearly a case of an incomplete BCP, which is another area that businesses like to go cheap on. Maybe backup software but no recovery plan at all.
>> It sounds like Atlassian haven't properly tested their backups.
That's a hugely simplistic and disingenuous statement to make!
Imagine your standard database from mainstream database vendor. You can have in place all the fault-tolerance, log shipping, full and incremental backups, offsite storage and the ability to restore (relatively) quickly that database.
Now tell me how any of that will allow you to effectively restore data having had a script which has selectively removed thousands of records from dozens of tables.
That's much more like what it sounds like Atlassian are facing that a simple "restore the backup" scenario.
Not that this is intended to be a defence of Atlassian. When you have a process that may delete copious amounts of data, it's inherent on to take whatever time is necessary to ensure it does (only) the right thing. If that means taking three-times the time to have it run in a "what if" mode, then that's what you need to do. Yes it will take longer to do that, but I would suggest not as long as it's taking Atlassian to recover from this.
Arguably paranoia is the first requirement for anyone in the computer industry. We're all fighting against Resistentialism.
Yet another reason why I loath the drive for cloud, when it falls over (not if) you are fully at the mercy of cost optimised likely understaffed company to hopefully have a working backup and recovery system to sort it out at their leisure. When you have control over the system you are at the mercy of your own backup and recovery procedures which you yourself can influence.
after someone "stops" my PHB, because his brain is permanently-wired into accepting vendor wooing/outright kickbacks, and loving shiny-new-trendy things. He gets away with it because he's a "people-person" and he works it like a god. All his higher-ups thinks he's the greatest thing since toast.
Two incidents reflect the "benefits" of cloud.
1. Had to stand up a Citrix CVAD environment in the cloud service a couple of years back. If it had been on premises, I could have had the basic service setup in a day, then just spend time on tuning and optimising the VDAs. In cloudland it took a week to get the basic service as whenever I hit an issue, I could not do any troubleshooting and had to log a job and wait for them to get back to me. The first ticket I had to log was the cloud service sign up portal wasn't working, then downhill from there.
2. A client wanted to stand up some extra Azure VMs of a type they were already using. Unfortunately, due to capacity issues, these particular VM types were all reserved for large clients. Our client didn't fit into this category so had to build new servers using a different VM type. Basically "We are only giving these VMs to people we give a shit about, and that doesn't include you matey! Just pay your bill and move along."
"Instead of deleting the legacy data, the script erroneously deleted sites, and all associated products for those sites including connected products, users, and third-party applications."
With an SQL database any sufficiently paranoid DBA will run a SELECT with the same WHERE clause as the intended DELETE to check that what would be deleted is what's intended. I suppose this is all trendy NoSQL stuff. Doesn't that have the same facility?
If that was my cloud thing i'd ensure that any wholesale deletes go through a multistage process.
get a list of everything that will be affected by the delete process
determine a process to undo what your about to do
mark each of those entries/files with a will-be-deleted-on-x-date flag
see what is actually still using those entries/files, if they are for delete then no user processes should be accessing them
remove access to those identified entries/files & see what breaks
if no one has complained in 4 weeks then fully remove those entries/files.
its expensive, time consuming & no one will know when it goes really well, which is actually the point.
You want to make absolute certain mistakes won't happen so you over engineer things, that's why people buy cloud solutions, they expect the careful dedication to ensuring things go right is being done by the cloud provider and included in whatever the fee is. The cloud provider benefits from their scale by having teams manage all that boring stuff instead of each customer having all the necessary teams to do that individually.
they expect the careful dedication to ensuring things go right is being done by the cloud provider
Give me a moment until I have finished laughing, sorry. The big not-so-secret of these cloudy providers is that they have the same bookkeepers that moved your own company to the cloud in the first place, so I'm afraid you may have to tone down your expectations to "almost good enough" level, because that's what you're going to get - after all, the fog surrounding cloudy things ensures you have zero insight in how the vapour is actually produced.
They too have to satisfy shareholders they have not left a cent/penny/dime on the table.
This is why I'm holding onto my Server instance of Jira as long as I can. At this point it's either selling data-center to the PHB or finding another platform. They're pushing a less capable, less available product as far as I'm concerned to try and make the execs more money.
These kinds of screw ups in the cloud is why I won't trust core services to a cloud provider without multiple layers of redundancy I control. Inevitably someone will screw up and I'll be the one paying for their "learning experience".
This post has been deleted by its author
Broadcom has made its first public comment in weeks about its plans for VMware, should the surprise $61 billion acquisition proceed as planned, and has prioritized retaining VMware's engineers to preserve the virtualization giant's innovation capabilities.
The outline of Broadcom's plans appeared in a Wednesday blog post by Broadcom Software president Tom Krause.
Qualcomm knows that if it wants developers to build and optimize AI applications across its portfolio of silicon, the Snapdragon giant needs to make the experience simpler and, ideally, better than what its rivals have been cooking up in the software stack department.
That's why on Wednesday the fabless chip designer introduced what it's calling the Qualcomm AI Stack, which aims to, among other things, let developers take AI models they've developed for one device type, let's say smartphones, and easily adapt them for another, like PCs. This stack is only for devices powered by Qualcomm's system-on-chips, be they in laptops, cellphones, car entertainment, or something else.
While Qualcomm is best known for its mobile Arm-based Snapdragon chips that power many Android phones, the chip house is hoping to grow into other markets, such as personal computers, the Internet of Things, and automotive. This expansion means Qualcomm is competing with the likes of Apple, Intel, Nvidia, AMD, and others, on a much larger battlefield.
Three people accused of selling pirate software licenses worth more than $88 million have been charged with fraud.
The software in question is built and sold by US-based Avaya, which provides, among other things, a telephone system called IP Office to small and medium-sized businesses. To add phones and enable features such as voicemail, customers buy the necessary software licenses from an Avaya reseller or distributor. These licenses are generated by the vendor, and once installed, the features are activated.
In charges unsealed on Tuesday, it is alleged Brad Pearce, a 46-year-old long-time Avaya customer service worker, used his system administrator access to generate license keys tens of millions of dollars without permission. Each license could sell for $100 to thousands of dollars.
Microsoft's GitHub on Tuesday released its Copilot AI programming assistance tool into the wild after a year-long free technical trial.
And now that GitHub Copilot is generally available, developers will have to start paying for it.
Or most of them will. Verified students and maintainers of popular open-source projects may continue using Copilot at no charge.
Analysis Toxic discussions on open-source GitHub projects tend to involve entitlement, subtle insults, and arrogance, according to an academic study. That contrasts with the toxic behavior – typically bad language, hate speech, and harassment – found on other corners of the web.
Whether that seems obvious or not, it's an interesting point to consider because, for one thing, it means technical and non-technical methods to detect and curb toxic behavior on one part of the internet may not therefore work well on GitHub, and if you're involved in communities on the code-hosting giant, you may find this research useful in combating trolls and unacceptable conduct.
It may also mean systems intended to automatically detect and report toxicity in open-source projects, or at least ones on GitHub, may need to be developed specifically for that task due to their unique nature.
The latest version of OpenSSL v3, a widely used open-source library for secure networking using the Transport Layer Security (TLS) protocol, contains a memory corruption vulnerability that imperils x64 systems with Intel's Advanced Vector Extensions 512 (AVX512).
OpenSSL 3.0.4 was released on June 21 to address a command-injection vulnerability (CVE-2022-2068) that was not fully addressed with a previous patch (CVE-2022-1292).
But this release itself needs further fixing. OpenSSL 3.0.4 "is susceptible to remote memory corruption which can be triggered trivially by an attacker," according to security researcher Guido Vranken. We're imagining two devices establishing a secure connection between themselves using OpenSSL and this flaw being exploited to run arbitrary malicious code on one of them.
Mega, the New Zealand-based file-sharing biz co-founded a decade ago by Kim Dotcom, promotes its "privacy by design" and user-controlled encryption keys to claim that data stored on Mega's servers can only be accessed by customers, even if its main system is taken over by law enforcement or others.
The design of the service, however, falls short of that promise thanks to poorly implemented encryption. Cryptography experts at ETH Zurich in Switzerland on Tuesday published a paper describing five possible attacks that can compromise the confidentiality of users' files.
The paper [PDF], titled "Mega: Malleable Encryption Goes Awry," by ETH cryptography researchers Matilda Backendal and Miro Haller, and computer science professor Kenneth Paterson, identifies "significant shortcomings in Mega’s cryptographic architecture" that allow Mega, or those able to mount a TLS MITM attack on Mega's client software, to access user files.
EndeavourOS is a rolling-release Linux distro based on Arch Linux. Although the project is relatively new, having started in 2019, it's the successor to an earlier Arch-based distro called Antergos, so it's not quite as immature as its youth might imply. It's a little more vanilla than Antergos was – for instance, it uses the Calamares cross-distro installer.
EndeavourOS hews more closely to its parent distro than, for example, Manjaro, which we looked at very recently. Unlike Manjaro, it doesn't have its own staging repositories or releases. It installs packages directly from the upstream Arch repositories, using the standard Arch package manager pacman
. It also bundles yay to easily fetch packages from the Arch User Repository, AUR. The yay
command takes the same switches as pacman
does, so if you wanted to install, say, Google Chrome, it's as simple as yay -s google-chrome
and a few seconds later, it's done.
E-paper display startup Modos wants to make laptops, but is starting out with a standalone high-refresh-rate monitor first.
The initial plan is for the "Modos Paper Monitor," which the company describes as: "An open-hardware standalone portable monitor made for reading and writing, especially for people who need to stare at the display for a long time."
The listed specifications sound good: a 13.3", 1600×1200 e-ink panel, with a DisplayPort 1.2 input, powered off MicroUSB because it only takes 1.5-2W.
Amazon at its re:Mars conference in Las Vegas on Thursday announced a preview of an automated programming assistance tool called CodeWhisperer.
Available to those who have obtained an invitation through the AWS IDE Toolkit, a plugin for code editors to assist with writing AWS applications, CodeWhisperer is Amazon's answer to GitHub Copilot, an AI (machine learning-based) code generation extension that entered general availability earlier this week.
In a blog post, Jeff Barr, chief evangelist for AWS, said the goal of CodeWhisperer is to make software developers more productive.
Biting the hand that feeds IT © 1998–2022