
The fallibility of cloud storage ?
This is news to me. Everything we've been told up to now was: move to the "The Cloud". "The Cloud" is more reliable and cheaper then the old fashioned solution.
Rackspace is running out of, er, space. At least as far as a portion of its Cloud Files customers served by the LON (London) region are concerned. According to the company, problems began on November 17 at 1545 UTC, as some customers experienced 503 errors when attempting to access their files. On November 24 – yes, a week …
"However, the question remains – how did this happen?"
Well, there are two scenarios.
The first is that the needle on their storage gauge just stuck at a certain point, they tapped the glass then it suddenly spun round and went beyond red.
The second is that they have the same sort of monitoring tools as they had at Chernobyl, and the reader only went so high.
Because GDPR effectively requires me to have control of my data and never be processed anywhere that's not subject to GDPR or similar regulations.
Yes, storing or moving encrypted data around is "processing".
Though I'm a mathematician and computer scientist and agree with you on this (you should be able to publish all your traffic and your public key on the front page of every newspaper and it not matter), legal requirements mean that you just can't do that for anything that is considered "personal data".
Guess what? Much of the data any company, business, charity, school or organisation stores is "personal data".
And the brief of "processing" is so broad that, no, I can't transmit my data to the US, even temporarily.
Also the 'cloud' is more than storage, its processing of that data. To process that data, it requires decrypting, so keys are required to be stored in the 'cloud', so will be plain text at some point, therefore those foreign entities have access to your data.
If its only storage you intend on using, sure, except now you are at the mercy of that foreign entity, so if nato stored their data in Russia, they could lose access at critical points, so requiring, like you should anyway, having multiple copies. Also all that data would need to be encrypted with quantum resistant algorithms to ensure that in the data's life expectancy it could not be decrypted.
Ignoring that regions can be usually used for gdpr and other compliance / regulatory requirements. They provide failure zones, so their whole infrastructure doesn't go down with a single screw up, just the zone, and allowing you to plan for said failure, as that is you side of the deal with the 'cloud' not theirs. Its also for locality for performance reasons. So that the data is close to the users of said data.
It's a bold move. Encryption standards become obsolete over time. Who knows what novel techniques or processing power will be available in twenty years? How will quantum computers affect present-day encryption?
You'd be very trusting to assume that your encrypted data is safe forever.
If they are only provisioning storage when the disks are full, it means that they didn't place an order for a few million quid's worth of SSDs whilst the array was 80% full. Which means that; they could not afford it (get out now!); their budgetary control system prioritizes bonuses over reliability (get out now) or they actually had no idea that disk space was getting low, because it's too hard to compute or their monitoring is poor (get out now).
I'm pretty sure that this storage isn't using SSDs for stuff beyond metadata, bulk storage like this is always on NL drives. If only a few % of requests are failing it sounds more like uneven distribution of data across the storage systems, there may be lots of space available overall but it's distributed across several different silos (most likely) and some of those could be getting full resulting in slower access times or allocation errors (just guessing of course).
Some storage systems at least in the past if you were 80% full it was too late already, you had to start planning to add more at 50-60% full. All depends on how well the storage stack handles such situations, I've never used Openstack and have never managed an object storage system so can't say myself. Another factor could be fragmentation, and reclaiming deleted space for use by other things. The systems may be struggling to re balance themselves under the load too, perhaps due to maybe limitations in rack space and/or power which perhaps Rackspace was trying to eek too much capacity out of too little of space(meaning maybe they need more racks/power and perhaps don't have enough available).
Often times the operator of the storage system doesn't know/expect such problems to exist until they actually encounter the situation for themselves. I know such situations has happened to me on many occasions, the one I like to cite the most perhaps is back in 2010 I deployed a new NAS cluster to my 3PAR T400 array, the companies were partnered with each other so they knew each other's abilities. The NAS vendor assured me they were thin provisioning friendly. After a month of operation that proved to be a lie. Fortunately the 3PAR system had the ability to dynamically re-allocate storage to a different RAID level to give me more space without application impact. To add to that point we were at the limit of the number of supported drives in our 3PAR array, if we wanted to add any more we'd have to purchase 2 more controllers(for a total of 4), which was not cheap. I spent 3-4 months running the data conversions (99% of it handled in the background on the array) and we got things converted in time so that we did not need to buy any more drives to support the system, though we ended up buying more drives later anyway because we wanted to store more data so we did buy the extra 2 controllers and another hundred or so drives maybe a year later.
but in short it is on them to know this kind of stuff in the end regardless.
I didn't realise Rackspace was still a thing ... Oh well maybe not for much longer.
I always thought the writing was on the wall for them when I saw them advertising their expertise to help you with your AWS cloud journey or some such similar marketing speak. Quite the statement from the once mighty Rackspace that they will help you build on someone else's cloud.
I guess this sort of event here is what happens when you mess up your overcommitted storage..... Eeek
Seriously, the dude was a one man fucking nightmare to work with...seriously, his cock ups were legendary. He managed to populate an entire rack backwards and only noticed after wiring it all up and wheeling the rack back into place...he also set up a rack with the door swinging towards an emergency cut off button that cut the power to the whole building and he would regularly swing that door open and hit that button...it wasn't that common, but because of the organisation that was in the building, it was particularly fucking dumb...it was an MRI imaging clinic...so cutting power at the wrong time can cause seriously expensive damage...and it did.
My favourite one of all though, was building a server to host a high throughput database that required 16TB of space...he thought all his birthdays came at once when "found" a way to increase the margins on that server. He installed two 16TB disks as a RAID 0 and a single 64GB stick of RAM...and he couldn't work out why it was fucking slow. He refused to let anyone else work on the problem. He embarked on a quest to optimise the database...which naturally failed.
I'm willing to bet that this guy has some interesting hobbies. Like uphill ice skating, refilling tubes of toothpaste and bailing out waterfalls.
The guy was a complete and utter wanker.
I've been doubly triggered from your comment. First as an affected in this Rackspace fuckup (We've migrated to GCP, goodbye RS), second, as a former worker of the radiology industry:
Switching off the power to an MRI by accident is one of the biggest and dumbest things you can do there. You aren't only losing money while it's down, but you're probably forcing a quench on the machine, venting all the helium, scaring the shit out of everybody, with risk of freezing or suffocating somebody, you're probably going to break a bunch of parts of the MRI machine (that's very, very, very expensive), and will have to take the machine out of service for days or weeks. Depending on the size of the machine just the helium fill could be, easily, tens of thousands of Euros/Dollars...
Even on simpler MRIs with no helium cooling the maintenance after that could be quite a mess.
I would have removed that door personally on the first occurrence of that incident, and fired they guy the second.
Also effected by this fuckup, luckily we'd done half the work on a previous sprint to start the migration over to S3 so we could frantically code the 2nd part and push the upload of all new images into S3.
interesting side effect we've had a couple of users comment on how much faster and smoother the upload speed is now, so I'm guessing they havn't increased their upload bandwidth over the years either.
RackSpaces support has been shockingly bad over the last few days, it's pretty rare we need to call support so I hadn't realised how much it had deteriorated from the days where I could pick up a phone and speak directly to someone with knowledge.