who cares...
if there was no service in three stores... you could like go to the fourth one
Engineers at 3 are busy rebuilding the firm's pay-as-you-go top-up database after the server carrying it fell over on Sunday. Customers were left unable to add new credit for several hours, either over the phone, online or in 3 stores. A 3 spokesman said today that it took action to ensure that pay-as-you-go users were able to …
I realise that timing is of the essence [and I only top up my ChavAsYouGo phone once every couple of weeks] but I can't remember the last time I had the Voda top up thingy fail on me.
What really, really, pisses me off tho, is that while you can interrupt the prompts for most of the menus to get to your top up - that is dial 1345, 1, 2, 1, [details] [amount] - when you get to the confirmation you have to wait right until the very end of the spiel before pressing 1 to confirm that yes, after pressing about 20 buttons, I DO know how much I want to top up, from what card.
Fucking annoying.
Anyway, rant over, and am I right for htinking that 3 suck noodle for not having some sort of hot/immediate redundancy in place for their top up systems??
Steven R
Anybody heard of this little thing called resilience nowadays!?
I was in awe of the first paragraph - ONE server for how many chavs I MEAN respectable customers!?
I mean in our lowly establishment, even we realise the importance of a simple failover or load balancing arrangement, so much so that we'd have our balls in a sling if we implemented a new system without it.
As an absolute minimum, if it were a system so badly designed that it couldn't be made resilient, we would recommend alternative products wherever possible, or put in place some kind of manual quick-restore procedure, such as standby hardware ready to accept one half of a Raid array for restoration.
I despair. Wasn't there mention of some kind of computing skills/systems design crisis back in the 80s? Its far worse now due to computing ubiquity.
I think some spokesman has over simplified the reason behind the problems. As an ex-3 employee, all their databases have resiliency, backups, etc... Even the lowliest database I was aware of was pretty well protected.
There are other problems though. For example:
With big 24*7 databases, there is a significant delay in recovering terabytes of data from tapes, particularly when everyone else still needs to be backed up at the same time. Maybe a tape has gone to offsite storage already. That imposes a nasty delay, even though the tape management company say they can get it back in half a shake, it always takes 3 days to get the wrong tape out.
There is probably a bigger delay in managing the "paperwork" behind such a massive restore, raising change reqs, filling in fault reports, keeping the problem management people off your back, responding to questions from popular websites and spokespeople, etc...
The biggest problem is : once you've got an HA database that's being used, how do you make sure that any related changes don't break the HA bit, without risking breaking the live service. Managers are always much more prepared to take the risk that if something went wrong 'one day' the system would fail than to say today "go ahead, see if it fails on purpose"