I'll get the popcorn...
Because the comments on this one should be more than worth the trouble to pop enough for everyone.
*Begins handing out shopping bags full of fresh, hot popcorn*
Grab a drink, take a bag, & enjoy! =-D
Welcome to another Who, Me? confession from the Register readership, and a reminder of the unexpected side effects of software updates. Today's story, from "Ralph" (not his name), takes us back nearly two decades to when he was responsible for a selection of servers, this being in the days before virtual machines were …
Ho boy.
I can well believe it. About 5 years ago, I worked for a company who used an extinct, 80-char-limit terminal based programming language / database system as their main business logic. A system that was probably relevant about the time I was born, and obsolete before I was out of school./
Yes and no
It had become a behemoth to maintain. The business logic wasn't one coherent program, it was whole piles of scripts daisy-chained together, triggered by cron tasks. In the first two weeks that I was there, it fell over three times, provoking P1 incidents and forcing the company to fail over to paper-based processes (at 1/4 the speed). Changing any one script or program could have unintended consequences down the line, though they did at least have a test environment to iron out most of these kinds of issues before any changes hit production.
Getting it to integrate with any other system was an exercise in time and patience. When hiring, they couldn't find anyone who knew how this particular language worked, so they had to train every new hire from scratch.
When I left, they were starting to look at migrating to more mainstream and current systems.
"It had become a behemoth to maintain. The business logic wasn't one coherent program, it was whole piles of scripts daisy-chained together, triggered by cron tasks"
the reality is that is just how most things work, especially rest based services.
lookup "microservice architecture " for an understating of the importance of these things, also unix is based off that philosophy.
It's not new, when I studied computing at uni we where taught to reuse code snippets from the library repository. Its standard for programs to call other programs or routines, its the whole reason why API's where such a hit as that is literally programs calling other programs too do work.
The statement above was more for scene-setting, rather than a direct criticism. In line with your comments, this system was heavily unix based.
APIs and similar tend to be pretty well documented, whereas this system wasn't. There wasn't even a record of dependencies. A better criticism of it would perhaps not be targeted at the system architecture, but the management of it. Is that fairer?
"Yeah but it worked, didn't it ?"
Everything works if you throw enough money and staff at it, but that doesn't usually qualify. Did it sustain changes in conditions? Could it survive the loss of the tech staff? Could it run on a modern computer or OS? The answers to all three could easily have been no and technically it still counts as working.
I've worked with codebases that are like this. As long as everybody who ever wrote it is still there, it will work just fine. Try to understand it later without their help and you'll be very stuck. It takes determination to be clear enough that someone can figure out what it does without needing your help.
The more I work with modern technology and the seeming clusterf*ck mishmash of disparate stuff mushed together to form a somewhat functional whole that sometimes stops working because apparently Mars is in retrograde and Phobos is transiting or something the more I think the old stuff might not be that bad. Sure it's sometimes a bit obscure but if it breaks you can probably actually trace exactly WHY it brakes because it's simple enough to do so. Instead of layer upon layer upon layer of crud, turds, brainfarts, incompetence and "good enoughs".
And it's so very nice when the answer is "and the program was clearing this value from memory address $f00b before we were done with it" instead of "and this library was throwing a fit, so we downgraded it and everything works now hooray." It's so nice to actually know what's going on.
Possibly somewhere in academia / public sector.
I worked for a university 15 years ago that was still very much a Novell house - Netware / NDS / GroupWise / NPS.
To be fair, apart from GroupWise being awful, the rest of it was fairly stable and performant.
Ahem, I got to set up a greenfields AD to transition from Novell in the year of our Lord twenty-nineteen. The only reason they did it is they wanted free Office, so of course MSFT was dangling good-old O365 AND you can buy this edu license plan with Windows licenses bundled and yadda yadda. (Too bad you have to pay $$$ on top for the AV/security licensing to match up with the requirements for govt orgs).
To be honest, it was very, very fortunate we had it all up and running before La Covid came to town - being able to work in O365 and leverage remote access for staff (the student learning system was already remote-accessible, as they mostly are these days) was a life-saver. I can't imagine what would have happened if it'd been pure Novell still.
It still persists with some file and print, although the reason it hasn't shifted is because no-one wants to take it on with "these uncertain times". And because certain managers have completely drunk the MSFT Kool-ade of "ALL files go to Sharepoint Online". Yeah, sure Adobe suite products and similar work great if they're having to pull huge project files down from the interwebs. Standing up an AD-joined NAS and print servers would have enabled us to have Novell completely out by now.
About that time ago (2002), I was installing some of our systems for government owned hospitals - Windows 95 clients on NetWare 4 networks. Our customers required "advanced audio" and plug and play. I received formal written permission from their head of IT to install Windows 98 on 9 PCs (3 sites each with 3). The staff were told to "not tell anyone", otherwise other users would expect Windows 98 too…
Hah! I can well imagine that was the case. Lucky is the techie who hasn't worked at an org that isn't so much behind the curve as using a telescope to see the curve accelerating away from them.
In 2001 at a certain public sector organisation I was still coding updates for a user interface running on Windows 3.11. I bet they thought of Netware still as newfangled technology.
In 2006 I was using a DOS laptop to talk to a PLC, as the software wouldn't run on anything newer. (Could have used DOSBox, but didn't find that out until later.) Right now I'm working with tape drives connected to HP-UX machines, which are production-critical for the biggest product of an international >$1B/year company.
Don't worry, they know about the epoch deadline and are planning on upgrading sometime before then.
(Anon for very obvious reasons.)
“ So goes the lot of IT manager.”
Depends. In a sufficiently large organization, the IT “manager” will be one that leaves mails unread but will be quick to shift the blame to the humble sysadmin. I wear both hats, but still report to someone with a bigger hat. Who has limited skills as a sysadmin, but credit where it’s due : when all-nighters have to be pulled he does assist in whatever way he can (i.e. supplying food, and allowing the foot soldiers to expense a “job well done” dinner afterwards).
I have a horrible feeling we had exactly the same problem with a Novell loaded machine around the millennium.
I think it was the only outage on that machine for several years but finally pushed the users on it over to MS Exchange so they could enjoy monthly outages like the rest of us.
NT's demands for CPU performance, memory and disk space always reminded me of that scene in Smokey and the Bandit where Little Enos is giving the Bandit money.
"I'm going to need a new car..."
(doles out a few hundred dollar bills)
"A speedy car..."
(doles out more bills)
"..speedier than that..."
(with a look of disgust, hands over even more money)
Very long ago, when HP printers were starting to get network interfaces, I was using an HP tool that could configure printers remotely. The list of printers was for the entire organization. As I was scrolling through, I ran across a printer called The IT Gods. That seemed a little presumptious to me, so I selected The IT Gods and clicked the factory reset button. The IT Gods vanaished, never to reappear.
Couple of years ago (more like 5) a vulnerability was found in the ASA code that meant it was possible to crack ikev1 vpns open.
I happened to be on call that weekend.
The security manager found a “fix” that someone had published on the internet would I install it.
I said I would provided the department director sent an email confirming the risks were understood - which he did.
The code change was applied to the firewall at which point my vpn to the office dropped, after 10 minutes it still wasn’t back so I made the trip to the office (fortunately only 3 miles away). Walked into the comms room and stuck a console lead into the back of the firewall to find if stuck in a boot loop the system would restart get to read the fix code and restart about 2 minutes later.
A few hours later and still waiting for tac to answer (it turned out the fix had been found by a load of people and it did the same to them), I started on extracting the config from the firewall.
This had to be done in blocks as although there was a backup it didn’t have the 60 or so vpn keys needed. And the thing wouldn’t stay up long enough to get all the keys out of the config before it crashed
A few hours later I had the config so I could trash the bad config file and then reload the 5000 line config through the serial port.
After 24 hours over 2 days we had a working firewall again and the biggest claim for call out payments in the departments history (24x £25) for one weekend.
Never trust fixes from random people on the internet
Indeed.
If it was a Cisco problem, the only valid patch I would install would have to come from Cisco.
Anything else and you're just asking for trouble.
If pressed, before applying the patch I would have searched for any problems with the patch (aka wait a day or two). In this case, the problem would have been largely reported and I would have gone, printout proof in hand, to explain why I wouldn't install said patch.
But hey, armchair general is easy, isn't it ?
the biggest claim for call out payments in the departments history (24x £25) for one weekend
Cheapskates!
Once upon a time, I was in a team of sysadmins for a very high profile 24x7 operation. I got bleeped at 3am on Jan 1st because the ops team got a message saying a critcial server was dead. I went to the datacentre and found nothing wrong - apart from a database script that couldn't handle a year rollover and assumed any error meant the hot standby system had croaked. The on-call DBA got called in to fix it. Which meant I had to stay on site until she showed up and sorted the broken script.
I got a big wodge of cash for answering a bleep at stupid o'clock. And another wodge for making a trip to the DC in the middle of the night. These were doubled because the incident happened at a weekend. All of that got doubled again because it was on a public holiday. I also got treble time for 3-4 hours at the out-of-hours rate for "working" at the DC - drinking coffee and watching porn with the datacentre ops team.
That paid for a holiday in Barbados with the wife. The tax deducted for this call of duty was considerably more than 24x£25.
You should replaced "porn" with "Britains most stupid drivers". After watching them for more time than good, as a German, I must say: British road planners must be insane making their roads, especially roundabouts, in such a way to actually FORCE accidents. Maybe that's the reason why so many British are shitty or overly agressive drivers - total inconsistency of driveway rules makes people insane.
OR
if you do have to use them make sure that applying them does not make your firewall go TITSUP or even worse, totally open to the world.
You should have a backup bit of the Firewall kit so you test it on that first.
THEN you make sure that all your configs are triple backed up before doing anything other than scratching the itch in your nose.
If your system allows for it then you could swap in the 'patched' device for a short period and monitor it. Reverting to the old one is then just a matter of swapping a load of cables but that means getting into the DC which in some places is harder than getting into Fort Knox.
I'm so glad that I'm long past that part of IT work.
Yep, I would have had the backup prepped first, only after having argued up to the job-endangering level of "I do not want to install a non-vendor-supported change for critical infrastructure" before agreeing do to the work with sign-off by at least two managers - security and main IT boss. And after having reviewed code in said change first to the best of my abilities and explicitly highlighting potential risks, in writing.
Or, yes, ideally, a standby system with the new config, swapped in and readily swapped out - with all the aforementioned risk sign-off.
Some things are really not worth the hassle. Especially for only 25 quid p.h. and all that stress - unless my hourly rate was eight, maybe.
This shows how old I am..
As a newbie support bod, I was enthusiastic. I had a degree and could do anything. Or so I thought.
Being the keen geek I was (and still am), I installed NT4 SP2 the second I could on my own work machine. I've always had two machines at work. One for serious day to day work, and a test machine I could afford to break if something went wrong. I installed SP2 on both, and used it quite heavily. After several days, I'd had no trouble, so when my boss asked if I thought it was reliable, without hesitating, I said yes.
What a naive fool I was.
We rolled out SP2 (which was actually quite an important upgrade IIRC) en masse. Within a day, we had users reporting their machines were blue screening. I did a quick survey of all the users (thankfully, we only had a couple of hundred PCs), and found just over half were blue screening.
We did resolve the problem, but IIRC, the resolution invloved me going round re-installing a *lot* of machines.
Now, I am still involved in testing, but everything is testing on multiple machines, virtual and real, and tested by multiple users before rolling it out to the estate. It's only rolled out when ALL testers are happy to sign off that it is good.
Oooh, even numbered service packs... I was in a Windows training course in the late 90s and the instructor pretty much said that even numbered service packs were of low quality. SP1 fixed the problems in initial release, SP2 was broken, SP3 fixed those problems before SP4 would break again. IIRC the same applied to Windows 2000.
I still remember that NT4 SP2 bug as I was just starting my life as an IT professional circa late 1996. Luckily my senior IT bod and I only installed SP2 on our devices initially and I think it bluescreened when you tried to access the floppy drive. (which was quite important back then)
Fortunately Microsoft quickly released a hotfix (which was the first one I can recall) but a good lesson on ensuring you always deploy to test devices first.
Can't remember the exact details - but one day someone needed to update something MS with a variant language. A flip through my MS Dev folder of CD/DVDs - and there it was.
It had taken some persuasion of my department's management to buy me the annual MS Dev subscription. Then the company centralised purchasing of everything "to be more efficient". The result was MS insisting they had sent all the company's renewal CD/DVD packs to the address that had paid the combined invoice. The latter had no idea who the subscriptions were for - or to where the packs had now disappeared
It has been said that as we reach our dotage we find we cannot remember what we did five minutes ago. However currently dim memories from our distant past will become clear as yesterday.
Many of my early memories are barely snapshots of a moment. It will be nice if one day my memory also delivers the context with the before and after.
Remembering CP/M gets closer to showing how old you are
Not only do I still remember it, but my first *proper* home computer ran it. IIRC a 4MHz Z80 in a decent case, with a decent keyboard and two (2!) 51/4" floppy drives. Of course it had no software to speak of, but we had a couple of "Superbrains" at work, and a Z80 emulator which I used to download and modify xmodem. It then spent the next few days downloading Assemblers, editors and Wordstar etc. from the Superbrain through the serial port.
Wordstar was as slow as molasses, but it did work. I eventually sold the whole lot as a Z80 development system to my next boss, who also bought me my first *proper* work computer - an IBM XT compatible clone - a 4.77MHz 8086.
God, I'm old.
Yep 4MHz ! It was an Alphatronic PC
In fact, I had the Matmos PC which was a rebadged version of the above.
The kids today don't know how lucky they are!
But the 1620 did have a fast card reader.
Up to a point Lord,.err, Watson. I took a program I'd written in an attempt to optimise the spread of football pools selections from the 1620 which ran for 25 minutes and produced garbage, to a Prime 300. On that the program ran for 25 seconds and still produced much the same garbage.
Luckily (actually not luck) what we did as our day job on the Prime in 1975 is still in use worldwide today. Our new nuclear power stations are being designed with it.....
.... 1975! We had no idea how long it would be in use, never thought about it.
Oh no, the idea of rolling in any NT patch before three months had elapsed (yay the 90s) just fills me with horror. And it wasn't just the even-numbered ones. ISTR SP5 had a niggle or two. Great practice for Exchange CUs, which they STILL screw up. Not to mention Kerberos just before xmas.
At least working with NT made you very aware of using a test environment, piloting with specific user groups and so on... if you survived that without major outages, you were pretty well set up.
One that always springs to mind for me was a patch for NT4 Terminal Server SP6.
It was a fix for one of the more rampant viruses back then - might have been I Love You but I don't really recall.
Anyway it came out in a hurry but had a list of known issues so I could at least forewarn our customers.
Almost all agreed that it was worth the possible downsides to be protected.
The initial patch went on without a reboot. I tested it. I rolled it out to a couple of clients. All good.
A newer patch came out that mitigated some of the risks/issues of the original. Without thinking I downloaded it and rolled it to a customer with the infamous words "and it doesn't even require a reboot".
At which point their servers rebooted... wait, what?
Turns out the updated patch DID reboot. A sign of things to come, eh?
And a lesson learned. Thankfully it was one of my kinder customers.
Software has way more "undocumented interactions" and "side-effects" than any mere pharmaceutical ever did!
My favorite from the land of the bizarre lives in several versions of Windows, from NT 3.5 onwards...
Put in a CD. Hit play.
Debug a C++ program with MSVC or equivalent. Stop at a breakpoint, and wait for the CD player to switch tracks...
BSOD!!!
Every. Freaking. Time.
Easy developer solution: buy portable music players for work. Hmm. Come to think of it, I could have expensed it... I was contracting. :)
A friend's audio server running on debian some time back when he decided to do an upgrade. Suddenly nothing worked except various (unwanted) audio cues.
I talked him through deleting the pulse audio {spit} server. A reboot and all was sweetness and li ^H^H sound.
I remember a decade ago. "Ah, my work laptop has started its daily virus scan. Guess it's time for a coffee."
Longer ago than that? "How do I disable the on-access scanning?" "You can't." "Ok, tell my boss she needs to hire another developer because you've just halved my productivity waiting for your virus scanning every time I hit compile."
I run sans AV at home. It's genuinely quicker and easier to rebuild from scratch if I do ever get an infection.
There are products that don't do old-fashioned "disk scans" or use virus signatures...
But you did say a decade ago.
During my evaluations of various AV products, I found there was up to a 4 fold difference in my compile times depending on which one I used. The best and lightest to date is Cylance, except when it decides some game file isn't understandable and it wants to upload it to the servers for evaluation. No surprise that the "big beast of Norton" was the worst performing.
I used Avast Free edition for years, but recently they started spamming the users with "informative popups" that won't go away, even for a paid edition, so I finally jumped ship. An AV product should just stay silent in the background unless it needs attention to deal with an update or a detection. I can understand freeware popping up a "buy me!" notice once a day, but every few hours? That is nagware in my books, and I'll have none of it.
One evening a few years ago - Norton decided it didn't like any of my various home-brewed VB programs** - and started to quarantine them. Unfortunately as soon as it saw the back-up hard disk mounted - it started munching through all the historical archived versions as well. Eventually Norton support told me how to get them back - and an incantation to stop it attacking them again. I can't have been the only person who suffered that problem.
** Yeah - I know - VB and Norton are both the Devil's spawn.
Being an IT guy I prefer to keep too-much-computer-stuff out of some places. My microwaved has the classical "time" knob, which is turned mechanically by being connected to the turning plate. And the strength knob, which uses just a handfull of parts.
Making things overly complicated is easy. Making it simple and do the same is genius. (the latter applies to programming too)
I can put something in my digital microwave and press Start, and it runs for precisely 30 seconds, then stops. Or press Start twice, it's 60 seconds, and so on, up to 5 minutes. Knobs are hit and miss. Admittedly you are supposed to supervise a microwave anyway so you don't need a timer at all, but I interpret it as "come back within a reasonable time just to make sure that it's not on fire or something".
SysAdmins are prone to delusions of grandeur, hiding away in the back rooms and dealing with the maddening hum and whoosh of servers all day. 'Tis CIA torture in the workplace, that incessant hum, designed to drive operations quite mad.
Claiming to be a "god" is only the start of their delusions, though. Wait until they start trying to tell you that you're in the Matrix and they've managed to contact A Programmer. :)
"Wait until they start trying to tell you that you're in the Matrix [...]"
It may be your delusion that we are not in the Matrix :-P
Steven Pinker has a 30 minute audio programme on BBC Sounds. It is about the human mind's perception of what is true -- and why people appear to believe things.
Not sure if the programmes are geographically restricted.
I had a problem at college where I needed root. The head of IT (who was properly paranoid) thought about it and gave me a root shell on my machine. Then I discovered I couldn't access my own files because the file server didn't trust root from my machine. Transferring what I needed to /tmp solved that problem and I was on my way to solve the real problem using strace on a setuid program.
I have to say that there were much more horror stories related to bad patches that nowadays. I won't pretend it doesn't exist anymore (KB5005613 comes to mind) but nonetheless, at a time it was risky to patch.
Now we have two choices: either we patch as soon as possible risking such nightmares, or we leave open vulnerabilities that can be exploited and destroy (part of) our network. Stake or pyre, make your choice.
Reminds me of that one time an update to NT4 didn't play well with the Emulex HBA driver and therefore caused the file-server (one of three, but the most critical of them) to drop the SAN drives. P1 logged with Microsoft and several other vendors as we tried to figure out what had happened. Took a week to get resolved.