One way to prevent accidents
#:(){ :|:& };:
Read twice and if I am sure, [home][delete][enter]
Welcome back to Who, Me?, The Register's weekly dip into the suspiciously bulging mailbag of reader confessions. Today's tale of mainframe madness comes roaring in from the 1970s, courtesy of a reader whom we shall call "Jed". Jed was a computer operator at what was then a large automotive and aerospace component manufacturer …
And if you set your shell prompt to start with a comment symbol (appropriate to your particular shell), it makes copy-and-paste a whole lot safer.
Ideally set it in a different colour (or a different background colour) so you realise that it's not part of the command too.
: Invalid function name
I know, bash ignores that restriction that functions start with only specific characters (alphabetic, but may be alphanumeric). This is one of many reasons why you should never write shell scripts to run in bash. Use a scripting shell.
I worked at a major bank and was in the area that supported the staff superannuation fund (billions of dollars under management). We had inherited a SCO unix system from a take over. One of the operators was a little less careful than he should be and on more than one occasion "inadvertently" entered an incorrect command. As this system was only used by a few staff, and had no access outside the operations room, the login management was a "little slack". Mostly the operators logged in as root. The instructions were to periodically clean out the temp folder using the "rm * -r" command (of course after changing the current working directory to the temp folder). One day the fat-fingered operator was logged in as root and entered the command (as he had done frequently in the past). The only problem was he had forgotten to change the working directory and was in the root folder.
The only good thing to come out of this was the complete system backup, that had been taken just before he deleted everything, had been successful and we were able to restore the system (DR procedures tested - Check. No DR test required that year).
We changed the root password after the system was restored and wouldn't give him the new password.
I'm fairly certain that everyone who has ever used *nix in anger has an inadvertent recursive delete story of one sort or another. It's almost a rite of passage.
You DO have current backups, and an on-going backup plan, right? Yes, you ... I'm talkin' to you, the guy who just thought "he can't possibly mean me!".
That reminds me of the Unix/shell section in ye (verie) olde classic "How to shoot yourself in the foot using any programming language" :
% ls
foot.c foot.h foot.o toe.c toe.o
% rm * .o
rm: .o: No such file or directory
% ls
%
In my case it was mv rather than rm but with much the same effect. The fly in the ointment (apart from the fact that it was my client's production box) was that the vendor of had installed the SCO OS and included a non-standard driver. I can't remember whether it was for the multi-port serial card or the disks. Whatever it was we didn't have a copy of it, we couldn't reinstall without it and spent much of the next day waiting for one to be emailed. Once we got that it only took a short time to get up and running again.
I'm fairly certain that everyone who has ever used *nix in anger has an inadvertent recursive delete story of one sort or another. It's almost a rite of passage.
Where I worked for much of the '90s, our sysop knew better. He aliased various 'dangerous' system commands to protect users from ourselves. Hence "rm" became "rm -i".
Whether that saved anyone from a nasty accident is not recorded. My suspicion is it's more likely to have caused accidents, when someone who has learned on the job that rm asks for confirmation finds out the hard way that that was non-standard. But that wouldn't be on the BOFH-in-question's turf.
For those of us who already knew the standard rm, it was just infuriating. I just overrode all such aliases in my .rc. If I wanted an alias, I'd use something that wasn't a standard command name.
"someone who has learned on the job that rm asks for confirmation finds out the hard way that that was non-standard."
That's one of the reasons I don't overly customise my own home systems. I like to stay conversant with the "normal" way of doing things. It's too easy to get into habits on one system and find they don't work, work differently or even do dangerous things on other peoples systems.
I'll usually set the usual -i aliases (mv, cp, rm) for myself and try to make sure it's in our standard profile. Many of the people on our system are irregular users following someone else's instructions. And sure, backups, but it's not a great use of people's time to be restoring things and work lost = $people * ${time since last backup} can be large even for frequent backups if $people is big. Personally I use enough systems that don't have it that I don't rely on it, it's just a safety line, if I really want -i I'll do it explicitly and if I really don't want it I'll \rm.
I think the real hazard of -i aliases is that people who don't know what they're doing get into the habit of using -f to turn it off instead, usually -rf even when they don't need it, because that's what somebody else has shown them. Finding instructions that routinely tell people to use -rf and wildcards... it's best to find other ways for them to do it. The trick is to make sure you get to these people and rm -rf the -rf culture before it spreads.
This post has been deleted by its author
I don't know about that, jake. My wife and daughter are die-hard Mac fans, as were many of the academics I knew back in the day. I'm pretty sure I've heard each of them cussing out the machine once in a while.
Fact is, pretty much any non-trivial tool used often enough will eventually get on the user's nerves, deservedly or not. And fond though I am of UNIX,1 it certainly has its infelicities.
1Though not of MacOS. Whenever someone asks me to help them with something on a Mac, the first thing I do is open Terminal so I can use the OS the way God intended.
I was using a work station with a *nix os on it and my somewhat accident prone friend on the one next to me recursively deleted the root directory and entertained me by screaming at it to stop for about 10 minutes before revealing what he was wanting to stop. A running club at lunch he shot passed me and round a corner, I came round the corner to see a large pine tree rocking backwards and forwards having flung him into a ditch. After a shower and needle removal we retired to the canteen where he managed to collect his large lunch (we ran a long way even with me laughing) and somehow slid the tray under the cashiers till depositing his lunch all over the floor. Once we'd finally finished and were returning to our offices as we approached his I commented on his appalling luck today and he turned to me to speak and smacked into a great big red fire extinguisher hanging on the wall just outside his office.and slid to the floor. As did I and crawled the last 10 yds to my own office on hands and knees completely incapable of anything other than uncontrollable laughter.
dd if=/dev/sda1 of=/dev/sda
See the error?
I learned a lot that Halloween, on how one should never try to manually clone partitions to a new SSD...
In my defense, it was wrangling an NTFS that was "corrupt/dirty" according to all the linux tools I had access to (clonezilla, gparted live), and that windows utilities said "nah, wrong size mate" to.
It only cost me the one true working copy of that authentec fingerprint reader driver for Windows 7...
On the upside...
A colleague made a related dd error during cutover to a new drbd cluster, copying block device mount points rather than file system mount points. The resulting growingly peculiar behaviour culminating in carnage, led to me twigging that drbd is just a tee and hence me cracking the entire installation&datacentres out of its 1x2 constraint into Nx2 infinitosity.
The IT version of accidentally dropping mouldy bread crumbs into a petrie dish...
In my experience, "rm * -r" was a standard upgrade command for the versions of SCO UNIX inflicted upon me.
Even when it wasn't awful, it managed to find hardware faults no other OS at the time cared about. Or maybe it was just terrible drivers.
And all this pre-dated SCO's legal shenanigans.
Someone (who lost access very quickly after) managed to do a chown -r from /
Took quite a bit of work to go and undo all the damage - compared to an equivalent system.
Unfortunately, it was a customer facing server so couldn't do a full restore from backup.
Fortunately, it wasn't me that had to do it (or was the one that caused it).
This post has been deleted by its author
There will be lots of stories like this from the *nix types.
Mine was on a solaris box in my early sysadmin days.
I was adding some disk capacity and in the process of putting a file system on typed newfs /dev/dsk/c0t0d0s3 rather than c1t0d0s3 and so watched my /usr go down the plughole. Fortunately it was not yet in production but it was an interesting experience watching it slowly fall in a heap before I rebuilt it.
I learn't to use disksuite after that.
SDS didn't make you immune to stupid mistake syndrome.
I had beautifully defined all of my meta devices and was mirroring root. metattach d0 d2 d1 was my downfall. d1 was from the build and it was dutifully wiped clean with the end result being a rebuild.
Thankfully, preproduction and it was just an afternoon's work.
I worked for Sun at the time they died.
Shortly after, Oracle and I parted company, and I had to return my chunky Sun workstation. But first, remove all private ssh and pgp keys that had been used on it. Hack up a utility to zero a file before deleting it, and run with recursive find on sensitive directories. And on the whole of /home for good measure. Oh, yeah, better do /var/ as well. And ... did I ever put anything under /root/ ?
Of course it had been running zfs, so that wasn't enough. Ho, hum. Boot from another medium and zap the filesystem from low level with dd to the device; ship it back with a fresh bare-bones install on a repartition-and-newfs (which from memory was not OpenSolaris but FreeBSD - a minor exercise of the inner BOFH). Feel a low-level bereavement for the workstation. Now even if it falls into the hands of someone evil, I'm not a high-enough-value target to merit searching for the ghost of any residual data.
No one ever thought to have the rm command check the current working directory, and if it was / ask for confirmation before executing? Those few lines of code would have saved a lot of asses over the years.
Linux using /root as root's home directory probably did as well.
Back in the early '80s I worked for a large company that ran two ICL 29xx mainframes.
Supervising these machines we had a Senior Operator who specialised in mixing up the George 2+ console commands "GO 25" and "GO 27".
One of these commands restarted a stopped job from the point where it was stopped, the other restarted the stopped job from the beginning.
The knock-on effects of doing this near the end of the two day payroll run, were spectacular to say the least.
From memories more than 40 years old (and thus subject to neuronic disintegration), there's a missing program name in the quoted commands.
"GO #ABCD 25" would cause:
- an external interrupt to be sent to the OS;
- program #ABCD to be suspended;
- the next instruction address to be overwritten by the address identified as 'entry point 25' in program 'ABCD';
- the suspended program would then be restarted.
Like I say, 40-year-old memory. I stand to be corrected.
GO #ABCD 20 - start the job with paper tape input
GO #ABCD 21 - Start the job with punch card input
GO #ABCD 25 & 27 ibid
GO #ABCD 29 - abort the job (usually) - yes, under George II+ you had to type GO when you wanted things to STOP.
The number after the program name was the address of the starting command in the compiled program.
Some programs had a "go type entry block" so you only ever typed GO #ABCD. I've never found out why.
On the 1900 series OU #ABCD 8 was useful too, as word 8 was the program counter. Good for finding forever loops (if the speaker warble wasn't indicative).
Though all those commands were actually EXEC commands as I remember them. George was a set of modules you could load (or not) to get a job done with minimal interference from the operators. The "steering lines" (aka JCL) were a drop-through action list. Dead simps. So you operator would type something like:
FI#XKYE#GEOG 1 (find the input spooler in file GEOG and assign it to unit number 1)
blahblahblah
INPA R WAS XKYE
XKYE HALTED TR 1 FIX (the spooler had renamed itself and stopped pending the loading of a paper tape and the operator pressing the green button on the reader)
GO#INPA 20
blahblahblah
INPA HALTED HH (the spooler has finished and is ready for the next job)
FI#GEMA#GEOG (load the central control module from file GEOG - ours was called GEMA after someone's girlfriend)
GO#GEMA
ABCD R WAS GEMA (your ABCD program is now running)
ABCD HALTED HH (and it just finished)
FI#XKZE#GEOG 3 (Load the output spooler and assign it to unit 3 - the line printer in this case)
blahblahblah
OUTA R WAS XKZE
OUTA HALTED LP3 FIX (operator loads paper and presses the green button on the printer)
GO OUTA 25 (I think, it has been over 40 years since I did this for real)
(EARSPLITTING BANGBANGBANG NOISES FROM THE BARREL PRINTER)
OUTA HALTED HH (the print run is finished and the operator, ears bleeding, can tear off the report)
If you needed more room for ABCD you would DE#INPA before you GO#GEMA'd. There was something of an art to juggling the memory. I remember one consultant we had who would lobotomize EXEC on the fly from the console so they could finish in only 3/4 the memory they needed.
Ah the happy sound of the print drum of the console Westrex slamming into its lid so it snapped like and angry crocodile when the machine "went illegal" because someone had typed DE#ABCD when there was less than two kilowords of core memory unused. Ah the happy sounds of the cursing operator typing like mad when a stupid compilation mistake did the same from a program test run.
Worse days.
> three lefts making a right
A tad off-topic, but the opposite of this was true for the Sopwith Camel: three rights made a left.
Whilst on observation patrol, (many) pilots turning left 90deg would instead turn 270deg right. Took the same amount of time and they improved their horizon/surrounds scanning markedly.
"Torque effects from the propellor rotation?"
It's called "gyroscopic couple". It's not just the prop, the rotating mass included the cylinders/pistons, valves, etc. Essentially, the crank was fixed and the engine+prop rotated around it. Fascinating answer to a design problem, if a trifle on the bizarre side to today's eyes. They don't make machinists the way they used to ...
ANYway, yes the Camel rolls into a right easier than a left. But not to the extent in the urban legend, that is likely caricature. In reality, the effect is only around 20% different left to right ... To the experienced pilot, it is almost unconsciously compensated for after a few minutes of flight[0]. When you think about it, if the Camel had been known to not be able to roll left, do you really think it would have become known as one of the best dog fighters of WWI?
[0] That is directly from the horse's mouth ... Javier Arango, who owned B6291 (the last surviving flyable Sopwith-built, Gnome powered Camel that saw action in WW1) was a friend of mine. I believe he and a writer for one of the flying magazines (maybe Pete Garrison?) actually wired the plane and put numbers on the flight characteristics, but I don't have a reference handy.
In a previous incarnation I was a developer at an org where everything ran on a chunky AS400 system. We had two smaller ones for development and deployment /acceptance testing.
One day one of the devs submitted a request for a sysadmin to clear down and reinstate one of the business streams on the acceptance server so it was definitely identical to the live server before a significant code release (not an uncommon practice). Sysadmin dutifully logged in with the appropriate superpowers, typed the appropriate commands and hit enter (and confirmed anything that needed confirming). About 5 seconds later abject panic descended as they realised they'd done it to the live server and obliterated that entire strand from digital existence.
A couple of hours of checking, planning and flappy director wrangling, one restore from the previous evening's backup, plus 10 minutes of reapplying journalled db changes for the processing since the backup and everything was back to working. Procedures were then changed, including the colours of the terminal sessions for live and pre-live boxes, along with which logins had the power to do those things to the live server, and ensuring the passwords for those accounts on acceptance and live were not the same passwords!
Nobody lost their job over it, and the financial implications were effectively the cost of 4 mid-level folks for 3 hours, plus maybe a bit of downtime for 20 or so end users.
Works for Windows too. Apart from setting the BG-Color:
I use this cmd_admin_shell_color.cmd on some servers.
Save it anywhere, run it. All your future admin shells will be red.
-----------------------------------
reg add "HKCU\Software\Microsoft\Command Processor" /v "Autorun" /t REG_SZ /f /d "%~dpf0" > NUL
whoami /groups | find "S-1-16-12288" > nul
if not errorlevel 1 (
color 4f
)
Not quite so spectacular, but it certainly produced unseen results: Back in the early 70s I was still at college (same place now calls itself a University!), and our "hands-on" computer was a PDP-8 - with a whole 4K RAM expansion - running FOCAL. FOCAL was in some ways similar to BASIC, which itself was almost unknown back then. FOCAL enabled up to four teletypes to share the processor, with each having access to 1K of memory. Programs were stored on punch-tape.
Programs were started by running the "GO" command, and as we were all struggling with the basic concepts, frequently resulted in the program either hanging or getting stuck in a loop.
One of our lecturers introduced us to the "GO?" command, which produced a step-by-step trace of the program as it ran, accompanied by much clattering from the teletype. Note the use of the phrase "THE teletype". This was because running "GO?" caused all the other teletypes to come to a grinding halt until the trace was complete!
The first few times we ran the trace command, this led to the other users cursing and swearing, slapping the teletypes, and inspecting their own programs to try and see what they had done wrong. It wasn't long before the more intelligent users realised that ONE teletype was not only still working, but doing so at a frantic pace! This led to a full and frank exchange of views between them and the offending user, and the use of "GO?" was restricted to times when there was only one user on the system!
Happy days.....!
(same place now calls itself a University!)
You make this sound like a name change made on a whim for marketing purposes. A college has to offer a sufficient range and quality of courses, postgraduate and research as well as undergraduate, to qualify and it also has to earn degree-awarding powers. It's a rigorous process of qualification and assessment which usually takes five years or more to attain, even once the minimum academic provision is in place. A successful result culminates in the grant of a Royal Charter and the right to call itself a university
You make this sound like a name change made on a whim for marketing purposes. A college has to offer a sufficient range and quality of courses, postgraduate and research as well as undergraduate, to qualify and it also has to earn degree-awarding powers. It's a rigorous process of qualification and assessment which usually takes five years or more to attain, even once the minimum academic provision is in place. A successful result culminates in the grant of a Royal Charter and the right to call itself a university
That is to say, it's an expensive and difficult change made for marketing purposes.
An awesome tool. 20-ish years old, and functionality has stayed pretty much identical bar a few tweaks here and there. It's still my go-to choice for transferring large amounts of data from volume to volume, and holding onto the log files has definitely saved my skin more times than I can remember. ("Some of my files are missing from that data you migrated 3 months ago. They must not have got copied properly...")
But yeah - I have got burned once or twice too...
++ for robocopy.
I always keep a formatted command ready to go just for the mirroring issue - too dangerous to not have it all thought out ahead of time. It's all wrapped up in a nice PowerShell script with variables for source/destination directories, logging, etc.
I really fell in love with it when I worked at Microsoft years ago as a contractor. One of my fellow workers didn't really understand the limits of the drag and drop of explorer. During a migration of a rather large file server (one that ,no kidding had financials and code from as far back as the 80's), the job kept failing. He kept cursing and trying again, but to no avail. I mentioned how he was doing it wrong and picked it up. One confirmed Robocopy script later and it was well on its way - along with the Z switch. Pretty sure I made an enemy that day, but I got the job done.
Robocopy is still the only tool that can handle the recursive appdata "compatibility" links in a userprofile when deleting it. Method: robocopy c:\empty-dir c:\users\user-to-kill /s /e /purge /r:0 /w:0
Nowadays the explorer can do it most of the time, but there are still enough cases where only robocopy can kill, like path-too-long-to-delete directories.
I had been working on a COBOL program. I took a backup of the original code (file.cobol) and worked on the normal file.cob version. After 2 days of coding and testing, the thing was finished, so I went to delete the backup (file.cobol) with del file.cob;* Hmm, anyone see the problem? I had to redo the 2 days of work again, although it went much quicker the second time, because I knew what I had to do, no more testing different approaches.
Another time, I was working on an OLAP system (Arbor/Hyperion Essbase). It had real problems recalculating a hypercube, it was much quicker to export the bottom rows, clear the cube, import the bottom rows and calculate (4 hours as opposed to 48 hours without clearing).
So, clear database... Whoops! ARRGH! Luckily we had the previous backup from 6 hours earlier. My colleague told me to just put in the old backup and recalculate and blame the missing data on the users, a real PFY!
I went to the head accountant, explained the problem, we then loaded the previous export, replayed the audit log and, in the end, we lost 2 transactions.
"Things worked more efficiently back then,"
There's definitely some rose-tinted glasses being applied here ;)
To be fair, I was barely a glint in my parent's eyes during the mainframe era - though I did grow up with a ZX Spectrum. And I appreciate that mainframes were mightily powerful within their context (and rightly so, for the amount of money they cost to maintain).
But also, by modern standards, they're crude, clanking machines which needed a host of Tech Adepts maintaining them, and which required arcane rituals which must be precisely following to get anything out of them, as this very story highlights. And I doubt they'd be able to handle the amount of data we routinely throw about today, or that they'd be able to usefully visualise any of it in anything like realtime.
Dipping into the cliche allegory bag, a mainframe is a bit like a 1930's sports car. In theory, they're a match for even a modern motorcar, with a high BHP and the ability to bomb around a racetrack at 130mph or more. In practice, they were mechanically fragile, cost a fortune to build and maintain and have little in the way of modern conveniences like windscreen wipers, power steering, synchromesh gearboxes, automatic chokes and the like...
Personally, I'll stick with my modern laptop and (not so) modern Honda Accord ;)
[Monday morning pre-caffeine grumble complete]
Mainframes are still around today for large TP loads.
On the other hand, think about that "paultry" mainframe back then, then ask yourself, how many hundred users can work at the same time on a modern PC, which has hundreds of times the theoretical power of those ancient machines...
Modern servers are based around PC technologies, not high throughput technologies used in mainframes. Which is one reason a lot of banks etc. still use them. They are still faster and more cost effective for some scenarios.
Think of it more like a sports car or a big rig. The sports car can get you from A to B faster than the big rig, but if you have to transport several tonnes of data, the big rig will still get there first, whilst your sports car is zipping backwards and forwards transferring small amounts of data at a time.
Not ust mainframes. In the 1980s I had one Unix box running several terminals on a Z8000 with 768K and then moved to look after another which must have had a similar processor but more disk space. We ran out of space on that one and had to fit a second 168M disk.
Somehow back then a workload could run on H/W resources less than a boot loader would use today. Bloat!
I did some back of an envelope math: the Terminal emulator we use for connecting to the mainframe is 24*80 characters in size. If you assume each character has 4 bytes behind it (symbol, colour, etc), the space for "IO buffer" at the mainframe end is about a megabyte per hundred users.
The emulator itself is taking up about 5 megs of RAM on my desktop.
You really can' exaggerate the sheer degree of difference in focus. Yes, they're both computers, but yon AS400 is there to crunch numbers and a sliver of overhead deciding who's numbers to crunch next, where as my desktop is using a relative kings' ransom worth of ram making the task bar slightly translucent.
If you assume each character has 4 bytes behind it (symbol, colour, etc), the space for "IO buffer" at the mainframe end is about a megabyte per hundred users.
Make that no more than 2 bytes per character and that only for double byte characters. Normally it is one byte per character and the codes for colour, high light and underline are in the seemingly blank space before each string of characters. Add another 100 bytes max for indicators and that makes the total less than 2 KB per user, less than double that in case of double byte systems.
This number increases a bit when you set the screen size to 27*132 (my preference), but even that is less than 7 KB max per user for double byte systems.
I'll happily agree that modern mainframes are still several orders of magnitude more powerful than a standard x86 server. OTOH, this article (or at least the person telling the story) was comparing a 1970s mainframe to a modern PC.
I think the big rig versus sports car is a decent analogy for a contemporary comparison, but I'm not sure it works as well for a historical versus modern analogy.
(I did debate whether a car analogy was best for this. Perhaps a better one would have been a steam engine versus a modern diesel generator. You could drive the steam engine onto a field, and do anything from ploughing a field to threshing grain or running a dynamo to power a fairground ride. OTOH, you'd have to manually reconfigure it for each task - and you'd have to stoke it up to operating temperature and constantly monitor it to make sure oil/coal/water levels keep topped up.
Conversely, you'd just deliver a modern diesel generator to where it needs to be, press a button and plug whatever appliances are needed into those nice, modern, IP66 outdoor socket...
)
> On the other hand, think about that "paultry" mainframe back then, then ask yourself, how many hundred users can work at the same time on a modern PC, which has hundreds of times the theoretical power of those ancient machines...
I don't think it'd be hundreds. I suspect it'd be thousands (possibly tens of thousands) *if* we were dealing with the workloads a 1970s mainframe dealt with :)
The key thing here is that workloads have changed. Where a legacy mainframe was a timesharing system which effectively focused on one task per timeslice/user (to vastly oversimplify), a modern server can simultaneously serve thousands of web connections, while dealing with encryption overheads and running other things like databases.
To pick a fairly arbitrary first-search-result example, here's someone benchmarking web server performance back in 2016, using a 12-core Xeon running two VMs with varying numbers of cores assigned to them.
https://www.rootusers.com/linux-web-server-performance-benchmark-2016-results/
The throughput scaled pretty lineally from 1, 2 and 4 cores, until it topped out somewhere around 6-8 core level, which makes me suspect the box was hiting some physical I/O limits.
Either way, it was pushing somewhere around 140,000 requests per second, with up to 1000 concurrent connections.
And while that 12-core Xeon is still a fairly beefy bit of kit, a quick rummage on Ebay suggests you can pick up something similar from around a thousand pounds.
Finally, I'd note that System 370 emulators exist. And to quote the FAQ...
http://www.hercules-390.org/hercfaq.html
"Classic IBM operating systems (OS/360, MVS 3.8, VM/370) are very light by today's standards and will run satisfactorily on a 300Mhz Pentium with as little as 32MB RAM.
Anything more up-to-date, such as Linux/390 or OS/390, requires much more processing power. Hercules is CPU intensive, so you will want to use the fastest processor you can get. A 2GHz Pentium, preferably with hyperthreading, will probably provide acceptable performance for a light workload"
Said FAQ was last updated in 2010; a decade later, it's not unreasonable to assume that a modern consumer-spec laptop running this emulator could handily outperform a real 370 despite the emulation overheads...
>I did debate whether a car analogy was best for this. Perhaps a better one would have been a steam engine versus a modern diesel generator.
Was at a tractor pulling event some years ago, for whatever arcane reason. Lots of highly-tuned and impressively powerful trucks and tractors running on diesel. They pull a sled that has a moveable weight on it and functionally becomes increasingly heavy/difficult load, with the idea being to pull it the longest distance. After watching awhile as each of these heavily modified machines pulled the sled varying distances and occasionally exploded in interesting fashion, someone decided to hook a late 1800's steam tractor to it.
While the trucks and tractors made a fair chunk of speed but not infrequently failed to go the full distance, the huffing and puffing steam engine simply ignored the load, putted to the end of the track slowly, and proceeded to turn around and return the sled to the starting point as if there was nothing behind it. I don't think it ever went much above walking pace, but it was a very interesting comparison, moving like God himself could not slow it down. Quite the display of using a cheetah vs an elephant!
When the Dartmouth time sharing system was being developed ~1963 there was a need for hardware to handle multiple teletype connections. Only the GE DN-30 had the ability to connect ten's of teletypes with standard, off-the-shelf hardware. This was a communications processor and probably had a lot to do with the eventual success of the project.
"But also, by modern standards, they're crude, clanking machines which needed a host of Tech Adepts maintaining them"
There's nothing crude about modern mainframes - the technology under the skin is ahead of the x86 world, modern mainframe CPUs process the vast bulk of COBOL and JAVA code natively in hardware, they also have hardware encryption and compression instructions performing these tasks way, way faster than x86 - they also run at 5+Ghz out the box. As for staff, at the site I currently work at there are fewer than 10 mainframe technical staff - the mainframe provides the bulk of processing for the large organisation. There are over 2,000 x86 staff of one form or another. The mainframes consume about 60Kws of energy - the combined x86 - well over 5000Kws just to provide pretty front-ends to the mainframe quietly getting on with delivering the core business with five 9s reliability.
As for "throwing data about" The I/O bandwidth of a single modern IBM zServer is over 800Gb/second, like all IBM mainframes from the 1960s onwards, I/O is performed off the main CPUs - these days on 5+Ghz assist processors. I/O on most x86 metal interrupts the actual CPUs wasting cycles of the (already slow) CPUs and damaging cache hit rates by switching threads. With zHyperLink enabled, latency to read the disk subsystems is under 20 microseconds (yes MICROseconds) - roughly 10x faster than good FICON response and smashing anything available on any other platform to atoms. Of course IBM DS (disk) arrays can be all flash with over a terabyte of cache - so even if you need to do actual I/O - it's mindbogglingly fast. If configured correctly, network I/O between z/OS and or z/VM clusters (operating system images) is done in memory bypassing the network altogether. It a nutshell - you don't know what you're talking about.
- Hardware memory compression
- Hardware memory encryption
- Memory protection Keys
- RDMA over Converged Ethernet (RoCE)
Every time I want to see what is coming in the x86 world I look at new features as each IBM z mainframe appears!
To support the same workload today, you'd "need at least an 8-core 64GB server", Jed observed.
So...any reasonably spec'd laptop or desktop. Or, if you want to go nuts, any high spec engineering workstation. Couple of xeons or epycs, some RAM and off you go.
Likely, to support a modern version of these workloads, you need a farm of servers, each handling mail, application loads, file services, etc. Data needs are, naturally, much larger these days. And performance is much faster. Also, one user doesn't have the ability to accidentally screw the rest of the office over due to a mistake.
The old mainframes were all about efficiency. Those 100 users would be using green screens in block mode - one network transfer of a few hundred bytes to generate a transaction, not using a web interface with the massive overhead of https and all the crappy scripting that the application GUI has been laden down with. Same with IO - no adhoc query capability and direct ISAM access is way faster (for the very specific task required) than an, almost always badly written, generic database application.
Being involved in data entry, I have long wondered why I could get more performance out of a PDP11/70 than a Sun Enterprise server with 32 virtual CPUs.<p>
After a careful investigation I discovered that, while a picture is allegedly worth 1,000 words, it probably consumes something closer to 64k words, and delivers the equivalent of one word. After removing all the Icons from my data entry screens, and replacing them with a single (bold) word, they now go "faster than a speeding PDP11".<p>
Icon - no you can't! (Its behind you).
If you concentrate on the data and not the GUI/'marketing' then its amazing what you can do with stuff. I used to help write and run a web site and back end for 30,000 customers and 300 in house PCs on 3 hefty windows pentium boxes and a big IBM box for the erp DB in the late 90s. Out if interest I pissed about on a PiZeroW and found I could run the web and DB on it with similar latency to the windows/ibm setup for similar user loads for up to 100 customers using the system when memory started to run out. I dont think we ever had that many customers online at once on the old setup. Not a perfect comparison but it did make me wonder WTF my own desktop is doing with all that CPU it seems to use!
Most of the time my home desktop is doing almost nothing, typical workloads being pretty trivial to today's hardware.
The last few applications I've written for home use have been severely IO limited, and not CPU or memory limited.
The CPU only really gets to stretch it's legs when gaming or encoding video, the latter task it does happily at 4k resolution in roughly real-time.
The specs on that machine were pretty awesome for the time. Lucky lads for working on such a beast :).
and <pedant> "Flak" not "Flack". "Flak" is a German abbreviation for anti-aircraft weapons or fire. "Flack" is a shit UK television show.
Your subbie deserves some flak for missing that.
</pedant>
Actually flack is perfectly acceptable as an alternative spelling of flak these days according to both the Cambridge and Oxford dictionaries.
Dictionaries are descriptive, not prescriptive.
A dictionary may tell you that some word is used with a certain spelling and a certain meaning so that you will be able to understand that word when you encounter it. The dictionary does not thereby endorse that usage, and you should not assume that you can do the same without world+dog thinking you're an idiot ... maybe you can, but there's no guarantee.
There is no 'c' in "Flugzeugabwehrkanone" (or "Flugabwehrkanone", or Fliegerabwehrkanone", or any of the other variants you may find); the conventional abbreviation for it is "Flak". A lot of English-speaking people don't know the etymology of Flak, and think it should be spelt "flack". Most people understand them ... that does not in itself make the usage acceptable. "Flack" might be viewed as an Anglicization of the German abbreviation, which might make it OK.
Ultimately, it's a matter of taste.
I remember working with someone on z/VM where they had carefully created synonyms: E for Edit, BR for Browse, and ER for erase.
So the sequence was BR to browse the file, it is the right file, retrieve the command and overtype E for edit - whoops the file has gone as the command was now ER ..
When I pointed this out to him he said he wondered why half his files kept going missing every day.
I helped him remove his ER synonym, and removed his ability to write to common disks.
I was working on my BSc thesis project at an Italian observatory in Switzerland, testing a new cryogenic IR spectrograph. First order every night was refill the liquid nitrogen and helium reservoirs in the dewar container of the instrument, before heading off to the observing control room. After that we needed to enter the coordinates of the object to observe (right ascension and declination, or RA and DEC) for short. Most of the objects were northern hemisphere objects, and were perfectly safe to observe, but one object was below the equator, and there were limits to which we were allowed to point the telescope downwards towards the horizon, in part due to the mechanics of the instrument, but also due to the fact that we might actually pour all the liquid nitrogen and helium out of the instrument, which wouldn't be a particularly good idea either. The engineer who built the instrument had worked out that this object, at DEC = -6 degrees plus a bit was just about at the limit of the specs of the instrument, but it should be safe.
I duly entered the coordinates, and the system replied with the coordinates entered, with the sensible question "Is this OK?" Until that fateful southern hemisphere object, I had always checked; found the data to be correct, and entered "Y", to which the system responded with a cheery "Then I go!", and pointed the scope at the desired object. This time I noticed I had accidentally entered DEC= -16 degrees plus a bit. Therefore I entered "N", and was horrified to get the response "Then I go!". I turned to one of the Italians on duty to ask how the hell I could stop this, and why the hell the program had ignored my "N". He replid that the system would accept essentially all input as "Yes", except Ctrl-D. The only option to stop it after the "Then I go!" was to enter new coordinates and press "Y" (or any key that chose your fancy). We rushed upstairs to the telescope, but found that the extra 10 degree rotation hadn't damaged anything, but I felt rather shaken that I might have trashed a several million guilder (at the time) instrument, let alone several years work, all because of, let us say, substandard UI design.
At work we've got that build system used to build the delivery artifact, like signed msi files which will be delivered to the clients. It's usually a long process, which need to be supervised since it's notoriously flaky. And then the build process will ask you this 'File over 250 Mb, do you want to discard [yes/no]': I think that every team member got burned at least once by answering yes, losing the build result and having to restart.
Been on the receiving end of a few such calls.
One was someone ringing the support line to say their database wasn't up. I asked them to check the database directory - it was empty. They had seen the disk was a bit full, saw some large files, and did an rm -rf. Poor lady was almost crying down the phone - no backups for 4 days.
The other was someone who managed to do an rm -rf from '/' but not as root. Not that it matters, because that can bork a machine just as much as completely deleting everything. Worse still, he did it twice! The first time we left the server (Linux) still running - we didn't dare turn it off or reboot it until we could work out what we were going to do. The second time.... :)
I do recall a colleague (who, shall we say, was not the most technically inclined of people) getting very confused about the fact that their code was causing a database table to mysteriously empty - after each run, he couldn't find any of his test data in there.
Thankfully, this was just in the dev environment, so we could restore his system back to a known good state.
Much head-scratching later, it transpired that all the records were still in the table. However, he'd accidentally terminated his SQL command early, so instead of running "UPDATE x SET y WHERE z", he'd just ran "UPDATE x SET y".
The result? a few thousand records, all with the same ID....
I've seen it happen since, sometimes with far more of an impact (hello, production system!). At least this first lesson in the importance of sanity-checking and transactional integrity was a humorous one!
"The answer was not good: "Even a reboot (IPL in those days) would not fix the problem."
The command had been recorded and would resume as soon the computer was restarted. There was no going back."
I suppose this kind of situation is why the Gods of Unix have implemented the concept of single users boot ...
In this case, booting single with no batch system would have allowed emergency careful maintenance of the batch system ...
But I indeed don't think this is possible on iSeries ...
Single boot on iSeries is possible (with the appropriate login credentials at the terminal) but would not have been necessary in this case as an IPL terminates all jobs. Of course there are autostart jobs which may do a restart of some processing, but those can be killed by any qualified (and authorised) operator.
I was called out to a remote site / 4 hour drive from HQ.
Super secret stuff being done on said site and no remote access etc. After faffing about because my security clearance for the building and data being processed was OK but it was inside a building site and I had not done an induction I spent the night in a (rather shitty) hotel.
4 hour induction the following morning to walk 100 meters along a road with an escort, total overkill. The escort had instructions to stay with me while on site but he didn't have clearance to enter the building so he had to wait outside in the rain.
Anyway the issue was that someone stuck a pen in the reset button on the QNAP NAS because they 'couldn't log in to the Web interface and wanted to reset the password'. I just thank the gods that it had extrernal USB HDD backup and backup to one of the several client PC's.
No data lost and the reset button was disabled. I logged it under hardware failure and recommended that all pointy objects were confiscated from our offices just in case.
About 3 or 4 years ago millions of lines of current output disappeared off our JES2 spool (the commands in the article are JES2 purge commands) - upon investigation I tracked the command to an "operator" in a certain sub-continent (the staff who replaced our really good "expensive" operators) I asked what he was doing - apparently he was "practising his JES2 commands" - on a live system operated by a multi billion pound organisation. In particular he was practising the age related purge - he was teaching himself how to purge everything over 2 days old but he clearly couldn't understand the difference between <2 and >2 - that said - what the hell was he doing taking it on himself to purge everything over 2 days old? - We keep 120 days worth of output on the spool! (obviously crucial stuff is spooled off to an archiver) and the spool was only 45% full. He was just playing.
Of course, excuses were made and this particular idiot got off scot-free and is still at large, meanwhile many excellent experienced operators sit on the dole or are in forced-retirement (two suicides I know of by dumped 50 somethings who are too old to retrain and can't support their household on minimum wage jobs.) It's only a matter of time before one of these doing-the-needful idiots brings a blue-chip company to its knees. When it happens, will the fat cats hand back their bonuses for "saving money"? Answers on the back of a P45...
"Of course, excuses were made and this particular idiot got off scot-free and is still at large"
Consider yourself lucky if he ever confessed his sins. I've worked with off-shore countries, where, in similar cases, even being confronted a formal proof, most of the staff would vigorously deny all and would get away with it safely (HR being their country's HR).
I may even send a story about that for on call.
At the same way-up-in-northern-Canada data centre where we once blew up and /roasted a ham hock with a 400 kilovolt jolt of electricity, we also ran an obscure operating system for a financial services server system that was a combination of DOS and *nix that had an INTERPRETED scripting language based upon BASIC which we used to perform numerous "Delete Old Batch Job Tasks" on an automated basis. Normally, one should delete a job via it's actual STRING NAME and NOT it's process identifier which was a number from 1025 to 65535 which was re-used when the process IDs eventually wrapped around.
AND normally after you delete a job, you put in a pause or delay command giving you enough time time to do a Control-S (Pause) or Control-C (break with prompt) if necessary to break out of a script. UNFORTUNATELY, back in the day, this particular OS was prone to race conditions where a SINGLE batch job could take over an entire system, process ID's got re-used and it would never give up control unless a hard electrical shutdown was completed AND the offending script was deleted and/or the hard drive it was in was REMOVED from the storage system.
SO IN SHORT .....
a QUARTER BILLION DOLLARS ($250 Million US) of financial transactions went to hell for 10 hours because of some batch code much like this:
10: REM Set Process Delay time
20: for i=1 to 2
30: REM loop through and Set job id number
40: for x=1025 to 65536
50: deleteJob( x )
60: REM: Delay in seconds to allow enough reconciliation data flush-to-disk time before deleting next job
70: pause( i )
80: Next
90: Next
100: goto 10
Because instead of a pause being 1 to 20 seconds or even milliseconds, on THIS SYSTEM the pause command used CLOCK TICKS as a function parameter and the outer for next loop was SUPPOSED to be a simple ASSIGNMENT statement rather than the for-next loop that was simply cut and pasted over from another batch job script file and someone FORGOT that one second delay required a command like: Pause ( 1000000 ) !!! AND there was no final break out of the code, so it looped forever!
Now imagine 64000+ financial stock trade transaction reconciliations being deleted in a race condition AND looped around again and again WITH NO WAY to interrupt it because NO OTHER keyboard or other input (i.e. Control-C break or pause command) could be interpreted because the time delay was in clock ticks (i.e. waaaay too short of a delay) AND the priority of the task was set at such a high level the system became bogged down and even a hardware reset would still cause a RELOAD of the script and run it at OS System-level priority and continue and continue forever!
Because the delay between job deletes was way too short, and the set-process-ID job code looped forever, the data flush-to-disk functions were never able to be called on the internal account reconciliation code so MANY hundreds of millions of dollars of trades were never properly recorded and sent to New York, London, HK and Toronto for payout. Because of the margins on such trades were time sensitive, any delay in account reconciliation is profit destroying!
In the end, we had to actually CUT (not unplug!) the ribbon cable from the system to the offending hard drive to get a proper system shutdown and go get an offsite SYSTEM OS IMAGE BACKUP which was a 4 hour drive away into town from a remote wilderness data centre location, get over to a locked-up safe to retrieve the second system OS disk, drive all the way back another 4 hours, re-install and turn the systems back on to re-run the job with the batch code fixed properly to then find out later the next morning that ten hour delay in reconciliations cost over a Quarter Billion US dollars to the companies involved.
I should note that NOBODY got fired in the end, even after a thorough investigation by external parties because it was deemed a HARDWARE and OPERATING SYSTEM failure point rather than the "System Operator Error" it actually was!
Still, to be part of a team that was kinda the CAUSE of a Quarter of a Billion U.S. Dollars of someone else's money being flushed down the toilet IS STILL AWE INSPIRING !!!!
.
I used to have a nifty little task management utility in Windows. It was also potentially dangerous, as I found out one day when I accidentally used it to kill winlogon.exe on my own NT box, and watched as the machine bluescreened without saving the document I'd been working on.
I've learned from that, and now have a machine dedicated to testing. It's handy to have a machine I can wipe without worrying about lost data.
Way back in the early 90s, when I'd just left school, I decided to follow in my father's footsteps and be a Freight Forwarding clerk.. The company I worked for was a small company, and, TBH, it was a boring job.
HMRC had just switched to a then state of the art electronic submission system for paperwork that required we use specially designed software on the one IBM AT in the office. Unfortunately, as we had no one with real IT experience working for the company (I was a keen hobbyist, but I had only been out of school a couple of years), we had a backup system, but it had not been tested, probably ever.
Every friday, my boss cleared old jobs off the system to free up space. The software required the user to enter the record number manually, but could work on a range of records. So, you could enter the start record as 12000, and the end record as 12010. My boss, one day, entered the start and end the wrong way round. The software didn't check for that, and happily deleted over a year's worth of records. HMRC required we keep the last 5 years of data on the system. They had dial in access to the computer, and would check it regularly.
So, my boss noticed all these records had gone missing, and, knowing I was a keen computer hobbyist, asked me to fix it. I tried the backup, and it didn't work. He phoned the supplier, and spent a couple of hours on the phone shouting at some poor tech support guy on the other end of the phone. Nothing they suggested worked.
The final solution? I had to go through every single shipment we had exported in the year that was lost, and re-enter the data from our paper files. It took about 2 months. Thankfully, for me, they made me redundant (although sadly I hadn't been there long enough for a payout), and the company itself didn't last long after I left. Obiously, I am not really thankful for that. People I liked lost their jobs, and that is never good. It was good for me, because while I had various other jobs that I didn't really enjoy and didn't last long, it ultimately got me thinking, and persuaded me to go to University (which is something I was resisting), and enabled me to start the career in tech support I have, which is something I enjoy.
A long(ish) time ago, in the late 90s/early 00s I was working for a company where I had spent a fair few years doing coding on a PC based Retail Point of Sale system.
We also provided servicedesk services for a variety of retail chains in the UK and used a fairly low-to-mid-range system, called Heat, to log calls for all the different customers.
I was given the task of seeing if we could make it multi-tenant so that calls could be easily logged and managed for each customer using categories, SLAs, and stuff like that tailored for each customer and to make it easy to switch between different customers within the system when taking calls.
After a bit of time playing with the out of the box customisation options, which were mostly around tweaking fields on forms and setting-up system wide categories,SLAs,escalation routes, I ended-up implementing a whole load of functionality using triggers and stored procedures on the back end database, which was SQL Server 7.0.
I had never even clapped eyes on SQL Server before this - The PoS system I had worked on used a flat file back end database and I had no SQL Server training (of course!) so I pretty much learned on the hoof and had no support from the people who had originally installed the system.
To cut a long story short, I accidentally truncated the core call reference table that everything else hung off pretty much at the busiest time of the day! There were no Pk/Fk relationships within the database to stop that happening - all the relationships were handled within the software itself.
I couldn't just restore the database without kicking everyone out of the system and causing a fair amount of TITSUP* so, after about 30 seconds of wild panic, I ended up restoring to another database everything up to the last transaction log backup, which I had thankfully enabled a couple of weeks previously after discovering the woeful backup strategy that had been left by the original installers, and copied the restored data back over to the live database.
All done in 15 minutes flat! Hardly anyone even noticed there had been a problem and I found a dark corner to go and recover my sanity!
Needless to say, I was much more careful about using the truncate command after that....
On the plus side, I moved to a better paid DBA job in the finance sector a year or so later off the back of that project!
* Total Inability To Support Usual PoSProblems
Why, when I know it stands for "Point of Sale" do I read "PoS" as "Piece of Shit"?
Anyhoo, as I've said before, we used to maintain our own system that listed all the equipment we had. We designed the various interfaces (web and Java) and backend service, and database.
All of a sudden one day, the system refused to do anything.. We couldn't loan out equipment, check it back in, run any reports or maintain the inventory. Everything just generated errors. My colleague took the system offline, and started going through the logs. Apparently, the system's central transactions table (which it used as an audit trail, as well as to store what was checked out to users) had vanished.
Rather worried, my colleague load up Sql Server Management Studio and started lookign for the table. Somehow, the table had been renamed to a full stop. Now, there were four people who had access to that database at the time. Me, my two colleagues who help design the system (all three of us designed different aspects) and the department DBA. Everyone denied renaming the transaction table, but I suspect my other colleague. I didn't do it (and had been busy elsewhere all morning, so had an alibi). The DBA wouldn't have done it, because he had no reason to be doing anything other than looking at the table. My colleague who fixed it would have been the one called to fix it, so I doubt he'd be looking to do something that generates work for him, which leaves my other colleague.