back to article Automation is great. Until it breaks and nobody gets paid

With Friday upon us, and a weekend next on the schedule, The Register once again brings you an instalment of On Call, our weekly reader-contributed tales of being dragged out at all hours to fix failures inflicted by the foolish, flummoxed, or fatuous. This week, meet a reader we’ll Regomize as “Hugh”, who in the early 2000s …

  1. Anonymous Coward
    Anonymous Coward

    This is why we need code review

    Because someone who writes a script designed to "append itself to his crontab each time it runs, then execute his target script 16384 times, and copy itself again" needs to be stopped before it even gets to QA, who probably only ended up doing the same testing that the programmer did because the programmer told them to test it like that.

    1. big_D Silver badge

      Re: This is why we need code review

      I was on a training course at Digital in Reading for VAX Administration.

      For a laugh, I wrote a quick script which logged all users who weren't me off the system, then submitted itself to the batch queue to be re-run immediately.

      It was great fun, worked 100% correctly. Only...

      When you are logging in, you are given the temporary username <LOGIN> in the userlist, which the script was using to terminate user sessions. Then I made the mistake of logging myself out manually, before I killed the script.

      Problem, big, huge! I couldn't log back in, because although my username was excluded, I first had to get past the <LOGIN> stage. Didn't happen.

      The instructor took us into the computer room and then tried to log onto the console, same problem. In the end, he had to hard reset the thing!

      Luckily, he used it as a learning experience and I wasn't thrown off the course, but I did learn a valuable lesson!

      1. Roger Lipscombe

        Re: This is why we need code review

        "to be re-run immediately"

        See, what you *should* have done is schedule the script to run 5 minutes later, instead of immediately. This gives you just enough time to log in and undo the mess.

        At least, I *hope* that's the "valuable lesson" you learned...

        1. heyrick Silver badge

          Re: This is why we need code review

          "you just enough time to log in and undo the mess"

          Or enough time to be somewhere else and...nope, wasn't me.

      2. werdsmith Silver badge

        Re: This is why we need code review

        <deleted>

      3. Anonymous Coward
        Anonymous Coward

        Re: This is why we need code review

        A sysadmin for a school I attended as a youth had script that ran away with itself that ended up exposing student skullduggery. As students, our personal storage areas were limited in size to something like 64mb...which wasn't as small then as it seems now, but still quite small...so I and an accomplice found a way to create a hidden folder inside the sysadmins folder which we granted ourselves permission to access via some interesting traversal rules so that we had essentially unlimited storage which we "mapped" to our own folders as a hidden folder (we had to hide it, because the method we used to map it essentially gave the mapping to everyone, including the sysadmin)...where this fell apart was the sysadmin had a script that created a replica of a folder inside his folder as a sort of quasi-backup solution, it ensured that he had a "lagged" version of his files by about 1 hour in case he needed to recover something quickly...and he'd keep these 1 hour snapshots for a few days...catalogued using folder names with the date and time that the snapshot was "created"...this was achieved by the way, because all the admin functions were performed using tools that the admin carried around on a floppy disk...a disk we managed to clone during an IT lesson when he left it lying around when he left to room to do something else for a few minutes...that was a very tense few minutes for us because we didn't know when he would return, and at that point we had no idea the extent of the stuff we were cloning!

        The sysadmins folder inside his folder is where our "mapped" hidden folder would show up, along side our other hidden folder. Creating a kind of circular reference to itself...anyway, long story short, his backup script went bonkers one day because he'd adjusted it to use a different flag (we suspect he was using the access time on the files, and we think his script was scheduled to run once an hour). Anyway, this led to our hidden folder and the hidden mapping being copied every hour on the hour several times...by the time we'd noticed (because we closely tracked the free storage on the server to ensure we flew under the radar and there were no sudden spikes in usage, well as close as we could with 1 IT lesson a week)...we were shitting bricks because the disk was down to 2% free and our little clandestine storage area used up nearly 2% of the disk. We rationalised to ourselves that it would be fine, because there was no chance the OS was on the same drive, they will find the files, but they can't be linked to us specificially..so we had a moment of calm...which lasted around 49 minutes...because just as the bell was about to go for the next lesson, everything went to shit. Disk space messages started popping up all over the place, people couldn't log in, the internet went down...hell on earth...we found a day or so later that the storage area was indeed on the same drive as the OS and the full drive caused the server to implode and there wasn't even enough space to boot it up after the sysadmin restarted it.

        We were never caught for this, because our janky mapping applied to everyone and anything in that folder would automatically be "owned" by the sysadmin...we have no doubt though that our secret collection of Doom WADs, Bomberman clones etc was found out...because a whole host of new restrictions appeared on the system and necessitated a change in tactics if we wished to keep our stash of games on the network, the alternative was to rely our hidden cache of floppy disks that was kept in a cavity behind the hand dryers in the boys toilets, there is a good chance they are still there...this method led to even more chaos (which lasted for months), caused the sysadmin to have a full blown, sweary, meltdown in one of our IT lessons (he had to be dragged out by two teachers, whilst screaming that we were all bastards...not just me and my mate...because he didn't know it was us...literally all of us), did get us caught (I got a particularly scathing bollocks which involved my IT teacher telling me I'd never have a career in IT, and got me chucked off the school website team) and involved the SPOOL folder on the server, locking out the sysadmin entirely by mistake...but that is a story, that involves many reams of wasted paper, a sysadmin sleeping in his car for 2 days, lots of aggressive bollockings and eventually a truce between your heroic hacking duo and the sysadmin for another day.

        As a sysadmin now, looking back...yeah we were bastards.

        1. Jou (Mxyzptlk) Silver badge

          Re: This is why we need code review

          "Why did you do it?" "Because the system allows us to! How should we know that was forbidden? Aren't we supposed to learn the system?"

        2. Doctor Syntax Silver badge

          Re: This is why we need code review

          "my IT teacher telling me I'd never have a career in IT"

          Obviously someone who had o idea how things happen in the real world.

          1. Anonymous Coward
            Anonymous Coward

            Re: This is why we need code review

            Indeed. The funny thing is, at the time wifi was in it's really early stages...rare enough that it was fairly unlikely to be configured correctly, but common enough that you could eventually find networks in the wild...I was actually making money at the time helping businesses password protect their wifi, used to lug a heavy (probably at least 5kg) thick bastard of an Olivetti laptop that had around 1 - 2 hours battery life if I was lucky, in a bulky leather satchel...in a time before people called the police for probing their wifi...in hindsight, I should have charged more than a fiver a time, but I was 14 and had no concept of what to charge and a tenner was a lot of money to me then...I'd find maybe 3 or 4 a month and it helped cover my occasional cinema tickets, ice skating entry etc. technically I had already started my career in IT.

            Charging £5 to set a password made me feel like king dick. In hindsight, I was a pillock charging so little.

            A year or so later, I was occasionally being smuggled into the IT department at a large, at the time, tech firm (they are now basically a dead meme) to help out a family friend with various things, mostly network...but occasionally software development...we're in the late 90's here, just pre-dotcom boom at a time where it was really trendy to have a "whizz kid" around doing tech stuff. It's amazing how times change and these days it's basically unheard of to have anyone that young anywhere near any advanced tech...it's a shame really, and is probably the reason we have shortages of good engineers.

            The engineers are out there, you just need the balls to hire them. Say what you like about boomers, but they took daft risks like letting kids loose on tech...it's Gen-X that doesn't take the risks. Boomers are very self aware and they know when to take a risk...Gen-X is risk averse...

            I think the next wave of proper risk taking will be when us millennials finally get a turn at the helm (if ever)...let's be honest, we're the generation that was raised by boomers...we are therefore, most likely, the next "boomers".

            1. BinkyTheMagicPaperclip Silver badge

              Re: This is why we need code review

              Risk taking is never coming back, the world has changed, it has nothing to do with generations.

              Millennials are already running IT, but they're not daft. Gen X did plenty of stupid things growing up including exploiting systems - that's why they're risk averse. Gen X grew up with the changes in security - from dialup exploits in the eighties, LAN exploits in the early to mid nineties, & widespread Internet exploits in the mid nineties onwards.

              Towards the end of the nineties you could (just about) get away with some unprofessional practices, patching wasn't integrated into Windows until Windows 2000, but even before then it was becoming a real hassle to harden NT 4 and Internet connected systems were still being exploited on a regular basis.

              Once the early 00s hit with fast, cheap, broadband available nearly everywhere it was a necessity to manage systems properly.

              It's only going to get worse - there will be more growing pains with software repositories, and widely used but not properly funded open source used by commercial companies.

              It's also easier than ever to get access to 'advanced tech'. Computers and software are cheaper than they've ever been. Specific technologies such as VMWare may be expensive for the novice to get into, but there's nothing stopping an engineer learning.

        3. Paul Hovnanian Silver badge

          Re: This is why we need code review

          "my IT teacher telling me I'd never have a career in IT"

          A more fitting punishment would be to spend the rest of eternity with a career in IT.

          -- Sisyphus

    2. Anonymous Coward
      Anonymous Coward

      Re: This is why we need code review

      I recall, many years ago, being asked to spend a few days with a supplier writing software for us. It was a supplier database to be shared by all major companies in a business sector. The first pass of the scheme was an Oracle database that was installed on each company network, with data updates sent each month on a floppy disk. This was almost 20 years ago, when high-speed internet access was a 56k modem. The new system was to be accessed via the internet, with a browser, and most of the client companies were being added to a higher speed private network that had a decent pipe to the main server.

      I turned up at our supplier's offices, sat down at a spare desk, fired up my laptop and started using the system to run the sort of queries users would be making. This was when IE was the main browser in business and that's the one I started with. Not too bad, I wasn't going to win popularity stakes as I ran queries the programmers had never considered. Their initial responses were to ask why anyone would want to run that query - my response usually was "because they can." "It doesn't have to make sense to us; if it makes sense to them, no matter how ridiculous is may seem then, as the client, they're entitled to run it. If the system can't handle it, either handle it nicely as an error, or update the system so it works." The latter option was the usual fix, as I had a fairly good idea what the clients might do (having worked with several of them for many years) which, as I pointed out, was why I tried those queries. I also tried a few illogical ones (knowing some bight spark somewhere, sometime, might try it) - to check error handling. (I should explain that I'm not a programmer, nor an expert in ICT systems - my expertise was in knowing the clients and their needs).

      Things took a turn when I then fired up Netscape to run tests. "You can't use that" was the plea, everyone works with IE. Not everyone, I replied, and the contract did not restrict users to IE. My stay in their offices was a bit longer than planned, but we got there, and actually had a working system when it went live.

      1. nintendoeats

        Re: This is why we need code review

        That bit about "because they can" is so important. Unless you explicitly state in your documentation that something is undefined behaviour, you must do something intelligent. I find myself using that phrase often; "it won't WORK, but it will do something intelligent".

        Even ignoring that you should expect users to do anything they are allowed to do, I feel that a system which does not handle all inputs is probably not well thought-through and generalized in its implementation. In general, a well made piece of software will have few edge cases because it will have systems which naturally handle most possible inputs.

        That last point may be a pipe dream, but it's one I'm going to stick with :p

        1. MrBanana

          Re: This is why we need code review

          There are a couple of cases where code review doesn't quite work, unless it can be done on the fly. Allowing users to write their own queries - "I didn't mess with the database, I simply want to know all customers called Fred with an order placed this year, so I just typed WHERE customer.fname = 'Fred' AND orders.date > '2023-01-01', what is this cartesian join thing you are talking about...". Or, allowing computers to write the queries with a drag and drop interface that has no limitations, or any kind of optimisation, than then generates queries with many, many thousands of lines of bad code.

      2. Anonymous Coward
        Anonymous Coward

        Re: This is why we need code review

        "You can't use that" was the plea, everyone works with IE.

        Ah yes, the bad old days of "IE or nothing".

        I once worked in a business that had a development team doing bespoke business software. We "ate our own dog food" and used the same (well almost) system in-house - including for things like our timesheets. Annoyingly, some bits either didn't work properly, or didn't work at all, unless you used IE6 on Windows. As a Mac user, this was ... irritating as I had to fire up a Windows VM every day just to do my timesheet - a colleague occasionally said "you're doing your timesheet aren't you ?", noting the cursing coming from my direction at the combination of Windows and this 'orrible system.

        When I mentioned this to the head dev, his response was that it was my fault for not using Windows and IE like everyone else - and no, he had no intention of even looking at what the problem might be as he only programs for IE on Windows as "that's what all businesses use".

        I noted, but somehow managed to avoid gloating at him, that not long afterwards - some of the senior managers at the biggest client were using Macs and iPads ...

        Postnote: as a parting gift, a dev just before he left had a look at the timesheet code, fixed an errant ";", and then it worked on Macs as well.

        1. Missing Semicolon Silver badge
          Facepalm

          Re: This is why we need code review

          Now it's "latest Google Chrome or nothing".

          And it must be the latest too! "Web site does not work" -> "Have you upgraded Chrome?"

        2. Doctor Syntax Silver badge

          Re: This is why we need code review

          Ah yes, the bad old days of "IE or nothing"

          The bad old days are still with us, just a bit updated from IE to whatever short, possibly very short, list the devs bothered about. And then there are those who take umbrage and won't display anything at all unless JavaScript is enabled for their site. Or won't display a close button on their cookie pop-up (looking at you, National Archives).

          1. Anonymous Coward
            Anonymous Coward

            Re: This is why we need code review

            I met this only this week when I was scheduled a video consultation with a healthcare professional. Followed the link to the site, clicked (as directed) on the "Test my equipment" button to find no obvious clue where the test was (but that's an aside, it took several clicks on non-obvious things) - only to reach a hard "Stop, you ain't getting any further unless you use Edge, Chrome, or Safari", at least they gave a bit of choice. I can understand them not wanting to support ${oddball_browser}, but having a hard block rather than a "continue at your own risk - it might not work and we won't help you" option is ... irritating.

            I was tempted to play with custom browser strings as I know Opera works fine with Teams (well as much as Teams is anything that can be called "fine") and Jitsi. But for the one consultation it came under CBA and I fired up Safari for the first time in ages.

            1. Doctor Syntax Silver badge

              Re: This is why we need code review

              "to find no obvious clue where the test was"

              I've often thought there should be a recognised test procedure requiring three people.

              One is the developer or a reasonably senior member of the team is there's more than one. The next is a user. The user should have no knowledge of the application that's been developed but a good knowledge of the application domain and of what the application should do.

              Ostensibly it's to allow the developer to learn how users use the application, in reality it's to learn just how unusable it is.

              The user is allowed to ask developer no questions other than "How does it tell me how to do X?" and the developer is allowed to answer no other questions.

              Ostensibly the third person is an invigilator to enforce these rules but is, in reality, there to stop the other two coming to blows.

              1. I could be a dog really Bronze badge

                Re: This is why we need code review

                And many organisations do this, in great depth. Apple famously did a lot of this when developing the original Mac OS - and things got changed as a result.

                One example is that they found a lot of users getting so far and then cancelling at the last opportunity. Seemed that originally they had buttons for "Cancel" and "Do it" - but many people misread the "Do it" as "Dolt" which I'm sure many of us recognise as something of an insult in US vernacular. So not considering themselves to be dolts, users were taking the other option. So "Do it" changed to "OK".

                On the other hand, I can't help thinking that some other vendors these days do it for a different reason. Do user testing, and if users find it too easy to use - re-engineer it until they find it harder. That's the most obvious explanation for some of the **** I get to use at times ! Same with cars and maintainability - that maintenance item (e.g. alternator or starter motor) is too easy to change, find something to stick in the way or move it down the back of the engine.

            2. Yes Me Silver badge
              Thumb Down

              Re: This is why we need code review

              "I can understand them not wanting to support ${oddball_browser}"

              I can't. This is why we have WebRTC standards.

              1. I could be a dog really Bronze badge

                Re: This is why we need code review

                Yes, in principle it should "just work" on any browser that properly* supports such standards. But I can fully understand any business that doesn't want to deal with someone getting in touch to say that it's not working on Furbleview version 0.1 - with them never having heard of it. They then have the task of either persuading the user to use a different browser (pick from any of the list we've painstaking tested and know** will work), or find and install Furbleview and figure out why it's not working.

                Much simpler is to say that "we have only tested on, and support the use of, X, Y, and Z". I'm OK to carry on and click the "yes I know that, but I'll take my chances" button - but what's really annoying is sites like this where there is no such option.

                * We know just how well certain vendors do, or don't for commercial reasons, support all the standards in their entirety without any proprietary changes or extensions don't we ?

                ** As best it's possible to determine given the wide variety of configs people could come up with.

          2. Anonymous Coward
            Anonymous Coward

            Re: This is why we need code review

            Yup

            3 HR systems here at work.

            . Time sheets - needs Firefox

            . Expenses - needs Edge

            . Holiday - needs IE and modifications to system wide TLS support, and back again when finished

            plus corporate intranet needs seem to vary!

      3. Cian_

        Re: This is why we need code review

        The almost 20 years ago, high speed internet and Netscape refusal reminds me - pretty much exactly that amount of time ago, I was getting an 'engineer'-lead install of a blisteringly fast 512/128k DSL line, which hadn't been provisioned properly. Said ISP 'engineer' didn't actually have a computer and tried to blame the line not working on my use of Mozilla; as if it had somehow poisoned the Cisco router - as clearly sites didn't work in Internet Exploder either afterwards.

        One argument and a trip to the exchange later, there was internet connectivity. In both browsers.

      4. John Brown (no body) Silver badge

        Re: This is why we need code review

        "my expertise was in knowing the clients and their needs)...and actually had a working system when it went live."

        Well done you! That is what needs to happen more often these days and the first statement is probably a major reason why the second statement is true.

        Too often systems are created with little to no input from the "coal face", let alone testing by those people.

    3. Robert Carnegie Silver badge

      Re: This is why we need code review

      I think the description is illustrative rather than being precisely accurate. Although 16384 is precise, obviously.

    4. Innominate Chicken
      Pint

      Re: This is why we need code review

      Is it really QA when they only run the exact tests they were told by the original programmer?

      Am I being excessive if I say that the only things the QA team are given are the code, its documentation, and the original specification? No contact with the programming team to minimise the risk of them picking up on the "proper" way to test it from the programmers.

      Icon: QA waved everything through based on the test the programmers already wrote and left early to have a few of them.

      1. Cheshire Cat
        Coat

        Re: This is why we need code review

        A project manager, programmer, and QA tech go into a bar...

        The PM orders a beer.

        The programmer orders a beer.

        The QA orders -1 beers. Then orders a fish.

    5. DS999 Silver badge

      I have consulted in many places over the years

      And have NEVER seen one with any sort of code review process for shell scripts.

      When I needed a script I just wrote it. In most places it was a grey area whether using that script in production required any sort of change control. I would file a CR if it was going to run as as "part of the OS", i.e. as a scheduled job or as a startup script. Otherwise it was copied to /usr/local/bin or whatever when I considered it ready. Stuff like that would be run on-demand by admins for a specific purpose like auto-configuring top of rack switches or locating orphan storage, so in the unlikely event they created some chaos due to a bug someone should be able to put 2 and 2 together and realize "hey the chaos started right when I ran that script".

      1. MJB7

        Re: I have consulted in many places over the years

        Good grief, we have 100's of shell scripts in our git repos - and you can can't any of them without a code review. (I am trying to convert many of them to python scripts - but that's a _long_ term project).

        1. Doctor Syntax Silver badge

          Re: I have consulted in many places over the years

          "I am trying to convert many of them to python scripts"

          You're trying to fix what's working?

        2. Cheshire Cat

          Re: I have consulted in many places over the years

          "I am trying to convert many of them to python scripts "

          Yes, convert a simple, working shell script to something that is understood by fewer people, and now has a dependency on the whole of Python being installed, plus a few extra libraries...

          If you don't need to have Python (or Perl, Ruby, Php, GCC, etc) installed on your server it's inherently more secure, less to keep patched, less disk space required, and so on. At least shellscript is a universal that everyone knows, and unless you need something really esoteric or high performance I'd go with the KISS principal

          1. doublelayer Silver badge

            Re: I have consulted in many places over the years

            Not when you need to build ever more complex bits of shell, calling sed or awk frequently, for something that would be structured well in any programming language. At a certain point, shell scripts become too large to edit in a productive way, and programming languages that have some more useful concepts like type management and better ways of reusing functionality become better for the task.

            You could write almost all of the basic utilities as a shell script, but would anyone want to do that? The original writers didn't, which is why most of them are in C. There's also portability tradeoffs. Yes, using a Python script means you have a dependency on Python, but if you don't use a particularly ancient set or use a feature introduced last week (there aren't that many of those anyway), then it will work identically in a lot of environments. A shell script is dependent on lots of aspects of the system, from which interpreter is running the script which may be user-dependent to the subset of utilities installed, and some of those differences in functionality can be fatal. Depending on the environments in which the script is running, you can experience some portability deficiencies from shell scripts as well.

          2. Jou (Mxyzptlk) Silver badge

            Re: I have consulted in many places over the years

            Simple things true. I still use shell script, in my case .CMD, when it is good enough. For anything beyond a simple command-order I expect powershell 5.1, which is WAY beyond. Offering structured objects, being able to directly use .xml or .json as structured objects instead of insane text-parsing by hand and so on. And if the need rises I can use .net objects directly, or .net code inline, call c++ dll functions, or still call .cmd for things which work better there (like piping a videostream from ffmpeg-decode into the av1 encoder).

        3. Yes Me Silver badge
          Coat

          Re: I have consulted in many places over the years

          "I am trying to convert many of them to python scripts"

          I find that needs a lot of

          do_cmd = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

          _out, _err = do_cmd.communicate()

          1. doublelayer Silver badge

            Re: I have consulted in many places over the years

            That's true, although I find that subprocess.run looks cleaner. Still, that's just two lines to run a command and get its output. If your entire script looks like that, then you can probably use a shell script very easily. Most of mine end up calling a few programs like that, but then have several lines of output parsing or command construction which I'd much rather do in Python where string parsing isn't a write-only operation and where loops can use more complex conditions.

    6. Nematode

      Re: This is why we need code review

      At an old place of mine, doing real-time control systems, comms interface Vaxes, PLCs and few other goodies, we used to have a rule: "the software isn't finished until [regomised name - the boss] says it is." He had a knack of rocking up at a shop test, doing one or two things and immediately breaking it (when of course it shoudl not have broken)

    7. steviebuk Silver badge

      Re: This is why we need code review

      But code review isn't*agile. Innit.

      *Reason agile is bollocks.

      1. TimMaher Silver badge
        Trollface

        Re: Bollocks

        Yup. Needs a good kicking.

  2. andro

    Sounds more like malware than a legit thing. I honestly didnt know where this story was going to go half way through.

    1. Anonymous Coward
      Anonymous Coward

      "Sounds more like malware than a legit thing."

      Yes, why would you run the darn thing 16K times ? The culprit would have deserved a good punch in the nose !

      1. KittenHuffer Silver badge

        16384 happens to be 214. So I would guess that the script was looping until the variable used by the loop was overflowing at that particular value.

        1. Arthur the cat Silver badge
          Trollface

          16384 happens to be 214. So I would guess that the script was looping until the variable used by the loop was overflowing at that particular value.

          Ahh, whatever happened to all the old 15 bit computers?

          1. RM Myers
            Unhappy

            15 bit computers?

            Or maybe it was a signed variable - one bit for the sign and 15 bits for the number? And 1 + 15 = 16 bit computer?

            1. MJB7
              Windows

              Re: 15 bit computers?

              Good grief! Doesn't _everyone_ know that signed 16-bit integers overflow when you increment past 32767? Really?

              Icon, my age.

              1. Paul Hovnanian Silver badge

                Re: 15 bit computers?

                https://xkcd.com/571/

        2. SCP

          One of the things that always set the old spidey senses tingling - when bad things happened at a multiple of 2^n (+/- 1).

      2. Spazturtle Silver badge

        Maybe each run of the script would collect the data for a single contractor and each time it was run it would increment a counter.

      3. AVR

        Spreadsheet lines? A few per contractor (four, say) to a maximum of Excel 97-2003's 65536 limit.

      4. Bebu Silver badge

        Needed more boot at coding bootcamp.

        "why would you run the darn thing 16K times?"

        At a guess

        MAX_CONTRACTORS = USHRT_MAX / sizeof (unsigned) // 64k / 4

        for 0 <= contractor < MAX_CONTRACTORS do

        if infosource [contractor] != 0

        collect from infosource [contractor] and send timesheet for contractor to HR etc

        fi

        resubmit job to cron !! thinking (if at all) "batch" command :(

        od

        resubmit job to cron !! give it more welly

        When you don't have separate prod and dev environments I guess you have to be agile to avoid the brown stuff when it hits the fan.

  3. Will Godfrey Silver badge
    Boffin

    Training

    I firmly believe that anyone writing software (yes, even a 'simple' script) professionally should first prove that they can produce stable code for schools. This will be 'tested' by malicious attacks from pupils, multiple wrong guesses by absent-minded professors, and random entries by new, dreamy primary school teachers.

    I have the scars!

    1. Anonymous Coward
      Anonymous Coward

      Re: Training

      Dreamy as in very attractive, or as in head in the clouds? Note: not mutually exclusive.

    2. Robert Carnegie Silver badge

      Re: Training

      I think if you make trainee programmers work in that environment, then you will have very few trainee programmers... except for recruiting the pupils to whom you referred, if only to punish them. And also, if those kids enjoy programming, then make it into work and put a stop to that. ;-)

  4. big_D Silver badge
    Big Brother

    Payroll, not automation...

    I was on a project at a Royal Naval dockyard, to replace their old personnel system with our new one.

    The problem is, being RN, you need positive vetting, which takes time. I was draughted onto the project at the last minute, given the vetting forms on a Friday afternoon & told to report to the dockyard on Monday... Hmm, 6 weeks of checking completed over a weekend, I don't think so.

    We drove down Sunday evening and booked into the hotel, then Monday morning, I presented myself at the guard hut. I managed to get a 3-day pass, but was told that was it, no ifs, no buts, without the vetting forms being approved, I wasn't coming back on sight on Thursday.

    Thursday rolled around and the guard told me, sorry, no dice, you ain't coming on site!

    I pointed out that I was working on migrating the payroll data from the old system to the new one, and if I didn't come on site, they wouldn't be getting paid at the end of the month.

    A couple of minutes later, I had a 1 month temporary pass!

    Interestingly, the dockyard had paid extra for shielded terminals. The supplier of the terminals decided that shielding wasn't really needed and decided to pocket the difference between the normal terminals he delivered and the cost of shielded terminals... Everything went fine, until a US Navy ship tested its radar in the harbour. Queue a hundred or so dead terminals, a red faced supplier, who had to suddenly lost all his "profit margin" on the terminals, as he had to replace them all with the shielded versions that had been ordered.

    1. J.G.Harston Silver badge

      Re: Payroll, not automation...

      That's a long queue.

      1. Anonymous Coward
        Anonymous Coward

        Re: That's a long queue.

        Longer than the cue to play snooker, that's for sure.

    2. Aladdin Sane

      Re: Payroll, not automation...

      Testing radar in harbour can lead to FLKs.

    3. Boris the Cockroach Silver badge
      Thumb Up

      Re: Payroll, not automation...

      You got a like for telling the truth, the whole truth and nothing but the truth.

      Although I had

      " 'ere... you're not cleared to be in this compartment....top secret etc etc etc"

      "Go boil you head... I'm the idiot who built the rig, and installed it in here 10 minutes ago"

    4. Anonymous Coward
      Anonymous Coward

      Re: Payroll, not automation...

      I knew what was coming when you mentioned the shielding. I went to a college (itself built on the site of a former naval training centre) sat on the side of a tidal creek, the art building had direct line of site down the creek and across the harbour to North-West Wall of HM Dockyard Portsmouth, which was where the US ships usually berthed.

      When the fire alarm was updated, it was attempted to link this to the main building via a radio link. It kept failing at odd intervals. The supplier said that the boards looked like there had been a lightning strike nearby, but these failed during dry weather. Eventually someone spotted the failures coincided with American ships visiting, and a call was made to the dockyard asking advice.

      Apparently a call was received back from the staff of Flag Officer Portsmouth neither confirming nor denying any specific details around dates, but agreed the symptoms were consistent with those that would be experienced in the vicinity of a marine attack radar test. It was suggested that some substantial RF filters centred on X band would be beneficial in mitigating any issued that might be experienced under those circumstances. I think we had already come to the same conclusion, filters were installed, and there were no repeat failures until a real lightning storm took the link out a couple of years later.

    5. jake Silver badge

      Re: Payroll, not automation...

      "I was draughted onto the project at the last minute"

      Now you're just putting the wind up.

  5. This post has been deleted by its author

  6. Pascal Monett Silver badge
    FAIL

    "execute his target script 16384 times"

    Sounds like a beginner who couldn't be arsed to find out exactly how many loops were necessary so go for 16K, it'll surely be enough.

    Not to mention that auto-appending to the cron job sounds like it should be anathema to me. Not a Linux admin (or an admin of any other OS), but I'm convinced that a job is supposed to be put in the cron list by an actual human who knows what he's doing, not as an auto-insert by some coder who looks like he's half-assing his way to the next paycheck.

    1. Anonymous Anti-ANC South African Coward Bronze badge

      Re: "execute his target script 16384 times"

      Execute order 66.

    2. cookieMonster Silver badge

      Re: "execute his target script 16384 times"

      auto-appending to the cron job = a terrible accident in a stairwell || a fall out a faulty window, very high up a building

    3. MrBanana

      Re: "execute his target script 16384 times"

      Looping is important. I've seen this to find the next unique identifier in a table:

      start:

      ID = 1;

      INSERT ID INTO TABLE ...

      IF (ID EXISTS)

      ID=ID+1;

      GOTO start:

      ELSE

      UPDATE ... WHERE table.ID = ID;

      END IF

      Clunky enough, but used as above without bothering to save the last inserted ID and always beginning at start:, after a few hundred thousands of update operations, the exponential impact was horrendous.

      1. MJB7
        Headmaster

        Re: "execute his target script 16384 times"

        Not exponential: quadratic.

        Quadratic is nasty - it won't bite in testing (like exponential usually will), but it bites with a vengeance in production!

        Yes, I am a pedant. Why do you ask?

      2. Joseba4242

        Re: "execute his target script 16384 times"

        "Exponential" does not mean "big".

        The impact in this case is that running time is quadratic instead of linear.

        1. This post has been deleted by its author

      3. ilmari

        Re: "execute his target script 16384 times"

        I think you want to put the start label one line further down?

        1. Allan George Dyer
          Coat

          Re: "execute his target script 16384 times"

          I think they might know that... now.

    4. l8gravely

      Re: "execute his target script 16384 times"

      I bet he assumed that cron was just a fire off once type batch system, and that old jobs would be never run again. He didn't know/realize it was a scheduled job, not a batch job.

      1. doublelayer Silver badge

        Re: "execute his target script 16384 times"

        It reminded me of XKCD 1678, with similar effects.

  7. chivo243 Silver badge
    Go

    When Automations Collide

    Back up takes much longer than on Day 1, instead of running from Midnight to 2am, it now runs from Midnight and finishes when it's ready, 4 or 5:30. Group Policy for WindowsUpdate set to pop at 4am. Random Virus scan decides to kick in at 4:30am. Cue MaxHeadroom!

  8. Fabrizio
    Devil

    Root cause found and eliminated

    rounding out his tale with news that the chap who wrote the script and had his cron privileges revoked.

    Excellent RCA and solution!

  9. Jou (Mxyzptlk) Silver badge

    16384 times is no problem

    The problem is missing throttling. Even my powershell scripts, at least those which need it, watch CPU usage and RAM usage to avoid running into a race condition.

    The solution would be hacking the script. The most simple would be if ([int]($counter/8)*8 -eq $counter) {sleep 5} add a five or ten second delay. Enough to keep the system alive, but will finish the job within three or six hours. Another variant: Only start jobs when the second digit of the local time is "5", but that is less controlled.

    If you are evil you insert a 30 second delay or only execute when the seconds are "30", so it will be noticed since it takes too long. And don't forget to NOT document your change, you need a chance to be the hero to fix the fix :D, and blame it on "race contition".

  10. Mayday
    Terminator

    Automation is great, however

    Shit in (ie the meatbag input) = shit out.

  11. RJX

    Had automation delete the entire Accounting department. Twice. In two days. And...

    Reviewing my overnight security alerts before heading into work showed that every account in the Accounting Department was deleted at 7:15 AM. Odd. I went into work and asked my team members what happened. Blank stares as they scrambled to check the alerts they ignored. The security manager chuckled when he came in and said "Apparently we don't need an Accounting department."

    The sysadmins, being sysadmins, recovered the accounts from the AD Dumpster, pronounced it a "glitch" that had never happened before so it would not happen again and they would do nothing else to investigate.

    Yup, the next morning at 7:15 AM the entire Accounting Department got deleted again.

    Now everyone is taking it seriously. Now.

    At a meeting that afternoon they said the mass deletion was caused by a script they wrote to sync Oracle HR with AD, a script that ran at 7:15 AM. The script erroneously assumed that if a department did not have a manager then the department did not exist. So rather than just disabling the accounts it deleted them.

    Turned out the Accounting Manager had gone on medical leave so he was removed as the department manager, leaving that position blank in Oracle HR.

    Everyone in the meeting thought it was good they they had found the problem and prepared to leave because it really was no big deal, right?

    Until I said "Wow, it's a good thing the CEO didn't go on medical leave, right? That script might have deleted every account in the company!"

    Heads turned, faces got a shocked look, and that's when the sysadmin manager admitted that was exactly what would have happened.

    ---------------

    Then they did a similar thing years later. They wrote a script to delete ex-employee Exchange mailboxes 90 days after the employees left the company. It worked great, until it didn't.

    We were in a legal battle and there was a court order to preserve certain mailboxes for legal discovery. Some of the mailboxes were for departed senior managers. Their script dutifully deleted their mailboxes after 90 days because everyone forgot about it. Then the archive purge killed the backups some months later.

    Then we were ordered to produce those mailboxes and Whoops! Their little automation script caused a massive legal problem with the court, you know, destruction of evidence...

    Finally, after that disaster, senior managers ordered the sysadmins to put EVERY script they used into the Source Code system where they needed to check scripts out to use them or to modify them. The sysadmins whined and they got told that all code run on production systems was programming and now need explicit manager approval via a Change Control ticket, so they made it worse by complaining.

  12. Plest Silver badge
    Pint

    Been doing it for decades

    I got interested in automated software when I was about 10 years old back in the 1980s playing on the family Dragon32 micro. I realised one day that I couldn't just accept that you had to run something once, wait, react and then proceed to the next thing manually. Around 1987 my Dad bought the first family PC and I learned how to code TSRs in assembly ( for you whipper-snappers that's Terminate-and-Stay-Resident ) it's like forcing DOS to run pseudo-multithreaded code, it's not actually but it's simplest way to explain it. You could put software in the background and call it to do things when you needed stuff doing, like a housekeeper on a PC! Wow! Then we got OS/2 with multi-threading and later on Windows that could multi-task.

    That's been my entire IT career, coding automation systems and it's incredible how many people think they can get it but don't even think of the most obvious things. Coding automation systems takes a lot of empathy and a shit load of trust in yourself!! You won't be there when X runs and X could run this, that, trigger 17 other things to happen none of which will ever be seen and it has to be constantly aware of all the resources it will consume while no one is watching it. While you're coding automation you also have to code in housekeeping routines that will always run alongside, they'll track what you're doing, report if they can fix it themselves. What happens when the alerting system is not there? How will it scream and shout for help if no one can see it or hear it screaming? How can you restart it? Anyone who loves automation and has worked it for years knows all the most stupid things you can do like the original story above, over-zealous logging has bit us all in the arse at some point!

    It why I love coding automation systems, it's basically like having your young kids at work, you need to have faith you taught them well enough and you have to just trust they'll be OK and won't do something stupid!!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like