back to article OK brainiacs, we've got an IT cold case for you: Fatal disk errors on an Amiga 4000 with 600MB external SCSI unless the clock app is... just so

Welcome to an unusual entry in The Register's On Call, where an Amiga mystery is never fully explained after the call for help is issued. Can you solve the mystery? Today's tale takes us back nearly a quarter century to a small development house, working on programs for the Commodore Amiga in the wake of the former home …

  1. Dan 55 Silver badge
    Holmes

    I'm not sure why it happened

    But was the boss called Denise and was Agnus particularly overweight?

    1. Giles C Silver badge

      Re: I'm not sure why it happened

      It could have been Alice instead of Angus and the boss would then be Lisa

      1. BebopWeBop

        Re: I'm not sure why it happened

        Bob was under the table?

      2. Wally Dug

        Re: I'm not sure why it happened

        No, Dan was right - it was definitely Agnus and when she put on weight she became Fat Agnus. But did that stop her? Oh no - she became Fatter Agnus after that.

        1. Anonymous Coward
          Anonymous Coward

          Re: I'm not sure why it happened

          "...she became Fatter Agnus after that."

          The more the merrier!

          (For those not getting these jokes, take a look at https://en.wikipedia.org/wiki/MOS_Technology_Agnus.)

          1. Wzrd1 Silver badge

            Re: I'm not sure why it happened

            (For those not getting these jokes, take a look at https://en.wikipedia.org/wiki/MOS_Technology_Agnus.)

            Yes, but Agnus had her resident awful replaced with Alice.

            I'll just get my coat...

  2. JimboSmith Silver badge

    Sounds like HEX not working if you move the FTB* to me. Are we sure Paula isn't really called Ponder?

    *Fluffy Teddy Bear

    1. Michael H.F. Wilkinson Silver badge

      +++ Out of cheese error +++

      +++ Reinstall Universe +++

      +++ Redo from Start +++

      1. JimboSmith Silver badge

        +++ Mine! Waah! +++

      2. Anonymous Coward
        Anonymous Coward

        I think we need to block the road from Start before Redo can get here.

  3. Kevin Fairhurst

    That reminds me of...

    I had an Amiga 500 with a trap-door memory card upgrade to bring it up to a whopping 1MB of RAM... I took the machine apart to cut a trace on the motherboard & solder two other pads together to enable this to be seen as "chip" memory rather than "fast" memory, and after putting it all back together was surprised that it seemed to be working! (I must have been 14yo and had my parents known I'd done something like this I'm sure I would have been in big trouble!)

    After a short time I started experiencing random crashes. I couldn't pin it down exactly but they became more frequent when my dad was doing his chair aerobics in the next room (which was the style at the time), or if the washing machine was on. I simply couldn't understand why the noise of something like this could cause the crashes...

    And then I realised that the memory card wasn't seated properly. You needed to push it in with so much force you thought you were going to snap it, and then a little bit harder than that, for it to really push home. Although it seemed to work without being fully pushed home, the vibrations coming through the floor were just enough to cause the card to disconnect at some point, causing the crash. Thankfully it wasn't as a result of my cack-handed attempts at soldering ;)

  4. Anonymous Coward
    Anonymous Coward

    The real mystery is how Paula discovered the clock work around ...

    ... given how sensitive it was to any variation.

    Also, it's 'Angus', not 'Agnus', isn't it?

    1. Graham Dawson Silver badge

      Re: The real mystery is how Paula discovered the clock work around ...

      No, Agnus. The OCS chips were all female.

      1. Degenerate Scumbag

        Re: The real mystery is how Paula discovered the clock work around ...

        Yep, just ask my aunt Gary.

        1. Alan W. Rateliff, II

          Re: The real mystery is how Paula discovered the clock work around ...

          Not certain Gary is considered part of the "custom chipset." You have the trinity of Paula, Agnus, and Denise.

        2. ThomH

          Re: The real mystery is how Paula discovered the clock work around ...

          Gary wasn't in the Amiga 1000, so I guess it depends how strict you want to be with the 'O' in OCS.

      2. Anonymous Coward
        Anonymous Coward

        Re: The real mystery is how Paula discovered the clock work around ...

        And "agnus" is the Latin for a lamb, whereas we're not talking about Angus and sheep.

        1. Graham Dawson Silver badge

          Re: The real mystery is how Paula discovered the clock work around ...

          Might as well be blitted for a sheep as for a lamb...

      3. elitejedimaster

        Re: The real mystery is how Paula discovered the clock work around ...

        Colonel Agnus?

        I’ll see myself out...

      4. Anonymous Coward
        Anonymous Coward

        Re: The real mystery is how Paula discovered the clock work around ...

        Belatedly I have to point out that the gender of agnus is masculine. I think someone couldn't spell Agnes.

    2. oiseau
      Happy

      Re: The real mystery is how Paula discovered the clock work around ...

      Also, it's 'Angus', not 'Agnus', isn't it?

      I thought so too but ...

      ElReg is always usually right.

      It's Agnus.

      Have a Happy Easter. =-)

      O.

      1. Frederic Bloggs
        Coat

        Re: The real mystery is how Paula discovered the clock work around ...

        And there was me thinking that the surname might be "Dei"...

        Yes, yes, I'm out the door already.

        1. Paul Herber Silver badge

          Re: The real mystery is how Paula discovered the clock work around ...

          Bah!

        2. BebopWeBop
          Coat

          Re: The real mystery is how Paula discovered the clock work around ...

          A good friday to be out (or so I am told).

        3. Doctor Syntax Silver badge

          Re: The real mystery is how Paula discovered the clock work around ...

          And suitably or subtly - seasonal.

        4. Hubert Cumberdale

          Re: The real mystery is how Paula discovered the clock work around ...

          For reasons obscure I was once temporarily the conductor of a small church choir (a semiconductor, if you will). On one occasion, one of the less experienced choristers misread "Agnus Dei" as "Angus Dei". I pointed out that would make it the Beef of God rather than the Lamb...

          1. TSM

            Re: The real mystery is how Paula discovered the clock work around ...

            We had a (recent) song called Agnus Dei on our church roster for a while. Except for some reason when our pastor at the time downloaded the PDF from SongSelect, it didn't put the title on like it normally does. So he helpfully wrote it in as Angus Dei for us, which annoyed me every time I saw it. Fortunately it's not in our current rotation any more :)

          2. David 132 Silver badge
            Happy

            Re: The real mystery is how Paula discovered the clock work around ...

            Replying reeeeeeeeeeeally late to this thread (because it was referenced from a July 2022 "On Call" story). Greetings to you all from the strange and futuristic world of 2022. But I digress.

            Your Agnus Dei story reminds me of my favourite classical music anecdote.

            There is, as some fules kno, a choral piece by the 16th-Century English composer Thomas Tallis named "Spem in Alium", which is Latin for "Hope in Any Other". It's a beautiful piece, and frequently performed by choral societies.

            ...Who, distressingly often, mis-spell it as "Spem in Allium".

            Which would translate as "Hope in the Onion"! So if you see any confused musicians wandering round worshipping shallots...

        5. Boork!

          Re: The real mystery is how Paula discovered the clock work around ...

          I'm not sure what Dei has to do with it, since the problem appears to be due to an Agnus tic.

          1. Anonymous Coward
            Thumb Up

            Re: The real mystery is how Paula discovered the clock work around ...

            That was brilliant, but may not be fully appreciated here.

    3. Anonymous Coward
      Holmes

      Re: The real mystery is how Paula discovered the clock work around ...

      Putting on my user hat (beret in this case) I suspect that Agnus always had the clock running there until the day she didn't and the drive stopped working. She noticed that the only thing different was the missing clock and using her user logic figured the scuzzy thingie needed it for some reason.

      Putting on my programmer hat (snap brim fedora) while I never worked with Amiga I have worked with SCSI and finicky doesn't even begin to describe them.

      1. Anonymous Coward
        Anonymous Coward

        Re: The real mystery is how Paula discovered the clock work around ...

        "I have worked with SCSI and finicky doesn't even begin to describe them."

        In my experience, ensuring a SCSI bus was correctly terminated addressed many finicky issues. The real surprise was the number of SCSI environments that worked while incorrectly terminated.

        1. J. Cook Silver badge
          Go

          Re: The real mystery is how Paula discovered the clock work around ...

          ... and a number of them would break when the termination error got fixed by someone trying to be helpful. :)

          Based on what was described in the article, I'm going to suppose it was a very, very specific timing issue with the various I/O buses and the SCSI card.

          SCSI was and still is considered voodoo if you are putting gear from different vendors on the same bus. Thankfully, we are largely past that in this modern day and age.

          1. A-nonCoward
            Coat

            Re: The real mystery is how Paula discovered the clock work around ...

            different vendors on the same bus

            reminds me of a lawyer's joke. Probably works also with vendors. What do you call a bus full of vendors falling off a bridge?

            1. Dvon of Edzore

              Re: The real mystery is how Paula discovered the clock work around ...

              A HAZMAT incident.

            2. Hero Protagonist
              Trollface

              Re: The real mystery is how Paula discovered the clock work around ...

              A good start?

            3. David McCoy

              Re: The real mystery is how Paula discovered the clock work around ...

              Simon's Revenge

          2. Dan 55 Silver badge
            Holmes

            Re: The real mystery is how Paula discovered the clock work around ...

            Based on what was described in the article, I'm going to suppose it was a very, very specific timing issue with the various I/O buses and the SCSI card.

            I think you might be on to something, especially with a Rev 9 A4000 which had a broken Buster chip which stopped many Zorro III external SCSI boards working, an old A2000 Zorro II external SCSI board which at least did work, and a software patch to improve transfer rates with old Zorro II boards.

            GVP A4008 SCSI Controller

            What should have been a reasonably simple case of just going out and buying a SCSI controller for my A4000 turned out to be a rather more complicated process than I had originally forseen.

            The only two SCSI cards I was aware of for the Amiga 4000 were Commodore's (now DKB's) A4091 and the Fastlane Z3. I'd read good reviews of the Fastlane board, and knew it was already in the shops, however prices were in the range of US$599 or STG#399 --- slightly more than I was prepared to spend just to get a CDROM attached! The A4091 board seemed to be in very short supply, but was significantly cheaper. Both of these cards were Zorro-III (A3000/A4000 only) SCSI-II controller cards.

            Things started to get tricky when I discovered that early models of the A4000 were shipped with a broken Buster chip which prevented many Zorro-III boards from working correctly. The revision which had this problem was `Rev 9', and sure enough this was in my machine. To make matters worse, my Buster was surface mounted, so despite the fact that Commodore were aware of this problem, and were distributing new Busters with the A4091 card, my A4000 was too old and didn't have a socketed chip that could be easily replaced. The Fastlane card, on the other hand, was smart. It knew about the broken Busters and had a work-around to compensate. Performance wouldn't be quite as good as with a fully functional Buster, but the card would still work well.

            All of this meant that the A4091 just wasn't an option for me. The Fastlane Z3 card would work perfectly, but it was too expensive. I'd have to look for a Zorro-II SCSI-I card for the A2000 which would hopefully still work in an A4000.

            The problem with old Zorro-II cards for the A2000 is that the Amiga 4000's 32-bit RAM is outside of the 24-bit DMA-able address space which these controller cards can see, so data can't be transferred directly from the SCSI device into main memory. This means the CPU ends up dealing with requests and individually copying a few bytes at a time to and from 32-bit memory. The transfer rates are abysmal.

            One fix for this is a Shareware program by Barry McConnell called DMAfix which patches some DOS library calls to do the CPU copies with larger buffers. This improves performance significantly. It works fine with cards like the A2091, but this whole scenario seemed quite unappealing to me.

            I can imagine that such a software fix could be quite timing-sensitive and crashy.

            1. Anonymous Coward
              Anonymous Coward

              Re: The real mystery is how Paula discovered the clock work around ...

              Well the Super Buster chip looks like a potential guilty party:

              Buster

              The Rev 9 Buster has a few flaws. The primary flaw, and the main reason the part was revised, is that the Zorro-3 bus arbiter can jam under the right conditions. Some DMA cards, like FastLane Z3, use a workaround for this (they avoid the lockup condition), others don't, and will lock up when used with a Rev 9 Buster. There is also a potential problem with end-of-cycle synchronization in the Rev 9 part. Some Zorro-3 cards will demonstrate this problem, some won't. This is made worse by the STERM* sampling problem on the Rev 3.0 A3640. A final problem with Rev 9 Buster was introduced by the A4000 architecture. The integrated bus buffer, Bridgette, used in the A4000 can't quite guarantee the propagation times required by the Rev 9 Buster design (done before Bridgette was proposed). In the typical case it works fine, in the worst case some Zorro-3 cards will have a problem with this condition.

              We have:

              Rev 9: bus lockups during DMA

              Rev 11: resolves timing issues for single bus master for A4000's but requires 25MHz CPU for A3000's

              So that leaves the question of why the clock would potentally resolve the issue - maybe a read of the hardware clock followed by the delay in writing X bytes to certain poisitions in VRAM paused the DMA long enough to avoid bus issues? Or the clock accesses the bus and handles acquirin/releasing the bus cleanly ad at regular intervals so as to address the limits wit Super Buster?

              1. Dan 55 Silver badge
                Holmes

                Re: The real mystery is how Paula discovered the clock work around ...

                Right at the end of that page you've got this:

                A final Zorro-3 problem exists on some cards, including the A4091s from Commodore, though not necessarily DKB (eg, I don't know). Originally, there were a couple of ways for a Zorro-3 card to terminate a bus cycle. It could give the bus back during its last cycle or after its last cycle. This former mechanism can cause some problems, including bus lockups, when multiple masters are present. So I only recommend the latter mechanism -- the card runs its last cycle, then unregisters the bus. This takes longer, but it's safe. This is only an issue when multiple bus mastering Zorro cards are working together.

                The A4091 is Commodore's official external SCSI expansion card and even with a Rev 11 Buster there are problems with it.

                So if you wanted an external SCSI on your A4000 it seems your options are this:

                - Rev 9 irreplaceable Buster: Expensive Fastlane card out of many people's price range.

                - Rev 9 irreplaceable Buster: Zorro II card and a shareware software patch (problems?).

                - Rev 9 irreplaceable Buster: GVP A4008 which is a reworked Zorro II card with a built-in software patch (we assume it is reliable).

                - Rev 9 replacable Buster: Official A4091 card supplied with Buster chip update to Rev 11 (still scope for problems as mentioned above).

                - Rev 11 Buster: Official A4091 card (still scope for problems as mentioned above).

                - Rev 11 Buster: GVP A4008 as mentioned above (we assume it is reliable).

                - Rev 11 Buster: Cheaper Zorro II card plus a software patch (solution may have problems?)

                - Rev 11 Buster: Other Zorro III SCSI cards (reliability unknown but let's assume they are reliable unless it's a first revision).

                So the chances are that the A4000 in this story had a solution which wasn't reliable and sticking the clock in the corner altered chipset DMA timing or the CPU usage of a software fix so it worked.

                1. knightperson

                  Re: The real mystery is how Paula discovered the clock work around ...

                  Or the option I tried back in the day: find a local shop that can do surface-mount soldering to replace the buggy Buster chip with a better one. Sadly, what I actually found was a local shop who only CLAIMED they knew how to surface-mount. Brought the machine (Amiga 4000 desktop) back home, and it almost booted once, but never got any farther than that. A friend of mine, far better with a soldering iron than I am, looked at it for all of a second and said "that solder job looks like crap. I could have done better, and I don't even know what I'm doing!" A few unsuccessful negotiations with the store about defending their work later, and I stopped payment on the check. Immediately after that, of course, they were actually able to get ahold of the people who were unavailable when I walked into the shop, accused me of civil and criminal fraud, sent the job to a collection agency, sent several letters threatening small claims or even criminal court (but only threatening rather than acting, as I made it clear I was perfectly willing to defend my conduct before a judge and they never filed anything), and that "resolution pending" stayed on my credit report for the next seven years. I shipped the machine out to another business out-of-state that I had been avoiding because of a few bad reviews, but it came back from them fully functional for less cost than what I had briefly paid the local shop.

                  I learned many years later that the technician, who I never actually met, was of the opinion that this was beyond his skill and tools and he wasn't comfortable even attempting a surface-mount job with a soldering iron (done properly, it's done with a hot air gun) but the boss, who was the source of most of the threats and arguments against me, ordered him to do it anyway.

            2. Anonymous Coward
              Anonymous Coward

              Re: The real mystery is how Paula discovered the clock work around ...

              The thing with the GVP board mentioned by Dan 55 is that, counter to the 'DMAFix' tool being mentioned, the 'DMA fix' was already designed into the GVP driver back in 1990 (for any RAM targets located outside the lowest 16MB address range). Any mucking with the DMA mask actually hurt performance even more because the filesystem was being tasked to do the buffering, not the driver. DMAFix was needed for the A2091 and the Microbotics HardFrame DMA controllers. However, DMA Mask - it was a C= hack. Anything trying to use SCSI_Direct and transfer via DMA ignores filesystems, and therefore ignores the DMA mask.

              The ideal config with a GVP Series II (or the 4008 as it was also known as), was to put 2MB of 16-bit RAM on the card. Z2 DMA would go to that board's 24-bit FastRAM (in a 16K buffer the driver allocated), and the CPU would then move it where it needed to go. DMA transfers to that 16-bit RAM were hidden, and the copy out by a 68030/68040/68060 was about as efficient as one can get.

              The second little 'bug' in Buster was the Zorro II DMA into ChipRAM glitch that could hang the bus depending on the CPU card present in the system. This was another thing that Buster 11 would fix. The GVP driver would also not DMA into ChipRAM for this reason (but the other popular Z2 boards would). The GVP would dropp to PIO (very slow) if no other choice. The later GuruROM (3rd party) v6 has this same behavior, but has an override option in one of it's tools to let DMA to ChipRAM happen if the bug was not encountered (and to test).

              I suspect the wacky story at the top of this (if remotely true) was anchored somewhere in the Buster / Zorro DMA murk.

          3. Wzrd1 Silver badge

            Re: The real mystery is how Paula discovered the clock work around ...

            SCSI was and still is considered voodoo if you are putting gear from different vendors on the same bus. Thankfully, we are largely past that in this modern day and age.

            Do you mean like times when something like a tape drive or CD drive would take off to the pub for a pint, holding down the entire SCSI bus until they returned and the damnable machine would finally boot?

            Never saw anything at all like that - I kept my eyes closed and counted at the computer.

            1. phuzz Silver badge

              Re: The real mystery is how Paula discovered the clock work around ...

              holding down the entire SCSI bus until they returned and the damnable machine would finally boot?

              This still happens today. Last weekend my computer started hanging for minutes at a time, and after a bit of troubleshooting, I narrowed it down to any attempt to read or write to a particular SSD*.

              So, a few days later, replacement SSD in hand, I power off my machine to install it (it had been powered off several times in between). It was at this point that the bad SSD decided to fail completely, and the machine refused to boot until I'd removed it.

              I suspect that if I'd been more patient it would have eventually booted after timing out.

              * I'm using StorageSpaces with tiering on Windows 10, which is completely unsupported, and I only have myself to blame

              1. Jou (Mxyzptlk) Silver badge

                Re: The real mystery is how Paula discovered the clock work around ...

                A bit late to comment: But it works so well!

          4. hmv

            Re: The real mystery is how Paula discovered the clock work around ...

            "Thankfully, we are largely past that in this modern day and age."

            To fill some of us old-timers with horror and loathing, it is worth pointing out that your USB storage is just the SCSI command set with a fancy new paint job. It also lives on in SAS.

        2. Brad Ackerman

          Re: The real mystery is how Paula discovered the clock work around ...

          You get the strangest looks from procurement when you ask for a dozen black goats, a silver knife, and a Dho-Nha summoning grid.

          1. David 132 Silver badge

            Re: The real mystery is how Paula discovered the clock work around ...

            Well yes. And a comment along the lines of "Those are reserved for purchase by the Legal dept."

      2. Anonymous Coward
        Anonymous Coward

        Re: The real mystery is how Paula discovered the clock work around ...

        "Putting on my user hat (beret in this case) I suspect that Agnus always had the clock running there until the day she didn't and the drive stopped working. She noticed that the only thing different was the missing clock and using her user logic figured the scuzzy thingie needed it for some reason.

        Nope. But Paula might have.

      3. juice

        Re: The real mystery is how Paula discovered the clock work around ...

        > Putting on my programmer hat (snap brim fedora) while I never worked with Amiga I have worked with SCSI and finicky doesn't even begin to describe them.

        Way back when, I was fortunate enough to be working somewhere which was having a wholescale purge of obsolete hardware. Mostly generic beige PCs, but there were a few pieces of more obscure and/or esoteric bits of kit being dumped into the corridor outside the office I worked in.

        Including a number of bits of SCSI gear. Notable bits I scavenged from this were:

        1) An internal CD drive, which used a caddy for it's disks

        2) An external CD writer, which wasn't far off the size of the PC controlling it. A whopping 2x recording speed IIRC, too...

        3) A SCSI hard drive. But not just any hard drive; this was a Full Height 5.25" beast of a drive - basically the same as duct-taping two CD drives together!

        4) An ISA SCSI card. Which was handy, as otherwise, the rest of the haul would have been little more than paperweights!

        Surprisingly, this motley collection of hand-me-downs mostly[*] worked, though I did end up having to ring a US phone number to get some tech support, as initially, the internal CD drive would only copy files if you held the space-bar down.

        Thankfully, the slightly bemused voice at the other end of the phone was able to diagnose the issue as being an IRQ conflict. And since this Ancient Technology was built long before Plug and Play became a thing, said conflict was resolved (with the aid of a pair of tweezers) by manually shifting a couple of jumpers on the ISA SCSI card.

        Those were the days. For a given value of "those"...

        [*] I can recall having some successes with the CD writer, but these quickly tailed off and this bit of hardware got relegated to the role of an empty mug corral...

  5. jonathan keith

    Witchcraft

    obviously. Or 'technomancy' if you want to be all modern about it.

    1. Uncle Slacky Silver badge

      Re: Witchcraft

      Should have waved a dead chicken over it:

      http://www.catb.org/jargon/html/W/wave-a-dead-chicken.html

      Edit: On reflection, maybe a rain dance instead:

      http://www.catb.org/jargon/html/R/rain-dance.html

      1. robert lindsay
        Stop

        Re: Witchcraft

        that only worked on SGI boxes

        1. jake Silver badge

          Re: Witchcraft

          We were doing it with DEC boxen before SGI existed.

  6. Jonathan Richards 1
    Coat

    Memory-mapped video?

    I'm going so far out on a limb that you'll mistake me for a leaf... did the Amiga 4000 have memory-mapped video? I'm thinking of a mistargetted JSR in the disk handler that leaps into video memory and crashes, unless the pixels at a point in the clock app window give you a nice clean RTS. Or something. I'll get my coat...

    1. Anonymous Coward
      Anonymous Coward

      Re: Memory-mapped video?

      Wasn't pretty much everything of that era memory-mapped I/O?

      Never used an Amiga myself, but I was in my 20's when they came out, and they were impressive for their time.

      According to Wikipedia...

      "The Amiga 4000 system design was generally similar to the A3000's, but introduced the Advanced Graphics Architecture (AGA) chipset with enhanced graphics. The SCSI system from previous Amigas was replaced by the lower-cost Parallel ATA."

      https://en.wikipedia.org/wiki/Amiga_4000

      1. JohnGrantNineTiles

        Re: Memory-mapped video?

        "Wasn't pretty much everything of that era memory-mapped I/O?"

        As I recall it, the 680xx didn't have a separate I/O space, so yes.

        1. Anonymous Coward
          Anonymous Coward

          Re: Memory-mapped video?

          This reminds me of a design I once saw using the 68000 series (i.e. in the days before "proper" memory management) which went to enormous lengths to ensure that a random software fault could not accidentally exercise certain mission critical I/O. It was obviously much easier if potentially slower to reduce this risk if you were using a processor with a separate I/O subsystem.

          It is interesting how Moore's Law and software development creates new and trickier challenges in every generation. We didn't have to worry about evil actors getting into our browsers, we had to worry about actually configuring Ethernet cards to the point that they connected to something.

      2. Doctor Syntax Silver badge

        Re: Memory-mapped video?

        AFAICR the entire Motorola line, from the 6800 upwards was memory mapped. It was Intel, and derivatives such as Z80, and upwards that had, and still have, a separate peripheral map.

    2. Dan 55 Silver badge

      Re: Memory-mapped video?

      Yes, but the chipset could only access some of the memory, so that area was called chip RAM (as opposed to fast RAM). Only memory accessible by the chipset could be used for video memory and DMA, it was also slower due to memory contention.

  7. Totally not a Cylon
    WTF?

    Reminds me of an ME upgrade.

    Many, many years ago I was upgrading some no-name clones to Win ME.....

    Had to turn off processor cache and keep my hand on the mouse, not moving it just resting on it. Take hand off mouse and instant installer crash.....

    For 4 hours!!!!!!, pentiums get slow when you turn off the internal cache...

    But it worked, system ran fine afterwards with all caches on.

    1. Blackjack Silver badge

      Re: Reminds me of an ME upgrade.

      And then two weeks later you had to "downgrade" then to Windows 98 SE or "upgrade" them to Windows 2000?

  8. BebopWeBop

    I have to admit that it one of the strangest, repeatable faults I have heard of.

  9. John H Woods Silver badge

    My favourite timing bug

    About 15 years ago, a screenscraper returning empty handed except when debugging was on. Then it was fine. Lowering the debug verbosity made it fail again.

    Dev had accidentally coded the timeout to wait for the Mainframe to 0 milleseconds. One line of java logging, at sufficient verbosity, between the request and the attempt to read the response took a couple of msec to execute - and by then the MF had responded.

    1. Tom 7

      Re: My favourite timing bug

      That's one of coding's laws - the code will run fine in the debugger.

      1. Anonymous Coward
        Anonymous Coward

        Re: My favourite timing bug

        I once had to spend nearly a year debugging and fixing a prototype military system which had been delivered "tested" by a company which promptly shut down. Naturally my boss was rather perturbed that the three of us were spending so long on a "simple" job.

        I was able to show that the system could not possibly have worked with the debug code in (as delivered) because it was a real time system and the debug code increased execution times to the point at which code couldn't possibly fit into the available timing. Essentially each bit of code exercising each individual bit of hardware - unit tests - worked fine in the debugger so long as you didn't notice that something that needed to happen every 10 milliseconds actually took 20 ms to execute. Put it all together and debugging was impossible, and when the debug code was taken out many things still took too long- as could only be found by scope probing every single external signal.

        It was the best education I ever had.

      2. MrNigel

        Re: My favourite timing bug

        Reminds me of when I used to fault find TXE4 telephone exchanges in the early 80's using a 4 channel Tektronix oscilloscope. The final part of acceptance testing with the PO was a call load test. You would program a run of say 50,000 calls (depending on the size of the exchange) and you were allowed a very small failure rate. The tester used to print out the routing info for the failed calls (in BUMCLK or was it MUKBUL format?) and I got pretty good at finding a link between them.

        One such fault was down to a batch of cards in the SPU (or was it the B-switch?) that had a transistor with a specific YY/MM manufacture date. It was flip-flopping 'too slowly' which I proved by having two traces side by side with a good/bad card. Out with the soldering iron, replace it, then fill in a Form 308 and claim the time back from the STC factory in New Southgate. Millennials have no idea what the term "Job Satisfaction" really means....

    2. Robert Sneddon

      Re: My favourite timing bug

      Hardware -- Long time back I was building a multiprocessor system based on Transputers. It worked perfectly if I connected a logic analyser to the memory module's address and data buses. Disconnect the analyser probes and the software test suite glitched. After scratching my head for a couple of days on this very repeatable issue I figured out the very-high impedance and negligible capacitance of the logic analyser probes was juuuuust enough to shift the edges of the data signals into spec (and probably the address lines too but I didn't prove that). Some extra Vcc decoupling caps here and there and 1megaohm pull-down resistors on the memory device lines and everything was golden, plus some red-pen changes to the data sheets.

      1. JohnGrantNineTiles

        Re: My favourite timing bug

        It used to be quite usual to find a 20pF capacitor on a signal, and you knew that was one that in testing only worked when there was a 'scope probe on it.

        1. Bob Carter

          Re: My favourite timing bug

          I admit it, distinctly remember soldering a scope probe to the back of a test fixture to get it to work so that I could go home...

        2. Anonymous Coward
          Anonymous Coward

          Re: My favourite timing bug

          It used to be quite usual to find a 20pF capacitor on a signal...

          I should have tried that*. My first big computer (OSI 8DFP?, dual 8" floppies) stopped booting. It lost the -9V power to the RAM chips. Probing the LM723 power regulator chips pins fixed it for several months. And several times. I replaced the LM723 and re-soldered some of the other close by parts. But it would fail again later.

          * It would have only worked if I had remembered which pin I had just touched. I was never quite sure which pin it was. And once it started working, it would work for a long time.

          1. whitepines
            Boffin

            Re: My favourite timing bug

            Probing the LM723 power regulator chips pins fixed it for several months.

            My best guess is that you found a cracked solder joint, probably in some feedback circuit where current was almost non-existent. Physically touching it re-made the connection enough for it to work, but it would have worked loose again over time.

            You could have used a non-conductive stick to do the same thing and it would have "fixed" it just as well. Maddening failure mode, I've had a few in my career. Easy to keep in mind that with normal circuits if it's a signal margin error, removing the probe always causes the issue to reappear almost immediately.

        3. razorfishsl

          Re: My favourite timing bug

          We worked on an electronic design,... ordered all the components

          Enough for 50k pieces, started the production run & everything off the line would not work...

          Took it to the engineers room got a scope on them& they worked perfectly....

          The design was by a major semiconductor company.....

          They brought in their engineers & no one could fix it....

          In the end we told them to F*** off & shipped all the components back to them...

      2. scarper

        Re: My favourite timing bug

        Back in the 70's, I helped write a signal processing system on a little-known "high speed" (10 MHz) box. When our big pile of Assembler seemed to be bug-free, we removed the debugger, and the drum code promptly malfunctioned. (For the young: a drum is a head-per-track disk.) It turned out that commands to the drum controller were not interlocked: if you issued a command "too soon" after the previous command, bad things happened. And of course removing the debugger made the software faster.

        1. jake Silver badge

          Re: My favourite timing bug

          A 10MHz CPU in the '70s? That would have been a rare beast indeed!

          1. Anonymous Coward
            Anonymous Coward

            Re: My favourite timing bug

            ecl.

          2. Anonymous Coward
            Anonymous Coward

            Re: My favourite timing bug

            Possibly an Amdahl mainframe?

            1. MrNigel

              Re: My favourite timing bug

              When I worked at AT&T Philips in the mid-80's we had the second biggest Amdahl in the UK. The biggest was with the Royal Navy. Taxiiiiii!

              1. Peter Gathercole Silver badge

                Re: My favourite timing bug

                Malmesbury.

                Interesting thing was we were running MDF, the Multiple Domain Facility, and one of the "domains" (read VMs for the younger readers here) was a full blown emulation of a 5EE3 telephone exchange!

                Even though it was a really expensive mainframe, emulating one of the large telephone exchanges that AT&T Philips Telecommunications (APT) were selling was still cheaper than building and running one of the actual exchanges.

                The systems all ran R&D Unix 5.2.5 or 5.2.6 (based on Amdahl UTS), which even though it was SVR2, had many SVR3 features before they made it into commercial releases, such as a paging virtual memory system, STREAMS and RFS.

                Just after I left, the EE was ported to multiple Sun 3/280s and eventually SPARCs running across Ethernet, running R&D Unix 5.4, built on top of Sun OS 4.03.

    3. A-nonCoward
      Angel

      another Re: My favourite timing bug

      14 years ago I wrote a brilliant paper regarding some work with a MCU. Turns out my whole presentation depended on a particular bug in the compiler. Once they fixed it (of course, just a few weeks before publication) my code was useless...

      sigh. To be young! (young-ish icon I could find)

  10. Chewi

    I do recall one Amiga tips and tricks book that included a section on upgrading the 500's 68000 CPU to a 68010. It explained that everything on Workbench was entirely compatible, except for Calculator, which would crash. So yeah, things were a bit precarious back then!

    1. Tom 7

      The 68010 tightened up on a couple of things to make it Popek and Goldberg compliant to allow proper secure virtualisation. This involved tightening up access to the status register so anything that needed to check the carry bit would stop working. Fairly easy to fix though. Intel didnt get there till 2005!

  11. Version 1.0 Silver badge

    Just a guess, TTL timing?

    The screen would have been mapped to memory in those days and the DMA disk transfer would have been software, controlling hardware. Memory hardware is significantly affected by timing and the all the memory would have been dynamic RAM storage which is critically affected by timing so if the refresh pulse lags a few micro seconds then a "1" bit might occasionally become a "0" - potentially placing the clock in a certain location could subtly affect the memory refresh for the DMA causing occasional refresh errors when the dynamic memory bus was being pushed very hard in the disk transfer.

    As for how she discovered how to make it work - if you are not a geek then you just observe computers, so you just notice that it always works when the screen is set one way - not knowing about the internals doesn't confuse your brain.

    1. Jens Goerke

      Re: Just a guess, TTL timing?

      I once built a printer interface for a CP/M machine, using an 8 bit latch for data, routing the write-signal through some unused inverters to get a properly timed strobe-signal. Worked like a charm and reduced those two BIOS routines to 12 bytes in total.

    2. NullNix

      Re: Just a guess, TTL timing?

      Similar example on the C64, which came directly down to driving the DRAM out of spec. This one wasn't diagnosed until a few years ago, and was of course first shown as a demoscene scroller with appropriate freshly-composed music.

      1. Nick Ryan Silver badge

        Re: Just a guess, TTL timing?

        Wow. That's a bit of detective work, and ever so slightly out of every single C64's warranty period!

        1. Anonymous Coward
          Anonymous Coward

          Re: Just a guess, TTL timing?

          It suggests to me that the problem would go away if you chilled the critical components on the motherboard, which tends to speed up gates. You would want to speed up the address multiplexer and delay ~RAS slightly or leave it the same.

          It takes me back to debugging random timing problems with a can of freezer spray.

          Peltier effect devices and water cooling on a C64 would certainly be an interesting mod.

    3. shedied

      Re: Just a guess, TTL timing?

      At least she didn't insist that you place your left foot at a certain spot after putting the clock in the corner

  12. IGotOut Silver badge

    Oh how I miss those days...

    No not the hardware, but when users would a)work out "fixes" for themselves and b) didn't cry at the first tiny niggle.

  13. vir

    Reminds me of the magic/more magic switch story:

    http://catb.org/esr/jargon/html/magic-story.html

    1. jake Silver badge

      Which reminds me of an AI koan ...

      A novice was trying to fix a broken Lisp machine by turning the power off and on.

      Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”

      Knight turned the machine off and on.

      The machine worked.

      1. Anonymous Coward
        Anonymous Coward

        Re: Which reminds me of an AI koan ...

        When I was a kid my parents could never work out how I could make the TV work by fiddling with things, and they couldn't. It is of course entirely down to knowing how to fiddle with the old drum tuners till eventually the contacts worked.

        1. Jon Bar

          Re: Which reminds me of an AI koan ...

          My wife used to handle computer support for a large consulting firm (which has since been absorbed, and shredded, by a misguided management of what used to be a good computer manufacturer). One time when she was called to solve the luser's problem everything worked fine for her, and continued to. The user, of course, asked what she did, to which she replied "It knows Mommy's here."

          1. phuzz Silver badge

            Re: Which reminds me of an AI koan ...

            The number of times a user has called me over to fix a problem, which then magically fixed itself as soon as I was stood there, is enormous.

            Possibly the user is taking things slower and more carefully when I'm watching, but sometimes I think it's just electronic fear.

  14. Pascal Monett Silver badge
    Coat

    I'm far from an expert, but I declare this unsolvable

    No possibility to replicate the issue, an incomprehensible link between a clock application and disk access, completely obsolete hardware that can hardly be found any more, it would take a genius - or a time traveler - to unravel this mystery.

    1. Version 1.0 Silver badge

      Re: I'm far from an expert, but I declare this unsolvable

      It's certainly "unsolvable" but I think it opens a few interesting avenues for discussion - and that's why we're both sitting here (self-isolated of course) reading El Reg everyday. As I said earlier, I think that the possibility of a dynamic ran timing error "might" explain it. I read the story and was scratching my head for a while before remembering the old issues designing and building dynamic ram boards.

    2. This post has been deleted by its author

    3. Allan George Dyer
      Boffin

      Re: I'm far from an expert, but I declare this unsolvable

      @Pascal Monett - "I declare this unsolvable"

      I came to the comments looking for a detailed post from someone with the same problem who:

      a) successfully investigated it

      b) lists the code for the fix

      c) still has the fix running

      It's what usually happens around here...

    4. jake Silver badge

      Re: I'm far from an expert, but I declare this unsolvable

      I wouldn't go so far as to say there is no possibility of replicating it. Folks around here admit to having metric butt-loads of working obsolete hardware stashed about the place ... I'm pretty certain with a little effort a similar box could be cobbled together, which in turn might exhibit the same behavior. Follow that with a little old-school hacking, and we'd have an answer.

      If I had any of the relevant Amiga kit I'd volunteer it just out of curiosity.

      So unlikely to be solved, yes. But hardly impossible,

  15. Doctor Syntax Silver badge

    This set me wondering how much memory those NCR Tower Unix boxes had. I think the low end ones started at 4Mb with a mere 16 ports. A single user in the crayons dept clearly needs much more.

  16. SVV

    "I made the clock slightly smaller. Disk errors."

    Easy. If if fails when the clock is too big or too small it's obviously a C coding error when checking sizeof pointers.

    (the real answer is probably some shared memory corruption, so not actually that far away)

    1. Anonymous Coward
      Anonymous Coward

      Re: the real answer is probably some shared memory corruption

      DEADBEEF?

      Am I remembering right? One of the Amiga GURUs made a nifty utility that wrote "DEADBEEF" in hex to all newly allocated memory. And a different different pattern to the areas just past your allocated storage.

      It's been a bit so I may be off on which was which. But I do remember if you saw "DEADBEEF" while debugging it was a big hint as to what you did wrong.

      1. SimD

        Re: the real answer is probably some shared memory corruption

        I think you’re talking about Mungwall

      2. MarkSitkowski

        Re: the real answer is probably some shared memory corruption

        IBM's AIX also did the DEADBEEF trick all through the 90's

        1. Peter Gathercole Silver badge

          Re: the real answer is probably some shared memory corruption

          IBM did this for more of their OSs than just AIX.

      3. This post has been deleted by its author

      4. Proton Wrangler

        feed me dead beef

        My recollection from 80's and 90's work with Unix and embedded stuff is that many libraries would write 0xDEADBEEF to free'd memory blocks and that the other was 0xFEEDFACE in newly allocated regions, and padding between data elements (at the end of an array or stack frame).

        I don't remember if the C library malloc() routine was spec'd to deliver zero-filled regions or not. Certainly the underlying Unix brk() call did not.

  17. Snaggy

    Well my teenage self would know this much better having messed around with the 1541 drives. I never used the more advanced drive mentioned here. My theory is that the clock was a badly written app that used an NMI to draw itself on the screen. This would interfere with the drive timing in a consistent fashion, based on the size of the clock. It would almost act as encryption. So a file written with a particular size of the clock would cause a pattern of rotational delay that needed to be matched every time it was read in the future. You could even set up varying permissions on different files with the clock being a specific size when you wrote each of them. The user could have noted that it worked only when they had the clock up without understanding why it was necessary. Make sense?

  18. LenG

    Rotten Apple

    I had an Apple IIe which stopped reading the left-hand floppy (I had two, sitting side by side on the top of the case) after I bought a new monitor. To cut a long story short it turned out that the monitor generated a magnetic lobe from the LH corner sufficient to screw up the less-than-robust Apple disk drive. Stacking the drives away from the monitor fixed the problem but was a pain with my less than ample desk space.

  19. Fastitocalon

    I've upgraded my Amiga 1200 with an 68030 50MHz accelerator, 32Mb of RAM, a 4Gb flash card hard drive, a wireless network card, USB and a DVI-I video output. It's amazing how you can still run old Amiga technology like this.

    Plus it has games like SWIV, Moonstone, Super Cars II, SWOS, Speedball II, The Chaos Engine, Alien Breed, Turrican II, Lemmings, Gods, Flashback, Another World, Cannon Fodder, Walker, Monkey Island, The Settlers, Worms, The First Samurai, F1GP, Stunt Car Racer, Mega Lo Mania, Pinball Dreams, Frontier, Syndicate, Eye of the Beholder, Dungeon Master, Lotus Turbo Challenge II, Robocop 3, Dune II, Lionheart, Leander, Fire & Ice, Stardust, Sim City, Civilization, Banshee, F/A-18 Interceptor, Knights of the Sky, Populous, Xenon II, Hired Guns, Damocles, Wings, Qwak, No Second Prize, Arcade Pool, Exile, Gloom, Alien Breed 3D, Switchblade, Switchblade II, North & South, Carrier Command, R-Type, Bubble Bobble, Desert Strike, Jetstrike, Rod-Land and Zeewolf.

    1. David 132 Silver badge
      Thumb Up

      I cheat and run my old Workbench desktop environment under WinUAE. The Sysinfo benchmark reacts hilariously when it's running on top of a modern CPU (the "your performance" comment reads "Phone me NOW!!!!!", when it finds that you somehow have a system that runs at 94x the speed of an A4000/40). I have many of the games you mention and a few of my old apps (I made the cover disk of Amiga Format - by golly, that made me feel smug for a long time). Although I am embarassed to admit that a clock I wrote in '94 isn't Y2K compliant and reports this year as "120". Darn it.

  20. STrRedWolf

    DMA and Timing

    From what I can tell, this *sounds* like the drive in question was built for a different Amiga with different DMA timing. It was built specifically for one Amiga, but put in another... and the clock app was built that it chewed up enough timer ticks at that right size to let the drive work properly.

  21. Anonymous South African Coward Bronze badge

    Baaah. Clock helps the nannydemons to walk in synch with the hard drive.

    No clock, or it is too big or small, then the nannydemons lose sync.

  22. James Haley 2

    Except the article states: "You've got to open the clock application and put it in the corner of the screen."

    The app would have to be very badly written to ruin disk writes while not running.

    1. Terry 6 Silver badge

      It's that many years since I wrote code (in my amateur way), But the mapped memory idea sounds reasonably compatible with identifying the solution, in as much as a specific clock location would map to a specific memory location.

      There were, if my memory serves me correctly, some clever/dodgy routines around in the 80s and maybe later that would "borrow" a little bit of the display's memory to store a byte or two. Reading and writing the value from there would free up a drop of memory. And a small corner would be the location of choice because no one would notice anything amiss there.

  23. Anonymous Coward
    Anonymous Coward

    600MB?

    That's a pretty huge disk for that time. My Amiga only had a 10MB SCSI drive, and that was only because I was best mates with the sales director of a certain hard disk company.

    1. David 132 Silver badge
      Happy

      Re: 600MB?

      I had an A590 sidecar system on my A500 - with a 20MB MFM drive. And I thought I was a GOD as I watched the Workbench icons flash instantly onto the screen and surveyed my vast tracts of disk space. COWER BEFORE ME, YE PUNY MORTALS!

    2. jake Silver badge

      Re: 600MB?

      By 1992, when the 4000 was released, 1Gig drives were readily available and selling in the $2,000 range.

      1. dajames

        Re: 600MB?

        By 1992, when the 4000 was released, 1Gig drives were readily available ...

        Indeed. I bought a shiny new Dell '486 box to play with the Beta of Windows NT 3.1, and that came with a 1.4 GB drive. NT 3.1 was released in 1993, so that was about the same time.

      2. l8gravely

        Re: 600MB?

        Sometime in the late 80s I had an Amiga 2500 (upgrade from my A1000) and I splurged and spent $800 on an 80mg 3.5" Quantum hard drive. Was really awesome to see how fast stuff loaded.

    3. IGotOut Silver badge

      Re: 600MB?

      I had a 1gb SCSI drive with my Atari Falcon.....and a 2x CD Writer. Think that was about a grand and the blank discs were around £15 each, with a fairly high failure rate.

    4. The Central Scrutinizer

      Re: 600MB?

      I ordered my 68030 A4000 with the 80 meg drive cos the 120 pushed the price up too much. When the guy from the computer shop rang to tell me it had arrived, he said that it had been shipped with the 120 meg drive at no extra cost. Happy days.

  24. ricardian

    Commodore Pet and 6502 assembler programming required THE handbook by the imaginatively named Raeto West.

    https://www.amazon.co.uk/Programming-Pet-Cbm-Raeto-West/dp/0942386043.

    Thanks to that I got two Pets talking to each other via IEEE488, each Pet thinking that it was the master thanks to a cobbled-together token system

    1. Anonymous Coward
      Anonymous Coward

      The PET that I bought a couple of years ago came with the West book. It's a brilliant description of the hardware and how to interact with it, which is proving invaluable as I try to create MIDI interface for the machine.

  25. Anonymous Coward
    Anonymous Coward

    I used to call such circumventions "druidic rituals". You discovered by accident that a certain set of steps - in a particular order - overcame some problem. There was often no resource for a follow-up of "trial and error" to find the salient steps/order - and then the root cause.

    We had a newly released comms Front End Processor which had to be loaded from a large roll of papertape - followed by a large reel of patches. This proved to be a fraught process with many abortive attempts - until a ritual was found to work every time. Was it powering off/on peripherals in a particular order? Was it judicious pauses? Was it stepping on that particular false floor tile? All I can remember (it was the 1970s) was explaining to the site support person that this "druidic ritual" was essential to follow.

    1. swm

      In the 1960's whenever we had a hardware failure on the Dartmouth Time Sharing System the field engineer would run hardware diagnostics to try to locate the problem. He was never successful but after he ran the diagnostics everything worked perfectly.

      So every morning he ran hardware diagnostics. If you asked him what he was doing he said he was warding off evil spirits.

      On day he was warding off evil spirits the diagnostic stopped on a solid hardware failure. But time sharing worked perfectly. Further inspection showed that the failure was in the "bit change zero" instruction that changed friden flexowriter codes to ascii. It was about the only instruction we didn't use.

  26. Jeremy Allison

    The SCSI implementation on the Amiga was badly broken.

    Back in the day, working on porting my GEM implementation to the Amiga to port Kuma K-Data over from the Atari ST to the Amiga, we noticed that (and I can't remember the exact details now) if you dragged an icon around on the stream at the same time it was doing a SCSI disk transfer your disk got lots of nice copies of the icon bitmap written all over the places where your data should have been.

    The problem described in the article sounds horribly familiar.. :-).

    1. Nick Ryan Silver badge

      Re: The SCSI implementation on the Amiga was badly broken.

      There were a few different scsi.device implementations around and one was very well advised to use the latest stable version.

      The next issue was the filesystem driver which also needed to be updated to a version that coped better with different, generally faster, speeds of CPU.

      In many ways the Amiga OS's implementation of such things was very elegant, flexible and extendable. From memory, at the time, there were no other systems capable of reading and writing so many different devices and formats. Generally the only one that caused a lot of trouble was the MAC diskette because it used a variable rotation rate.

    2. ecofeco Silver badge

      Re: The SCSI implementation on the Amiga was badly broken.

      SCSI was never all that reliable on any machine.

      1. jake Silver badge

        Re: The SCSI implementation on the Amiga was badly broken.

        My Sun 3/470 "Pegasus" has 5 CDC WREN IV SCSI drives that have been running flawlessly pretty much non-stop since 1988. Seems pretty reliable to me.

        1. Dan 55 Silver badge

          Re: The SCSI implementation on the Amiga was badly broken.

          Slightly different price range, build quality, and target market though and probably didn't suffer from Commodore's legendary ability to not fix hardware bugs in a timely fashion but treat them as platform features (even though they owned their own fab).

          1. jake Silver badge

            Re: The SCSI implementation on the Amiga was badly broken.

            I was responding to ecofeco's blanket statement "SCSI was never all that reliable on any machine.". Twice.

            1. Dan 55 Silver badge

              Re: The SCSI implementation on the Amiga was badly broken.

              SCSI was only ever as good as the worst device on the chain.

        2. MarkSitkowski

          Re: The SCSI implementation on the Amiga was badly broken.

          Oh, you modern people!

          I have a Sun 3/60 with two external SCSI drives that are stll performing flawlessly.

          Sun, obviously, got it right.

      2. Anonymous Coward
        Anonymous Coward

        Re: The SCSI implementation on the Amiga was badly broken.

        i had an IBM 200 meg scsi drive attached to my Atari Mega STE, via an ASCI to SCSI converter.

        It worked great 99% of the time, but every so often, the first byte of the boot sector (which gave to offset to the first instruction of the boot code) would flip a bit and the computer would refuse to boot. I had to start it without the drive attached, attach the drive, reset the first byte to 00 with a sector editor, reset and reinstall autoboot...

        1. jake Silver badge

          Re: The SCSI implementation on the Amiga was badly broken.

          "via an ASCI to SCSI converter."

          Should I file that next to my EBCDIC to ESDI converter?

          1. Jou (Mxyzptlk) Silver badge

            Re: The SCSI implementation on the Amiga was badly broken.

            Atari Computer System Internface - But you are right, the previous comment swapped C/S :D.

  27. This post has been deleted by its author

  28. Anonymous Coward
    Anonymous Coward

    A mainframe operating system had been successfully running for a few years. Then a new model was added to the range. After completing engineering tests a version of the operating system was generated for the theoretically compatible prototype.

    Every Wednesday it would fail.

    It was traced to an instruction that mistakenly tested the day of week memory location. The mystery was why it was even going down that path. The answer was that each model's version of the O/S was given a definitive file name. IIRC the "AAGJ1000" name of the original working model's version became "AAKJ1000" for the new model. Another instruction was mistakenly testing a bit in the memory location containing the letter "G" ...but that always sent it down an expected tested code path. In the new model the "K" had that bit set opposite and off it went down an untested branch.

    1. shedied

      Don't have recurring nightmares?

      Waking up in the middle of the night, yelling It's the letter K, FFS!

  29. wub
    Pint

    Printer prevents disk corruption

    One of the systems that we bought when we first upgraded to 80286 hardware developed an interesting problem after a year or two. Hard to know when it began, but I'm pretty sure this was acquired behaviour [side note: your website dissed my American spelling]. The dot matrix printers we were using with these machines were a bit touchy about how one chose to advance the paper, and it was fairly easy to cause damage by just pulling the paper out by grabbing the top of the sheet and pulling it out.

    On one of the occasions when we had removed the printer for some TLC, we got a message from the user that their system was having trouble opening an important file. I wish I could reconstruct the whole story, but it has been a long time. What we finally figured out was that this particular 286 was only able to write to its hard drive correctly if there was a printer connected and plugged in. Disk reads were always fine. It did not matter if the printer was configured, or even turned on. It just had to be connected to the parallel port and plugged in. Without a printer, any attempt to write to disk just spewed gibberish - even if the system was, oh, updating a log file.

    I'm sure there was something involving a ground somewhere, but it wasn't worth digging any deeper. I had always thought of that situation as a bizarre and interesting way of pointing out just how complex a modern PC is, but this story of the clock on the screen has it beaten all hollow. Cheers!

    1. Steve K

      Re: Printer prevents disk corruption

      [side note: your website dissed my American spelling]

      Yes - appropriately for this article - there are still a few unfixed bugs in the US spelling dictionary....

    2. Elfoad Regfoad

      Re: Printer prevents disk corruption

      If you hadn't said the printer was attached to a parallel port, I was going to suggest that if the disk drive and printer were both SCSI, that possibly the SCSI hard drive wasn't properly terminated if the printer was removed.

  30. zb42

    slightly low voltage

    About 20 years ago, an acquaintance of mine had a story about his Amiga harddrive ceasing to work. It turned out that the power supply was producing 4.8volts instead of 5.0volts, apparently enough for the computer to boot but not for the harddrive to work.

  31. el_oscuro
    Coffee/keyboard

    Oracle, Windows, and keyboards

    About the same time, a user was having problems with Oracle*Forms on Windows. It would GPF all the time. I opened a service request and troubleshooted for a while. Then they asked what type of keyboard the user had. He had one of those Microsoft "Natural" keyboards, which apparently had some sort of conflict with Oracle*Forms.

    Several years later, I told that story to another consultant: She was "OMFG - we had that same issue too. But we never solved it - we tried reinstalling Windows, a new PC, everything. But the one thing that was in common was the users keyboard.

  32. _LC_
    Pint

    This can be easily explained

    Back in the days when the Amiga was designed, CPUs didn't have a "memory management unit".

    Today, every program gets its own virtual address space. Therefore, every program can pretend that it is located at a fixed address. It cannot access the address space of other programs (processes). They themselves can believe to be located at the exact same address and it doesn't matter as they don't interfere.

    In order for programs to work together in a multi-tasking environment, unlike DOS where only one program could run at a time, they had to be "pc-relative". "pc" stands for program counter. It's a register of the processor that determines the position of the current instruction. When your program got loaded and executed, you had to access your data relatively to the program counter (that is, relatively to where in the address space your program was loaded into).

    Say, your program's start address was at 0x0000. Say the data was located at 0x0100. Now, if your program was loaded into address 0x2000, then your data was located at 0x2100 (for simplicity I used 16-bit addresses, though the 68k used 32-bit).

    By accessing your data relatively to the pc-register, your programs became independent of the memory locations they were loaded into.

    Some idiots never managed to program properly. Their programs were always accessing fixed addresses (most demos, for instance). If there was something else at those addresses - boom!

    As has already been previously stated, the SCSI implementation was broken. Likely, some idiot forgot a fixed address somewhere. By running the clock first, you allocated a certain address space, guaranteeing that whatever came afterwards was pushed further up in memory.

    ;-)

    1. RichardBarrell

      Re: This can be easily explained

      PC relative addressing came back into vogue in the last decade or so because of address space layout randomisation :)

      1. _LC_
        Happy

        Re: This can be easily explained

        I don't mind it. The code was pretty tight on the 68k.

    2. Peter Gathercole Silver badge

      Re: This can be easily explained

      Virtual Address Spaces are much older than the Amiga and 68040s (I'm pretty certain that the 68030s and later had fully functional MMUs as part of the CPU).

      I'm just trying to remember the computer architecture course I taught in the mid 1980s, where I talked about the first workable virtual memory system, which has to include a virtual address space. Ah. Atlas at the University of Manchester in 1959.

      1. jake Silver badge

        Re: This can be easily explained

        I believe the Burroughs MCP was released a year before the Atlas Supervisor. Perhaps surprisingly (to the PEE CEE crowd, anyway) it is still in use, and under development. 19.0 was released last June.

  33. TeeCee Gold badge
    Coat

    So it only worked when there was a clock up?

    I'm sure I've heard that the opposite is more usually true.

    1. TimMaher Silver badge
      Coat

      A clock up what?

      Did I misread that?

      I’ll get your coat. It’s right next to mine.

      1. bpfh

        Re: A clock up what?

        No you did not mis-read. It had to be a certain size in a certain position. So It’s not the size of the clock that’s important. It’s what you do with it...

  34. eionmac

    Mechanical Druidic Magic

    In times long gone, We reground to 'flat' (a specific very accurate specification in microns over 5 metre length) a large 'bed plate" for a very large bed grinder (about 3m x3m square bed area), as 'grinder developed errors' about once per month. Much to and fro with manufacturers who paid for this as warranty. By co-incidence, one day I was looking at moon-earth positions relative to calendar, and suddenly realised errors always and only occurred when the moon was closest to earth. I enabled stop work on that machine when The Moon was close to earth (perigee). It worked no errors when working and of course no errors when not working! Later after investigation we found foundations for machine was a 600 odd ton concrete pillar. The moon's gravity distorted the machine when close, relative to overhead machining arm supports, which were fixed independently to factory floor outside grinder base so the many motor rotations & vibrations did not effect grinding fixtures! We called this work routine 'this machine does not work when druids walk'.

    Memory is fragile, but big foundations do distort with gravity relative to surrounding shallow machine beds.

    Link: the words 'Druidic Magic' took me back to that problem of my early working days as a 'gentleman apprentice'.

  35. tea junkie

    Buster

    I'm going to guess it was dodgy Buster chip. That controlled the Zorro III bus in the A4000, and was quite buggy. The earlier revision 9 buster was pretty bad, revision 11 fixed a lot of faults.

  36. ecofeco Silver badge

    The root of the problem

    SCSI

    Is was never all that reliable on any system. The most common was termination in the chain issues. This would throw disk errors all day long. And there was never any reliable or logical fix for it.

    My best guess is a certain signal was being sent when the clock was set "just so" that temporarily resolved it. But that was SCSI for you. Not even joking when I say getting them to work reliably was damn near witchcraft.

    Don't even get me started on ZIP drives.

    1. Grunchy Silver badge

      Re: The root of the problem

      ...and I was just about to gripe about LS120 super disks, too...

    2. Stuart Castle Silver badge

      Re: The root of the problem

      Used to support a lot of scsi devices. Thankfully they’ve all been replaced. I had hours of “fun” sorting scsi problems where the entire chain of devices would vanish and after a lot of testing, I’d find the terminator had failed (or even nicked on occasion) , someone had decided to rewire the chain or even a cable had failed.

      USB is almost as bad. Not because we have great chains of devices (although power and bandwidth permitting you could build a sizeable tree of usb devices using hubs), but because some of the cables are, frankly, crap and some of the devices play a bit fast and loose with the standard. I have a usb powerpack that has high capacity and charges quickly. The problem is that although the provided cables look like micro usb, it won’t charge properly if you don’t use the provided cables.

  37. razorfishsl

    Chances this one was that the programmers disabled the NMI's or other systems flags so the clock would draw perfectly & not break up due to interruptions.

    1. jake Silver badge

      It's not much of an NMI if software can disable it ...

  38. Raffaele

    When formatting your Hard Disk you had to insert MASK 0x7ffffffe or 0xfffffffe to avoid files being corrupted and stopping receiving lots of fault errors. ^_^

  39. Alan Brown Silver badge

    Duct tape

    "usually, we found out later, because the marketing manager had decided to be creative with the network topology in his office, messing up not just for him but everyone else,"

    Noooo, not on the network devices. For the manager - to "affix" him in place so he can't fiddle with anything.

  40. FlavioStanchina

    Whoa, that was a nasty gremlin

    Whoa, I did my fair share of stuff on the Amiga at the time, although I didn't own an A4000 -- the A3000 was far better, as we all know -- but never saw or heard anything like this. OK, the flawed Buster, SCSI termination, no memory protection (I'm pretty sure Paula wasn't running the Enforcer)... but taming disk errors by opening the clock?? And only in a certain position?!?

    My best guess is that she, being the graphics designer, also had an add-on graphics card -- Picasso comes to mind, but memory isn't serving me well, I don't even remember the name of the one I had. Both the hardware and the drivers for those beasts were of mixed quality. Interactions between the graphics, the SCSI controller and/or system timings would not surprise me, although I'd still be at a loss to explain why the clock fixed the enough to work.

    1. Jolyon Ralph

      Re: Whoa, that was a nasty gremlin

      Ok, "Agnus" here!

      Some more details, although my memory is a little sketchy after so long.

      I *think* the machine had both a Commodore A4091 SCSI card *and* a GVP Zorro II SCSI card in it, because the GVP card was maxed out with the extra (slow fast) ram because we were too cheap to get the proper ram to expand the A4000 (the card had been recycled out of another machine that it replaced)

      I can't remember which the external drive was connected to (I do remember it was a Fujitsu drive) but I suspect it was the CBM card.

      It was one of the early production A4091 cards sold to developers which was, to put it mildly, a pile of crap. I know there was a ROM upgrade but I don't know if it had been installed or not.

      1. Wally Dug

        Re: Whoa, that was a nasty gremlin

        Jolyon Ralph? The Jolyon Ralph?

        We're not worthy! We're not worthy!

        1. Jolyon Ralph

          Re: Whoa, that was a nasty gremlin

          No, just 'a' Jolyon Ralph. We come in six-packs now.

      2. ridley

        Re: Whoa, that was a nasty gremlin

        Wow, now Agnus or should I say Jolyon I remember you well, we met a few times at shows and I believe a developer conference.

        Now that takes me back.

        1. Jolyon Ralph

          Re: Whoa, that was a nasty gremlin

          Those were the days!

  41. Anonymous Coward
    Anonymous Coward

    DMA issues with the A3000 as well

    The Amiga 3000 included an onboard Western Digital 33C93 SCSI controller chip as opposed to the Amiga 4000's onboard IDE controller. Problem was that the 33C93 included a DMA bug that could lock up the machine under the right circumstances. I rarely triggered it while running AmigaOS, but I hit it all the time running NetBSD.

    I worked around the problem by installing a Zorro II SCSI card, but it was terribly slow since ZII DMA was disabled on the ZIII bus. Eventually WD released a bugfixed 33C93A chip that resolved the corruption problem, but I had moved on to using an A4000 by that point.

    I was lucky in that my Amiga 4000 included an r11 Buster, so eventually I fitted it with a DKB 4091. I also tried adding an A3640 processor card, but apparently it was too much for the system and it completely fried after a couple of hours.

    So I pulled out my old A3000 and then transferred the r11 Buster, 4091, and Cybervision 64 to it. It lived on a few more years mostly running Shapeshifter (Mac emulator) running System 7.

  42. ridley

    My part in CBM's downfall.

    Having an Amiga A4000/030 and lusting after the power of those '040's I decided to get the daughter board and source the CPU's from elsewhere as the daughter board with CPU IIRC was about £700, a lot in those days.

    So I ordered the daughter board off CPC? Farnell? for £29 but was told that they were on back order.

    Six months! Later I received my daughter board..complete with cpu and cooler all for £29. Now some people would have alerted Farnell to the mistake and others would have placed an order for all the daughter boards in stock. IIRC there were 20-30 in stock...

  43. budp

    problem with time of day

    I had a strange problem with Timex watch software for the PC back in a similar timeframe.

    The software would only work between midnight and noon EST, and got a non-indicative error between noon and midnight EST.

    So I used it in the morning for some time.

    I eventually did determine what the error was.

    My computer was running with a 24-hour clock option rather than a 12-hour clock option.

    The Timex software was written assuming a 12-hour clock, so hours 13-23 were not acceptable.

    I sent an email, but the bug was never fixed.

    After that discovery, if I needed to update the watch after noon, I temporarily changed the system clock to 12-hour.

  44. Anonymous Coward
    Anonymous Coward

    "external SCSI device"

    Well, there's your problem. Off to the pub!

  45. juice

    Merry Christmas!

    A friend used to have a little christmas-fairy-lights app running on his (windows 95?) machine all year round, as he swore it made it more stable.

    Which in turn has reminded me of all the other little toys which hooked into Windows and gave you little characters and critters running around the top of your window titlebars.

    Such as Sheep. And I'm not sure if I should be amused or terrified that someone's ported said beastie to Windows 10...

    https://www.microsoft.com/en-us/p/esheep-64bit/9mx2v0tqt6rm

    1. js.lanshark

      Re: Merry Christmas!

      Neko is back! Thank you ever so much! Now I can waste even more time on lockdown!

      1. John Brown (no body) Silver badge
        Coat

        Re: Merry Christmas!

        "Neko is back! Thank you ever so much! Now I can waste even more time on lockdown!"

        Hi,I was directed here from an El Reg article published in 2022. I just came here to let you know that lockdown never ends. Make the best of it you can, it only gets worse. Much worse.

        The hazmat suit --------->

  46. Proton Wrangler
    Happy

    feed me dead beef

    My recollection from 80's and 90's work with Unix and embedded stuff is that in many libraries free'd memory blocks had 0xDEADBEEF written to them and that the other was 0xFEEDFACE in newly allocated regions, and padding between data elements (at the end of an array or stack frame).

    I don't remember if the C library malloc() routine was spec'd to deliver zero-filled regions or not. Certainly the underlying Unix brk() call did not.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like