back to article Breaking Bad or just a bad breakpoint? That feeling when your predecessor is BASIC

That Friday feeling is upon us again after a week of dealing with IT issues and dodging the gimlet gaze of the boss. Hopefully yours didn't involve some impromptu debugging in production. Welcome to On Call. Today are pleased to salute the return of Who, Me? contributor Susan, who previously regaled us with a tale from two …

  1. Anonymous Coward
    Anonymous Coward

    'Ever been called to fix a problem that was very much somebody else's work?'

    All the time. It's called 'Helpdesk'.

    1. A.P. Veening Silver badge

      All the time. It's called 'Helldesk'.

      Fixed your typo ;)

    2. Nunyabiznes

      To be fair, as a helpdesker I've had to spend a lot of time troubleshooting because the network team believes their systems are infallible. Even after proving it wasn't a desktop issue, being sent on wild goose chases is common.

      I seldom get a "sorry about that" after a 5 minute troubleshoot and fix session by the network staff (after being pointed in the right direction by one of us) resolves an issue they've been loudly proclaiming was someone else's problem.

      Oh well, it's Friday and weekend plans are afoot. Y'all have a good one.

      1. Muscleguy

        According to Carlsberg on twatter it’s beer day, so what more do you need?

        1. Gene Cash Silver badge

          > According to Carlsberg on twatter it’s beer day, so what more do you need?

          Well it's ALWAYS beer day, or more exactly, whiskey day. Or perhaps whiskey hour, around here.

      2. J. Cook Silver badge

        A view from the network team...

        So, *my* problem is that our helpdesk will, at times, throw me tickets that are effectively "it's broken." with nothing else in the ticket - just the problem description provided by the customer, and no notes, no evidence of troubleshooting, nothing.

        such tickets get tossed back down to the support team with "please clarify what the issue is" type language put in the notes. with my boss's blessing (and the support manager's encouragement) even.

        And generally (at least for me) when a ticket gets handed to me and they ping me on IM with "hey, can you look at X- customer is down" and nothing else, I'm usually kidney-deep in something else, which sometimes isn't as easy to pause or back out of at people might think.

        Despite people claiming otherwise, I am not a god, nor am I able to wave my [censored] around and magically fix things. :)

        1. Anonymous Coward
          Anonymous Coward

          Re: A view from the network team...

          Ugh. Had one of those kinds of tickets recently - "Fix [xxx] in upstairs conference room." It didn't get taken care immediately, so somebody higher up started getting involved and demanding to know why - at which point the tech asks "Which building?" Could have been 4 different buildings, with several different upstairs conference rooms in each.

          1. Anonymous Coward
            Anonymous Coward

            Re: A view from the network team...

            Or mine - "scanner not working"

            Barcode scanner?

            Flatbed scanner?

            Multi-million pound MRI scanner?

            BTW - that last one isn't IT's problem!

            1. ShadowSystems

              Re: A view from the network team...

              Just pick up a different kitty to perform a cat scan. Then hand it off to a dog to do the lab work. Job done!

              *Runs away before you club me with a dead fish of pun-ishment*

      3. Anonymous Coward
        Anonymous Coward

        after a 5 minute troubleshoot and fix session by the network staff


        I get this thing doesn’t work, it worked last week. IT’s obviously a network issue.

        Q1: what’s the source, destination and port so I can trace the path & check any controls on route.

        Answer typically just mentions the destination and that it doesn’t work.

        I have to ask ~ 3 times for just those 3 bits of detail.

        Throw in load balancers, nsx, ACI, acl’s, firewalls, App firewalls, ips, anti spoofing, av, proxy pacs, proxies, certificates etc it’s like finding a needle in a haystack.

        Throw access management in the mix and it’s not trivial.

        Fixing it after a 5 minute chat is a testament to the quality of your network team.

      4. js.lanshark

        With all the bits and pieces (and peoples fingers) that go into a "network", sometimes it is amazing the things actually work, much less be infallible.

        1. Not Yb Bronze badge

          Worked at more than one company that called the internal network "the notwork".

  2. Mishak Silver badge

    Debug build on a live server?

    I've not really had a lot to do with VS, but in my domain (embedded systems), you wouldn't deploy a debug build (so no breakpoints to fire).

    Are VS breakpoints really attached to the application and not the debug session?

    1. Nick Ryan Silver badge

      Re: Debug build on a live server?

      Breakpoints are usually compiled into the application code and trigger a CPU level exception. This exception is handled by the OS and passed onto whatever application is registered as handling them. If no application is registered then code execution should continue as usual.

      1. MarkET

        Re: Debug build on a live server?

        IIRC, VB6 was compiled to MSIL despite looking like an EXE. Run-time exceptions are ring 1 or higher on Windows x86 and wouldn't halt a server, even NT4 could cope.

        1. Nick Ryan Silver badge

          Re: Debug build on a live server?

          The server itself would/should continue fine - however the application in question would be halted waiting for the response from the debugger to proceed/step through, etc. Which is just what was reported in this application.

          I don't know exactly how VB6 implemented the breakpoints however I'd assume that it used standard CPU level methods even if MSIL was used as an intermediary - nothing stopping MSIL code using standard methods.

          1. This post has been deleted by its author

    2. Terry 6 Silver badge

      Re: Debug build on a live server?

      It's several decades since my semi-amateur coding days. Even then I used two copies of the s/w. One for doing the testing and one that I'd only amend after I'd got the test version working.* These of course were not a on a server. Deployed to stand alone PCs.

      *OK yes there were a couple of times when the amended version didn't run properly. I think I always was able to find the typo with a quick look.

    3. Philip Stott

      Re: Debug build on a live server?

      This was VB6 in early 2002 - a few months before C# and visual studio. net was released.

  3. Mishak Silver badge

    Visual Basic 6 and SQL Server contractor before her employer of four years abruptly went bust

    So, she was caught by IR35 then? Or should "employer" have been "client"? ;-)

  4. Nick Ryan Silver badge

    I got a crash course in CANBus (OpenCAN) communications.

    The application that I inherited did not work. OK, it exhibited some activity however there were a few flaws in the implementation:

    The CANBus interface code was implemented as if CANBus was a serial link from the USB CANBus node attached to the PC to whatever remote device was being communicated with at the time. CANBus is a broadcast network. Because communications with a CANBus device was not "reliable" due to treating it as a serial link device with only that device connected, every single damn command was sent three times to the device. Why? Because sometimes another device would send a broadcast packet and this would appear on the bus in between the command to the remote device and the remote device's response. Therefore it wasn't working and the command was just sent three times and assumed to work. Yep, assumed to work...

    Because the CANBus interface code was implemented as if it was serial link to a specific device, multiple devices operating simultaneously was impossible. This kind of worked when there was a single module being controlled, however the intention of the project was to have up to four identical modules, each with their own set of motors and sensor modules, therefore it was impossible to scale it.

    The diagnostic and test code for the connected devices used different interface code to that used in the operating environment. Getting something working and configured in the diagnostic and test interface did not mean anything would work in the operating environment.

    I/O modules were treated as polled devices rather than devices that broadcast state change. As a result, sensor triggers were only responded to when the I/O module was specifically polled for the current I/O state. The I/O state was maintained as a copy in the application itself and therefore was inevitably out of step with whatever was happening on the I/O device itself. The I/O device was, of course, asked three times in a row for the state... For those that don't know, most CANBus I/O modules are asynchronous and while they can be polled, their normal operation is to send a broadcast, or direct message to another device, on state change of their I/O signals. As a result, when an input on the I/O module was triggered, this would be missed until the control application happened to poll the device for the current status. This did not help operational efficiency or accuracy at all...

    After fighting the existing code I threw every single part of it away and started from scratch. The project was already late at this point, but it was the only way to proceed and there was absolutely nothing of use in the previous application. The replacement application was fully multi-threaded, the CANBus interface processed and dispatched all received CANBus messages to whatever internal handler was interested in it which allowed a responsive state machine to operate and for this to be scaled to multiple modules.

    The CANBus devices were proven to be reliable, previously it was reported that these devices were not where the only problem was the utterly inappropriate attempt at communicating with them as they were a singly connected serial device.

    The project did continue to struggle however that was due to the systems that we were interfacing with not adhering to the stated method of operation that they were very loudly stated to adhere to. This changed a little after I connected to the on-site PLCs, downloaded the code from them, effectively reverse engineered it and then demonstrated that different sites had different, and incompatible, modes of operation...

    1. Nick Ryan Silver badge

      I forgot the other gem!

      The on-site PLCs that a different organisation supplied and loudly stated worked one way and then it was proven that they worked different ways on different sites had a further problem. Rather than operate on a state machine basis with child states dependent on others, every single possible state was tracked "in parallel" regardless if many of them were impossible due to other states. The processing cycle of these PLCs, and many such PLCs, are often just an array of state machine handlers that the local processor runs through in sequence. This did not always work as expected because there is a finite amount of time for processing the array of state machines and if the execution loop time is exceeded then later state machines may not be processed.

      That was a gem to track down. And, typically, have the manufacturer deny. :(

    2. Justthefacts Silver badge

      You’ve possibly made a mistake....

      Depending on application, you’ve possibly gone badly wrong. CAN was developed for automotive industry, where AUTOSAR is the software standard, virtually mandatory industry wide.

      The reason why the CAN controller was polling state rather than asynchronous, is that polling is mandatory in AUTOSAR. Like 1553 mil-std. That makes the software execution perfectly predictable real-time. Which they value more than efficiency. I personally disagree with that philosophy, but those are the standards.

      And most industries that have inherited CAN have also inherited AUTOSAR or AUTOSAR -like coding standards.

      So, if and when your code needs to be certified, they will have to throw away the entire thing as a non-compliant architecture.

      But in fact the root cause was that other node being configured to broadcast, in a polled architecture, causing interference.

      You will be surprised to know that you explicitly aren’t supposed to error check as such. Three tries, and if fail then just use the out of date local status, is preferred methodology. If error, identify faulty node and fault recover the whole node, either power cycle, select redundant node, or select redundant bus. That’s it. You do *not* repeat *not* error check at transport level. Because the software timing would then vary at the microsecond level depending on whether the data was OK or not, and that is totally verboten.

      1. Anonymous Coward
        Anonymous Coward

        Re: You’ve possibly made a mistake....

        AUTOSAR is relatively new compared to CAN itself, and even in AUTOSAR environments, periodic and event driven traffic isn't uncommon (source: the three different controll modules on my lab bench that spray numerous periodic messages on a powertrain CAN bus every 50 msec or so).

        Transport-level error checks also aren't unheard of. Depending on the data integrity needs, I've seen counters and CRCs done here. Rebooting a controller isn't an option when you're driving down the highway (or if you're an airbag module in the middle of a crash event).

        Microsecond level timing isn't doable when you're running at typical CAN rates like 250k or 500k found on modern cars.

        Async communication is ok even with the hard real time requirements of many CAN applications. The reason it works is due to CAN's inherent message priority and collision avoidance (but that does require proper system design, including selection of CAN IDs).

        1. Justthefacts Silver badge

          Re: You’ve possibly made a mistake....

          Those controllers on your desk - interesting and very worrying indeed.

          Certainly at the automotive prime I worked for, putting periodic messages onto a *powertrain* CAN would be considered grounds for instant dismissal. In a literal, not metaphorical sense.

          On a secondary CAN bus, eg infotainment or window motors, as I said personally I think that’s safe, but still would not have been considered remotely acceptable at that company.

          Error checks - yes, I think we differ in our definitions. One should count errors, and update statuses, at transport level, but not act upon them. Application level can perform mode switch based on error status. But within an operating mode, definitely no conditional operations.

          Microsecond timing - my statement was short for baud rate.

          “Rebooting a controller isn’t an option”....yes, obvs. So, you don’t. Critical nodes are run with hot redundancy, and hot-swapped on fail. Ditto controller. Errors trigger a mode change to whatever safe-mode is defined, so if it’s something critical then even if you have swapped to something working you have lost the backup node, so you need to be in safe mode.

          Once you are in safe mode, then you can cycle the redundant failed node, because the operations are running correctly on primary.

      2. Nick Ryan Silver badge

        Re: You’ve possibly made a mistake....

        This was in an industrial control environment, not sure how you got the impression otherwise.

        Polling is incredibly inefficient, however it is a method of continuous assurance that a connected device is still responding. This requires a control application rather than building the network of CAN devices to operate indepedently.

        For example, a CAN I/O module can be configured to send a specific message in response to an input transitioning from low to high (or the reverse). This message can be sent to a specific device or a broadcast. Think a simple example of an optical beam sensor that when something crosses its path sends a message to a motor to stop. This can easily be an expected input rather than an emergency situation. Attempting to perform the reverse using polling is beyond most motor devices as they'd have to maintain an internal state track of the current I/O input on the remote device and repeatedly poll the remote I/O device for the current state, compare this to the current recorded state and them execute a specific command. This is very inefficient and having multiple devices doing this on the same network quickly floods the network with polling requests and responses.

        Most CAN messages are transactional, therefore a message is sent and as a result there are three states in response and these need to be tracked in a control application: OK (including response data), Error (including error reason) and no response (which needs to be escalated and handled appropriately). A control application that doesn't do this processing is useless and entirely not fit for purpose.

        1. Justthefacts Silver badge

          Re: You’ve possibly made a mistake....

          That’s why I said “You *may* have made a mistake”, not that you definitely did. I don’t know what your application domain is.

          What I’m telling you is that the original architecture is *mandated by software certification requirements* within an important domain. Your rewrite may well seem more sensible to you. But you need to understand the good reason why the original was written that way, which was probably the technical background of the author, even if you decide that your system requirements are different. Do you absolutely know that the software you designed will never need to be certified, and to what standards and process - have you explicitly asked this question within your organisation? If it’s industrial control, could it ever end up in robotics or heavy machinery, with outsourced safety-certification that your organisation may not know in detail, let alone control?

          Google Chestertons Fence.

          I’m well aware of the efficiency argument of Not Polling.

          Your example shows exactly why the opposite isn’t done in safety-critical applications. Polling imposes the worst-case analysed load, at all times, given the stated required response latency. *Your* scheme will suffer exactly the same, but in undefined and “almost infinitely unlikely” circumstances.

          If there are ten sensors, it is theoretically possible they all happen to see their event at exactly the same time. Then the instantaneous network load hits 10x. Oh, in that case, one of those sensor readings will be delayed, and that’s ok because the timing isn’t quite so micro-critical? Right then, original analysis requirement of what is the mandatory response time was wrong. Please re-do your requirements analysis to specify the true worst-case acceptable latency of system response. And then poll at that rate. The problem quickly reduces to a horrendously detailed requirements analysis of the actual worst case requirement, per sensor, and allocating worst-case resources to that sensor all the way through the stack from network up through the application software.

          “A control application that doesn’t do this processing is absolutely useless”.

          No. A control application that doesn’t do this *logging* is absolutely useless. Safety-critical domains usually require fail-safe, not fault-tolerance. Fail-safe typically doesn’t require much fault diagnosis, if any, in the software, because most responses are independent of root cause, all you want is a good Safe Mode.

          To be absolutely clear, I don’t agree with much of this argument from a technical standpoint. But it is the industry-standard argument. I’m temperamentally unsuited to safety-critical work, and its endless documents and reviews and appeals to following-the-process, and I don’t do it any more.

          1. Robert Carnegie Silver badge

            Re: You’ve possibly made a mistake....

            Jumping on: the project was late, it didn't need to match a standardisation which didn't apply, but it did need to work, which the existing project code didn't and couldn't do. Job done, move on.

            1. Nick Ryan Silver badge

              Re: You’ve possibly made a mistake....

              Jumping on: the project was late, it didn't need to match a standardisation which didn't apply, but it did need to work, which the existing project code didn't and couldn't do. Job done, move on.
              Thanks - exactly this. It wasn't automotive in any way, and my original post didn't imply anything of the such (in fact, it's clear that it was industrial).

              As further point safety circuits should be entirely independent of control systems and in general should be wired direct. For example, the ESTOP signal for a motor is a specific input wire and, for example, the big red stop button is directly wired to all of these.

      3. Mishak Silver badge

        Re: You’ve possibly made a mistake....

        Whilst a lot of automotive is based on AUTOSAR, other protocols are also used - such as J1939 (mainly large trucks) and MilCAN (military vetronics; compatible with J1939).

        These operate mainly in a synchronous mode (but do also support asynchronous), with a high-priority sync frame being used to trigger the synchronous nodes to send their messages. Each sync frame also includes a counter so that messages can be assigned to specific slots, ensuring that the bus load is distributed and the system doesn't end up with bandwidth starvation that could arise if everything were sent at the same time.

        This really helps when you're running hard real-time closed loop control systems over the bus, as sample times and output generation occur at much more well-defined points in time, reducing jitter (and therefore loop gain errors).

        I used a similar system to this (before J1939 emerged) to implement vehicle dynamics controllers on prototype platforms.

      4. martinusher Silver badge

        Re: You’ve possibly made a mistake....

        CAN is a lot older than AUTOSAR. It effectively comes in two flavors -- CAN proper and CAN/Open. CAN proper -- the original CAN -- requires a bit of thought because the packet address (the COB-Id) isn't directed at a specific node but rather at a process. That is, all messages are effectively broadcast to all nodes. (This explains the peculiar architecture of a CAN controller and why attempts to attach CAN to a classic operating system network driver stack don't work very well.)

        Since most of us are used to networking we invariably use CAN/Open. This is effectively a subset of CAN that enables datagram like messaging between nodes. There are still broadcast messages, though. CAN/Open messaging is also used by industrial protocols that are not based on the physical CAN bus such as Ethercat.

        The actual CAN physical protocol is a serial bus, its both an infernal nuisance to set up and a resilient, collision proof, protocol that automatically identifies bus errors (and, in theory, has software to support managing those errors). It works well even if it is a bit slow and inefficient compared to networking. The CAN transport is used by protocols such as J1939, a protocol that's widely used in the commercial automotive industry.

        (This 'three polls' methodology described int he article suggests that whoever wrote the software didn't have a clue how CAN worked.)(Not unusual -- CAN is not only a bit 'different', its also gets very complex and so isn't the sort of thing you want to mess with when on a tight development timescale.)

  5. Boris the Cockroach Silver badge

    Someone elses

    problem is now your problem

    "It doesnt work" wails the guy with 3 yrs experience(claimed) "I've tried and tried but it keeps losing the file when I hit compile"

    The mangler arrives

    "I want it up and running NOW!", followed by his more usual song "Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it Fix it "(until the PA is called upon to drag him away to do something more productive... like .. sit in this wood chipper until I switch it on..)

    An older wiser eye casts a look

    "Does help if you compile it to the shared program directory instead of your home directory.. especially since the networked communication PC can only see the shared directory"

    1. Zarno

      Re: Someone elses

      "until the PA is called upon to drag him away to do something more productive... like .. sit in this wood chipper until I switch it on.."

      Stealing that one.

      A pint of your favorite on a friday!

      1. The Oncoming Scorn Silver badge

        Re: Someone elses woodchipper

        Seen the one from Fargo (Movie) in Fargo visitor centre.

  6. Sgt_Oddball

    This was...

    The only reason I ever learned to code ASP classic after getting asked by a local company to fix a badly borked site.

    Tons of near identical code on multiple pages and not a single common functions or sub between the lot of them (that alone removed about 2/3rds of the errors).

    Otherwise, having a team of multiple other devs that I work with these days does mean that we communicate and give eyeballs on each others work to stop such giant acts of stupidity. Pretty sure I couldn't go back to working with only one other dev any more.

    1. My-Handle Silver badge

      Re: This was...

      I was brought in to work for my current employer to deal with an almost identical situation. An ASP webforms website, with each page just being one ASP literal control and then a whole bunch of string manipulation to generate HTML. Product pricing calculated four or five different ways, depending on where it was viewed. As you say, not a single shared or common function to be found anywhere.

      The sad thing is that I'm a "learned on the job" kind of programmer. I've probably left my fair share of howlers behind me, purely from being required by management to achieve results far outside of my qualification. The best I can say is that I damn well owned any of my mistakes whilst I worked for any employer and always made a good effort to document what I did for anyone following.

    2. DJV Silver badge

      Re: This was...

      Oh hell, that reminds me of a holiday cover temp job I did back in 2005 for a local web company. On my first day there they wanted me to start putting together a new site. I asked, "Where's your library of common functions/utilities?"

      The response was a blank look of confusion.

      I was horrified to learn that their method of building web sites was to have a graphic designer sit down with the customer and ask "What do you want the site to look like?" After the customer described it, the graphic designer would produce graphical mock ups of several key screens (one image per screen). These, once the customer had approved them, would be sent out to India where the images would be chopped up into smaller graphics and then have some HTML wrapped around them to produce individual static HTML pages. These were sent back to the web company who would then custom build hack some ASP code into each one individually to connect it through to a database. Common functional code? Not a chance.

      Despite them offering me a full-time job, I buggered off as quickly as I could! They went bust about 2 years later.

      1. MrBanana Silver badge

        Re: This was...

        Common functions/utilities are great, when they work. If they are broken, and you are still forced to use them, then that is is different world of pain.

  7. Anonymous Coward
    Anonymous Coward

    I've been in a situation where I was prevented from working on a project because I was deemed 'too expensive'.

    The work was passed on to a cheaper resource. Y'all know what I mean here.

    2 months later, the project had stalled, so I get pulled back in. I proceed to spend more time fixing the mess left behind than I would have spent delivering the project in the first place!

    Very frustrating and somewhat stressful as I was then under the cosh to deliver a late project in record time to a less-than-happy customer (who's decision to cut costs caused all this mess).

    1. MiguelC Silver badge

      So... a regular Tuesday?

      1. Anonymous Coward Silver badge
        Big Brother

        Only if the deadline was the previous Friday.

        For me it tends to be a regular Friday, and the deadline is the same day.

        1. A.P. Veening Silver badge

          Only if the deadline was the previous Friday.

          Something like that caused me to redefine deadline:

          "Here is the line, drop dead."

          This caused some consternation with some of my colleagues as the one giving me the assignment with the impossible (already passed) deadline was the owner/director of the company. After his initial shock he couldn't disagree with my analysis.

    2. My-Handle Silver badge

      My current project was pitched to me with the phrase

      "It took the contracted UI expert five weeks to design it, hopefully it won't take that long to build"

      1. nintendoeats Silver badge

        They are effectively saying "hopefully it won't take as long to design as it took to design".

        1. My-Handle Silver badge

          Not sure of the nuance in your statement, but the man involved expected the project built and running in that timeframe. I may have paraphrased badly.

          1. doublelayer Silver badge

            I believe they meant that you would have to redesign large chunks of it in addition to building it.

            1. nintendoeats Silver badge

              What I meant was this: there is a philosophy (to which I prescribe) that software development is 100% a design exercise. When you write code, you are really specifying a program design to be executed by the computer. You must "design" a program that implements the requested UI.

              Thus is laid bare the logic of my original statement.

              1. Lil Endian Silver badge

                Yep. I differentiate programming and coding.

                Programming is platform/language independent (and can be done with pencil and paper). This is the design stage (the creative stage), after analysis, data modelling and HCI design.

                Coding is the conversion of the program design into the deliverable which has platform/language specific requirements.

                The statement "programming is platform/language independent" is a wee bit of a stretch, as IRL the target platform probably (almost certainly) does impact the design approach [1], but the programming/coding distinction is a good approach, especially for PFYs.

                [1] eg. embedded vs client/server

    3. Anonymous Coward
      Anonymous Coward

      Oh yes, even better when you're brought into the project to work UNDER the person who borked it so badly and you outshine them. Then it's a constant battle to try to avoid the borker's politics and mates from stitching you up in front of the management! All you have is the contractor's battle cry, "Hey look, I just came in to work on this project and then I'm gone permie!".

      Some days you wonder why you signed up to be in IT....oh yeah I know, the "spondoolics"! Nice to know you're sailing to early retirement from fixing crap code year in and year out while those who called you a nerd at school the in 1980s are sweating out a crappy insurance clerk job somewhere!

      1. My-Handle Silver badge

        To turn that on it's head, I almost wish that they'd bring in someone underneath me who outshone me. My projects might actually get finished quicker and I'd be damned if I couldn't learn something along the way.

  8. Anonymous Coward
    Anonymous Coward

    Re: The problem

    "the return of Who, Me? contributor Susan, who previously regaled us with a tale from two decades ago."

    "Susan's latest anecdote takes place in the months before Christmas 2001"

    Erm, guys, that might have been badly worded, but I might have some bad news as to which year was 2 decades ago if you think those are 2 different statements...

    1. Anonymous Coward
      Anonymous Coward

      Re: The problem

      Only a coder would be that pedantic about dates! Ha ha!

    2. Prst. V.Jeltz Silver badge

      Re: The problem

      Who said they were two different statements?

      both tales might be those months.

      just expressed two different ways... which is inconsistent and annoying

      stick to yyymmdd !

      1. Anonymous Coward Silver badge

        Re: The problem

        Three-digit years? Now THAT would be confusing!

        1. Stoneshop

          Re: The problem

          Way back I inherited a inventory/catalogue system for a record library (the black vinyl kind) that basically used 3-digit years. Due to space constraints dates were squeezed into two bytes, coded as 5 bits day, 4 bits month and 7 bits year counting from 1900. As this only concerned the date of purchase and in some cases the date of removal (archived, lost, whatever), so at best a dozen years or so back from 1985 this was quite sufficient; the ones who built that version rightly expected that larger harddisks would come along before 31-12-2027, allowing a less restricted format. And indeed, two years later we upgraded from a 20MB MFM hardcard sitting in an Amstrad 1512 to an Olivetti 386 with a whopping 80MB ESDI disk. And with that, dates were stored as 4 bytes, in Julian.

          A bigger problem was that the data records for that initial version overlaid the word field that the database toolbox used for its (singly) linked list of deleted records. That should not have caused problems, but under some circumstances a changed record was deleted (and thus added to that linked list), then written back into that same record number (instead of being re-added, which would likely have reused that record anyway, but keeping the linked list intact). That took a few evenings to chase down and fix.

          All this was in Turbo Pascal 3 initially, later TP4.

        2. bpfh

          Re: The problem

          Year 1000 issues. I'm sure some mainframes & COBOL programmers had to deal with it back then...

      2. diguz

        Re: The problem

        yyyymmdd? this is why i'm confused by english-speaking countries, UK and US mainly (i'm italian)... why the hell use the year first? i want to know first which day, then which month and finally which year someone's talking about. the use of a different measuring system crashed some planes (like the Gimli glider) and made NASA overshoot Mars by a good margin a couple decades ago...

        *rant over*

        1. Shred

          Re: The problem

          Two reasons spring to mind:

          a) It avoids the ambiguity caused by Americans insisting on using the truly bizarre, non-sensical mmddyyyy format

          b) It's really easy to sort into chronological order if the dates are yyymmdd format.

        2. WonkoTheSane

          Re: The problem

          Because ISO 8601?

  9. Dan 55 Silver badge

    Wait, VB applications can act as web servers?

    My mind is boggled.

    1. DJV Silver badge

      Re: Wait, VB applications can act as web servers?

      <marvinmode>Yes, depressing isn't it.</marvinmode>

    2. DJO Silver badge

      Re: Wait, VB applications can act as web servers?

      You can do almost everything with VB, that's not to say you should.

    3. doublelayer Silver badge

      Re: Wait, VB applications can act as web servers?

      Nowadays, there's a framework for any language you want that makes it act as a webserver. That doesn't mean you should use them for that purpose. Often, it's a basic webserver designed for debugging before you attach your backend to the real webserver, but not everybody adheres to that.

  10. Anonymous Coward
    Anonymous Coward

    Ahh, the joys of debugging on a Production system...

    I earned my wings as a Senior AIX Admin after being called in to help on a different account I was working at the time, maybe 10 years ago. The main TSM server would consume all of its memory (256GB) in a matter of hours and the only way to clear the situation was to reboot the box. TSM support had been summoned, there was a Sev 1 case opened, and for the mother of $DEITY everything seemed to be OK with the OS/TSM part. Hell, we even updated the frame, HBAs, NICs microcode to the latest level, as suggested by the IBM Power support, just in case.

    Almost a week went by, the customer and the account manager were dumping all the blame on me and the TSM Top Gun (hey Reggie, I hope you are doing well these days, wherever you are!). I went through every single memory leak debug routine and procedure, and it simply kept happening: it was almost as if the dsmserv binary would engulf a chunk of RAM and never gave it back, until we rebooted the box.

    Then, early on the Friday morning, out of the blue, I decided to look at the directory where the TSM binaries were installed, and something stroke me immediately: there was a fucking copy of libc sitting there, along with dsmserv. The goddamn application had been compiled with . (the current directory) way too early in the LIBPATH variable, and it was reporting its memory block releases to that local copy, so the kernel never learned about that memory being freed.

    I deleted the libc copy along with everything not strictly necessary for TSM to operate, rebooted the box and BINGO! we were back to regular memory consupmtion values.

    Turns out some of the TSM admins were trying to debug an I/O error with some of the tape libraries and a previous TSM support rep told them to copy libc and other libraries to the TSM home dir and run some scripts, maybe two weeks ago. They never mentioned that other support case to me or the TSM top gun, a fact I made sure both the customer and the upline manager learned about.

    Beers were virtually shared that very Friday over a webex session with the TSM top gun, my manager, and the customer's infrastructure leader, at 11AM. Maybe too early for a regular work day, but completely OK after a week of almost 24/7 work to solve the issues.

    1. MarkET

      Re: Ahh, the joys of debugging on a Production system...

      Just interested - what has the LIBPATH order got to do with memory management on AIX?

      1. Anonymous Coward
        Anonymous Coward

        Re: Ahh, the joys of debugging on a Production system...

        The TSM binaries were compiled using the native C library, which includes a lot of low level system calls. They're used to interact with the kernel, and it seems the dsmserv bynary was compiled to look for the libraries in its own location before looking at the default LIBPATH. Therefore, the binary was reporting its memory requests to the local libc, which in turn weren't even noticed by the kernel.

        1. Anonymous Coward
          Anonymous Coward

          Re: Ahh, the joys of debugging on a Production system...

          Um, so the two different copies of 'standard' libc used two completely different methods of doing memory management?

          This story makes me so *glad* I never dealt with AIX.

          1. Anonymous Coward
            Anonymous Coward

            Re: Ahh, the joys of debugging on a Production system...

            Yup. I think it's actually a case of bad coding on the application side. it opens it to any number of local spoofing attacks. The development method seems entrenched in the ole days of no internet/network connections.

    2. Old Used Programmer

      Re: Ahh, the joys of debugging on a Production system...

      On an IBM 360/30 running DOS I worked with (yes, it was a *long* time ago), every once in a while, the system would just stop. I wandered down to the machine room to look at the console. The Run Control switch was set to "Stop" (instead of "Run").

      I asked why.

      I was told that an IBM CE told them to set it that way when he was debugging a problem with the box. It had never been changed back.

      The reason it was being triggered was that they had plug-replacement 2314 disk drives that ran at 312 KB/s and tape drives at 200 KB/s. Both connected to the same--single--selector channel. Said channel being rated for a total data rate of 500 KB/s...

  11. theoa

    'Ever been called to fix a problem that was very much somebody else's work?

    "'Ever been called to fix a problem that was very much somebody else's work?"

    Sure, just like everyone who worked in IT longer than 2 days.

    And sometimes that "somebody else" was me from months or years ago.

    1. Anonymous Coward
      Anonymous Coward

      Re: 'Ever been called to fix a problem that was very much somebody else's work?

      We are all Taw-Jieh and I claim my 5 pounds.

      Particularly see the hovertext on the calendar-pages image.

      1. Lil Endian Silver badge

        Re: 'Ever been called to fix a problem that was very much somebody else's work?

        If I have one solid reason for documenting my code, it's so that on Monday I could understand what I was trying to do the previous Friday afternoon. Friday lunchtimes often went from 11am to 3pm when contracting at the local council!

    2. Anonymous Coward
      Anonymous Coward

      Re: 'Ever been called to fix a problem that was very much somebody else's work?

      Light on details because otherwise anonymous would be a bit pointless. Remind me to write the full thing as an on call in 10 years. Ah, right, anonymous, never mind then.

      Some time before event: had helped another employee, who had then moved on, to write a thing that did a thing. Said thing was moderately complicated and required a bit of domain knowledge.

      Slightly shorter time before event: someone else is brought in to take over thing and move it to a different framework. I am sidelined off the project.

      A few months before event: someone else moves on.

      Event: I return blissfully from a couple of weeks holiday to find that I have been assigned in-absentia to add a feature to thing. (Thing is no longer my responsibility.)

      Dig into thing, discover it does not operate as it should. There are even chunks of code in the original language that have just been lifted over as comments and not implemented. What thing is meant to be doing has not really been understood. Spend nearly a year fixing thing while also trying to do actual job.

  12. Kev99 Silver badge

    This story reminded me of when I was taking a Basic & RPG course at university. Another student forgot to insert a breakpoint in his sample code. So instead of the program printing out a simple page or two of greenbar the program went into an endless loop and cranked through a case of greenbar. If the course moderator had taken a couple minutes longer at lunch who knows what could have happened.

    1. mmlj4

      off-topic, but...

      I had a FORTRAN prof at LSU that told us the printer shed guys really didn't like it when you squelched the line feed and printed the same line a few times in succession ;-)

    2. Manolo

      You got courses in Rocket Propelled Grenades?

      Cool, where do I sign up?

      1. Andy A

        Nah. Role Playing Games.

        1. WonkoTheSane

          @Kev99 is in a maze of twisty, little passages, all alike.

  13. Prst. V.Jeltz Silver badge


    Checking the option would ensure the outputted binary was compatible with its predecessors.

    Stuff like that is exactly what they never teach you in any kind of coding class .

    You learn in a compiler or dev environment , see some output , learn the language , then .... class dismissed!

  14. MrBanana Silver badge

    Which binary?

    I was just the database guy, onsite trying to install something, but the customer was having a problem rebooting the Unix system (Pyramid I think, yes I'm that old). Single user mode struggled into existence, but multiuser mode fell apart. Lots of head scratching as to why so many init scripts were failing. Then I spotted a file called test.c containing the ever classic: 'main(int argc, char *argv[]){printf("Hello World\n");}'. It had its matching binary 'test', from when the root user had run 'make test'. This was in /bin. Also still linked to '/bin/[', before the days of shell aliases. Guaranteed to break pretty much any shell script. Easily fixed, but no one at the customer site held up their hand.

  15. DS999 Silver badge

    I was once called back to fix a "problem" with my own scripts

    At the tail end of a project, I had created some pretty extensive shell scripts I'd built up over time to automate some stuff they'd been doing manually at the cost of many man hours (and more than a few mistakes since they were tedious man hours)

    I documented the crap out of it all but told them that I couldn't guarantee I had accounted for everything, but the scripts would catch unexpected conditions and print an error message so you'd know why it had terminated and how far it had got so things could be fixed and/or completed manually. Just cautioned them that if anyone made any changes to make sure they knew what they were doing.

    They were humming along without needing any change for the last few weeks of my contract so I didn't expect to hear of any problems, and assumed they would handle them if they ran into anything anyway. About a year after I got a call from the guy running that project I'd been consulting for, telling me that ever since they'd simply upgraded the OS the scripts fail to run without any error messages. They had a new consultant in place who had looked at it for a couple weeks and couldn't figure out what was going on, and was having to refer to the scripts to do all the manual steps each day.

    So after a few days when they got me access I took a look. As he said the logs were empty, it was as if the scripts were not running at all. I looked at them, the modification dates were unchanged, no one had broken them. What could it be? I waited until the late hour when they were supposed to run, to see if I could find any clues but nothing happened. I checked the last modify time (which would indicate they had been read/executed) on the inodes, and they hadn't been changed. I manually ran them, and everything worked perfectly.

    So what was it? Turns out, when they upgraded the OS they reinstalled from scratch and copied everything over. Everything that is except for the crontab entries that triggered the scripts! They had a guy looking at this for a couple weeks, and he hadn't thought to check that anything was even ATTEMPTING to run them, or to try to run the scripts manually - he'd been performing all the steps they did himself (which takes 3-4 hours) He was either an idiot, or a genius (getting 3-4 hours of extra billable time overnight for two weeks while feigning ignorance about the problem)

  16. Anonymous Coward
    Anonymous Coward

    I didn't do much "broken, please fix" work during my contracting career, but I had a lot of "unusably slow, please tune" contracts. Deployment plans were a given in my world, so that never really became an issue. There were often one or two people dedicated just to doing team builds and deployments.

    My best tuning effort was to take a C-code Oracle report that was taking over 24 hours to run (they didn't know how long it would take to finish because the servers rebooted at midnight every night.) I got it to run in under 20 minutes.

    It is amazing how slow you can make what was once a fast report by throwing copy-paste coders at the enhancements who copy-paste the entire query and loop of the report calculations for each value they want to calculate instead of coalescing the calculations into one pass.

    Nowadays a similar peeve I have is over-abstraction of code, where the nesting often gets more than 15 layers deep to service what would normally be a simple method. I've seen programmers who write 2-3 line "methods" and think that makes them "good programmers." Unless it makes code more obvious in function and more readable, abstraction should be used where NECESSARY, not as a general coding philosophy.

    1. Lilolefrostback

      I do, on the whole, like the idea of shorter, simpler functions. However, out of curiosity, I recently wrote a program twice - once to just get it to work and once using the "shorter, simpler" paradigm. The former took maybe two hours (tested, documented, deployed). The latter took two days (tested, documented, deployed) and was more than twice as long.

      Sometimes it's better to be simple about things.

      NB: for a much larger program, shorter simpler functions would probably be desirable.

      1. Anonymous Coward
        Anonymous Coward

        About 15 years ago I was a student and spent a lot of time programming a TI graphing calculator. (Excellent training - it was space AND speed limited, so any nontrivial program HAD to be optimized, by hand since there was no other option.) I wrote a maze-generating program. It took forever to run, and frequently made mazes with no path through them. So I rewrote it, and it was much faster and generally solveable. So I rewrote it again, and it was faster still and generated a maze with exactly one solution and no loops every time. Rewriting it allowed me to not only improve it but learn better ways of doing things.

  17. xyz123 Bronze badge

    I worked for a fruitbased phone company. Hired to fix some iCloud security issues. Basically employees could easily access ANY MeCloud account and do whatever they wanted with it. Pointed out security flaws in system. Was promptly let go on full pay as my solution would have closed the hole AND logged whoever tried to access cloud accounts internally...managers wanted to be able to look at celebrity nudes, but block employees from same.

    Think the whole Jennifer Lawrence thing was external? there's every change it was internal.....

    1. heyrick Silver badge

      I think the obvious question here ought to be - why aren't uploaded photos encrypted using the account password or somesuch?

      1. TimMaher Silver badge

        Fruitbased company

        Especially as they now appear to be scanning your phone for pervy pics.

        Mine’s the brown mac with the string belt.

  18. I Am Spartacus

    Got to love VB. Keeping contractors employed for decades

    My best one is when I inherited a complete trading and logistics system written in VBA with an Excel front end. Oh, and with approx 20 inter-linked spreadsheets.

    Nightmare. It was still running when I left, because the team that used it actually loved the fact that it was built for them, by a contractor they employed directly. The first IT knew of it's existence was when the contractor left and they had to ask IT to take on support. My attempts to have it rewritten in to something more supportable where not accepted.

  19. aldolo

    very common ...

    done that many times. but without breakpoint.

  20. Stuart Castle Silver badge

    Not, strictly speaking, fixing someone else’s problem, but a few years ago, when I was doing tech support in a computer lab, I got a user coming in asking about a problem with PowerPoint. I asked her to show me the problem, so she took me to the computer she was logged into. In a lab, over the other side of the building, supported by another team. She said they’d asked her to come see me (specifically).

    Not being happy that someone had assigned me some of their work, I walked into their office asking why this user had been sent to me. The team manager explained that the user had asked about “multimedia” (they counted PowerPoint as multimedia) and as my job had “media” in the title, they assumed it was my responsibility. I pointed out the PowerPoint is part of Microsoft Office, something *they* were responsible for. The manager apologised, and took charge of supporting the user. Apparently their reason for assuming I was responsible for supporting PowerPoint? The fact it could play audio and video. Something which even then, most windows applications could.

  21. VE3ID

    Somebody else's disaster

    If the disaster is in hardware, much harder to fix! I was once doing tech support for a distributor of DEC-compatible add-ons. Sold an 8-port DZ11 to a government agency where all techs had to have 3-year College Diploma in electronics. They installed it, and didn't work. Getting them to scope the lines over the phone, nothing made sense. Site visit needed. Problem was obvious on arrival - the panel had male DB25s and the user needed female. Not understanding the difference between DTE and DCE, he took 16 connectors, and soldered them back-to back to make 8 adapters. Unfortunately, this connected pin 1 to 13, 2 to 12, 3 to 11 etc. Not sure if he unsoldered 16 times 25 wires or threw them out and ordered new - not my problem as I left after testing it with loop-backs to make sure he hadn't blown anything!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like