back to article No change control? Without suitable planning, a change can be as good as an arrest

Anyone who has worked in medium or large organisations will know that there are three levels of change control when it comes to code: (a) the organisation doesn’t have any, (b) the organisation has change control but does it sub-optimally, and (c) change is managed well. Anyone who has worked under more than one of these three …

  1. Headley_Grange Silver badge

    Change vs Configuration

    Getting an organization to think in terms of configuration management instead of change management is key, in my experience. Change is just one aspect of configuration management and if you can get the whole team - including marketing, support and senior management - to understand the configuration needs of a product it certainly makes change management easier and will reduce change and support costs.

  2. Mike 137 Silver badge

    Not just about code

    I once worked in an organisation that had just implemented a shiny new computerised "change control system". It allowed submission of change requests in entirely free form except for a row of loosely specified compliance-specific tick boxes (e.g. "PCI-DSS relevant?") that relied on the understanding of the submitter and were not verified thereafter. The requests were then discussed for approval at weekly change control meetings. So the big problem was that the scope of discussion (and therefore the approval decision) was limited by the requester's terms of reference. This was exacerbated by introduction of "standard changes" - broad categories of change such as "firewall rule" that could be prioritised for fast implementation with almost zero discussion. So everyone started defining their requests in terms of one of the standard changes in order to expedite their requests.

    I was the first to alert to the problem, when a "firewall rule standard change" turned out to be required to permit an otherwise unreported new POS terminal to connect to the card data processing core. I pointed out that this affected the PCI-DSS scope, and therefore could not be considered in isolation as a standard change. I made myself very unpopular and change control rolled on as before.

    So as well as Dave's three levels, there's another - believing you have change control but failing to recognise that it doesn't work.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not just about code

      "So everyone started defining their requests in terms of one of the standard changes in order to expedite their requests."

      I do this as much as possible.

      its funny how much can be done linking enough different standard changes.

      CAB changes are a total pain, requiring finding out who needs to authorise some thing and then getting to approve the change all prior for it to be then discussed at CAB.

      I'd rather right the change and then submit to CAB to discuss & then they can then gain the approvals from the business and notify me when I can do the change for them.

  3. My other car WAS an IAV Stryker

    ALL engineering needs configuration management of the design, whether hardware or software/firmware.

    I've worked major projects with "full" but inefficient CM that tied everyone up in meetings and layers of problem report (PR) > change request (CR) > change notice (CN) with the Change Review Board (CRB) involved at each step. This was in parallel with the Problem Review Board (PRB) that handled Test Incident Reports (TIRs), assigning them for engineer analysis and discussing/approving the results, leading to issuing Failure Analysis & Corrective Action Reports (FACARs) to the customer. Often the FACAR's corrective action approved by PRB included the PR fed straight to the CRB for action. Most of 2014-2017 was tied up in one or both of the processes (lots of meetings) and it was all tracked in Siemens Teamcenter.

    I've done other tasks -- both at that same employer and my current one -- where engineers are getting hands-on doing the work of technicians and almost nothing is being documented. Wire harnesses, mechanical brackets, fluid fittings/hoses/pipes are being built and fit as needed and 3D models, 2D schematics, and software code/config lists are woefully incomplete. As such, when something goes wrong -- which it always does -- it's impossible for anyone from the outside to track down.

    I've seen proper CM performed yet ignored when the "shop boys" grab a transmission control module (about the size of a large USB hard drive) to go with the new, upgraded powerpack only to find during road test that it shifts funny (or not at all under full throttle) since it had the config of the previous engine (different redline speed) and no one checked the vendor's part number on the TCM label which literally included the configuration identifier. I quickly identified the problem and all was resolved quickly but I also learned I had to check it myself before each prototype left the high bay.

    Just in the last few months I have a manager who demands changes quickly, but doesn't allow for proper checks of the changes, then often enough sends the wrong (older) version out for customer review. It's quite embarrassing in review meetings to point out mistakes, especially his mistakes, and I'm certainly not earning any respect from him or customer but someone has to be focused on what's technically correct.

    Tech teams of all flavors -- engineers, technicians, architects, scientists, and management -- who aren't doing CM at all -- or poor CM, or proper CM that gets ignored during build -- are not doing their company any favors. Proper review boards, or for smaller teams a single final approver, is a good call that I rarely see.

  4. doublelayer Silver badge

    And also don't be simplistic

    "Anyone who has worked in medium or large organisations will know that there are three levels of change control when it comes to code: (a) the organisation doesn’t have any, (b) the organisation has change control but does it sub-optimally, and (c) change is managed well."

    And anyone who has worked in more than one knows that there are a lot more than three options and there's not a nicely compartmentalized right one. What the article lumps together as option B includes a lot of different ways to do change control wrong which have no similarity to one another. It's not three buckets. If we're being simple, it's a one-dimensional scale with the best points being somewhere in the middle.

    You can have no change control. You can have change control which doesn't require notification of others or thorough attention to the required steps. That's what the article mostly talks about when it's describing incorrect application. But you can also have change control which is too strong, either because nothing can get done because change control is too onerous (and if that happens, don't expect stagnation, expect circumvention), or change control which puts a lot of responsibility on people unrelated to the change requiring a lot of explanation of the change to people who won't understand it and certainly won't identify problems. Or you could have change control which is implemented correctly in the sense that changes have to be reviewed but is incorrect because the focus is on approval by committee and not the method of anticipating or responding to problems.

  5. IGotOut Silver badge

    You forgot the 4th

    So completely over the top, the incredibly minor change requires 4 hrs of admin, 6 reschedules (due to Dave being on holiday, the person didn't read it in time or the cleaners cousin found a puppy and needs to get a collar for it), 50 approvals, of which you will have to chase 49 of them at least 3 times, the endless "what does this bit mean (usually from some manager who wouldn't understand if you smashed them over the head with it written in crayon on a baseball bat) and finally, 5 minutes before you go live, it gets cancelled because Mary from accounts forgot she has an urgent appointment to attend (the hair dressers).

    Oh did I mention you can't resubmit the thing again, for "security reasons" meaning you have to type everything all over again.

    /rantover

    1. Ken G Silver badge

      Re: You forgot the 4th

      This is sometimes appropriate, when the benefits for any change are small and the likely impact of any problem is large, such as when it's a heritage system that was tuned for decades and there are only a few people left in the company who know the technology.

      1. Tomato42
        Meh

        Re: You forgot the 4th

        you really think anybody would go through such ordeal to fix a typo or do any other small changes?!

  6. Dante Alighieri
    Thumb Up

    Not just IT

    I work in a service industry with some highly technical kit and procedures.

    At present we allow anyone to change the "recipe" of how we do common things or leave the technical staff to guess which recipe to use.

    Really helpful article for me. I've been considering how we do change management and this has given me some ideas.

    I will propose a change board which should lock out the randomness and at least give us a fighting chance of making and reverting change in multiple areas.

    Should reduce the human death rate. (not a joke)

    1. TDog

      Re: Not just IT

      Then if you have the authority - learn about change control and thoroughly understand it; lock down your change process till you understand it and kick the arses of those twats who have not even thought about it. If you don't have the authority and you are serious about this putting lives at risk either go directly to your chief executive officer and tell what is happening or become a whistleblower which may cost you your job but save lives.

      Seriously, what sort of organisation is this fucked up after all of the other public examples? And I worked in the NHS and have seen total fuck ups.

    2. Anonymous Coward
      Go

      Re: Not just IT

      > At present we allow anyone to change the "recipe" of how we do common things or leave the technical staff to guess which recipe to use.

      Introducing change control will help you document what has been done but, for things like deciding which recipe to use in the first place, something as simple as a decision tree might be 90% of what you need.

      The technical staff follow the decision tree when deciding which recipe to use so that the choice can be justified later if needs be.

  7. Do Not Fold Spindle Mutilate
    Flame

    Level 99: Managment actively tries to stop controling changes.

    I am retired because:

    1) Management said give all the passwords to production to the consultant. Your job is not to look over his shoulder. You will start supporting the application in production at 8 am tomorrow.

    2) Management threatened to have me fired when I was clearly stating that the disaster recovery site cannot recover our production systems.

    3) Management chose to not put quantity and quality into the testing procedures. I was told that if the change does not work in production it should be rolled back. Management did not understand that rolling back the change would require a service outage of more than 24 hours and did not know if the user's application could handle an outage of that length.

    (The manager responsible for disaster recovery was eventually fired. A whole bunch of professional dbas left because of management wanting speed not quality.)

    1. Anonymous Coward
      Anonymous Coward

      Re: Level 99: Managment actively tries to stop controling changes.

      @Do_Not_Fold_Spindle_Mutilate

      Quote: "....management wanting speed not quality...."

      1. PROCESS

      In a financial service company, long ago and far away, the Operations Director told me that "We don't need any f***in' process."

      2. MODERN PROCESS

      Ah.....now why is is that there are no comments here about "agile", "scrum", "devops".........

      .......and other "modern" practices?

  8. blah@blag.com

    CM is hard

    So hard in fact that we had to implement it twice, but we got there. Lessons were learned ...

    - Different parts of the IT function do things their own way, it's best to match their process as close as possible while introducing the extra controls.

    - Dealing with regulation (Japan SOX in our case) introduces extra bureaucratic burden, we decreased this burden by defining Pre-Approved Changes for regular defined operations which could be applied by certain people and then post-approved/reviwed by the management team.

    - Writing changes, test documentation, roll-back plans is very hard for a lot of people. Having structured documentation and reference examples is essential.

    - Always be optimising your processes ... until they are good enough, then stop.

    - Really, seriously, examining your processes gives you massive insights into how your dept works and where changes are needed.

    - No company/dept/section/team is the same, there is no one size fits all set of processes, if your software can't match how you work then change your software*

    - User training is vital.

    - ... many others.

    When you get it right there are big benefits in overall operational efficiency and accuracy, the extra reporting we got to implement gave us some very nice insights. There were less incidents, faster turnaround of changes, which let us all work more on interesting projects and play with shiny toys.

    * I spent 6 months reviewing over 60 different ITIL packages, the final 3 were ServiceNow, BMC and IBM's offering. ServiceNow was by far the best, it wasn't a painless transition but worth it, we dumped an old ITIL package which we had pushed to it's limits and a vendor that couldn't deliver. Whichever software you go for, you'll only get out of it what you put in. Implementing something like this requires effort from the whole dept and more importantly their buy-in to get it to work, so that means you need good leadership from the start.

    1. Anonymous Coward
      Anonymous Coward

      Re: CM is hard

      ServiceNow was the best? Either the rest must be complete and utter garbage or we are using ServiceNow very badly!

      1. Anonymous Coward
        Anonymous Coward

        Re: CM is hard

        Years ago I worked at a client that used Remedy. It's one of those products that is tailored to the the CxO procurement process: does it have this? Yes. Tick. Does it have the other? Yes. Tick.

        Consequently there was one screen with about 3 fields actually dedicated to taking info from the person with the problem and allowing the analysts to make notes. Reading other people's notes was, of course, incredibly tedious so no one ever did it when a punter phoned-up to chase a call.

        On the other hand there were at least 93 fields split over multiple screens dedicated to giving statistics on calls - time open, whether the SLA was being breached, etc etc.

        If only they'd put that much effort into the screens that supported the end user experience,

  9. TDog

    Different Places

    I recollect going to a CAB at aT large insurance company and putting together a document following their standard model. It was actually quite a good model and forced me to consider some aspects I would not otherwise have done. At the end of the presentation (about 50 minutes) and subsequent grilling it was agreed to go ahead. As I left the room one of the CAB members wished me "good luck".

    I went mildly ballistic pointing out that had I done my job properly there would be no luck required or involved (I was about 50 then and still cringe at the memory) and their job was not to wish me "good luck" but to bloody well insure I didn't need it. That got a few startled looks.

    Of course no one had informed me of, nor noticed that the SQL "USE XYZ" statement was obsolete as they had altered their server naming convention about 3 months before and they hadn't got round to updating their documentation. (in fairness making changes to databases was seriously frowned upon and all of the hoops were about how, why and fuck me, what will this do to us) so even creating another table and not altering existing processes was hard work (as it should be).

    Fortunately the DBA who ran the script with me (and had been at the CAB) burst out laughing when it failed and a swift amendment in notepad saved my face.

    But any CAB that thinks its job is to wish owners of potential changes "good luck" has a totally inappropriate view of their role. They are there to ensure that luck is not needed.

    (Bit of a rant but true)

  10. Logiker72

    Aerospace, Medical, Automotive, Railways

    These industries manage to specify, design, test and produce rather reliable and safe machinery. Including lots of complex software such as ABS brakes and flight control computers with full authority to move control surfaces, engine valves etc.

    Boeing messed up MCAS, but thousands of other control unit types (and dozens of millions of instances) nicely work in cars, airplanes, patient monitors and so on. Most of us use an ABS brake every day.

    So we know almost perfect quality is possible. It is a matter of documentation, skilled engineers, sufficient and skilled testing. Sufficient project funding and time, of course. That is the good news.

    Bad news is that non-safety critical development operations are managed by cheapskates and idiots in most instances.

    Some further reading:

    https://en.wikipedia.org/wiki/V-Model

    https://en.wikipedia.org/wiki/ISO_26262

    1. Logiker72

      Re: Aerospace, Medical, Automotive, Railways

      Essentially, a safety-critical control unit does have an automated test system (often call HIL Simulator) which is testing each and every feature of the control unit, as specified in the requirements document. So for each piece of software there exists a mirroring piece of test case software in the HIL system.

      https://de.wikipedia.org/wiki/Hardware_in_the_Loop

      Each requirement is numbered and change controlled.

      New requirements must be assessed. documented and tests written for the automated test.

      Each new SW release must be fully tested against the automated test battery before it is tested in vehicle/test patient/aircraft.

      Critical modules of software must have a sufficient battery of Module/Unit Tests to check the Module requirments. Which are also nicely written down in a system like DOORS or Polarion.

      All test cases must be green before released into service with normal passengers/patients.

      If you do this for IT-class software, you can have similar quality levels.

      1. Logiker72

        Re: Aerospace, Medical, Automotive, Railways

        https://en.wikipedia.org/wiki/Flight_control_modes

        http://peterklant.de/wp-content/uploads/2019/05/PF-2019-03-A380-Flightcontrol-System.pdf

        https://en.wikipedia.org/wiki/DO-178B

        https://en.wikipedia.org/wiki/Software_verification

  11. Lucy in the Sky (with Diamonds)

    Deep down, I seek out enablers…

    I have spent thirty years contracting to big corporates, and I would not have done it if there were no great enjoyment, which there was.

    The only suffering that I have encountered was the change control process at some places, which were painful to say the least, on one occasion I have spent six months trying to shepherd a retrospective change through the channels, regarding a server that I have restarted months ago, but none of the stakeholders were willing to allow me to restart the server. I could not get through to them that it has already happened and they flatly refused to sign off on my retrospective change.

    Other places, they looked at me and said, “you are the senior engineer, you want to change stuff, go ahead, it is your job, let us be…”

    Deep down, I seek out enablers…

  12. Joseba4242

    Risk Questions

    The two risk questions typically asked (and postulated in the article) are "how likely is it to go wrong" and "if how what's the impact".

    These questions have a fundamental flaws. Firstly, there isn't a single dimension. A typical change has some low-likelihood change of something going a little bit wrong, and a very-low-likelihood chance of going very seriously wrong. These can't be put into a single answer.

    Secondly, both of these questions are next to impossible to answer objectively and hence with a degree of repeatability. Different engineers will, perfectly legitimately and competently, come to different answers. For example one engineer might consider a typo in applying a change with impact to one particular service as the impact to focus on as it's the most common scenario. Another one would instead consider triggering an unknown software defect that impact a whole host of otherwise not directly related services, as this is the worst case impact. In either case it's difficult to see how "likelihood" could be objectively described.

    Thirdly, these questions put the focus in the wrong place. It focuses on the expected outcome of the change so encourages not just wishful thinking but also focus on the "expected unexpected" outcomes.

    I am advocating two different questions instead: "How quickly do you notice a problem" and "how quickly and reliably can you roll back". These two questions are considerably more objective, and they drive good behaviours such as focusing on monitoring which might otherwise be overlooked. Crucially it encourages to think about worst-case service restoration which is often most relevant for the business - think about 5min worst-case impact vs. hour of worst-case impact. So these questions focus on dealing with the biggest issues in a mature change control environment - the "unexpected unexpected".

  13. Giles C Silver badge

    My approach

    Having spent far too many changes trying to fix problems caused by said change I now go by the following for the documentation for a change.

    Now I am a network and firewall admin, so…..

    Removing firewall rules

    - make sure there is a document detailing all the rules you are removing so if one rule that was showing 0 hits in the last year is needed next week, then it is in the document to be copied and reinstated.

    Adding firewall rules

    - same applies document everything.

    Routing changes

    - put in the output of the show route command, highlight the lines to be changes, list the commands and how to check and roll back if it doesn’t work.

    - if changing external bgp use something like looking glass to verify the change has gone global

    Change procedure

    - the most important one make sure the procedure is written so that an idiot (you) who is half asleep (3am change start) can follow the instructions without needing another document or anything else to refer to. Full rollback plans are included along with the tests that prove disprove it is working and what to do (PANIC!,!,,!,,,!)

    Then I ensure that I have talked to the relevant people, got them to approve the change and cab should be a formality.

    And the most important bit is the change window, which is calculated on the simple formula.

    (Est time) x 2 is the change window.

    If you think the change will take an hour, factor 2 if you think it will take 3 hours factor 6. If you do it quicker that is good, if you overrun that is bad.

    that extra time is for rolling back

  14. uccsoundman

    We don't do change control and you can't make us.

    A long story can be summed up by this: "We don't need to test, and we will not submit to change control. We're Agile, and agile can ignore all processes". Trouble is the PHB is only worried about "speed to market".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon