back to article Updating in production, like a boss

Testing in production has always been a thing, sometimes by accident and sometimes because the powers that be cannot be bothered with multiple environments. And sometimes things go wrong. Welcome to Who, Me? Our tale, from a reader Regomised as Nicolás, takes us back through the decades and to the glory days of Microsoft SQL …

  1. Dave K

    No test environment and no backups - yet important enough to make national news if it went TITSUP? Assuming this story is indeed true, the management involved redefines the meaning of the word "incompetent".

    1. Anonymous Custard
      Headmaster

      It's the kind of scenario where I'd be tempted to wait for half an hour without doing anything affecting the db, then take it offline and ring back and tell them something disastrously bad had happened to it.

      Then pause and comment - "At least that's what may happen if we do this live on your production database without any backups. Now do you want to reconsider this scenario...?"

      Management seem to like their role playing training exercises, so maybe spring one on them to give them a proper taste of the risks they're taking?

      1. big_D Silver badge

        The problem is, you try that and the answer is usually, "fine, we'll find someone cheaper who will do it without making unreasonable demands!"

        1. DJV Silver badge

          Good idea

          It then becomes SEP.

      2. Bogbody

        You are much more likely to be offered

        G.O.N.

        or

        D.C.T.

        Or, if very lucky....

        D.C.M.

    2. big_D Silver badge

      Unfortunately, it happens all to often.

      I worked at a software company that sold turn-key systems to customers. But there was only the live environment, all the testing was done in-house on our development system. Updates were just applied to the customer systems.

      And, if something went wrong on the customer system, well, there was a SQL window with a hidden keystroke to open it, where us support people could really balls things up, if we weren't careful. There were lists of commands for common faults, but you still needed to enter the right transaction numbers and parameter values, mistype one of those, or miss one out altogether, and you'd have an unholy mess on your hands. A colleague and I used to always double and triple check each other's commands, before we would execute them.

      It happened regularly, that the users would make a mistake and we'd have to hand-craft SQL to undo the damage they'd done. No way to test, no backups (they cost money), just get on with it.

      Luckily, I've now moved on and I have the luxury of a test environment and direct access to the backup server to make an image, before doing anything that could have unforeseen consequences.

    3. Sgt_Oddball
      Mushroom

      Strangely enough...

      Everywhere I've worked that had no backups/test server usually got one quite shortly after I arrived.

      Must be a pure coincidence I'm sure.......

      (nuke icon, because I've yet to actually leave a smoking ruin. Yet....)

      1. codejunky Silver badge

        Re: Strangely enough...

        @Sgt_Oddball

        "Everywhere I've worked that had no backups/test server usually got one quite shortly after I arrived."

        One place I worked at had a test server right up until we got a new manager who didnt order one when we were upgrading to a new production server (no matter how many times I told him we needed it). Shockingly it went to hell quickly as it appeared this manager didnt actually do anything of any use.

        I left when they decided he would stay.

    4. anothercynic Silver badge

      Well, it took me a while in a previous job to manage to get a pre-production (staging) environment approved... Because the people in charge felt it was a waste of resources and since the service was new, people wouldn't mind it having (breaking) updates made to it. Lesson learned from that slow-motion car wreck: When you debut a new service, make sure it's stable, end of. When you make breaking changes, inform the users you do have to help them make the migration painless. And for God's sake, do NOT, I repeat, do NOT use the open source style "oh well, if it breaks, meh, they're not paying for it" mantra, for it will cut the service off at the knees while you're trying to get it adopted.

    5. Anonymous Coward
      Anonymous Coward

      Hi

      "Nicolas speaking" : You'd be surprised at the way Spanish IT firms worked those days. It has improved a little since those days. At least you can get at least get 1 backup these days.

      We didn't get a proper backup because the only storage disk dedicated to that very own purpose was failing and had been sent back for replacement. It took 1 month to get a new one. We were ORDERED to keep working and had it signed on paper. It wouldn't have mattered...we would still have been considered guilty.

      Event today, getting a "spare" environment or storage space for large backups can be tough as it seems easier for management to pay 1K € on a bowling event, and lots of drinks, than the same amount on temporary cloud storage. If you "don't play ball" they hire someone else.

      We inherited the database, it was CRAP, many missing indexes, foreign keys, duplicated data, not properly normalized, but it did it's job...in a way. Performance was so terrible that some annual legally mandatory reports took 4 hours to load, (it went down to 3 minutes after fixing).

      The fact that data was duplicated saved our day. Basically all daily operations, no matter which ones, were inserted into a large monster table were they were later processed and put into its own place, (acted like a gigantic log file).

      In case you still are in disbelief about current state of the Spanish IT art I suggest you check recent events and take a look at how hackers owned many public services systems in the last 24 months.

      IT managers in Spain are usually more interested in their own short term objectives, looking good on pictures, dating the occasional good looking guy/gal and taking credit for some others people's achievements, and saying the are "women/LGTBi friendly" if they can't they will say a friend of his/her is the one who did it.... (I know of a manager who got pregnant while tutoring an intern....)

      They can't differentiate a windows service/unix daemon from a web service, but who cares?

      If you complain, you are branded as a bad team player and will suffer the consequences.

      Even if you have the luxury of having a HR formal enquiry and are found not to be guilty....you will be punished, unofficially.

      (I was once punished for making the right numbers when asked about Azure storage costs...they wanted it cheaper...numbers were not important --> I was branded as not being a Team Player and not having the companies interest in my mind).

      So this little story while it may seem outlandish was not that outlandish in Spain in the late nineties

      Regards

      "Nicolas"

  2. Nick Ryan Silver badge

    I'd be almost rich if I had £1/$1 for every time I've seen database uses which perfectly demonstrate that the developer didn't have a clue about SQL whatsoever, let alone the specifics of MS-SQL.

    From horrors such as an absolute lack of referential integrity (no linked tables whatsoever), to company owners who insisted on browsing the SQL data directory to open up the "tables" directly and so often, the devopers who just do not understand that SQL operations are set based and not procedural.

    ...and the perpetual bugbear? No underlying fallback to a unique sort order in display results. Want to order by name? Fine, but make the last sort order column a unique record ID to ensure that the search results are consistent.

    1. My-Handle

      DING!

      Every one of your points there hit home, perfectly describing the mess I walked into 5 years ago. Admittedly, I didn't know a whole lot about SQL Server databases at the time, but even I boggled at the fact that the database in question had nearly 100 tables, none of which were linked. To top it off, the database designer had used an aggressively large varchar for every field, regardless of the actual data type of the contents (including yes / no). The code that went with it was worse.

      Excuse me, I need to go somewhere quiet until the PTSD flashbacks go away...

      1. Nick Ryan Silver badge

        Aggressively large varchar is not as bad as my current issue... a database where the clueless developers have used varchar(max) for almost every damn char based column. Gender recorded as a single char? Use a nvarchar(max)...

        nvarchar(max) has its places, but using is for everything is ridiculous for performance reasons. Fetching a column that is nvarchar(max) takes about 7-8 times longer compared to where the data is stored inline. And this is for a database where the database server has been given enough RAM to load the entire database into memory.

        1. Loyal Commenter Silver badge

          nvarchar(max) is not as bad as you might think in terms of performance. If the data is small (for example 'Y'/'N') then it is stored in row, not off-page.

          However, there are other fundamental reasons why it is the wrong field type if you know the maximum length of the data. In the case of a single character for gender, or a short identifying code (e.g. 8 character account number), storing it in the right field type (in these cases, char(1), or varchar(8)) means you can include it in an index*. nvarchar(max) cannot be indexed, so if you have a transactional table with a lot of records where you need to retrieve or update a single record quickly based on such criteria, or do some filtered reporting, you're looking at table scans if you don't have appropriate indexes.

          *Of course, there's very little point in indexing on a column that might only have two evenly distributed values, such as 'M' or 'F'. The query engine will look at your index, laugh, and do a clustered index scan anyway. If you have a small number of possible values, and they are not evenly distributed, an index might be useful in finding the rare values though.

          1. Nick Ryan Silver badge

            nvarchar(max) is not as bad as you might think in terms of performance. If the data is small (for example 'Y'/'N') then it is stored in row, not off-page.
            If only things were so simple... MS-SQL server does try to do some clever things when it comes to the (max) data type columns, however as soon as there are too many of them in a table definition then the MS-SQL optimiser seems to go "I give up, you made this mess, you deal with the consequences".

            There are various table options that can be used, either to force large data columns out of row storage or to ask MS-SQL to try and store large data column values in-row ("text in row" table option). The table option "text in row" is also set to be removed in a later version of MS-SQL so relying on it is very much not a good idea.

            Testing these options are a little tiresome as because the data storage method is only changed when the column data is updated, which is a simple enough query but is not quick and locks the entire table just in case anyone tries this and wants to continue with other testing while the update is applied.

            From my observations and testing, using Microsoft SQL 2016, as soon as a column is specified as nvarchar(max) query performance goes to hell. Changing the column datatype to something sensible like nvarchar(30) makes a considerable performance improvement.

            In short, nvarchar(max) and so on are not evil as such as there can be good and genuine uses, but performance wise it's like using variants all the time.

          2. Ken Moorhouse Silver badge

            VARCHAR(MAX)

            Another problem with fields bigger than they should be is that people start sticking extra things into the field. I wrote a stock control/order processing system where everything was properly normalised and linked relationally. There were sometimes situations where orders were split: my design coped with that by having a linked table, however because the order numbers weren't purely numeric (and were different lengths for different companies) I was having to use VARCHAR (not max, though, sensible sizes). "Someone" had the habit of putting two orders into one record segregating the order refs with a slash, and abbreviating the second reference. She then used to complain that she couldn't find the singular reference numbers when doing searches.

            I like to use Firebird SQL, which at the level I work at, is just one file. Shut the service down, copy the FDB and restart the service. Backup done.

            1. Nick Ryan Silver badge

              Re: VARCHAR(MAX)

              Oh no! Table data stuffing... shudders... it was only a couple of years ago that I came across a moron developer thought that using tables and rows and columns was far too inconvenient therefore stuffed data in an exported XML stream into a single column.

              I still see so much similar stupid done in databases, where individual columns are stuffed with multiple row data rather than using the database to store the rows. Master > Detail is such a basic concept but lost on some.

              As for users stuffing too much data into a field, the number of arguments I've had about the storing of data in free-type text fields. If they need to store something in a structured way, store it in the database properly, not as a random-ish mash up of inconsistent characters stuffed into free-type text fields. Which the inevitably want to report on and then get annoyed when data is missing due to their typos...

        2. My-Handle

          Ouch

          I will raise you one worse... a function in code entitled CheckIfColumnExistsAndIfNotCreateIt(columnName As String).

          And three different columns for the same bit of data, each with subtly different misspelled names. Which one got updated / read largely depended on what bit of code was being run.

          1. Scott 53

            How about working on a database where the person who defined the schema was dyslexic, and you can't assume anything is spelled as you would expect?

            1. Nick Ryan Silver badge

              Dyslexia seems to be rather common in technical fields... unfortunately SQL queries and data definition editors don't have spell checkers, unlike source code. OK, source code isn't exactly a spell checker but spelling mistakes are flagged up and it's often a simple matter in a modern IDE to rename a local variable of function. Doesn't help so much with anything that's published though...

            2. Anonymous Coward
              Anonymous Coward

              As a dyslexic I simply can't take offence at this point of view the misspelling makes it difficult for us dyslexics too, I do at least try to use a spell checker or have someone else check, the main problem is if you can't spell a word how does one know the spell checker has corrected it to the right one?

              1. Sgt_Oddball

                On that note...

                I don't really care in the code if somethings misspelt. So long as it's consistently misspelt.

                As the end of the day a server doesn't care what a function, stored procedure or column is called.

                It does however care about grammar... And encoding... And commands spellings....

            3. GroovyLama

              In one of our products, we have a database table that links to an address book table using the "Address ID". Unfortunately due to a spelling mistake not being noticed, we are instead referencing that Address using the "Adderss ID".

              We always look back and wonder if that particular dev was thinking of "Adders" or "Udders" at the time they did that!

              1. Sudosu Bronze badge

                Maybe they had been watching a Rowan Atkinson show.

          2. yetanotheraoc Silver badge

            Three card monte

            "three different columns for the same bit of data"

            It's called ABC-normal form, usually abbreviated as AB-normal.

            1. Nick Ryan Silver badge

              Re: Three card monte

              Oh noes... now I'm remembering the horrors of people trying to stuff multiple, independent status values into a single column. Kind of works when there are just two status values as they just multiply together and as long as there aren't too many it sometimes works out OK. As soon as they want to add a third status though it's just a mess. Pretty much a really bad idea all the time.

        3. Yet Another Anonymous coward Silver badge

          >Gender recorded as a single char?

          Leaving only 26 possibilities ? You fascist oppressor

          >Use a nvarchar(max)...

          May still not be enough

          1. Anonymous Coward
            Anonymous Coward

            Can't decide whether to upvote or downvote. If serious, downvote. If joking, upvote. Please remember to flag your jokes!

          2. A.P. Veening Silver badge
            Joke

            Leaving only 26 possibilities ?

            Using both upper and lower case and digits there are already 52 possibilities. And in some cases "?" will do nicely.

            I do agree with you though, not nearly enough ;)

            1. Yet Another Anonymous coward Silver badge

              Not going into the politics of what lower vs uppercase would mean.

              As a physicist I probably identify as 0x27

            2. W.S.Gosset

              >Using both upper and lower case and digits there are already 52

              *cough*

              62?

              Plus punctuation, I guess.

              1. A.P. Veening Silver badge

                You are correct with that 62, was a typo and I noticed it outside the edit window. As for some reason all my posts have to be approved by a moderator now, I wasn't going to waste effort posting a correction.

          3. claimed Bronze badge

            No, no, no!

            I'm not having this. A prime example of the worst sin of all in database creation: not using a unicode encoding.

            There is more to the world than American English.

          4. Trixr

            Late to the party - it's one of my pet peeves. Unless required for actual regulatory or health care purposes, who cares? And if it is regulatory, there are your definitions, set in stone (ok, a few jurisdictions have unbent enough to include 'X' - not too taxing).

            What end users normally care about is someone's title of address. I personally get disappointed when I don't see "Major General", "The Most Reverend" et al in the picker.

      2. Anonymous Coward
        Anonymous Coward

        Yeah this sounds like a piece of personal conveyance management software I used to work on roughly a similar number of years ago. The best thing about this was not just the sheer number tables which although had foreign keys they weren't normalised. Every instance was customised for each customer and they had an update tool which would add and remove table columns sometimes on a per customer basis. (shudders)

        Oh yeah and one of the classes was 20,000 lines long split into two partial classes!

      3. Anonymous Coward
        Anonymous Coward

        re: My-Handle

        Oh God! I know exactly how you feel. Its like the developers don't actually know what they're doing, understand what they're actually doing, or why they're doing it. And their boss...FFS, why is he in charge???

        1. Yet Another Anonymous coward Silver badge

          Re: re: My-Handle

          You new here ?

        2. dvd

          Re: re: My-Handle

          Never had this conversation?

          Boss: Know anything about <technology>?

          You: No.

          Boss: You're the expert now. There's no money for a course, online training or even a textbook. Read up online. You're giving a presentation on Monday.

          1. Nick Ryan Silver badge

            Re: re: My-Handle

            Not quite, but nearly! Similar situations though and before too long I usually wind up being the subject matter expert and often wind up knowing more about it than the developers (if software anyway).

          2. Anonymous Coward
            Anonymous Coward

            Re: re: My-Handle

            You were lucky!

            I worked for a company that had two offices - one technical, other sales etc. Sales office started selling company X graphics cards. I was very familiar in company Y cards, repairing and tech support.

            I only found out that we sold X cards from a couple of customers ringing me directly for tech support! Ran the MD to ask for a X card to play about with - even just for a few days, but was refused.

            This was in the early 90's - before the internet took off, so couldn't even download a manual. There was no way I could help customers and when they rang up asking for help told them the truth.

          3. Trixr

            Re: re: My-Handle

            That is literaly how I got into IT.

            I was a *copy editor*, then correcting SGML markup, then within a year, I'm running NT domains in a different country and getting to grips with token ring and IBM connectors.

      4. Doctor Syntax Silver badge

        "the database in question had nearly 100 tables, none of which were linked"

        I've dealt with a product of that scale although it had a decent schema and worked well. The only issue was that, running on Informix, it had been written in the pre-cost-based optimiser days and the vendor wished to stay there so they forbade customers from running UPDATE STATISTICS. The SQL was written to hint the query path to the engine. I recognised the technique because I also go back to the pre-cost-based optimiser days. Step outside the limits of what could be done with that approach and the performance was a disaster.

        When I arrived there were a few home-written reports that they couldn't run - if allowed to complete they'd have taken well over 24 hours. In my second - or maybe even late in my first week - I took a look at them and found the query plan generated took absolutely the worst route it could, index searching almost the entirety of small tables and joining large tables to them with a sequential search. Temp tables to the rescue.

        Eventually the vendor decided that as the cost-based optimiser had been around for over a decade and probably worked they'd allow it.

      5. Sudosu Bronze badge

        It almost sounds like someone converted their Excel inventory to MS-SQL

        1. jake Silver badge

          But ...

          ... Excel is a perfectly good database! Why spend money on MS-SQL?

          If you don't know what Poe's Law is, look it up before responding. Ta.

          1. W.S.Gosset

            Re: But ...

            Ying Ying Iddle I

            1. W.S.Gosset
              Facepalm

              Re: But ...

              ...

              That should have been:

              Ying TONG Iddle I

              ...

              1. Sgt_Oddball

                Re: But ...

                What a lovely lovely boy!

      6. yetanotheraoc Silver badge

        Clever

        "an aggressively large varchar for every field"

        One of the clever analysts wanted to design a production database this way, asked me what I thought. I told him there was a reason for defined types and his database would not perform well. When I realized he wanted it this way so he wouldn't have to do any value or type checking in the data entry GUI, I started avoiding him.

        1. My-Handle

          Re: Clever

          I fully expect that the database I refer to was built with the same idea in mind. Most of the fields were varchar(500), but some were 5000. After a while you saw the pattern... the original designer had used 500 up until he'd hit a field that needed more than 500 characters (a product description or something). Every field subsequent to that was 5000.

          There was also no sanity checking or validation in the code. Any typo made on the site would just be swallowed by the database. Don't get me started on date (varchar) fields and the myriad of formats used.

          Several points in the code just spat data at the database in the order that the fields appeared in the UI. This worked just fine, until they then changed the order of the fields and suddenly five address fields got rotated 2 places (house number in city field, city in postcode, postcode in address line etc). AND CODE HAD THEN BEEN WRITTEN TO TEST AND ACCOUNT FOR THIS DOWN THE LINE!!!

          <ok, breath... breath.... calm face>

          I sometimes question exactly how skilled a developer I am, wonder how many best practices I don't know about or don't follow. Then I remember how much worse it could be.

          1. Loyal Commenter Silver badge

            Re: Clever

            If you *must* use an arbitrary field length, make it a power of two, then at least it *looks* like you know what you're doing.

    2. big_D Silver badge

      Been there done that.

      I spent months specing up a system for food tracing integrity. Made a data model with referential integrity and traceability, definitions of all the fields and a bunch of classes and methods to go with it.

      Management took one look at it, threw it out and cobbled together a database without any referential integrity and no controls... For food tracking through production... What could possibly go wrong?

      1. Doctor Syntax Silver badge

        We can use Excel as a database so we don't need all that complexity....

        1. Anonymous Coward
          Anonymous Coward

          >We can use Excel as a database

          Up there with,

          "I'll only put it in a little bit" and "I didn't think anyone was using it" as prefix for a disaster

          1. Zarno
            Mushroom

            "I'll only put it in a little bit"

            I'm immediately thinking of an arc flash from a 1.5KV electrical cabinet when someone stuffs the ammeter probes across two legs.

            Icon because, well, that's the result usually.

            1. Ken Moorhouse Silver badge
              FAIL

              Re: someone stuffs the ammeter probes across two legs.

              London Underground signal engineering staff used AVO's for fault-finding. Never tested this myself but these were supplied with all the Current options disabled, as it is akin to causing a short-circuit, which could mean a "wrong-side" signal failure.

              (No grim reaper icon?)

    3. logicalextreme

      Of course in this particular case they were able to reconstruct the data from elsewhere in the database, which strongly implies that normalisation was, er, jodido.

      1. Loyal Commenter Silver badge

        Why have one version of the truth, when two or more, possibly conflicting versions will do?

        1. Yet Another Anonymous coward Silver badge

          More truthiness = better

    4. Diogenes

      he developers who just do not understand that SQL operations are set based and not procedural.

      Was on a maintenance project back in the days when DB2 was new. I got a program that overran its allotted slot in the nightly job queue by 6 hours which meant the queue didn't finish until around 10am, instead of 5am, and was told to "make it go faster- you have a week".

      Opened the code "Select everything from parent" for each row "select children" for each child "select grandchildren" for each row "if grandchild value = x then doY else if value = a then doB " where doB updated one column in grandchild, and doY updated another. When I stopped laughing, I rewrite this mess as two update queries which ran in 30 minutes, went to DBA & suggested an additional index, and it then ran in 5. It took longer to do the code review (how the original stinker ever passed is a mystery), and do the change control paperwork than it took to do the fix.

  3. Potemkine! Silver badge

    We all know never to test or develop in production. But sometimes management is reluctant to pay for all those extra environments

    In that case, ask them to write they refuse to pay for a test environment and will be accountable of any problem related to development made directly into production.

    Words are not enough. It has to be written.

    1. Anonymous Custard
      Headmaster

      And to personally name, date and sign it...

      1. John Young 1

        In blood...

        1. The Oncoming Scorn Silver badge
          Pint

          I wish to write them in blood — your blood, to be precise.

          In managements blood...

          FTFY

          Icon from the Blackadder episode the title was taken from.

          1. jake Silver badge

            Re: I wish to write them in blood — your blood, to be precise.

            Management has no blood.

    2. Doctor Syntax Silver badge

      "Because I say so"

      1. veti Silver badge

        In that case, you don't need them to write it. You can do it yourself.

        "Per your request, we will be making these changes in production with no rollback mechanism." If you can show you sent that email, that's as good as receiving it.

        1. imanidiot Silver badge

          At the very least sending such a mail shows that you thought about these problems before. However, depending on how things turn out someone might still give you the blame for "not refusing to do the work knowing it was dangerous to proceed".

          1. John Brown (no body) Silver badge

            So you finish the email with "...and so I strongly recommend not proceeding and will await your reply instructing me on what to do next"

    3. jake Silver badge

      "Words are not enough. It has to be written."

      Indeed. Back in my 9-5 career, I very much enjoyed thwacking manglement over their collective head with large piles of wet-ink paper-trail.

      It's amazing how heavy a signed post-it can feel when wielded appropriately.

      1. Mr Army

        *wielded appropriately

        *wielded appropriately

        As per BOFH

        Affix Post-it Note to a 3ft length of 2"x4" before wielding

        1. jake Silver badge

          Re: *wielded appropriately

          Any buffoon can use a club.

          It takes skill to know when and how to present a post-it for the most dramatic affect.

          1. Anonymous Custard
            Trollface

            Re: *wielded appropriately

            But there are times (and skull thicknesses) when only a clue-by-four will do...

            1. quxinot

              Re: *wielded appropriately

              Homework for the day:

              How many post-it notes, combined with an adequate amount of glue for creating a solid block, and perhaps a piece of rebar for stiffening, are required to create a proper LART?

              (A lathe may be used to create a more comforting handle for the wielder.)

              1. jake Silver badge

                Re: *wielded appropriately

                Why would one pad perfectly good rebar with paper?

                Besides, I maintain that a signed confession on a single post-it makes more impact than any number of clumsy clubs.

            2. jake Silver badge

              Re: *wielded appropriately

              A non-Buddhist philosopher said to the Buddha, "I do not ask for words; I do not ask for non-words."

              The Buddha just sat there.

              The philosopher said admiringly, "The World-honored One, with his great mercy, has blown away the clouds of my illusion and enabled me to enter the Way."

              And after making bows, he took his leave.

              Then Ananda asked the Buddha, "What did he realize, to admire you so much?"

              The World-honored One replied, "A horse runs even at the shadow of the whip."

              1. Anonymous Coward
                Anonymous Coward

                Re: *wielded appropriately

                I can imagine a more smart-alec reply - "He didn't ask anything, so I didn't answer." Or, more in the spirit of the original tale, "Ask a non-question, get a non-answer."

  4. blah@blag.com

    Back-End Issues

    A pretty common experience I suspect. But having a dev & test environments is no sure thing either. On one SQL/CR combo I used to re-up the dev server by droping the tables and then batch import new data, that was fairly routine until of course one day I failed to recognise I was on the Prod box. I think of these moments as a "Sphincter Loosener" when your heart misses several beats, disengages from your chest wall and drops into your stomach and attempts to push all contents out of the "back-end".

    This particular incident stress tested my rebuild scripts and took the best part of a day IIRC. I explained to my manager something like that I'd found a bad flaw in the database structure, that the flaw meant all the CR financial reporting was incorrect and that only my prompt intervention saved the day.

  5. Already?

    Surely anyone faced with doing updates on a live box will run the query first as a Select instead of an update to check that an expected quantity of rows are updated and repeat that a couple of times, and then run the Update inside a transaction with a Select immediately after on the expected updates to confirm which then finishes with a Rollback after the check to undo it? Then do that again, and again to check and then once more to make sure. Then step away, have a coffee and then check again. And get someone else to eyeball the script that you're about to run.

    We've all been there, I certainly have. The sense of trepidation as you're about to hit Execute is enough to want to be absolutely sure that you're happy with what you're about to inflict on the database.

    1. Doctor Syntax Silver badge

      "The sense of trepidation"

      That's because you have a suitable degree of paranoia, the first requirement of any DBA. Unfortunately it's not part of the certifications that HR check at recruitment.

      1. HandlesMessiah

        THANK YOU.

        I teach people how to be DBAs for a Very Large Software Company, and my boss once yelled at me for telling my students I was there to teach them the appropriate personality defects to be effective DBAs, including paranoia and control-freakism.

        1. yetanotheraoc Silver badge

          Nice username

          Of course your boss yelled at you. Control-freaking is a management-only function. And now he is watching you like a hawk.

      2. jake Silver badge

        As I tell people when I implement a belt & suspenders (belt & braces to you Brits) solution to avoid losing data during system migrations and the like, "I'm not quite paranoid yet .... but I'm getting there".

        1. W.S.Gosset

          :) When you find yourself saying:

          "I'm not quite paranoid yet .... but I know people think I am."

          , you'll know you're getting close.

          1. A.P. Veening Silver badge

            you'll know you're getting close.

            With "I know I am paranoid, but am I paranoid enough?" you are getting really close.

            1. W.S.Gosset

              Actually, a twist on that'd make a great graffiti to put up randomly in strange places:

              "You know you're paranoid, but are you paranoid enough?"

              .

              Or perhaps "We know..." ? Hmm

    2. veti Silver badge

      You absolutely should do all that, yes. But if you're sure of what you're doing, you've already made several changes without a hitch, you're anxious to get home early. .. It's possible to get sloppy.

      And even if you don't, all it takes is to miss a line in the part of your query you highlight before pressing F5. Been there, done that.

      1. Already?

        "...all it takes is to miss a line in the part of your query you highlight before pressing F5."

        I once did that, after doing all the careful checking etc. I was happy, everything was right, the delete affected only the required rows and then as I reached over to hit F5 I caught the mouse with my sleeve and in doing so highlighted just the delete clause leaving the all-important Where excluded. I could sit there and try to repeat that trick a million times and fail every time, but not this one.

        As soon as I realised I hit the Stop button and prayed for MSSMS to halt execution, hoping the much vaunted Atomic All or Nothing promise would in fact work. It did, but it was only due to the production db having many millions of rows that there was time to catch it before it completed.

    3. My-Handle

      I long ago got into the habit of typing the WHERE clause first in an update statement, before then going back and entering the SET clause. Even on local / inconsequential DBs. It's not fool-proof, but it's saved me a couple of times.

      1. Captain Scarlet Silver badge

        Actually yes I'll always do this as well, however I also tend to just a select * into tablename_date just in case as well.

    4. Loyal Commenter Silver badge

      And if you're extra paranoid, you'll do a

      SELECT * INTO BACKUP_TABLE FROM ... 
      as well before screwing with that data in the table you're doing the DELETE or UPDATE on, and keep hold of that backup table until you're certain you've not screwed up. There's nothing like a fallback from which to retrieve the records you accidentally deleted, although you might need to do a bit of jiggery-pokery to disable identity generation when putting them back where they came from...

    5. eionmac

      QUOTE

      We've all been there, I certainly have. The sense of trepidation as you're about to hit Execute is enough to want to be absolutely sure that you're happy with what you're about to inflict on the database.

      UNQUOTE

      This is a gross understatement.

  6. ColinPa

    The random expiry time

    A large bank used a messaging product to send information around its systems.

    Some of these "what is my bank balance" messages were not important, expired after 30 seconds and were thrown away - the requester could always resend.

    Some of these were "transfer $100 Million" messages - and these were not allowed to expire. They were logged to disk.

    Unfortunately a teeny-weeny application change put a random expiry time into the data.

    The first the bank knew there was a problem when there was a phone call "Where is my $100 Million?"

    I worked for the company that provided the messaging system. I got a call 10:00 from a stranger who explained the problem and asked if I could go on site to help.

    I said I was happy to, but the customer would have to pay for a plane ticket etc. (That usually puts people off). I strolled over to my boss to give him early warning - but he was away from the office. I got back to my desk and had an email " there is a ticket for you on the 1200 flight to ... if you can make this it would be great"

    This is when you think "They are serious". I left a note on my desk, booked a taxi - rushed home, picked up my passport, and change of underwear etc. I got the (business class wow!) flight with minutes to spare.

    At the far end I was met and taken to the customer site. I knew the confidential layout of the log records on disk, and between us we came up with some rules - if this value is .. and that value is - then print out this other data. Some people then worked through the nigh and re-entered the expired data.

    By 0900 the next morning, it was all fixed.

    They then told me the true scale of the problem. They had "recovered" billions of dollars which was good. The banking auditors were due to arrive that day at 1200 for the annual review. If they found there was money missing, the bank would have been closed down.

    1. John Robson Silver badge

      Re: The random expiry time

      "The banking auditors were due to arrive that day at 1200 for the annual review"

      That would explain business class flights.

      1. Strahd Ivarius Silver badge
        Trollface

        Re: The random expiry time

        and would have explained also flights going in the other direction

      2. John Brown (no body) Silver badge

        Re: The random expiry time

        "That would explain business class flights."

        Do they get there quicker than cattle class because they are nearer the front? Or just more likely to have a vacant seat?

        1. KittenHuffer Silver badge

          Re: The random expiry time

          I think it's more the case that the business class traveller is more able to step off the plane and get right to w**k.

          If you been corralled with the cattle class on a long (12+ hours) flight then you need 12-24 hours to get your sh*t together before you're capable of w**king again.

    2. Anonymous Coward
      Anonymous Coward

      Re: The random expiry time

      Nice of them (I think) to tell you the scale of the problem after you fixed it.

      No pressure and all that!

  7. Anonymous Coward
    Anonymous Coward

    Done it, learned from it

    Working on a meteorological database in the field. Database was in a data centre in Reading, I was sat on top of a truck chasing storms in Kansas. We had had some inconsistencies on one sensor. Quick SQL query to take them. Put for the sake of making the public facing graphs look better, eg no jumps from 25C to - 127C. Ran query, expected to loose 20 odd rows...

    Lost 45,000, in fact All of the data for that asset in the feild.

    Thankfully backups. And a week in the office after writing procedures and documenting stuff so it wouldn't happen again. At least us small businesses tend to learn...

    1. anothercynic Silver badge

      Re: Done it, learned from it

      Hi there ECMWF ;-)

      1. The Oncoming Scorn Silver badge
        Holmes

        Re: Done it, learned from it

        Well that narrows down the list of potential AC's.

      2. jake Silver badge

        Re: Done it, learned from it

        I suspect that's Reading, Kansas. US. Not E(CMWF).

  8. Anonymous Coward
    Anonymous Coward

    Experience saves nobody

    Having been around the block a few times now, on the odd occasion - only when timebound by ridiculous management - do I do exactly this error - have a plugin that checks my statement and warns if it's going to overly destructive.

    I've now made a point of sending an email, asking for confirmation that it's to be run on the prod system, with all the caveats it entails.

    Rarely do I get a committal at that point from management to actually run it. Then, it's a how many days to test.

    This isn't so much a CYA, but more a point of the driving the point...'Do It Properly'

    As a result, we are training management - who should know better - but you put their ass on the line..............

  9. Anonymous South African Coward Bronze badge
    Coat

    DBA needed to install Informix on a new server on site, and transfer data from old server.

    SSHd in to the server, set up Informix etc, did an oninit -iy to prepare and bring the DB online...

    ...only to find it was on the production server.

    Oninit -iy initializes the database. Like formatting a disk.

    And yup, everything was gone. Backups was a futtup.

    Took two software support techs a couple of days of SQLing to get redundant data from other sites into this site and get everything up and running again.

    Wasn't me... I was at the pub that day. Going there soon.

    1. Doctor Syntax Silver badge

      No backups? It should have been possible to restore everything except any transactions in the last logical logs.

      Perhaps it's safest to do these things through onmonitor. You can look around and see what's happening & maybe notice you're on the production instance.

  10. Anonymous Coward
    Anonymous Coward

    BOFH torpedoes the test environment

    At a long-bought-out outfit I worked for as the Testing Manager, I sensed that a test environment was necessary, and was told by my manager to submit a proposal. I costed it, and selected a certain then-on-top Big Blue company to provide it.

    We were on track to go, and had the cash ready to pay for it, when the BOFH told me that the kit, which weighed around 5 tons, couldn't be installed in the system room as the floor wouldn't handle the extra weight without massive reinforcements that would cost double the price of the kit.

    We went without the test environment. Another BOFH triumph!

    1. KittenHuffer Silver badge

      Re: BOFH torpedoes the test environment

      He didn't torpedo it .... he just you informed you that there was a lack of support for your new server!

  11. macjules

    There are two types of database programmer in the world

    3 when you add Oracle ones, Those who have missed a critical filter in the WHERE clause of an UPDATE, those that will do so at some later point in their career and those who will deny that they made a mistake and blame it on the crap DB solution.

    1. Anonymous Coward
      Anonymous Coward

      Re: There are two types of database programmer in the world

      There are also those who don't use transactions and use "autocommit" because that's how their old dBase/Paradox/Access/mySQL database worked....

      Not that SQL Server 6.5 helped much - it escalated locks so quickly any transaction opened for more than a few seconds had good chances to lock whole tables...

      1. Down not across

        Re: There are two types of database programmer in the world

        There are also those who don't use transactions and use "autocommit" because that's how their old dBase/Paradox/Access/mySQL database worked....

        It is not necessarily quite so "simple". For example Sybase, by default, runs in unchained mode so transactions (if not explicitly started) will commit automatically. However that does not prevent you from explicitly starting a transaction with BEGIN TRAN which you will then need to either commit or rollback.

        You can of course also set your session is chained mode if you so wish.

  12. GroovyLama
    Facepalm

    Racing to stop one of these from happening

    Had a slight reversal of the scenario at a previous employer.

    We had been struggling at our customer site with a problem with the reference data we had loaded. It always seemed like updates to some of the original data loaded was not taking effect. The changes were visible in the Product Catalogue Database, but once deployed into the rating databases it wasn't taking effect. The struggle went on for a week or so, until I found the problem. By this time I was back at our office in a different country, so I was supporting the team remotely.

    The product in question used a special table to track the "version" of reference data, it allowed the capability to be able to deploy reference data changes as their own release, but also be able to roll back to a stable version if required. I found there was a case where there were duplicate entries in the table for the original reference data loaded. When updates were being made, our applications were able to correctly track the change with one record, but the duplicate(s) were being left in situ with no update on them, which was then causing havoc.

    I wrote up a pretty long SQL script to identify the affected records and passed this on to PM's and the developers investigating the issue with me. The response I got back was a "thanks for the help, you've found a defect affecting multiple customer sites, and we have built a patch to fix the issue. It's being deployed to customer Production sites globally ASAP".

    my manager was a bit concerned about:

    a) how did they write a patch so quickly, they don't usually do that.

    b) they seem to be rushing this out a bit too quick for his liking.

    So he asked me to verify what they had built. I went trawling through the patch to find a tiny little SQL file in it, with nothing but this:

    DELETE FROM holy_grail_of_tables;

    COMMIT;

    I had to go chasing after him to tell him to make sure the patch wasn't deployed anywhere. Luckily he was quite well connected in the company and was also on the phone with someone with some influence, so managed to get that patch rollout stopped before it did any damage.

    After that the devs weren't allowed to apply any fix for this issue until it had been specifically reviewed and OK'd by me. I seem to remember that I had to write out the majority of the correct patch for them too - but that might be me remembering myself as the only hero of the story :p

    Icon as that is what the developers at that place used to make me do quite often.

    1. ColinPa

      Re: Racing to stop one of these from happening

      Reminded me of the customer when the database ran out of disk space, and had recovery problems, so they though they would delete the recovery logs to free up some space!

      Quote "It worked before". "In production?" "No on a test database"

  13. Anonymous Coward
    Anonymous Coward

    Please excuse my lack of knowledge

    Please excuse my lack of knowledge - but does COMMIT and ROLLBACK not exist on non Oracle databases? Or CTAS (Create Table AS <your where clause>)?

    All sounds a bit odd to me (Oracle Database Performance Consultant).

    1. A.P. Veening Silver badge

      Re: Please excuse my lack of knowledge

      Those commands do exist on DB2 (and DB2/400).

    2. W.S.Gosset

      Re: Please excuse my lack of knowledge

      (I should have read before posting -- see my post below)

      "Transactions" are part of the Relational standard, which all RDBMSs strive to meet. Rollback is trivial: all RDBMSs manage that.

      Oracle fails to usefully achieve full Isolation of its Transactions, though. (AKA full Predicate Locking.) For full Isolation, you have to restart the server with an option set to True, and then your locking Granularity is Table... Ie, touch a table: you lock that table. Ie, never used in real-world. So all Oracle dbs need programmatic handling of this and few people fully understand it, so all Oracle dbs are at risk of subtle but godawful semantic corruption in high concurrency environments. (This may have changed: I last checked a few years ago, no change after 20+yrs...)

      Re your Performance role: you might "like" (or not) to read my other post re Sybase's optimiser's path-dependent multi-codebase approach. Argh.

      1. W.S.Gosset

        Oh, and

        Create Table As (Select blah) typically hits permission problems in Prod: Resource authority is typically locked down.

        But yeah use it to dump a quick copy into TempDB of every table you're touching. And make damn sure you don't close that session's window ;)

  14. MarkET

    Excuse my ignorance

    But when you SNAFU a system in real life, e.g. in a factory, you know about it. You might survive to tell the tale....some induction courses aren't pretty.

    1. Boris the Cockroach Silver badge

      Re: Excuse my ignorance

      Especially when the boss tells you that there is'nt time to validate and slow mo the code to prove

      A. it works

      B. It wont destroy anything (or anyone in the process)

      Playing catch with a 100lb tool turret after its rammed a chuck spinning at 2000rpm is NOT fun. especially when it comes to changing your underwear

      1. A.P. Veening Silver badge

        Re: Excuse my ignorance

        Playing catch with a 100lb tool turret after its rammed a chuck spinning at 2000rpm is NOT fun. especially when it comes to changing your underwear

        It becomes fun when your boss, who told you there wasn't time to validate and slow mo the code, has to play catch, especially when it comes to him having to change his underwear. If you are truly lucky, he even will learn from it though it is more likely you will acquire a new boss, the old one having decided to try his luck elsewhere.

  15. roytrubshaw
    Headmaster

    "There is a happy ending to the story: he was able to reconstruct the borked data from the content of other tables ..."

    So the database wasn't in 3rd Normal Form then? :)

    1. spireite Silver badge

      That is what I refer to as 'First Abnormal Form'

    2. Anonymous Coward
      Anonymous Coward

      Nicolas here...

      Nope....normal forms were a distant legend there....the guy who had created the database knew little about them

  16. Tim99 Silver badge

    Logs

    Transaction log recovery to a point in time? Or "someone" truncated it after "Oh No"?

  17. W.S.Gosset
    Facepalm

    Transactions...

    God save us from 3GL database "experts".

    He _could_ have done this safely in Prod via BEGIN TRANSACTION. Even Sybase/SQL server could cope limitedly with that.

    Check your work on another terminal session using Dirty Reads.

    If good: COMMIT ; else ROLLBACK.

  18. W.S.Gosset
    Mushroom

    Sybase was unsafe, full stop

    I banned Sybase from use in our Australian office as soon as I tried it for something nontrivial.

    It could flip into Cartesian Products on nontrivial predicates if you re-ordered the table-list...

    !!

    Implication: (and matching its then-industry-leading performance) it had entirely different codebases for different query plans. Presumably each specially optimised.

    And some of them were unpredictably --and silently-- catastrophically buggy.

    1. W.S.Gosset

      Re: Sybase was unsafe, full stop

      I am intrigued by the downvotes. I never knew Sybase had fanboiz.

      1. jake Silver badge
        Pint

        Re: Sybase was unsafe, full stop

        Near as I can tell, EVERYTHING has fanboiz.

        Even the systemd-cancer. And apparently even Sybase! (Who knew‽) Wear your downvotes with pride for trollingdrawing both of them them out of the closet.

        This round's on me, fanboiz included :-)

  19. Jiggity
    Thumb Up

    Classic

    "There are two types of database programmer in the world. Those who have missed a critical filter in the WHERE clause of an UPDATE, and those that will do so at some later point in their career."

    This raised a rueful chuckle; I'm happy to admit that I am in the former camp (and even happier that it was not quite so mission critical...)

    That said, I have personally fixed at least one such bork that was mission critical, which always serves as a useful booster vaccination against such activities!

  20. Aitor 1

    His name was not Nicolás

    I think I know the pair,and how it was solved.

    Lets say the borked table was the in use representation but there was enough data in other tables to fix it.

    And not only "Nicolás" fixed it, but also solved the accumulated rounding errors. But what moments of terror!!

    I have seen the same happen in oracle, and I did the same on an old IBM host.. on production, on a gvnt database.

    In my case, I did have a backup, but the whole organism main task would have to stop for hours and data would be lost.. so I managed to fix it with sqls.

    1. Anonymous Coward
      Anonymous Coward

      Re: His name was not Nicolás

      Hahahaha Indeed you know me

      1. EnviableOne

        Re: His name was not Nicolás

        Yeah the regonimiser got some accents

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like