Only appropriate icon
Have one of these, "Bert" -->
The days are drawing in and the mornings are getting darker – so why not take a dose of Who, Me? to help lighten up your day. In this week’s instalment of El Reg’s weekly column filled with embarrassing reader cock-ups, we meet “Bert”, who was working at a large brewing firm at the point in question. “I was a junior dev as …
Early 90s
We had a file system that held samna / amipro documents that constiuted all the specs and programming documents.
We were running out of space so i deleted what i thought were some unneeded docs.
Except i rm - r on the wrong dir
I recovered from a 3 montj old backup
No-one noticed
>Except i rm - r on the wrong dir
>I recovered from a 3 montj old backup
>No-one noticed
Sure? There have been a few instances where I had to do an editing job I was pretty certain I had already completed, and the work flow logs indicated I had done so too. Strangely when asking IT support the answer was invariably "you must be mistaken." I am not so sure.
And equally suspiciously the last backup they could document was almost about as old as when the editing was supposed to have taken place.
I remember the one and only copy of an old database I deleted on tape.
Way back in the late 90's we have tons of physical servers and a tape drive and backup software installed on five of them. I dutifully inserted the tape for that night's backed into one of the drives and right clicked, chose erase.
Nothing. No lights on the tape drive to show a job was in progress.
So I repeated the steps.
Nothing.
I did this at least three more times until I saw a tape below the one I was working on light up.
Good 'ol Backup Exec allowed anyone to point the software to any tape drive on another server in the network so long as that server was also running BE.
I must have erased that lone backup for the old DB at least four times. That tape was definitely erased!
Once I fixed the error and erased the correct tape I told my boss what I had done. His reaction?
"Good. That's what they get for messing around with the backups and not telling us."
Others do, Sabroni? Really? Care to show me who explained why Microsoft allowed a file vital to the database to be deleted while the system was running? Or is that reasoning, as I suggested, only known to Microsoft?
But thank you for your input. I'm sure it was intended in the spirit of diversity, inclusion, welcomeness, and openness.
"Why would database software be written such that deleting an ancillary file ( such as a log file of historic steps) cause it to fall over?"
Not familiar with that particular engine but one possibility is that it contains the log of current transactions, i.e. WORK that's STARTed but not yet COMMITted or ROLLed BACK. That puts the engine in a bit of a quandary as to what it's working on. The longer term need for the logs is to roll the database forward after a restore from backup and there ought to be a means of archiving logs for use as and when required instead of letting them grow indefinitely.
>The longer term need for the logs is to roll the database forward after a restore from backup and there ought to be a means of archiving logs for use as and when required instead of letting them grow indefinitely.
Running a full backup will normally consolidate the transaction logs on a MS database (SQL, Exchange, etc) server. If you run incremental or differential backups, it doesn't.
"Running a full backup will normally consolidate the transaction logs on a MS database (SQL, Exchange, etc) server. If you run incremental or differential backups, it doesn't."
I'm more used to Informix where you back up the logs separately. Also the restore of a full backup plus incremental takes you to a fully check-pointed position. You only need the logs since the last incremental was made to roll forward from there. And you were doing your incrementals every night or even more frequently, weren't you?
This post has been deleted by its author
"Why would database software be written such that deleting an ancillary file ( such as a log file of historic steps) cause it to fall over?"
Because the transaction log of an SQL server is anything BUT an "ancillary file".
SQL Server transaction logs are actually split in virtual log files, that are used in rotation. All transactions are written sequentially to it and eventually it goes back to the start / 1st virtual file and starts over, ever-overwriting old transactions.
Other than being where current, non-committed transactions are serialized to allow for rollback or for recovery in case of power loss / unexpected shutdown, it also serves multiple other purposes and that's where "runaway transaction log files" tend to happen to people, but those are actually an indicator of something not configured correctly or something else being broken.
The issue is that those virtual log files can only be reused once they can be cleared; and a few things control how and when they can be cleared.
One thing is ongoing, uncommited transaction. Put simply if you have 10 Gb of uncommited transactions, and 5 Gb of transaction log files... Well, the file will grow. That's normal, otherwise the transaction can't happen.
But SQL Server also has different recovery models (related to how you backup/restore a database).
A "full recovery" database is designed in such a way that by combining (full / differential) backups and transaction log backups, you can do point in time restores down to an exact transaction.
To do that however all transactions must also be backed up! So until a specific virtual file is fully backed up, it can't be freed for reuse. It's typical for fully logged database to have a very fast transaction log backup regime (think 5-15 minutes), because the transaction log must hold all transactions until they're backed up.
If someone sets a database to be fully logged, and then does not get a transaction log backup schedule going, well... They log fill will grow indefinitely, because... well because that's what they asked it to do.
Another reason is replication. The various replication schemes that exist in SQL Server are based off the transaction log: transactions that are commited on the source server, are transferred to the transaction log of target server and then also commited to the datafiles of the target servers. Any virtual log files that contain data that is not yet replicated to even just one of the target server can't be cleared for reuse, so broken replication configurations / failed servers can be a reason for transaction log files bloat.
Oh well I could keep blabbing about that, but yeah confusing transaction log with just "log files" is a classic mistakes that bites every SQL dabbler in the ass eventually :)
Not familiar with that particular engine but one possibility is that it contains the log of current transactions, i.e. WORK that's STARTed but not yet COMMITted or ROLLed BACK.
Pretty much spot on. Depending on your database recovery model, it either contains the data for uncommitted transactions, or a log of everything that has been done in the database since the last full backup (enabling a restore to any point in time since then, and fancy things like database mirroring / log shipping).
The former should mean small(ish) log files, which usually contain a fair amount of empty space, corresponding to the largest transaction in the file's history. That space can be freed up by using the database commands to shrink the file.
The latter means big log files, especially if you're not regularly backing up the database.
On a modern file system, the log file will be locked while the database is online, safe from errant presses of the Delete key in file explorer. I'm guessing that on Win2000, not so much...
I've worked with SQL Server since SQL Server 6.5 and never been able to delete a log file when the SQL Server service is running, so I find the entire story a bit unlikely to be honest.
As you say, a SQL Server log file isn't just an ancillary file of historic steps. It maintains the state of active transactions and is a critical part of the database.
This post has been deleted by its author
I might have something of an answer. As a programmer, say you have a log file for activity recording purposes. When the server comes up, you check that the log file is there. If it isn't, you create a new one and the server is good to go.
But when the server is up and running, why check for the _existence_ of the log file ? You know it was created at start, so it should be there. You also have a tendency to foolishly assume that the people who will be administering the server have a clue and won't be deleting files willy-nilly.
Also, just recreating the log file and continuing operation is not really a good idea. If the log file disappears while the server is running, it means that there is a problem and it is actually better to crash the server to bring attention to it, rather than just trudging on and ignoring such an issue.
The transaction log file is NOT an "ancillary" part of the database. It's a critical file, which is just as important as the data files, since the state of the database is represented by the combination of data files AND transaction log file. The transaction log file contains changes to the data that have been committed, but that may not have been applied to the data files yet.
A database log file is very different from an OS or application log - despite the common name - it's not an ancillary record of "historic" steps, it's a live running record of ongoing activities with all the data required to make them "done" (committed), or canceling them by rolling back them while keeping full data coherency. It can also be used to roll-forward, replay the actions to restore the state of the database to a given point in time, from a backup which may not contain the latest data. They are more like the "journal" of a journaled file system.
Usually, any change to the database is registered first in the log file, and only later in the data files. And they must be written quickly (that's why log files are often places on faster storage) and committed to disk, so if the instance crashes, the database can be put again in a coherent state reading log data and reapplying or rolling back whatever is needed - often data blocks are kept in memory for performance reasons, and written to disk later. Log files are usually sequential files with a simpler structure that are much faster to write than the much more complex and random access data files.
Remove it, and the database no longer knows where to write those critical data, and to avoid corruption it will usually refuse to perform any activities until a log file is available again. There could be ways to recreate an empty log file, but of course anything not yet applied into the database have been lost - and backing it up *immediately* is needed, because there will be no way to roll forward from the last backup.
There are many reason to protect log files more than data files....
The reason is many SQL Server instances are misconfigured and never properly backed up. In other databases keeping the size of log files under control is easier - especially since you usually wanted to put them on very fast storage albeit usually snaller, especially for system doing a lot of transactions. But SQL Server was and is often used as a fire-and-forget databases, and issue start to arise later.
Once, years ago, I was called at a manufacturing plant far away because their SQL Server, where they wrote a lot of telemetry data coming of the manufacturing machines, was crawling to a halt. They thought it was some arcane issue that would have taken days to solve. Actually it was just a few bad DB design decisions and lack of proper maintenance - after a few hours spent putting it a a good and safe configuration it was fast again - and I'm quite sure the local acting DBA was later called for a word or two....
"I'm quite sure the local acting DBA was later called for a word or two"
The local acting DBA would probably have had a good response if he dared use it: "Why don't you send me on a training course?". That wouldhave put his manglement in a real quandary - if they did that they might have to pay him the going rate for a trained DBA.
"The local acting DBA would probably have had a good response if he dared use it: "Why don't you send me on a training course?". "
A number of years ago we were supposed to be doing an install at a new client. The usual procedure for SQL Server was to send our database creation script to the DBA for comment, with a note explaining that we would work with them to ensure that the resulting database met all requirements. The database creation script was extensively commented.
The result was an extremely irate phone call from the DBA on Friday evening at 5 (so doubleplus ungood). He hadn't read the bit about "for comment". What did we mean sending him this to install? Who did we think we were? There was no way he would install stuff from such totally incompetent people and his MD would hear all about it on Monday.
He continued on transmit only for quite a while and eventually I managed to find out what he was objecting to. We specified the log file dimensions and parameters (with various "insert your preferred value here" comments but he hadn't bothered to read that). Our default maximum log file was 2Gbytes.
His view was that he refused to have log files on databases because they took up room.
So how were we supposed to roll back bad transactions etc.? Nothing to do with him, company policy.
We never did get the job, which was based in an East Anglian city well known for football and insurance. It was eventually leaked back to me that the DBA would be at a loss to display any qualifications on SQL Server. But it did cause me to cancel my car insurance policy and move it to another company.
>The local acting DBA would probably have had a good response if he dared use it: "Why don't you send me on a training course?".
That is a high risk strategy that could backfire badly. Management might get the idea that they have gotten so far with incompetent DBAs that they might just as well fire him and hire in our friend Anonymous DBA whenever the need would arise. And since slimming down the work force could net them a bonus I'd say the DBA would be heading for a role as the second part of a three course meal with an apple in his mouth.
I can do that ... Ever lose control of a handtruck loaded with punchcards containing a post-grad's entire dataset from an overnight run at SLAC? I have. Down a flight of stairs (the elevator/lift was down for maintenance). I was a teenager, landed a position as a summer intern ... I thought for sure I would be out the door.
After giving me shit for about half an hour as we picked up the cards, Don (the post-grad) relented & demonstrated the proper care & feeding of a card sorter, and explained how columns 73-80 were used. I'm pretty sure that's where I started developing my belt-and-suspenders (belt-and-braces to you Brits) approach to data retention. I'm not paranoid about it, but I'm getting there ...
I once worked for a company which used a mainframe in the midlands to run FEA. One day two engineers set off which a full set of program and data cards in the boot, in two boxes. On the way there, someone drove out in front of them and they had to do an emergency stop.
On arrival on site it proved necessary to turn round and go back again because, of course, the boxes had ruptured and the cards were distributed all around the boot.
After that more care was taken in ensuring data integrity.
The .LDF file is locked by the SQL server process, it is not possible to simply right click and delete
Correction - shouldn't be possible.
File systems, especially back then, were rather nasty things that took bytes quite literally - often biting chunks out of your data. What should've been impossible and what was in fact actually quite possible at the worst possible time were the same thing :(
In the 90s I was working on Hyperion Essbase, or Arbor Essbase as it was then known, one of the first OLAP systems.
It had a slight problem. Recalculating the Hypercube took <insert swear word here> ages if it was already full - in that recalculating a full cube would take, for example, 24 hours, whereas just loading the bottom row data it would take 1 - 2 hours.
As the corporate reporting database needed to be recalculated several times a day, obviously re-calculating the full database was out of the question (and this was on a "high end" server, with dual Pentium Pro processors and "bags" or RAM).
So, the "standard" procedure was:
1) Export bottom rows
2) Clear the database
3) Import bottom rows
4) Re-calculate
Being new on the project, yours truly promptly forgot step 1! Yikes!
A quick discussion with my colleague didn't help ease my mind. His advice was to use the backup from the previous run and blame the missing data on (l)user error, as opposed to administrator error! :-O
Instead, I went to the client and informed them that I had borked the export and that we would try and re-assemble the database from the previous exports and the transaction log. It took about half an hour longer than usual and only 2 transactions were missing.
I actually got brownie points from the Finance Manager for being honest...
Back when I was less old, and less bitter and twisted, I remember asking the boss how to shrink an LDF file on an older version of MS SQL.
His response was that I should stop the server instance and delete the LDF file, then restart the service, and it should create a new smaller one...
So I did...
Good thing I copied the LDF file to another location, 'cos when I tried restarting the service it wouldn't come back up, and it definitely didn't create a new fresh transaction log like he said it would!
This ^^^ is the answer, for those of you who cannot for whatever reason do a full backup.
Change the database mode to simple, then run the shrink command on the log file, then change the mode back.
What you should do is a full backup, but if we all had the resources to stick to best practises all the time, then we'd all be a lot happier I expect.
One problem are sites the insist on cold backups of database files, instead of going through database backup utilities, or specific backup agents that knows about database backup APIs (because lack of knowledge, and costs...)
So maybe they just stop the instance, backup (maybe just the data file...), restart it but many of the "cleanups" a proper backup would do are bypassed.
There are good reasons while SQL Server is by default setup in "backup mode", or Oracle in "ARCHIVELOG" one. They are safer, but will fill your disks also if you don't know what you're doing,
and it is true that many applications that install an instance behind your back (i.e. VMWare....) won't tell you what to do if you're not an experienced enough DBA with that kind of engine.
With SQL Server 6.5 I was diagnosing an issue that ultimately came down to SQL not having enough memory allocated to it. Back then you had to specify how much RAM you wanted it to use (I assume it's more clever now but I don't really know).
Anyway, it was running on NT4 and I set it to use half the available memory. I forget how much it had - 256mb or something - but completely forgot it was in 2Kb increments so setting it to 128Mb really meant 256Mb.
Net result was the machine booted, SQL started and immediately took all the available RAM and the whole thing blue screened. Obviously, this was whilst I was at the client site, whilst remote controlling their server which was in a data center elsewhere.
We had to get a technician out to put more RAM in the server in the end. They weren't impressed...
Anyway, it was running on NT4 and I set it to use half the available memory. I forget how much it had - 256mb or something - but completely forgot it was in 2Kb increments so setting it to 128Mb really meant 256Mb.
Yes. Because memory was specified in 2K pages just like with Sybase SQL Server/Adaptive Server Enterprise, that MS SQL Server has its roots in.
Net result was the machine booted, SQL started and immediately took all the available RAM and the whole thing blue screened. Obviously, this was whilst I was at the client site, whilst remote controlling their server which was in a data center elsewhere.
We had to get a technician out to put more RAM in the server in the end. They weren't impressed...
Or you could have just started it in minimal single-user mode with -f command line option, and then used sp_configure to resize memory to more acceptable size.
Sybase on UNIX of course would be even easier as configuration is always written to a text file on filesystem, but I digress.
The Great thing about SQL Server is how easy it is to set up and use with a little bit of knowledge. Unfortunately that's also its biggest problem. I've lost count of the number of databases I've seen where its set to full recovery model but no transaction log backup. Back when I first started I was very helpful and would set up the backup. These days I just send a warning email to the DBA... and then re-send it when the server falls over as its run out of space (I know I'm Evil, but in my defence, I find people learn far quicker from mistakes).
...specially if they match the amount of RAM memory.
Classical who me? here...
Things like hiberfil.sys, pagefile.sys on windows... some other OSes store the entire memory load of the OS in a ginormous file on root, that is used when you reboot to commit all the changes... pretty much like Windows, but with much more cryptic names. And no .sys extension to help you.
I heard of such classic instances of someone deleting these key files... and that system refusing to reboot. At all. Or when it did, it looked like something freshly installed, without the millions of patches, updates, and configs tuned to that machine... systems that upon startup are asking for keyboard layout and Time Zones... telltale clues that Very Bad Things Happened™.
I may have done that myself... when Windows was not adamant about keeping those files...
A friend deleted /vmunix on a largish Sun system back in ~1988 ... fortunately on a Saturday afternoon. Also fortunately, he had enough sense to call me before he started "fixing it". All was well come Monday morning, and I had free beer for the rest of the month.
+1. In the classical period (1995-2005) being an enterprise DBA was pretty hard. You had to know your OS inside out, you had to understand how to do fast I/O and then you had to design the database, as in tune how tables should be laid out for maximal performance. Wasnae easy, cap'n.
These days, I'm not so sure (I'm out of the biz). Do Oracle admins still worry about tuning the SGA for example? Does anyone use raw block devices? Or is it all self-tuning and autonomous for all but the most esoteric setups?
"Or is it all self-tuning and autonomous for all but the most esoteric setups?"
More or less.
You can get cheap all-flash storage arrays that can do 100,000 x more IOPS than you had access to during the classical period, and of course nobody runs databases against pentium 3s with only a few Mb of RAM...
So unless you have top tier requirements It's easy to just throw hardware at a problem to make poor design go away.
"So unless you have top tier requirements It's easy to just throw hardware at a problem to make poor design go away."
This is so true. Can't count the number of places that have suggested we need bigger boxes to cope with the all the data Yet 10 minutes later you realise it is not just poor design, it's friggin' criminal.
System and user db's on the same disk. Data and log files on the same disk. and then of course the clincher it's all sitting on the OS drive.
+1. In the classical period (1995-2005) being an enterprise DBA was pretty hard. You had to know your OS inside out, you had to understand how to do fast I/O and then you had to design the database, as in tune how tables should be laid out for maximal performance. Wasnae easy, cap'n.
Hear hear. I miss those days. You really ended up knowing the systems, both hardware and OS, better than most of the sysadmins.
Oracle tries to be clever and decide things for you. It does acceptably for most of the time. But not always. And being Oracle is buggy as hell. And it can get it really horribly wrong. It is also even worse piece of bloated crap than what it used to be.
All-flash or tiered is cheap so that takes care of I/O performance. Of course raw blocks weren't just for performance, it was also to bypass filesystem cache, after all you cache rows in memory anyway, but also for ensuring that writes that the database thinks have been written to disk have actually been written to disk.
Having said all that, even if everything is faster and more automatic, the data volumes and business requirements have increased too so there is still very much need for skilled DBAs. Don't forget lusers have not improved even if hardware and software may have done.
When I was 5, I spent the day at Dad's work, a multi-national market research company.
I was tasked with taking a loose, and rather large handful of sorted punch-cards to the computer room on another floor. All on my own.
I made it OK. If I hadn't, I'm sure supermarkets across the country would have stocked the wrong brand of beans on their shelves.
I was working as an account manager/do-everything-guy at the local branch of an up and coming software multinational. I was tech guy at core, but a dev, not a sysadmin.
A livid customer's IT manager contacted my boss about our app's database crashing on them. He had never liked us, we were chosen by the biz side over his objections, so his moment to gloat. Yes, his "DBA" had been backing up faithfully every day, but now they had lost a full day's work since this was a late afternoon crash. So he wanted our RDMS vendor to examine his database core dump. IMMEDIATELY and he had already notified his C-levels of the snafu.
Dragged in our local CTO - yes, we were that small, and we looked at the Gupta SQL database manual together.
- open at index
- Restore DB, page 35, yes.
- But, that's gonna lose them that full day. That's what they're bitching about.
- (5 minutes later) Wait, what's that Log Journal Replay thingy?
- OK, let's just call them and tell them to restore and then follow procedure at page 42 of manual to replay the day's transactions.
Never heard of Mr IT again, but the customer's business person told us he came off as a proper ass when it became obvious they had never attempted a recovery or even read up on how to do one. Remember how upper management was notified? It's amazing how some people are incompetent at even management games.
The biz guy was pretty happy, they only lost 15 minutes or so of data. First time I got an inkling of what an LDF was for ;-)
I am annoyed for no good reason when people refer to SQL Server, a Microsoft DBMS product, as "SQL", which is the name of a standard language for expressing relational database queries and other operations. I knew SQL. SQL was a friend of mine. And you, SQL Server, are not SQL.
I would like everyone to refrain forevermore from doing so, beginning immediately.
Thank you.
20+ years ago I was primary support for the state's ambulance dispatch system - 000/999/911 calls come in and are sent to the ambulance dispatcher to send a resource. For some reason we were doing an upgrade in the early morning (less emergency calls in case something goes wrong) and as a precaution my off-sider was on-site as well. It was a home-grown Pick system running on Pr1me minis with a 3GL green screen front end. The upgrade went fine, but while I was looking around the system I saw a file that I thought wasn't needed so I deleted it. Immediate chaos - everything stopped. Mad scramble for paper and pencil so emergency calls could be recorded and ambulances dispatched. Fortunately we had a live backup system so after a few minutes of panic, we copied the missing file from the backup system and restarted and everything's back to normal.
Always got by off-sider to look over my shoulder after that and he didn't like playing with the system.