Good creative thinking, I know it's only Monday morning but have one on me -->
When a deleted primary device file only takes 20 mins out of your maintenance window, but a whole year off your lifespan
The weekend has been deleted. Pause a moment before you start your own workplace odyssey and enjoy another's trip to Oopsville courtesy of Who, Me? Today's story comes from "Jim", and concerns the time he and a colleague were performing an all-night hardware, OS, database and application upgrade of a daily newspaper's …
COMMENTS
-
-
Monday 6th July 2020 08:10 GMT Fruit and Nutcase
"and accidentally deleted the Sybase master device file while it was running."
I did something similar on a DB/2 for OS/2 installation on a customer site ironically whilst doing a roll-forward recovery after a database corruption. Some quick thinking got me out of a hole - connected to another of the customer's sites and copied the relevant file off that server, as the builds were identical. The connection... Lan Netview Managment Utilities to remote server at 9600baud
-
Monday 6th July 2020 15:59 GMT chasil
Another way to do this
There might have been a less traumatic way of accomplishing this.
As I remember, Sybase was able to mirror device files, and the free verson (11.0.3.3) was capable of doing this.
Assuming a mirror operation could be launched that could read the unlinked file, Sybase itself would copy the device file to a new location.
Oracle has the ability to "alter database rename file," and Sybase device file mirroring was the way to accomplish the same thing.
-
-
Monday 6th July 2020 09:32 GMT druck
There are much better ways of recovering a deleted open file, than crashing the system and hoping fsck recovers it. I did it the other day on Linux when I deleted an open log file, it wasn't very important but I got it back anyway. I believe even on Solaris the file handle will be under /proc/<pid>/<fd>, and a quick google shows the fsdb command will help in this situation.
-
Monday 6th July 2020 10:36 GMT big_D
This is assuming that a) there is such a thing as Google, when you do this b) you think about looking in /prod/<pid> etc.
In the middle of the night, in a time before Google or other major search engines, you were left to your own devices and what you could remember from reading the f'ing manual.
-
Monday 6th July 2020 12:46 GMT Chairman of the Bored
Speaking about the f*ing manual...
I think there is some sort of physical law that says something to the effect of, "If you're in deep ship when working on a production system, you will not find the required manual in your documentation wall. Nor will the requisite man pages be installed. After the disaster, you will find the resources with ease."
-
Monday 6th July 2020 13:59 GMT Anonymous Coward
Re: Speaking about the f*ing manual...
I had this issue many years ago, we knew where the manual was, in the locked filing cabinet in the tech support room. Luckily we had a hardware engineer on the team complete with toolkit. He removed the lock from the cabinet in about 30 seconds. The filing cabinet was not quite as secure once we replaced it but hey ho
-
Monday 6th July 2020 20:14 GMT NorthIowan
Re: Speaking about the f*ing manual...
Some locks are not very good.
In an emergency, see if you have a key that fits. I unlocked a minivan at church with my pickups key from the same manufacturer.
In the US, most camper storage compartments use the same key. One of my camper locks froze up so I replaced them all to get a new key. Will keep the old key in case someone else at a campground loses theirs.
-
Monday 6th July 2020 22:38 GMT DougMac
Re: Speaking about the f*ing manual...
It makes me chuckle to see the colo people in other cages oh so carefully label and string up the server keys, paired with each server, to make sure they don't get the mixed up...
When Dell only changes out the lock/key type every 5-10 years or so and I have a bucket of keys that would fit any of their servers depending on how far down deep you want to dig for it.
-
Tuesday 7th July 2020 11:20 GMT Luiz Abdala
Re: Speaking about the f*ing manual...
Taking this off-topic and running with it, my mom opened an old Ford car that looked identical to hers, *completely by accident*. It was the same make and model, parked right next to hers.
I happened to notice because that car was running on fumes, while we had just filled ours.
And the tires were bald. And it had 100.000 more miles on the odometer. And the radio was not set on her station.
The kicker: her keys could open both, but the guy arrived soon enough to catch us closing his car, and his set of keys could not open ours.
-
Tuesday 7th July 2020 14:52 GMT DCFusor
Re: Speaking about the f*ing manual...
I've had the same thing happen with a '66 chevy staton wagon, when it was around 4 years old (giving away my age) as a teenager.
Got half a mile in the wrong car, noticed some things weren't quite right, came back to the grocery store to see the other fellow trying to start my (dad's) car and failing....
At least back in the day in pastoral USA, it was only an occasion for some laughter.
Nowadays, it'd be charged as auto theft or something.
Fans of Deviant Olam (he uses that name on youtube and conferences) know that virtually all Ford crown victoria police cars, and hence taxis, are keyed the same....and you can buy that key on ebay.
-
-
-
Tuesday 7th July 2020 14:58 GMT GBE
Re: Speaking about the f*ing manual...
I had this issue many years ago, we knew where the manual was, in the locked filing cabinet in the tech support room.
Typical office desk/file locks are almost always cheap wafer locks that are easy to pick — even if all you have to work with is a couple paper clips.
-
Tuesday 7th July 2020 06:40 GMT tony trolle
Re: Speaking about the f*ing manual...
We found the manual, had missing pages as did the backup mini wall...lucky thing was i knew where a spare set was due to my night shift exploring practice.
Few weeks later I found the old erratum instructions that said remove the pages....turn that erratum page over and it DID say replace with the new versions. And the difference ? two 50% bigger blank lines. ok...
-
-
-
Monday 6th July 2020 11:16 GMT AVee
I was thinking the same thing. However, if you need 45 minutes to figure all that out it's risky. Anything can happen in that time causing more permanent damage. If you know how to do it on top of your head it surely is the better option, but if it's going to take time to figure out it quickly becomes scary...
-
Monday 6th July 2020 12:54 GMT ovation1357
I rember trying to use fsdb - not only did it come with a massive "here be dragons" warning but it also needed the user to have an intimate knowledge of the internal workings of UFS. Certainly not for the faint-hearted! :-/
I certainly share the concern about the approach of deliberately crashing the system - it could very well have caused more problems that it solved. I'm going to guess that this was a farily early version of Solaris as the later versions (starting from Solaris 8 I think) had Journalling enabled which in essence meant that it would have logged the file deletion and automatically committed it on boot after the crash.
Later versions of Solaris included the 'pfiles' command which could be used in a similar way to 'lsof' (which was considered a Liunx tool and only available as an unsupported extra on the Sun Freeware CD)..
A tricky problem here for sure, and one which I think we're all experienced in our careers.
-
-
Tuesday 7th July 2020 17:16 GMT Anonymous Coward
I remember the moment I realised that the way you check whether a ZFS thingy (pool? I forget) is good is to import it into the system which then checks it for consistency, in the kernel, where any kind of error is going to nuke the whole machine. Because, obviously, having any kind of userspace checker was beneath the people who wrote ZFS. So, here, I'm just going to run the equivalent of fsck on a pool I don't completely trust is not broken inside the kernel of the machine that's running <large financial institution>'s account database. Yes, of course I'm going to do that. Somewhere around then was when it became apparent that the reason to move to Linux was because some of the people involved in it had heard of the real world.
-
Tuesday 7th July 2020 17:48 GMT katrinab
zpool scrub pool
Then zpool status to see how the scrub is getting on.
While it is doing it, it will show something like
pool: pool
state: ONLINE
scan: scrub in progress since Tue Jul 7 18:42:43 2020
520G scanned at 1.10G/s, 43.4G issued at 436M/s, 15.25T total
0 repaired, 0.81% done, 0 days 03:28:36 to go
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
errors: No known data errors
When it is finished it will hopefully show something like
pool: pool
state: ONLINE
scan: scrub repaired 0 in 3 days 22:55:48 with 0 errors on Fri July 11 06:32:59 2020
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
errors: No known data errors
-
Wednesday 8th July 2020 10:54 GMT Anonymous Coward
zpool scrub pool
Then zpool status to see how the scrub is getting on.
Assuming that the system doesn't panic, or the scrub get refused, that is (I've had both happen).
ZFS assumes that it's internal copy-on-write mechanism means that corruption is always detectable and recoverable, simply by tracking checksums & pointers. That is true, IF (and it's a big IF) the problem is just a ZFS operation that didn't complete properly.
If the underlying problem is a disk going bad, or a driver that's corrupted data, there's no recovery mechanism. If the pool isn't consistent enough to be imported or scrubbed ZFS just reports "Nope, shan't.", and you've lost all access to all the data in the pool, even data which hasn't been damaged.
It has many nice features, but regular snapshots sent to a remote system are an absolute necessity
-
Wednesday 8th July 2020 11:29 GMT Anonymous Coward
And that's the whole problem: the pool being scrubbed is online. I don't want that: I want a userspace program which I can point at the devices which make up an exported pool (or a pool which has never been imported on this machine, I don't just mean a pool I've just exported...), and which will go and check it, either for basic sanity (are all the devices even there? etc) or in detail, in the same way that
zpool scrub
does. I want that because I want to minimise the chance that nasty problems will cause the machine to crash, and that is a lot less likely to happen with userspace code than it is with code that runs in the kernel. I also want to be as sure as I can be that the pool is good before I import the thing, so I can back out whatever change I'm making before being half-committed to it. There is absolutely no reason why such a utility should not exist: it could probably even share most of its source code withzpool scrub
. Indeed I know for a fact (because some of the papers on ZFS said so) that a fair amount of the early development of ZFS happened with the whole bloody filesystem in userspace.And I'm sure ZFS is more robust now than it was when I used it, but it certainly was the case a while ago that bad things could happen to pools which would cause awful results on the machines which imported them. I know this because I've watched it happen. That's why I want userspace checks: it's not academic nerdiness, it's because I've watched the smoke rising from the remains of some machine which ZFS bugs had just killed and been in the meetings where we decided to go back to UFS (or VxFS, I forget) as a result. And I worked for Sun, this was a personally embarrassing recommendation to have to make!
Right up until the end Sun suffered from having too many very clever people with too little contact with reality.
-
-
-
-
Monday 6th July 2020 08:17 GMT ColinPa
Read and understand the instructions first
I got called out to help a "production down" problem during an upgrade. I was trying to help over the phone, and not being able to see what was going on was a major problem.
The instructions were clear.
1 Delete the following files config1,config2 etc
2 Recreate the system
3 Enter the config data when asked.
What could go wrong?
I got called at step 3. "Where is the config info we have to enter?"
"It is in config1"
"You mean the file we just deleted?"
They could not recover the file from the backups - because they were not authorised.
We eventually found the data because someone has copied all of the config files into one place for education.
We changed the instructions from "delete..." to "rename... " and added "step 0 - print out...".
-
Monday 6th July 2020 10:57 GMT juice
Re: Read and understand the instructions first
It is mildly worrying how many things are resolved thanks to someone having a copy of the data sat in their personal filesystem.
At one point, I had a minion sheepishly come up to me (while I was talking a new minion through their first day of the job, entertainingly/ironically) and sheepishly announce that they'd deleted something which Really Shouldn't Have Been Deleted.
And while there were daily backups, the data was being constantly updated, so would require a fair amount of work to rebuild from the "last known good" state.
Thankfully, someone had just that very morning cloned the data for testing purposes, so we were able to use that to restore 90% of missing data, leaving my very subdued minion with just an hour or two of hard graft to finish the cleanup.
Fun, fun fun! Thankfully, it didn't scare the new minion off :)
-
Monday 6th July 2020 18:38 GMT Marshalltown
Re: Read and understand the instructions first
I never had any issues with any unix or linux system but I worked for a company that were religiously faithful users of Micro**** products. These would not infrequently eat their own young, including the servers. The Boss would every so often insist that we "clean up" our hard drives, specifically of old project files. I never did so, and oddly he would always ask me for help in recovering lost or strayed project files. Once, an entire year's worth of work vanished including gigabytes worth of images which were critical (maps, property photos, ...). Unhappily in that insatnce, the only recoverable data was what I had cached on my system - and since these were "team" projects and other team members dutifully deleted everything they were told to, it was pretty ugly for awhile. The bad part was that somehow, the backups were corrupted as well. That really set some hearts beating. The boss never ever again advanced as company policy that closed project files be removed from machines of the individuals that created them.
-
-
-
-
-
-
Monday 6th July 2020 17:23 GMT Doctor Syntax
Re: Seems like a proper who, me
I discovered a client's system was set up to do a backup from the live system to the hot standby overnight. I also discovered that he rcp, ftp or whatever it was would be terminated if not complete by start of business next day. I also discovered that although it probably worked when first set up but by the time I came on the scene there was no way it could be completed overnight and probably hadn't been for
monthsyears.Fortunately there were also tape backups.
-
Saturday 11th July 2020 08:51 GMT grumpy-old-person
Re: Seems like a proper who, me
Many years ago when I was still young I travelled from Johannesburg to Windhoek to upgrade the operating system on an ICL SYSTEM 4, taking THREE copies of the necessary stuff on 9-track magnetic tapes.
The mainframe could not read ANY of them!
Eventually got a colleague to plead with a passenger about to board a Windhoek flight to nurse a removable disk pack (quite large in those days!) as hand luggage which I very gratefully accepted from hm on his arrival in Windhoek.
Seems that the 'skew tape' used to align the heads on the tape drives was not well - but simply remedying this with a 'valid' skew tape was not an option as many years of the site's tapes would have become unreadable!
-
-
Tuesday 7th July 2020 16:06 GMT Stuart Castle
Re: Seems like a proper who, me
"A backup isn't complete until it is also successfully restored."
I am not religious, but Amen to that.
Too many people have fallen foul of assuming their backups are OK.. They need to be tested regularly. I've had to explain to quite a few people that their backup media is fallible, and has failed. Of course, working with students, I've also had to explain why it's important not to leave work to the last minute, but hey ho. Students left things to the last minute before I was born, and I suspect they'll still be doing so when I die.
Even big corporations have fallen foul of bad backups. My old Computer Science lecturer at college liked to tell a story of a failure at a major bank (Nat west IIRC). They had an issue with their computer systems. They lost a lot of transactions, and when they went to restore the backup, they couldn't read it. There was a 24 period they couldn't restore and apparently had to cancel any transactions during that period. A potentially costly problem.
-
-
-
-
-
Monday 6th July 2020 09:38 GMT Sgt_Oddball
Could be worse...
I managed to forget which drives I had my OS installed on for my home server and happily nuked the raid array whilst finishing up adding all 12, 600gb drives I've bought (they're getting pricy secondhand now, not impressed).
Thankfully, it was just windows 2012 r2 and I hadn't finished playing with it yet enough ot put anything sensitive.
So it's now running Ubuntu server and couldn't be happier with only cli commands.
-
Monday 6th July 2020 10:02 GMT Pascal Monett
Good article, because it outlines the two types of goofs
On the one hand, you've got the technician that knows the system, makes a mistake, analyzes the situation correctly, finds the loophole and re-establishes functionality without any major hiccup. Hair-raising to be sure, heavy implications for failure, but in the end his in-depth knowledge allowed him to gracefully recover from the error.
Then, on the other hand, you've got the blithering idiot that knows just enough to make himself dangerous, has no idea of the consequences of his actions, and will be totally incapable of recovering anything.
I know who I'd prefer working with.
Good article.
-
-
Monday 6th July 2020 10:47 GMT Anonymous Coward
Re: Oxymoron alert
Though this is true, I have been at the end of a paranoid rant from the DBA at a certain (no longer exists) insurance company. In the course of this he accidentally gave away that his paranoia was due to the company having provided totally inadequate disk space and backup capability.
The next day I moved my car insurance to another company.
-
Monday 6th July 2020 16:06 GMT Bruce Ordway
Re: Oxymoron alert
>> first requirement of a DBA is paranoia
I learned how totally oblivious I was with my very first PC running MS-DOS.
While cleaning up a directory, I was annoyed by two files persistently listed, "." and "..".
I finally typed del .. & pressing enter...... nothing. It was a hard lesson.
Luckily, back then, PC manufacturers provided 24 hour phone support.
So I was able to recover the OS that night. Several days to reload everything else.
Possibly due to that rocky beginning, I am an obsessive planner for disaster now.
Even today, when VM's have rendered backups (mostly) obsolete , I continue to maintain multiple copies/locations.
-
Monday 6th July 2020 19:26 GMT Dog11
Re: Oxymoron alert
"While cleaning up a directory, I was annoyed by two files persistently listed, "." and "..".
I finally typed del .. & pressing enter...... nothing"
I had a client who was a neatnik that did that. I had no idea it was even possible (surely there would be klaxons blaring and dire warnings, no?... no). We did recover, but it was a long time ago and my memory is droppiong bits. Probably with UNERASE and a good deal of puzzling what the first letter of each filename should be (MSDOS would replace that character with a placeholder to signify that the file was erased).
-
-
-
Tuesday 7th July 2020 14:09 GMT DemeterLast
Re: Oxymoron alert
Oracle DBAs know that, if a problem occurs, the SOP is to explain to the money people that a check must be cut to your Oracle sales rep. I'm pretty sure that's Chapter 1, page 1 of the How to Be an Unleashed Dummy Oracle DBA in 24 Hours handbook.
On the other hand, the first thing SQL Server does on installation is hurl a rock at your face and insult your mother. It will then proceed to convert all system DLLs to kiddy porn and notify the FBI.
-
-
-
Monday 6th July 2020 10:03 GMT DrBobK
/dev
A friend of mine who was doing his PhD and programming PCs in Prolog to do something to do with analysing people's understanding of skin diseases (shades of The Singing Detective) took it on himself to 'tidy up' a sparcstation 2 that was about to replace something ancient we'd been using as a fileserver and host for some early experiments in website design (this is 1990 or so). For some reason he decided (as root) to delete /dev as it seemed to be full of lots of useless empty files. Not a good idea.The machine was connected to a network and the console was running the SunOS 4.1 GUI with a terminal open. I did not know much more than my friend, but I did have my own Sun 3/60 and I'd been on a short course for scientists who had to deal with new-fangled workstation things. I have forgotten how I did it, but armed with my trusty SERC 'how to be a unix system admin' manual that came with the two day course I'd done, I managed to retrieve everything. In the land of the blind the one-eyed man is king.
-
Monday 6th July 2020 11:21 GMT Antron Argaiv
Re: /dev
'how to be a unix system admin'
Manual?
Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad.
Data General, ca. 1990. The decision had been made that the in-house MV machines and their in-house written CAD system were an expensive dead end for the engineering staff, so Suns and Viewlogic, it was. Got to name my own system, was my own sysadminnand it was visible from The Internet, because -- no nasties. Mr Morris and his worm, Canter and Siegel were all far in the future. Learn UNIX or sink, and learn, we did. Among other things, we found Usenet, and comp.os.minix
Thanks for the memories!
-
Monday 6th July 2020 12:51 GMT Stuart Castle
Re: /dev
Re: "'how to be a unix system admin'
Manual?
Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad."
Mousepad? You were lucky.. We were given a sheet of tinfoil and we had to draw our own grid on it. For our younger readers, early optical mice (such as those used on early Sun workstations) required an aluminium mousepad that had a very fine grid of lines on it.
-
Monday 6th July 2020 14:10 GMT UCAP
Re: /dev
Manual?
Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad.
Former company (same one as my message below) received a brand shiny new DEC VAXstation for a project. Thing arrived on a pallet with loads of boxes, quite a few of which were documentation. All of the document ring binders were put on a shelf still in their shrink-wrap (we could not be bothered to unwrap them - too knackered after getting the workstation out and set up.
A few weeks later the programmer using the system wanted to find out the VMS system call required to print a message on the console. Took four of us best part of an afternoon to solve that one, and we had to consult pretty much all of the manuals to do so (Manual X says "see Manual Y" which says "see Manual Z" which says "see Manual X" ...).
I've hated the thought of VMS every since.
-
-
-
Monday 6th July 2020 10:14 GMT UCAP
/ tmp
Once, many decades ago, I was sysadmin for my company's three Sun-3 workstations (one diskless node, and two disked nodes). One morning, fairly early before my brain had fully booted, I decided to clean up the cruft in /tmp (it had got pretty full and the OS was complaining on occasion). So I loggedin as root and typed the immortal command "rm -rf / tmp". I then reaslised about the significant space in the command and hit "control-C" PDQ, but not quite Q enough to stop half of /user from having been deleted. I subsequently 'fessed to my manager and spent the rest of morning rebuilding the workstation.
-
Monday 6th July 2020 12:54 GMT Chairman of the Bored
Re: / tmp
Aye! I've shot my toes off in much the same way.
Tip: put a zero length file called -i in the root directory. It will force a rampaging rm -rf back to interactive mode. Then the trick to to not hit 'y' when prompted...
It's a zero length file, but at least once has covered my entire posterior.
-
-
-
Monday 6th July 2020 17:50 GMT Throatwarbler Mangrove
Re: / tmp
"Modern rm doesn't allow you to do rm -rf / without confirmation."
Insert ranting here about how kids today can't do anything without training wheels and how back in my day, Unix commands ran without confirmation because if you didn't know what you were doing you shouldn't be let in front of a computer and now everyone thinks they can use one and deserve protection from their own idiocy and September 1996 was the death of the Internet and why do people even use this new World Wide Web thing when gopher was just as good and nothing is as good as it used to be like it was when only furry-toothed geeks ruled the computer labs and don't even get me started on systemd and GNOME and who even uses vim instead of vi, linking /bin/vi to vim is heresy . . .
Anyway, have some homebrew.
-
-
-
Monday 6th July 2020 22:49 GMT logicalextreme
Re: / tmp
Ah, this probably explains my quick test (in a non-root directory) failing. For some reason it had never even occurred to me that filenames were a perfectly cromulent vector for command arguments (presumably the risk inversely correlates with the prescriptiveness of the argument order).
-
-
-
-
Monday 6th July 2020 10:20 GMT Anonymous South African Coward
Way back in the 90's I did POS support for a couple of clients in Pretoria.
One difficult client had all the bells and whistles - shiny new Novell 3.12 plus a couple of DOS workstations and a Windows 3.1 workstation for himself. And 120Mb tape drive.
Laughably small in today's terms of Giga- and Terabytes. Anyway.
He made a backup for the day, put it in the safe with the other backup tapes, locked the safe and went home for the weekend.
Come Monday morning we received a frantic call from him - ne'er-do-wells happened during the weekend, they took the file server, workstations and safe (including the backup tapes). So he had nothing to fall back on, and he was due for a SARS (income tax) revenue. Ouch.
-
Monday 6th July 2020 11:09 GMT Anonymous Coward
I am cursed with having to work with people who, when a filesystem goes full, go gzip happy.
Java file? Ohh, that's big. I'll gzip it. Queue me looking around trying to see what was done because now the billing system is falling over with a Java error.
Log file? Ohh, that's big. I'll gzip it. Log rotation breaks.
LOG file? Ohh, there's a few. I'll gzip the lot. Who needs LDAP anyway?
Pesky files under /usr/lib64? I have people who can deal with that. Who needs a running system anyway.
Not as destructive as deleting, but still.....
-
Monday 6th July 2020 15:38 GMT juice
> I am cursed with having to work with people who, when a filesystem goes full, go gzip happy.
Nah. The fun one is when someone deletes a log file (or similar) which a process is still hanging onto.
At that point, you're left with a large chunk of space which can't be reallocated, and which will resolutely stay as-is until you find and kill/restart the parent process.
It's the dark-side equivalent of the little trick used to save the day in this article :)
-
Tuesday 7th July 2020 12:52 GMT RichardBarrell
> Java file? Ohh, that's big. I'll gzip it
Please tell me they didn't gzip a .jar file? Those are already zip files, compressed with deflate!
(You may get a little more compression out of them because zip compresses files one at a time, but rarely an interesting amount.)
> Log file? Ohh, that's big. I'll gzip it. Log rotation breaks.
Log rotation should already have been gzipping yesterday's rotated copy. ;)
-
Monday 6th July 2020 11:11 GMT Remy Redert
Recently had a fun time with backing data up. A semi competent computer user asked me to help him get a Linux instead going on one of his PCs. Having set up a previous machine of this with Linux in the past, I said sure, helped him decide what distro he wanted and came over armed with a live USB.
I checked with him that he had everything important backed up, he had. So I plug the USB stick in and let it run the pre configured install while we talk.
I walk him through the steps so he can do it himself next time, including the part where the drive gets repartitioned.
At the end I said we were ready to put his backed up stuff on the machine and asked him for it. D:\backup he says. He'd created a new partition to put his back up on. It was of course nuked during the install.
I've learned to always ask for the backup before even starting the machine.
-
Monday 6th July 2020 11:25 GMT Antron Argaiv
My approach for this is to tell the aspiring Linux user to buy another disk off Amazon. They're well under $100 and will arrive the next day.
I then carefully disconnect and remove his existing OS disk, replacing it with the newly purchased one, on which I do a clean install of Linux.
We can (or not) copy his files off the old drive, which then remains "on the shelf", in case he has a change of heart and decides Linux is not for him. This also provides him with the comforting knowledge (he can see it right there on the shelf) that his decision to try Linux is completely reversible. Very handy when converting a less-than-knowledgeable friend or family member from Windows to Linux (for reduced incidence of service calls)
I do this for myself as well, when upgrading every few years. Handy to have a complete backup drive, which was perhaps getting long in the tooth, and a fresh new, (often faster and larger) drive for the new install.
-
Monday 6th July 2020 16:08 GMT The Oncoming Scorn
I For One Welcome This Strategy
Yep - Did this for my massage therapist (The tales arising from that told elsewhere) & others in the past (Along with myself) replacing the spinning rust with a SSD,
In the corporate world I always tried to ensure a re-image was done on a replacement system, other places your machine was assigned for life or at least for the term of your employment I'd rob a drive from elsewhere & keep the originals back for a week.
-
-
-
Monday 6th July 2020 11:48 GMT Blackjack
Windows 95 was still new...
It was 1996, one of those "fix windows " programs deleted a file Windows needed to boot the GUI. I went into "DOS MODE" and found a backup. I renamed the *.bak" file to "*.bat" then rebooted. It worked and saved us having to buy a Windows 95 install CD, since the machine didn't have one as it came with Windows 95 pre-installed.
I was 13 at the time and a totally noob with computers, who do you think had run that "fix windows" program in the first place?
-
Monday 6th July 2020 16:25 GMT heyrick
Re: Windows 95 was still new...
"I was 13 at the time and a totally noob with computers"
And, yet, you likely knew more about what was necessary than the authors of those "fix Windows" programs.
Before letting anything near my real machine, I'd give it a whirl on an old box running a basic installation (era of 98SE ~ XP). My god, I don't think I found one single registry cleaner that didn't make things worse. One of them made such a mess Windows threw a BSOD on booting.
Accordingly, my machine's registry might be cluttered and suboptimal, but it hasn't been wrecked by some half-assed attempt to "repair" it.
-
Monday 6th July 2020 18:30 GMT J. Cook
Re: Windows 95 was still new...
Yep. I've had to whip out the voodoo beads and rubber chicken to recover from a few things. (malware before they got stupid complex, virii in the same manner, and the occasional HP driver* that decided to soil it's underpants.
I generally leave the registry the hell alone unless the machine is broken or complaining about something that's in the registry, and even then only on the advice of the author/publisher of the program or the OS.
*and almost always for a crappy inkjet printer that had ZERO BUSINESS installing kernel level drivers. WTF HP?
-
Monday 6th July 2020 21:29 GMT A.P. Veening
Re: Windows 95 was still new...
I also mostly leave the registry alone except for adding a couple of small things when installing Windows:
To force Home systems to use <Ctrl><Alt><Del> to log on and not show the previous user:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System]
"dontdisplaylastusername"=dword:00000001
"DisableCAD"=dword:00000000
"LogonType"=dword:00000000
To show seconds in the system clock (handy to detect a hanging system):
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\Advanced]
"ShowSecondsInSystemClock"=dword:00000001
-
Monday 6th July 2020 22:56 GMT logicalextreme
Re: Windows 95 was still new...
I've got a PowersHell script full of 'em, and GP tweaks etc., to put Windows into "barely competent computer user" mode. Labels on taskbar buttons, file extensions shown, all that crap. I like the idea of seconds to easily check for hangs, but I worry that it would use an entire CPU core just to do so.
-
-
-
Tuesday 7th July 2020 01:33 GMT Blackjack
Re: Windows 95 was still new...
I actually didn't have that much problem with registry cleaners because I made backups. And because I mostly just used them to delete leftover register keys from uninstalled programs.
Nowadays I don't bother with the registry unless I absolutely have to. And I always backup first.
-
-
-
Monday 6th July 2020 13:09 GMT Stuart Castle
A few years back, my main PC was running Windows 7. The OS was actually running well, but over the years, had built up a lot of cruft. I tend to use my home PC for mucking around with different packages, so it builds up a lot of crap.
To combat this, at the time, I wiped and re-installed Windows every few months. While I do currently back up the important stuff online, I didn't at the time, just relied on the fact I had multiple drives on the machine, with the OS being installed on an SSD, and applications and data being stored on separate HDDs, with my various network shares all stored on one dedicated drive.
Just as I had got to disk management page on the Windows 7 setup, my housemate walked in and started talking to me. I happily hit the "delete" and "new partition" buttons, thinking I'd selected the SSD. Then realised I suddenly had 1TB of unpartitioned space (the SSD is only 256GB). After a couple of seconds, I realised I'd just wiped one of the HDDs. Thankfully, it was just the network shares, which did contain some things I needed, but nothing I couldn't download again or otherwise re-create.
-
This post has been deleted by its author
-
Monday 6th July 2020 14:49 GMT Anonymous Coward
Oh, the joys of dd!
Many years ago, I was working for a small company. On this fateful day, as the last one scheduled to be out of the office, I was tasked with starting the backup job of our file server, to avoid disruptions to the users.
So at 6pm on a Friday evening, well after the last user had left, i fired up a terminal, ssh'ed in, ran cat on my cheat list file of useful commands which i kept in my home directory, selected the line i needed with the mouse, copied and pasted the command into the shell along with the following blank line i helpfully inserted after each command, because, you know, hitting return was so much unnecessary effort.
So the command started and i took one last look to make sure it was running fine before shutting of my monitor and then I realized, to my horror, that i had actually entered the wrong command:
dd if=/dev/zip of=/dev/hda5 bs=4M
I had mistakenly copied the restore command instead of the backup command. I panic-pressed Ctrl+C as fast and as often as i could, enough to make me - in retrospect - appreciate the durability of the keyboards we had back then. I'm happy to report that for whatever reason, dd hadn't actually started writing before I was able to stop it, and even more surprising, it actually responded to CTRL+C - i seem to remember that not always being the case with the dd command. A satisfying 0 Bytes Read+Written message helped return my blood pressure and heart rate to a more sustainable level.
Sometimes It's better to be lucky than smart.
Obviously, I prefer to remain anonymous.
-
Wednesday 8th July 2020 00:30 GMT whitepines
Re: Oh, the joys of dd!
I'm happy to report that for whatever reason, dd hadn't actually started writing before I was able to stop it
Very likely because it was busy filling its buffers before starting the write process, and if your tapes are as slow to respond to the initial position request as some of ours, that could be several minutes before dd would even have enough of a buffer to start writing anything.
-
-
Monday 6th July 2020 16:52 GMT hugo tyson
Re: Serious question from a non Unix person
"Delete" only deletes an entry in a directory (folder) and decrements the reference count on the actual bag-o-bits file. When the file has no references and no open handles on it, *then* the file is deleted.
Normally, in the simple case, of course "rm foo" just removes folder entry foo, and also the bag-o-bits underlying it at the same time....
-
Tuesday 7th July 2020 08:29 GMT Paul Crawford
Re: Serious question from a non Unix person
While it seems like a liability in some cases (i.e. you can remove the directory entry of an in-use file) it also is the reason that UNIX like systems can do updates with far less reboots and trouble compared to Windows (that will not all this on an in-use file).
The typical approach in UNIX is you write out the new files to something like 'foo.tmp', sync the file system so it if fully committed to disk, then rename 'foo.tmp' to 'foo' which is an atomic operation (and works in the same way that removing a in-use file works - on the directory mapping to inode, not on the actual file contents). Thus any process will only ever see the old file (via an already-open handle) or new file but even if a system crash occurs around that time, never an in-modification file.
Of course any running process using the old 'foo' won't be updated but many processes and background daemons can simply be restarted (or are short lived) and the new version is now in use without disrupting anything else.
Doing the kernel is trickier as it has to be rebooted for a new kernel image, but some Linux distros support in-use kernel patching by other means.
-
-
Monday 6th July 2020 16:37 GMT J.G.Harston
One thing I learned with early Unix: if you mkfs onto a mounted device, then sync before logging off, it writes the cached filesystem back onto the fresh blank disk you've just created. I have memories of tearing my hair out wondering why my clean new 720K disk kept claiming it had 2M free on it. I was in the habit of typing 'exit' to log off, and my exit script included sync to make sure everything was nice and tidy....
-
Monday 6th July 2020 19:29 GMT Luiz Abdala
This is pretty much like killing an engine in a 45 ton lorry by accident, drop it in neutral, and then slam the 9th gear in, hoping to $deity the clutch can survive, the crankshaft won't leave the bowels of the truck in a hurry while it jump-starts itself to life....
...while carreening downhill at ever increasing speeds, you know the brakes won't work alone, and there is a sharp turn ahead.
-
Tuesday 7th July 2020 01:03 GMT Anonymous Coward
Some two decades ago or so, most of our computing resources were maintained by researchers as a side job. I had the responsibility of keeping couple of Solaris and HP-UX machines up and running - including the large Sun server that hosted all most all our home directories and lots of important research data at our department.
One time we had an inhouse office party and as I thought that the program was quite boring, so I got a brilliant idea - as no one would be using any of our computing resources, I could shut down that Sun server and upgrade Solaris from 2.5.1 to 8 (if I remember correctly). All backups were up-to-date, OS upgrade should touch only file systems containing OS directories and I could just sip wine while the upgrade processes - what could go wrong?
As soon as the system booted to Solaris 8, it was easy to find out what went wrong - all the home & data file systems were missing. That was one of the most terrifying moment that I have ever had with computers. I was not able to find anything that could have helped in that situation - except the note that I did not read before starting the upgrade - that one should disassemble all software based RAID arrays before upgrade.
So, it looked like all the metadata etc. that was required to run those "RAID0" mirrors containing /home & /data went missing.
What next? Should I return to the party and tell everyone that most of you won't be working for week or so because I had to painstakingly restore all the file systems from backup tapes? My last idea was that what if I just try to recreate the metadevices and disk arrays using the very same commands that I used to create them in the first place? If the only other option was to restore everything from tapes, then why not to try?
As by some strange magic, the trick worked - all the missing file systems came back on line and there was no corruption what so ever. The feeling that would be run over by 40-ton truck vanished at the same moment...
-
Tuesday 7th July 2020 01:18 GMT Anonymous Coward
mySQL Slave - oh bugger
Relinking a broken mySQL slave to the master via SSH, windows open to both. Restore 3-hour-old version of the DB to the slave as a seed before I sync them (it's half a world away behind a slow connection) - then realise I'm in the "master" SSH session, and 80-odd staff have just lost 3 hours of notes and tracking data. Almost fainted.
Deep breath, blood flow restored - and hourly backups were working fine, only lost about 15 minutes' data in the end. Lessons were learned!
-
-
Wednesday 8th July 2020 22:55 GMT tfewster
The "Oh No" second - the time between hitting Enter and realising your mistake.
With experience, time slows down as your finger descends towards the Enter key, as a sixth sense that something is wrong kicks in, enabling you to stop yourself before the fateful commit.
(Yes, I do still make mistakes, just not as many these days.
-
-
Tuesday 7th July 2020 08:36 GMT Anonymous Coward
When rm -rf * is not what you meant
I somehow managed to delete the entire java folder of a running WebSphere server when doing some maintenance. Oh this is going to hurt when I tell management the server is dead. Except if ket merily chugging away. Now what to do? Well, it was a node, and it had brethren, so, why not just scp the java dir from one and "restore" this one? Worked a treat and never had to tell anyone.
Anonymous, cause it's not that good a story.
-
Tuesday 7th July 2020 08:39 GMT aki009
Reminds me of that day...
I had an interesting experience deleting old user accounts some 30+ years ago as root (the only choice for the system).
For convenience I was using a system provided script that also removed the home directory for a deleted user at the same time.
Everything was going well until I hit that one user who had his home directory set to "/". I have no idea why a regular joe would have his home set that way, but the consequence was that the OS happily ran "rm -rf /" for as long as it could.
Fortunately the system was being repurposed and only required an OS reinstall from -- gasp -- floppies. I forget how many there were, but it was quite a massive pile.
-
Tuesday 7th July 2020 22:12 GMT Anonymous Coward
Feeling lucky yesterday
Yesterday, I had to to upgrade a linux 2 instance in AWS EC2.. in the interest of time and against usual procedures I made a short cut and just created an AMI, terminated the old instance and spun a new one.. and whalla! of course I couldnt access it. Fortunately, everything can be restored easily enough using the attached volumes.. but what could have been a 10 minute task ended up almost an hour.