Double Ungood
I love the reference to Orwell. I wonder if that is a subtle dig at those who complained about his use of language, or having to banish terms such as master from the kernel.
Linux overlord Linus Torvalds has rushed out a new release candidate of Linux 5.12 after the first in the new series was found to include a ‘subtle and very nasty bug’ that was so serious he marked rc1 as unsuitable for use. “We had a very innocuous code cleanup and simplification that raised no red flags at all, but had a …
I love the reference to Orwell. I wonder if that is a subtle dig at those who complained about his use of language, or having to banish terms such as master from the kernel.
Isn't it a munged reference though? If you want emphasis over "ungood" it would would be "plus ungood", at least from my recollection - my copy of Nineteen Eighty-Four is buried by other books in front of it on the shelf and since I'm not completely sure where it is I can't be bothered digging it out. However, I don't recall any bar on plus and negated forms. It's only when you need even more emphasis you add rather than substitute the double, i.e. double-plus ungood is correct NewSpeak.
Hopefully the number of such early adopters that use a swapfile as opposed to a swap partition will be quite small given the default partition setup in most of the installers that I am aware of.
icon: we definitely need a 'waiting for when it is safe to go to the barbers' icon for the UK at any rate.
Let's not forget just before Christmas...
Windows 10 20H2: CHKDSK /F damaged file systems on SSDs with Update KB4592438 installed
https://docs.microsoft.com/en-us/answers/questions/206618/are-these-known-issues-with-kb4592438-with-the-ssd.html
https://support.microsoft.com/en-us/topic/december-8-2020-kb4592438-os-builds-19041-685-and-19042-685-a548ef85-dec5-e58e-0c33-206784bfcf91
In Windows 10 20H2 with installed cumulative update KB4592438, CHKDSK causes massive issues. It destroys the file system during a disk check on SSDs, so Windows 10 can’t start after a reboot. Here is some information about the problem and the affected Windows 10 build.
Microsoft has fixed that bug, but not before it destroyed my own laptop.
And no, I didn't get the data back from the laptop, but I did have a recent backup of the data, but worth stating System Restore point, Recovery from Cloud/Local didn't work, just exited with errors.
It was truly trashed, and several days were wasted due to this.
What I found particularly unfunny at the time was (un)helpful souls chiming in that nobody should be running chkdsk on an SSD anyway.
Conveniently forgetting the fact that, if Windows thinks there may be a disk error, it does it automatically on next boot. Unless you were aware of the issue and stood over every boot to abort it, you were screwed.
I run an external backup disk with Acronis chucking incrementals and a full every five times for this sort of reason. It actually wasn't MS that made me instigate this policy. Step up Abit and take a bow for your fucked up BIOS update, which would irrevocably trash a Raid set faster than you could say; "WTF?".
Most recent visit from the Disk Corruption Fairies; An MSI board electing to have its on-board power go iffy in such a way as to drop the NVMe slot at inopportune moments, with detrimental effects to the filesystem thereon.
Anyone not running regular, verifiable backups of any machine they give a shit about is asking for it.
Anyone not running regular, verifiable backups of any machine they give a shit about is asking for it.
Or, as I've seen it stated elsewhere, "Data you haven't backed up is data you didn't really want."
What if I helpfully chime in and suggest that nobody should be running Windows? :-P
Swap *files*? Never saw that, but then I did not install those Ubuntu versions from scratch. I mostly use mint or Debian, and the last time I installed these they wanted to set up a swap partition, which was fine by me. It is needed for suspend to disk anyway.
The only time I saw a swap file was on some Windows system a decade or two ago *shudders*, and as far as I can recall, even the Linux distros I tried out twenty odd years ago all had swap partitions (Debian, SuSE, Mandrake, and a bunch I didn't use a longer time... Gentoo maybe? Too long ago...)
Linux allows both swap partitions and swap files. A swap partition is used by default. Ever since memory has become cheap and plentiful many people (me included) don't create swap partitions - if you need to swap you are in trouble already, though with swap you handle the trouble with more grace, granted. Better avoid trouble altogether and provide enough memory though. [Aside: your decision may depend on whether you like to hybernate.]
There are situations where you decide you need swap space and it is too bothersome to carve out a partition. You can create a swap file then - "man mkswap" for details.
I've found in the past that regardless of memory you will run out, and with some tasks fast enough to completely lock the system. Sometimes just having swap there will cause things to slow down enough that OOM killer or you have a chance to kill things before they completely freeze the machine. This is why with even quite a lot of RAM I like having a couple of GB of swap as a sort of crumple zone.
This is an especially pertinent issue because it turns out that having a modern SSD in your machine creates a very interesting (and rather annoying) situation as per this Linux Kernel Mailing List thread https://lkml.org/lkml/2019/8/5/416
The crux of the matter is that with or WITHOUT a swap file/partition, if you 'run out' of memory on a Linux system with an SSD then it's going to become almost completely unresponsive and thrash the living hell out of your SSD for an indeterminate (but long) amount of time!
Despite having 16GB RAM in my laptop I occassionaly failed to notice my memory running low due to having a tonne of stuff open and ended up just having to Alt + SysRq + REISUB my system, losing any unsaved files in the process (http://blog.kember.net/articles/reisub-the-gentle-linux-restart/).
N.B. There's an option to force the OOM Memory killer (Alt + SysRq + f ) but by default Ubuntu only allows a subset of Magic SysRq commands and I hadn't expanded that.
N.N.B. Remember that the magic key is 'f'! Don't think of 'o' for 'OOM' because, as I found out once: 'o' is for 'Shutoff the system', which does exactly what it says!
After a couple of hangs due to OOM, I noticed that I had a swapfile which I thought I'd gotten rid of. So I booted with no swap at all and then, to my suprise, hit exactly the same thrashing issue! I then reinstated swap so as to keep the benefits of larger virtual address space and the abililty to lose pages which are completely unused as per other comments here.
I don't fully understand the reasons behing the 'without swap' situation, but I believe it's essentially to do with the Kernel trying write back pages of memory that are backed by files (as opposed to anonymous pages which aren't). With a classic spinning rust hard drive the response times are slow enough that the Kernel detects a series of 'page faults' trying to do this and then invokes the OOM Killer to recoupe some physical RAM and unlock the system.
As per the LKML thread, it seems that SSDs are so fast that they break this mechanism by essentially being fast enough not to trigger OOM but actually leaving the system going through a long period of thashing and being slow. My basic understanding here is that this is generally less of a problem with server workloads but is really bad for interactive desktops.
Anyhow, there are moves to improve this in the Kernel, hopefully it will become a bit smarter and more configurable.
In the mean time I've installed Facebook's 'oomd' (https://github.com/facebookincubator/oomd - but just 'apt install oomd' will do) which monitors memory 'pressure' PSI stats and theorectically makes it less likely for the system to reach an unresponsive state.
Actually, as another reply to your post points out, it's always a good idea to have a couple of GBs as swap. Long gone are the days when the rule of thumb for swap size was twice the RAM. But Linux can swap out seldom used pages in order to free space for disk cache, if nothing else.
Personally, I prefer block device swap, rather than file swap. I use LVM and I'm usually in position to create a logical volume for additional swap if the going gets tough.
Yeah, there are always seldom used pages, all the bits of applications which are just used during initialization, the shit that just tends to run at boot time etc. This gets loaded into memory, used once and is then never needed again, it can be kicked out of RAM at no cost. So long as no one is waiting to get pages back off the swap device it's not costing you anything performance wise.
There has always been resistance to using swap amongst Linux users, due to its checkered history, but I'd hope Linux makes better use of swap these days.
Anyone who has used non-Linux Unix systems know the usefulness of swap, indeed, just looking at one of my frerbsd servers now, it has 10Gb of free RAM, yet 327MB used swap.
That's 327MB extra for burst use, and file caching that would otherwise be wasted on console login daemons and other stuff that are likely to never be woken.
That swapping out would have been done presumably during large file-copies or similar where the RAM use was much higher
"Swap *files*? Never saw that"
I'm using swap files exclusively since 10 years. I prefer to have plenty of RAM, but sometimes I overextend myself and then a swapfile comes in handy. Also, it's easy to double the size of a swapfile, but very difficult to double the size of a swap partition
Certainly the raspberry pi version based on Debian uses a swap file.
I also thought Ubuntu had gone down the swap-file by default route. Never liked the file based approach, assumed it was to make it adjustable in size later (with all of the performance issue from fragmentation presumably)
For small files (relative to the disk usage) they do not, but as swap files can be large!
Also it is another thing that has to be turned off to unmount the file system cleanly, etc. A swap partition is not bothered by a hard reset as it never expected persistent internal structure anyway.
“We had a very innocuous code cleanup and simplification that raised no red flags at all, but had a subtle and very nasty bug in it ..."
When it comes to something as critical as the Linux Kernel, I would suggest that there is no such thing as an 'innocuous' change. Code cleanup and simplification warrants the same level of verification as changes made for any other reason.
Obvious points, I know, but Linus's wording suggested (at least to me) that someone was overconfident about the simplicity of the changes and didn't look closely enough at what was being cleaned up.
someone was overconfident about the simplicity of the changes and didn't look closely enough at what was being cleaned up
Yes, and no. Of course, someone made a mistake. But all changes are reviewed on subsystem or kernel mailing lists and require a "Reviewed-By" certification from at least one other developer. But human beings are involved: changes deemed more significant will normally mean the subsystem leader requires more review and/or more testing before allowing it to go forward (although the bar for a rc1 is relatively low).
What this really indicates in this case is that the test suite needs improving. Robots run automated tests on all patches but they only catch the things there are tests for. Even so, they catch a surprisingly high number of buggy patches: sometimes developers and reviewers become overconfident and don't run all the tests!
"Someone made a mistake" Yep. Been that someone. The particular mistake I am thinking of was a case when I was asked to sprinkle some "performance dust" on a bit of bottleneck. Moderately simple bit of code, so what could possibly go wrong? My "improved" code suffered random crashes I lost some sleep (and hair) trying to find.
Cutting to the chase, the original code had an "off by one" error that I assumed (yeah, I know) was intended for some subtle reason I did not need to worry about. After all, the original code ran just fine, passed tests, etc. Trouble was, that "off by one" error trashed a bit of memory that did not "belong" to it. A bit of memory used by a DMA driver, but hey, the original code ran slowly enough that the trashing happened after the DMA was finished. Tightening up that performance bottleneck made the corruption happen before the DMA used it.
So "the old code worked" has to take the particular definition of "worked" into consideration.
“And, as far as I know, all the normal distributions set things up with swap partitions, not files..." Um... that isn't entirely correct. I am sitting here looking at an army of ubuntu 20.04 and 18.04 server installs that by default setup swap on the swapfile /swap.img. Mint 19.3 and Raspbian 9 also use a swapfile and not a partition, conveniently named /swapfile. Seems that they moved away from swap partitions on ubuntu after 16.04; can't speak for Mint or Raspbian since I don't have older versions running anywhere.
Hmm, didn't even know this!
I built a headerless Ubuntu 20.04 server last summer into an old Intel NUC, minimal default install, then just added stuff afterwards. Just had a quick look, sure enough, there's a ~3.3 gig /swap.img file dated June 2020, which is when the 'server' was built!
Thanks for the into, TIL.
This might sound dumb and a bit self entitled. The average steam hardware survey PC now has 16GB RAM. Do you really need a swapfile anymore? Servers with TB range RAM are possible if you really do need that much RAM?
Some smaller embedded systems, or things like RasPi might still need one I guess. Do they really need that much RAM to begin with? If they do, do you really want the overhead of swapping to disk interrupting your embedded task? Or wearing out the SD card?
I can see the insurance claim now for that embedded device. Sorry, your autonomous driving system had to swap information to disk because you skimped on a couple RAM chips, so the vehicle went in the wall before it's finished processing data when it fell behind.
Swap had it's place on my 90's PC's, but it's been something deliberately disabled for well over a decade now.
I did specifically cite the steam survey because it's obviously a biased sample. That said, my pair of corporate junkware machines have 16 and 64GB respectively. The 16GB one is hopeless by the time it's loaded down with bloatware.
At the other end of the scale, if anyone asks me to fix a netbook with 4GB RAM, Windows 10 and 16GB of storage the only thing I'll ever do it offer to put Linux on it. There is no other fix for bloatware being shovelled onto underpowered hardware. 4GB with Linux happens to be just fine without a swap file for netbook-type usage.
If you're running leftover desktop gear with 4GB or less, seriously, how old is it now? 5, maybe 10 years? Upgrade already. The only cases where I can think RAM-limited hardware have legit reason to be in place running linux are embedded applications, in which case swapping to disk is probably undesirable because performance.
Open up perfmon.exe in Windows and you can see how much the pagefile is used. Chances are if you load a decent sized game you'll see page file usage increase as background stuff is swapped out. These days on Windows you let the operating system figure out how much virtual memory it needs and it is a very bad idea to disable it - low memory situations can fail in all kinds of spectacular ways.
As for embedded devices, the usual rule of thumb is you *don't* use swap files. You design the software around the physical memory constraints and you try your best to ensure you don't leak or fragment the heap. It doesn't stop you from enabling swap but normally you wouldn't in the finished product. The only time I've done it with a Pi was to build & debug some source code, but I'd have been better off cross-compiling it from somewhere else.
Binraider asked "Do you really need a swapfile anymore? Servers with TB range RAM are possible if you really do need that much RAM?" then answered "Swap had it's place on my 90's PC's, but it's been something deliberately disabled for well over a decade now."
Swap is a tool for managing virtual memory beyond the limits of physical RAM. I agree that in modern PCs, one can usually install enough RAM to run everything desired. Swap is still a useful tool with servers, though.
For example, we recently needed to test some database operations in the lab. Our Big Iron uses swap to eke out every last bit of pageable memory, but in the lab I had to also add a few hundred GB of VM just to keep the Out-Of-Memory Killer quiet.
Well, saved my Linux installation at any rate.
You know how you can create a bootable USB stick with dd?
# dd if=image.img of=/dev/sdc
or whatever.
Well, I always think to myself "Hah! Better not type /dev/sda, because that's the boot disk!" Well, one day, I thought it, and then I did it. Trashed the partition table, of course -- but that was rebuildable -- but I interrupted it before it got into the second partition, which was the root filesystem
I never bothered to check the actual code but I'd naively assumed that the VM code wrote to the sway files using the filessytem's normal file access functions, via the vnode layer or some such. But it seems this isn't always the case. I remember a well know Unix version had a similar issue with an update to a bundled third party FS. With a new release you could create a swap file and then a while later the FS would be trashed. In that case it tended to overwrite the superblock and completely wreck the FS. It was easy to demonstrate. It's hard to control the order pages will get paged out to backing store, but they're trivial to label, fill a page with the text "this is page 1" etc... Over flow the RAM and read the resulting mess of the disk. In that case it was picked up internally before any customer reported the issue, I doubt any of their customers were using swap files by then.
I guess the swap files will have just been added to the test suite, thus is progress made.
Yes, that was news to me, too. So you create a swap file, add it to swap, and one would naively assume that the VM code uses the file system to write to it. Instead, it seems it actually sniffs out the raw sectors of the file, and writes to them directly. I mean, what? I'm sure it goes faster, but you do rely on the FS behaving in the way that the kernel expects.
OK, it's been decades for the Unix variant's case. Thinking back I think it looked like it was writing "raw" to the device, so I guess someone had screwed up between the file's vnode and the file's device vnode. Having fed all the details of the problem and how to reproduce it back through the appropriate channels I never bothered to follow up the details of the fix, other than to note it was very quick :-)
I Am Not A Kernel Developer, but I'm vaguely aware of problems that can occur with paging and caching tripping over each other. ISTR FreeBSD's unified paging & caching system was a big deal, and that e.g. ZFS can still have "issues" if paging is invoked on something it's trying to cache. I dare say in some operating systems where they want as much stuff as possible to live in paged memory, even the filesystem code could theoretically be paged out! So it seems there are reasons to avoid going through the filesystem, though if there's any overlap between the reasons I've given and the actual reasons remains to be seen.