Re: the caching services zfs requires
I’ll try and illustrate my position on Proxmox as best possible. It’s a useful exercise to attempt to lay out the position I’m finding myself in as we too are evaluating it as a potential alternative to a small VMWare cluster setup for ourselves (3 / 4 hosts, 80 VMs) and it’s potential to live out on client sites and be maintained by all our engineers (yes, spurred on by Broadcom). And I’m not dead against it, but it often doesn’t do itself any favours.
There was some hyperbole on my part calling it ‘garbage’ – but it was a short and mildly humorous mention in the context of why I’m weary of ZFS on Linux. I run Linux distros on the five PCs that I use in and out of work and, potentially, some of them could present a ‘use case’ for ZFS (encryption, compression, volume management, cache vdevs) but I’ve never committed to it because it feels like it becomes more of an overhead to look after than it’s worth. And I mean feel – it’s the position I’ve come to from my exposure to the tech.
We actually wouldn’t run local ZFS on our own hypervisors anyway.
I’ll try to elaborate on some examples I’ve found. I accept that, in most cases, there would be the answer ‘but if you did x, y and z then it would be fine’ but, when it comes to rolling this out and maintaining it, this doesn’t fly for us. And this is my first problem with Proxmox, the attitude. If I call an (old) Land Rover Defender an unreliable somewhere on the internet, there’s a horde of Nevilles and Jeffs ready to tell me “they’re perfectly reliable if you strip them down, rewire them, fit this extensive list of after-market parts and, whilst you’re at it, you may as well go all the way and galvanise the chassis”. Proxmox too has an army of zealots and they create so much noise with “you’re doing it wrong!” that getting meaningful feedback (and possibly improving it as a product) is near impossible. I suspect the vast majority of them think Linus Tech Tips is a good source of information.
Having a bad community (or at least ‘crowded’ with the aforementioned zealots – I’m sure many or most users are fine! The FreeNAS forum has/had a user with an astonishingly high post count that posted far too often with a tone of ‘authority’ it took some time to learn to ignore!) isn’t the end of the world if there’s decent documentation. But the Proxmox documentation itself is poor enough that it actually has copy/paste information from the Archlinux wiki that’s not even been corrected to work in a typical Proxmox setup.
For example. I have systems with heavy memory over-contention by design. Many many people have managed to make Proxmox unstable by running the system memory low and/or forgetting ZFS behaviour when they first start to use it. They seek help and get met with ‘Well don’t do it! Stop being so cheap! What kind of provider sells more memory than they have!?’
If I’ve got 64 VMs that need 512MB active RAM when idle and 8GB of RAM when they’re in use, but they only get used every couple of months and rarely more than one or two at a time, having 512GB of RAM is ridiculous when I can sit comfortably in 64GB – it’s why we have page files (or vswap) and balloon drivers. This is a stable configuration on VMWare and it can be stable on KVM so it can defaintely work on Proxmox. If you get this ‘wrong’ on VMWare (e.g. the machines are more active than you planned, or you boot them all at once), you’ll slow it to a crawl. If you try it following a default Proxmox setup, you’ll just have Linux killing VM processes. Some people won’t be able to easily find out why.
Un-tuned ZFS makes this worse.
In any ‘early’ setup, a client had a single host, local ZFS storage, 16GB of RAM and two VMs allocated 8GB each. This was installed by just following the setup and letting it do it’s thing. Of course, one of the VMs would just shut off the moment there was high IO. Out of the box, ZFS was using 8gig and there was no page file configured. We adjusted ZFS down to use 4gig (to maintain SOME storage speed, this was a simple file server setup using large/slow drives), allocated the VMs 6GB each (still plenty for the job) and ran like that. This reduced the frequency of the killed VMs to ‘occasional’ from ‘constant’ but was still sailing too close to the wind.
So I pulled the host back in for a rebuild – set it back up again with ZRAM swap and some flash storage that could be used for additional swap and some ZIL/ARC, gave the VMs 16GB each just to stress test them. Runs fine in most circumstances (and is how I run some KVM machines on Debian without the involvement of Proxmox) but ZFS could still rear it’s head under super heavy IO. It would grab RAM faster than the system would page out, which is where I was coming at with saying that ZFS doesn’t feel like an integrated solution on Linux. As it stands, giving ZFS 2GB and the VMs 6GB each then turning the swappieness up has is working at a decent speed and stable. But, for example, if someone needed to run up a quick VM for some other purpose and didn’t think it all through, we’d be back to knocking the system over.
This is an older issue, so may have changed, but we also logged an improvement suggestion for the UI as the menu layout caught me and others out on occasion. There’s reset or restart for a VM. This is fairly usual for hypervisors, one it to have the gust OS restart itself and ‘reset’ kills the VM and start is again – like the reset switch. Fine. However, ‘reset’ was/is missing from one of the context menus (right click?) and it’s only on the button in one of the other panels. Every other option is in both menus, but NOT ‘reset’. It’s misleading. We logged this and the devs response was ‘but it’s in the other place, use that’. Crap.
Accidentally using the ‘restart’ button on a VM that’s guest OS is hung then leads to the hypervisor waiting for it to respond – which it won’t do because it’s hung. And then you can’t use the ‘reset’ option until that’s clear. To clear it on a cluster, you need to SSH into one of the hosts to locate and clear some lock file before you then kill off the process. It’s a silly little niggle in UI design that then sends you right back to the command line for 5 minutes.
Most of the above can be sorted by someone with some Linux/Hypervisor/Computing background but it just prevents me being confident handing an ISO to a generalised engineer and saying ‘go use this’ because the ‘default’ configuration you’ll end up with is poor enough that it’ll fall over. I still don't feel like I've come to a truly 'nice' setup yet.
When I find myself just going straight back to the command-line or manually configuring things on it, I wonder how much I’m really getting from using it in the first place. When I can run the same set of VMs up on my generic Ubuntu desktop PC 3 times over without having to tweak anything, why have I made my life harder with PVE? For some menus and graphs? I love being lazy and using an out of the box product, but this isn’t it.
To me, it’s currently not convenient. That’s a fine place to start for a beta and some homelab tinkering, but they’re selling the thing and calling it version 8. The current attitude from the devs and the user base (again, the loud ones) worries me it won’t be driven towards a bit of polish for some time.
Maybe I’m just commenting on a common FOSS problem. If something isn’t polished, there’s still that ‘that’s because you don’t know what you’re doing’ attitude from some. At least the Windows people out there are used to ‘yeah, it’s a bit crap like that’.