K555 • User • The Register Forums

Monday 8th April 2024 10:44 GMT K555

Those were peak 'fix by reinstall' days

Yes, it's still a bit of a go-to when you're stuck. But 95 dictated you reinstall it so often you could almost benchmark how good you were at day to day PC work by how quickly you could reinstall Windows, load drivers (knowing which ones worked being a massive time saver) and de-junk it ready for the next 3 weeks of operation before you needed to format it again.

In that context, it makes the current iteration of Windows look palatable.

21 0

Friday 22nd March 2024 10:40 GMT K555

Re: the caching services zfs requires

Good luck with the project.

I do feel like with enough tweaking, tinkering and understanding PVE could well turn out to be just the thing in a lot of scenarios so i'd be interested to know how it settles for you.

0 0

Wednesday 20th March 2024 11:28 GMT K555

Re: the caching services zfs requires

I’ll try and illustrate my position on Proxmox as best possible. It’s a useful exercise to attempt to lay out the position I’m finding myself in as we too are evaluating it as a potential alternative to a small VMWare cluster setup for ourselves (3 / 4 hosts, 80 VMs) and it’s potential to live out on client sites and be maintained by all our engineers (yes, spurred on by Broadcom). And I’m not dead against it, but it often doesn’t do itself any favours.

There was some hyperbole on my part calling it ‘garbage’ – but it was a short and mildly humorous mention in the context of why I’m weary of ZFS on Linux. I run Linux distros on the five PCs that I use in and out of work and, potentially, some of them could present a ‘use case’ for ZFS (encryption, compression, volume management, cache vdevs) but I’ve never committed to it because it feels like it becomes more of an overhead to look after than it’s worth. And I mean feel – it’s the position I’ve come to from my exposure to the tech.

We actually wouldn’t run local ZFS on our own hypervisors anyway.

I’ll try to elaborate on some examples I’ve found. I accept that, in most cases, there would be the answer ‘but if you did x, y and z then it would be fine’ but, when it comes to rolling this out and maintaining it, this doesn’t fly for us. And this is my first problem with Proxmox, the attitude. If I call an (old) Land Rover Defender an unreliable somewhere on the internet, there’s a horde of Nevilles and Jeffs ready to tell me “they’re perfectly reliable if you strip them down, rewire them, fit this extensive list of after-market parts and, whilst you’re at it, you may as well go all the way and galvanise the chassis”. Proxmox too has an army of zealots and they create so much noise with “you’re doing it wrong!” that getting meaningful feedback (and possibly improving it as a product) is near impossible. I suspect the vast majority of them think Linus Tech Tips is a good source of information.

Having a bad community (or at least ‘crowded’ with the aforementioned zealots – I’m sure many or most users are fine! The FreeNAS forum has/had a user with an astonishingly high post count that posted far too often with a tone of ‘authority’ it took some time to learn to ignore!) isn’t the end of the world if there’s decent documentation. But the Proxmox documentation itself is poor enough that it actually has copy/paste information from the Archlinux wiki that’s not even been corrected to work in a typical Proxmox setup.

For example. I have systems with heavy memory over-contention by design. Many many people have managed to make Proxmox unstable by running the system memory low and/or forgetting ZFS behaviour when they first start to use it. They seek help and get met with ‘Well don’t do it! Stop being so cheap! What kind of provider sells more memory than they have!?’

If I’ve got 64 VMs that need 512MB active RAM when idle and 8GB of RAM when they’re in use, but they only get used every couple of months and rarely more than one or two at a time, having 512GB of RAM is ridiculous when I can sit comfortably in 64GB – it’s why we have page files (or vswap) and balloon drivers. This is a stable configuration on VMWare and it can be stable on KVM so it can defaintely work on Proxmox. If you get this ‘wrong’ on VMWare (e.g. the machines are more active than you planned, or you boot them all at once), you’ll slow it to a crawl. If you try it following a default Proxmox setup, you’ll just have Linux killing VM processes. Some people won’t be able to easily find out why.

Un-tuned ZFS makes this worse.

In any ‘early’ setup, a client had a single host, local ZFS storage, 16GB of RAM and two VMs allocated 8GB each. This was installed by just following the setup and letting it do it’s thing. Of course, one of the VMs would just shut off the moment there was high IO. Out of the box, ZFS was using 8gig and there was no page file configured. We adjusted ZFS down to use 4gig (to maintain SOME storage speed, this was a simple file server setup using large/slow drives), allocated the VMs 6GB each (still plenty for the job) and ran like that. This reduced the frequency of the killed VMs to ‘occasional’ from ‘constant’ but was still sailing too close to the wind.

So I pulled the host back in for a rebuild – set it back up again with ZRAM swap and some flash storage that could be used for additional swap and some ZIL/ARC, gave the VMs 16GB each just to stress test them. Runs fine in most circumstances (and is how I run some KVM machines on Debian without the involvement of Proxmox) but ZFS could still rear it’s head under super heavy IO. It would grab RAM faster than the system would page out, which is where I was coming at with saying that ZFS doesn’t feel like an integrated solution on Linux. As it stands, giving ZFS 2GB and the VMs 6GB each then turning the swappieness up has is working at a decent speed and stable. But, for example, if someone needed to run up a quick VM for some other purpose and didn’t think it all through, we’d be back to knocking the system over.

This is an older issue, so may have changed, but we also logged an improvement suggestion for the UI as the menu layout caught me and others out on occasion. There’s reset or restart for a VM. This is fairly usual for hypervisors, one it to have the gust OS restart itself and ‘reset’ kills the VM and start is again – like the reset switch. Fine. However, ‘reset’ was/is missing from one of the context menus (right click?) and it’s only on the button in one of the other panels. Every other option is in both menus, but NOT ‘reset’. It’s misleading. We logged this and the devs response was ‘but it’s in the other place, use that’. Crap.

Accidentally using the ‘restart’ button on a VM that’s guest OS is hung then leads to the hypervisor waiting for it to respond – which it won’t do because it’s hung. And then you can’t use the ‘reset’ option until that’s clear. To clear it on a cluster, you need to SSH into one of the hosts to locate and clear some lock file before you then kill off the process. It’s a silly little niggle in UI design that then sends you right back to the command line for 5 minutes.

Most of the above can be sorted by someone with some Linux/Hypervisor/Computing background but it just prevents me being confident handing an ISO to a generalised engineer and saying ‘go use this’ because the ‘default’ configuration you’ll end up with is poor enough that it’ll fall over. I still don't feel like I've come to a truly 'nice' setup yet.

When I find myself just going straight back to the command-line or manually configuring things on it, I wonder how much I’m really getting from using it in the first place. When I can run the same set of VMs up on my generic Ubuntu desktop PC 3 times over without having to tweak anything, why have I made my life harder with PVE? For some menus and graphs? I love being lazy and using an out of the box product, but this isn’t it.

To me, it’s currently not convenient. That’s a fine place to start for a beta and some homelab tinkering, but they’re selling the thing and calling it version 8. The current attitude from the devs and the user base (again, the loud ones) worries me it won’t be driven towards a bit of polish for some time.

Maybe I’m just commenting on a common FOSS problem. If something isn’t polished, there’s still that ‘that’s because you don’t know what you’re doing’ attitude from some. At least the Windows people out there are used to ‘yeah, it’s a bit crap like that’.

4 0

Tuesday 19th March 2024 17:49 GMT K555

Re: the caching services zfs requires

The think I've always admired about TrueNAS/BSD is that, given I use it as a storage appliance, is it absolutely uses every shred of RAM for accelerating storage and I've never seen a hint of stability issues because of it. How much of that is down to TrueNAS and how much is inherent to it being ZFS on BSD, I don't actually know. But I get nervous around ZFS on Linux - it doesn't feel like an intrinsic part of the OS and it's memory management.

Barely-out-of-homebrew Linux based garbage products *ahem* *proxmox* just end up with the OOM stomping all over the place because an admin got slightly too excited with VM memory allocation and OZFS suddenly wanted to use a bunch of memory for cache.

Yes, you can tune this out (or just not use ZFS), but I don't really want to when I can just out-of-the-box TrueNAS.

I too have NL52 Microservers... and have been scouring ebay for the 16GB kits that sort-kinda-maybe work on them (unofficially).

2 1

Friday 5th January 2024 16:22 GMT K555

I distinctly remember this when I was hacking about in the file in Red Alert that changed all the stats of the units (rules.ini?) - Not sure how big it could've been but I remember notepad wetting itself at the prospect.

5 0

Thursday 4th January 2024 16:00 GMT K555

That's odd. I've got it 4 different systems running a mix of Pop and Ubuntu and I've never had to do any more than go into compatibility menu and enable the 'Enable Steam Play for all other titles' option.

2 0

Wednesday 2nd January 2013 09:13 GMT K555

Windows file copy screen grab

I'm interested in what setup you where using when you took the screen grab of the windows file copy. It follows a paragraph that mentions exceeding the speed a single gigabit interface but is only showing 35MB/sec transfer rate.

0 0

Monday 15th October 2012 14:47 GMT K555