Re: the caching services zfs requires
Cool, that's all well reasoned and gives food for thought. Thanks. :)
The setup I'm testing (small hyperconverged cluster using ceph for vm storage), there have definitely been some learning experiences.
It's still way too early in my testing/learning process to be comfortable rolling it out to production, as there are times I've gotten the underlying Ceph storage "stuck" or unresponsive... and ended up having to rebuild the cluster.
But, I'm rapidly getting better at understanding how the underlying pieces operate, then being able to unstick a cluster that's frozen (etc). So I'm hopeful it turn out to be workable. :)
Came across this situation of yours just yesterday, and found an easier solution:
<quote> Accidentally using the ‘restart’ button on a VM that’s guest OS is hung then leads to the hypervisor waiting for it to respond – which it won’t do because it’s hung. And then you can’t use the ‘reset’ option until that’s clear. To clear it on a cluster, you need to SSH into one of the hosts to locate and clear some lock file before you then kill off the process. It’s a silly little niggle in UI design that then sends you right back to the command line for 5 minutes.</quote>
I accidentally did the "restart" thing on a stuck VM yesterday too, which then promptly wedged and blocked subsequent operations.
(At least with Proxmox 8) double clicking the log entry for that initial wedged "restart" operation at the bottom of the proxmox gui opens a progress dialog where you can see it doing nothing. There's a "stop" button in that dialog.
Clicking that stop button (and giving it a few seconds), seems to correctly cancel the wedged restart job, allowing new actions (like a hard power off or whatever) to function.
That being said, I'm still pretty new to Proxmox and have only used Proxmox 8. No idea if that's a new thing or was just unreliable previously etc. :)