For the first time in … well, ever, I’m running into some instability with my home lab. There are two major differences between my current platform and the previous ones: it’s running an AMD processor, and the RAM is non-ECC. I know the processor isn’t the problem because I pulled it from my own machine, where it ran 24/7. I have a similar machine that ran multiple Linux distributions, with the most recent being EndeavourOS. This is obviously not Debian, which is what Proxmox is based on, but I had KVM and Docker running on it so the use case was similar.
The next step is to check the RAM. However, rather than troubleshooting the individual modules, I’m just going to chuck four 32GB ECC modules in there instead. The modules are going to show up in a couple days, so I won’t get that instant gratification, but my hope is that the instability will be resolved.
Interestingly enough, my main machine is able to achieve longer uptimes than my home lab server despite being on a rolling release. I’m using ZFS on both, and the RAM is the same speed, but the server has 128GB instead of 64GB. The server’s host OS shouldn’t be attempting to load any proprietary drivers as it has five Intel NICs and is using the built-in SATA ports; the Quadro P2200 is being passed through to a VM, the proprietary nVidia drivers are not loaded, and the nouveau drivers are blacklisted. My main machine has a Radeon RX 6750 XT, so no extra drivers are required there.
Anyway, watch this space. Let’s see if the new modules address the issues.