Well that was quick

The replacement RAM arrived today and I remembered one of the reasons I didn’t particularly enjoy building the server in the case I chose, but ultimately I did replace the modules, so the server’s now running on 128GB of ECC memory. The previous 32GB modules are sitting in their packages, and I plan on testing each individual module with memtest86+ to see if they’re any good. Not entirely sold on the idea of sticking them into my main rig at the moment, though.

The case I used for this server is a Fractal Design Define 7. It’s the one with the solid side panels and has excellent noise reduction features. It’s also quite versatile, in that different accessories can be installed depending on the end-user’s needs. All of the required accessories come with the case, too — no need to go out and buy extra stuff to set it up just right.

The Define 7 has three mesh filters: one on the bottom, one behind the front panel, and one on top. In the stock layout the one on top is unused, as the top panel is solid; however, there’s an optional vented top panel that can be used if the end-user wants to use a top-mounted radiator. That’s the route I took for this build: because the motherboard I chose has its AM4 socket rotated 90 degrees compared to other B550-based boards, installing an air cooler would be awkward. A 280mm Arctic Cooling Liquid Freezer II all-in-one liquid cooler is mounted to the top of the case and, despite having that tiny fan on the block, is completely silent during normal operation.

This decision is what caused some annoyance. See, because the CPU socket is rotated 90 degrees, the RAM is also rotated 90 degrees. When you hold a standard AMD-based ATX motherboard with the CPU socket near the top edge, the RAM slots are typically vertically aligned to the right of the socket. The ASRock Rack B550D4-4L board has its four DIMM sockets above the CPU socket, aligned horizontally. An AIO cooler mounted to the top panel of the case ends up covering three of the four RAM slots.

One key fact I left out, though, is that the top of the Define 7 consists of three layers: the top panel (which, as you recall, is the vented one), the filter, and a mounting plate for AIO coolers or fans. This mounting plate is removable, so luckily, I was able to remove 3 screws, pull the AIO’s radiator and fans out, and swap the RAM.

There was, however, another snag here: the server contains five 3.5″ hard disks. To accommodate these I had to configure the case in “storage mode” and, in order to do this, I removed about a third of the motherboard tray, fitted five drive sleds, and reinstalled that removed part of the motherboard tray on the opposite side to hold the sleds in place. The AIO mounting plate bolts to the top of this panel, giving it extra rigidity to hold those relatively heavy 3.5″ drives in place.

So … this time around, it wasn’t as simple as pulling the side panel off, pulling out the old DIMMs, and socketing the new ones in, but thanks to the excellent design decisions implemented in the Fractal Design Define 7 case, it wasn’t as difficult as it could have been.

As a side note, I had to replace a failed DIMM in a 1U server the other day as well, which made me appreciate the engineering and design work that goes into those as well. Cable management arms are spectacularly useful despite impeding exhaust airflow; server manufacturers provide diagrams on the inside of the removable top panel; BMCs tell you exactly which part failed, so you know what to replace. So, despite being more complex to use than your average desktop PC, they’re a lot easier to troubleshoot.

Now, we wait. The issue will surface in the coming days, if at all, but until then all I can do is observe.

Minor Instability

For the first time in … well, ever, I’m running into some instability with my home lab. There are two major differences between my current platform and the previous ones: it’s running an AMD processor, and the RAM is non-ECC. I know the processor isn’t the problem because I pulled it from my own machine, where it ran 24/7. I have a similar machine that ran multiple Linux distributions, with the most recent being EndeavourOS. This is obviously not Debian, which is what Proxmox is based on, but I had KVM and Docker running on it so the use case was similar.

The next step is to check the RAM. However, rather than troubleshooting the individual modules, I’m just going to chuck four 32GB ECC modules in there instead. The modules are going to show up in a couple days, so I won’t get that instant gratification, but my hope is that the instability will be resolved.

Interestingly enough, my main machine is able to achieve longer uptimes than my home lab server despite being on a rolling release. I’m using ZFS on both, and the RAM is the same speed, but the server has 128GB instead of 64GB. The server’s host OS shouldn’t be attempting to load any proprietary drivers as it has five Intel NICs and is using the built-in SATA ports; the Quadro P2200 is being passed through to a VM, the proprietary nVidia drivers are not loaded, and the nouveau drivers are blacklisted. My main machine has a Radeon RX 6750 XT, so no extra drivers are required there.

Anyway, watch this space. Let’s see if the new modules address the issues.

Minor Cleanup

I took the liberty of transferring some roles between VMs and have ended up with a slightly more streamlined homelab. Now that the 8GB Pi 4B is serving as the primary Pi-Hole instance I was able to get rid of one of the LXC containers. I’ve also managed to get rid of the Gitlab server, as I have another LXC container running Gitea, which is quite a bit lighter and fits my use case a bit better.

The biggest change was the move away from that ancient version of Ansible AWX to the CLI. I had originally set it up on the machine that serves as my Tailscale endpoint, but realized that I had too much stuff on there and decided to split it off. Interestingly, the version of Ansible in Rocky Linux 9’s repos is newer than the one in Ubuntu 22.04; however, by adding the Ansible PPA, I was able to get a newer version and ensure that my playbooks keep working.

Another role I’m working on splitting off is the renewal of my Let’s Encrypt certificates. By far, the easiest way I’ve found to get that working is just to use the Snap; yes, it works on Rocky 9, but as it’s Canonical’s baby, I figured I’d just chuck it on a lightweight LXC running Ubuntu.

By the time I have that working there’s probably not going to be a need for Rocky Linux in my homelab at all. It’s not a bad thing, as it’s quite stable and a great drop-in replacement for CentOS, but I don’t think my use case requires it. Even with the above changes, though, my server’s current memory utilization has dropped from 84GB to 37GB.

As for the playbooks, they’re pretty simple. One applies updates to all VMs/LXCs; one cleans the apt cache (because most of my VMs/LXCs are DEB-based); and one updates Pi-Hole weekly. I have another playbook that reboots VMs if they need it, but that one’s not on a schedule.

The Rest of Them

There are computers all over my house. Most of my time is spent between my laptop and desktop; however, there’s also an assortment of Raspberry Pis and Intel NUCs.

Since my last post I’ve reimaged a Raspberry Pi 4B with 8GB of RAM with the 64-bit version of Raspberry Pi OS Lite and installed Pi-Hole on it. There’s also 4GB Raspberry Pi 4B in a Seeed Studio DeskPi Pro that serves as a portable Jellyfin server for use during glamping, and a 2GB Pi 4B running OctoPrint. A Pi Zero W sits on my nightstand with a small E-Ink display and gives me the current weather and the day’s forecast, updated every 15 minutes, and a Pi Zero non-W with the HQ Camera strapped to it acts as my webcam. I built a PiGRRL 2.0 with a Pi 3B and have a 3A on standby that I want to use for future project.

The NUCs are an i3-6100U that serves as a second portable Jellyfin glamping box, and an i5-4250U that was a TrueNAS Scale box. That one’s off now because there’s some weird hardware issue that I haven’t bothered to troubleshoot yet.

I’ve given my kids a couple of my previous gaming rigs. My son’s running an i7-4930k with 64GB of RAM, a 500GB SATA SSD, and a GeForce RTX 2070. That box is running Linux Mint Debian Edition. My daughter’s on an i7-2600k with 32GB of RAM, a 240GB SATA SSD, and a GeForce GTX 970, with Pop!_OS 22.04.

My wife’s on a 2nd-generation Lenovo ThinkPad X1 Yoga (i7-7600U, 16GB RAM, 512GB NVMe SSD) with Windows 11, which does decently well despite running on “unsupported” hardware. I’m pretty sure that’s the only physical machine running Windows, and that’s only because it has the license embedded in its firmware. My own laptop is a Lenovo ThinkPad T460 composed of a hodgepodge of eBay parts — it’s currently an i5-6300U with 16GB of RAM and a 960GB SATA SSD. I distro hop a lot on this laptop, so it has a 100GB btrfs boot volume, and the home partition that takes up the rest of the drive is ext4. The laptop’s currently on EndeavourOS, though it’s had Pop!_OS, Rocky Linux, Debian, openSUSE Leap, and NixOS on it before.

That brings us to the main rig, which is sort of a Ship of Theseus that’s been following me around since 2004 or so. I don’t believe it has any of the original parts left, though. Currently, it’s:

  • CPU: AMD Ryzen 9 5900X (cooled by Noctua NH-D15)
  • RAM: 64GB Corsair Vengeance LPX, DDR4-3200
  • Motherboard: Gigabyte X570 Aorus Elite w/discrete TPM (they were dirt cheap on Amazon)
  • Boot SSD: 1TB Corsair Force MP600, split into 100GB btrfs and the rest ext4
  • Data SSD 1: 1TB Samsung 970 Evo Plus
  • Data SSD 2: 1TB Crucial MX500 SATA
  • Low-Speed Data: 2x 6TB WD Red Pro (zfs raid 1)
  • GPU: AMD Radeon RX 6750 XT (the OEM one from amd.com, not a board partner’s)
  • Add-In Card: Blackmagic Intensity Pro 4k
  • Case: Fractal Design Torrent, gray with dark tempered glass, with RGB fans
  • Power Supply: Corsair HX850

OS on this one is also EndeavourOS. That btrfs boot volume has bailed me out multiple times, and this thing runs games really well. I have kvm installed on the machine but use VMware Workstation when running Windows VMs as 3D graphics perform better on it. There are two Windows VMs: one running 10 (for Fusion360 and Garmin Express) and the other running the 32-bit version of Windows 7 (for an ancient photo scanner). I don’t currently have any Linux VMs but those go on kvm.

So, there we go. Something of a menagerie, but it handles everything I can throw at it with zero trouble.

The Lab – VMs

No homelab running Proxmox is complete without a bunch of VMs. I’ve gone a bit heavier on LXC containerization this time, though, because I can cram more services onto the box.

I generally prefer to run Debian wherever I can. It’s very lightweight out of the box, and it can run anything I throw at it without any trouble. I don’t need a brand-new kernel, or the absolute latest packages — I can containerize whatever needs newer stuff anyway. Right now, LXC is responsible for Pi-Hole, Squid (on Alpine), Homebridge, a Minecraft server, and source control using Gitea.

I typically throw experimental or heavier stuff on VMs. The single heaviest service I’m currently running is Plex, as that one has a GPU passed through to it. It’s running Ubuntu and is one of the few things I migrated from the old server. Yeah I could have migrated the media over, but moving metadata and playlists over would have been more work than I’d have wanted to do at the time.

I currently have two VMs running Docker. One’s on Debian and the other is on Ubuntu. I’m migrating everything over from Ubuntu to Debian because I just prefer it. My previous source control was on Gitlab and is on a Debian VM. My most recent addition is an Ubuntu VM for Nextcloud, which is going to take control of some stuff from the Ubuntu-based Docker VM — the Nextcloud Snap is trivially easy. The last one is running Rocky Linux 9 and is the way I get into the house remotely, via Tailscale.

The old Ubuntu VM is running an ancient version of Ansible AWX that keeps everything except the Rocky Linux VM up-to-date and happy. The Rocky VM updates itself as needed, including live-patching its kernel.

Even with all of this, the server’s running at less than 50% memory utilization, and CPU of less than 2%. The Ryzen 9 3900X is overkill for all of this, as is the 128GB of RAM I threw into the box, but I’ve got plenty of room to play around.

So … pretty plain, really, but it does everything I need it to.

The Lab – Networking

I can’t say I’ve ever had a formal networking setup in the lab. I went through consumer routers pretty quickly, and invariably overwhelmed them with … something. I’m not sure what kept killing their ability to pass traffic, but the fact remains that after a few days, traffic would stop. The first consumer router that didn’t do that was the AmpliFi HD, which is Ubiquiti’s first consumer device, but there were practically no configuration options in the firewall, so that wasn’t going to help me all that much.

During this time I started using some additional Ubiquiti equipment: three bullet cams mounted on the house, the UniFi Cloud Key Gen2+ as a network video recorder, and an 8-port PoE switch to run all four of these devices. The UniFi Network Controller seemed really nice but I couldn’t use any of the functionality, as I effectively only had a switch.

Around that time I found the UniFi Dream Machine. Looks pretty fancy, like a big, white Dr. Mario pill, but combines the functionalities of several Ubiquiti devices into one. The Cloud Key Gen2+ was already acting as a controller so I had to transition that over, leaving the Cloud Key as a simple NVR and the UDM as the controller. This gave me the ability to implement some more sophisticated network configurations: all of a sudden I had IDS, some basic layer 7 inspection, VLANs, and the ability to restrict communications between VLANs. I also had to replace the AmpliFi HD’s mesh points as you can’t use Ubiquiti’s consumer devices with their business gear; the UDM has a built-in access point, but I wanted to make sure my whole house was covered.

Unfortunately the UDM was not enough either. The device runs a lightweight Kubernetes distribution of some sort that ran into a memory leak which required me to run an Ansible playbook to restart its container every night, else the network controller would simply stop responding. Once that was fixed in a UniFi OS patch, something caused the CPU utilization and temperature to rise and stay high, resulting in high fan RPMs and lots of noise. Concurrent with that, the connection would drop for a few minutes at a time.

Clearly the UDM wasn’t beefy enough, even though my network wasn’t particularly busy. Not sure what happened, but in the end, it was replaced by a UDM SE. This folded NVR functionality and 8-port PoE switch into one device, but eliminated one AP, which I replaced with one of the dinner plate-looking things from Ubiquiti’s lineup. The final head count looks like this:

  • UniFi Dream Machine SE
  • 1x USW-24-PoE
  • 1x US-8-150W
  • 2x US-8
  • 1x UAP-AC-Pro
  • 1x U6-Lite
  • 2x UAP-AC-M
  • 3x G3 Bullet

The layout is pretty simple:

  • UDM SE is connected to:
    • USW-24-PoE w/1Gb SFP (same shelf as UDM SE)
    • US-8-150W w/1Gb SFP (living room)
    • 3x G3 Bullet (outdoor-rated CAT5)
    • UAP-AC-Pro (family room)
    • UAP-AC-M (kitchen)
    • US-8 (office)
    • US-8 (living room)
  • US-8-150W is connected to:
    • 2x multimedia devices
    • U6-Lite
  • US-8 (office) is connected to:
    • UAP-AC-M (office – PoE passthrough from UDM SE)
    • Computers
  • US-8 (living room) is connected to:
    • 2x multimedia devices (standby, used for travel)
  • USW-24-PoE is connected to:
    • Server

There are four wireless networks, with two on one VLAN and the others on their own VLANs. One VLAN is for the kids, and the other is for IoT. The kids’ VLAN has heavier content restrictions on it, and the IoT network is not allowed to talk back to the default VLAN.

More to come next, because without the VMs, nothing works.

The Lab

Let’s get this out of the way really quickly: I’m going to try my best to avoid talking about specifics of work on this site. I am well aware of the volume of content I’m completely ignoring, but I suppose that’s a sacrifice I’m willing to make in the name of privacy.

The data center environments I’ve worked with have all had non-production and production environments. The number non-production environments tends to vary, and they have all sorts of names: staging, development, test, training, sandbox, lab, and so on. These are great to test technologies that we’re going to be working with in that environment, but if I want to learn something new, I don’t think it’s all that appropriate to use those resources. To that end, I’ve built a limited home lab that I can use.

This lab has gone through a number of iterations, but it’s been around in some form since I lived in my parents’ basement. Yes, I’ve had a “server” that was capable of virtualization since at least 2004, when it ran Microsoft Virtual PC. Let’s not talk about that one, though.

Most of my home servers since then have been repurposed gaming rigs running Proxmox VE. The first one was based on an i7-2600k with 32GB of RAM — not very server-like at all. There has only been one proper server among the lot, which was based on the SuperMicro X8DTH-IF dual-socket motherboard. It started with a pair of Xeon X5650s, which were pretty beefy, but highly inefficient. The server tripped the breaker under heavy load, so I downgraded the X5650s to L5640s, and all was well.

As the complexity of my projects increased, I found that the SuperMicro couldn’t keep up, so I grabbed a SuperMicro X9SRL-F and Xeon E5-2690v2. Yes, I took a hit in the core count department, but at least I didn’t have to buy new RAM. Power delivery concerns meant that I couldn’t run both servers, so this wasn’t a concern. This server lived in a violently green Thermaltake case with a tempered glass window, which was originally used for the i7-2600k, and it lasted a couple of years. Not a single breaker trip, though, which is nice.

Somewhere along the way I upgraded my main computer from a Ryzen 9 3900X to a 5900X, which meant I had 12 cores just sitting around doing nothing. It took a while, but I eventually settled on socketing that 3900X into an ASRock Rack B550D4-4L motherboard. That replaced the E5-2690v2, and as expected, utterly destroys it in every possible performance metric.

The main reason I chose the B550D4-4L instead of a much cheaper B550-based board was the BMC. As far as I can tell, ASRock is the only company that makes an AM4-based board with IPMI. As the 3900X lacks an iGPU, and I had a dGPU I wanted to pass through to a VM, I needed a motherboard that had a built-in display output. Also, I don’t want to go down into the crawlspace every time I want to watch the machine boot, as infrequent as that is.

Beyond that, the machine has parts from previous boxes and new ones:

  • CPU: AMD Ryzen 9 3900X
  • Cooler: Arctic Cooling Liquid Freezer II 280mm
  • Motherboard: ASRock Rack B550D4-4L
  • RAM: 128GB DDR4-3200
  • Boot Volume: 240GB Samsung 883 DCT (basically an 860 Evo)
  • VM Boot Storage: 1TB Samsung 970 Evo Plus
  • VM Data Storage: 5x 16TB Seagate Exos, RAID-Z2
  • dGPU: PNY Quadro P2200
  • Case: Fractal Design Define 7

The Quadro started off in the X9SRL board and proved to be the perfect tool to transcode videos without any weird trickery. I’ve been using that 970 Evo Plus for a while as well; it started in a Silverstone M.2 card, as the X8 and X9 boards didn’t have any M.2 slots. The only new part of the bunch was the Liquid Freezer II — the CPU and case were gaming rig material, and the rest were from eBay, for better or for worse.

So … now you know the base on which I build everything. Later on I’ll talk about the networking and VMs.

The Reset Button

If there’s one thing I love about computers, it’s the ability to simply press a button and start all over again. Not all computers have such a button, but when they do, it’s comforting to know that it’s there.

I’ve pressed the metaphorical button on this website numerous times over the past couple of decades. Last time it was because I migrated to another domain. This time it’s because I’d like the content on this site to be more focused. Previously, the posts were quite scatterbrained and I didn’t see a common thread until recently; however, the posts also meandered all over the place.

So, with that said, let’s begin again. I’m Andreas, and I like tech.