Longhorn, backups, and version control

**Update as of 1/05/24** I’ve move away from Longhorn. When it works, it works well, but when it does it’s insanely complex to troubleshoot. Plus, I don’t have a lot of storage on my nodes right now. Maybe when I do a node hardware refresh I’ll revisit.

I’ve been doing a bit of housekeeping on the home k8s cluster, and one of the things I’m doing is moving from microk8s to k3s. This isn’t really a post about that, but long story short, it’s because of how microk8s does a non-existant job of updating addons, and you basically have to use the DNS (coreDNS) addon as I could never get that to work as a normal helm chart (even with updating the kubelet config).

Anyways as part of that change, I need to create a new cluster, get longhorn running, and restore the volumes it was running in the old cluster. Thankfully, I had tested most of this prior to becoming reliant on longhorn, so I knew the backup and restore process worked well – just point the backTarget variable for longhorn on the new cluster to the same place as the old cluster and magic happens. Unfortunately, I ran into a snag.

The volume restored properly, and I was able to recreate the PVC with the same name, but the deployment kept complaining about it and my Influx DB wouldn’t mount the drive. It kept throwing the error

Attach failed for volume : CSINode does not contain driver driver.longhorn.io

This was super odd though, I could create a new PVC with the same longhorn StorageClass and it would mount. WTF?!

Well, lo-and-behold it was because when I built the new cluster, I decided to use the newest version of longhorn – 1.4.1 – as you do. However, the old cluster was still on 1.4.0, as were the backups. During any upgrades of longhorn, you must do an engine upgrade to the volume. Needless to say, the backups were on engine 1.4.0 (driver), but I only had 1.4.1 (driver) running as I was never prompted to upgrade the engine on the volume when restoring it. So yes, the error message was factual, if not incredibly frustrating.

So, note to self (and others) – when restoring a longhorn volume from backup, make sure you are running the same version as from when the backup was taken. Once the volume is successfully restored and running, you can then upgrade to the latest version via the update steps, and update the engine on the volume. Sadly, there didn’t appear to be a way to do that after the restore, and tbh I didn’t look to see what version was listed as the Engine Image after the restore. I’m just thankful it’s back up and running!

Published
Categorized as kubernetes