Server Screw-Up Post Cover

Server Screw-Up

Preamble

Yesterday, I thought I should add a new 1TB drive to my home server as a backup for the photos stored within Immich. Since I have a couple users there, I’d like to at least have backup of the photos in case my docker SSD dies.

After plugging in the new disk and rebooting. I did a quick df -h to see all disk. Regrettably, I forgot I should be using lsblk -e7 instead for a clearer view without all the loopbacks.

I quickly read it and looked at the return. Hmm, I had 3 disks. Added the forth, so it should be /dev/sdd right. Right?

Well f**k me sideways, because /dev/sdd is the Docker SSD. I then immediate do a gdisk, and typed in /dev/sdd while thinking of the cron script I was gonna write.

Act II: Panic

Immediately, my NTFY server and all my notifications started going off. “Service Down” messages are coming in from everywhere. F**K! Something’s wrong, I thought.

I quickly loaded up my webmin, and lo and behold. My mounted drives on the homepage shows only 2 mount points, boot and the main OS LVM. My docker mount points are now missing.

Holy Batman! I quickly go back to my terminal and check. The docker folder I keep all my compose, empty. What about /var/lib/docker? Empty.

Shit. I screwed up. My immediate reaction is to shut down the server right now, and find something that I can do to recover the data from the disk. Luckily, I noticed it quickly before I even mount the disk somewhere and start copying the files over. Granted I won’t be able to do that since the docker SDD is now empty. But I count my blessings.

I felt my heart immediately dropped lower than my balls. Blood went cold. Full-on panic mode. My girlfriend was also panicking with me.

Act III: Disk Recovery

I unplugged the SSD from the server and ran into my room. Plugged it into my external disk dock and immediately started to google for disk recovery for EXT3 FS. Found a few Stack Overflow threads recommending TestDisk for recovery. Downloaded and ran it.

Started up the application while feeling every single beat of my heart. Lo and behold! Both partitions on the disk showed up. Followed the documentation to re-write the partition table back to the disk. Success! Mounted the disk to my ubuntu VM and did a quick scan. All my files are back!

Happily skipped to the server, plugged everything in. Pressed the power button with my fingers crossed. Watched the HP logo appear. Good, good.

GOD DAMMIT. Maybe if I wait, it’ll work? 5 minutes passed. Same screen. Great, OS corrupted. Back to the standard power down, unplug disk, run to my room, plug into external disk dock.

Act IV: Operating System Operation

Back to Googling. Found a few Stack Overflow threads again with users facing the same issue. Solution: Download Ubuntu live disk, run boot-repair, righto. Got down to it. I have a ubuntu VM, no need for live disk. Booted the VM up once again. Downloaded and installed boot-repair. Met with an issue. Can’t repair while Ubuntu is running. What?

Turns out you have to run it via a live disc before you can repair an attached disk. Fine, downloaded the live boot iso. Attached to my VM’s virtual DVD. Doesn’t work. At this point, I’m prepared to not sleep at all to fix this.

Dug and found a few USB thumb drives. Plugged in and flashed the ISO into one of them using balenoEtcher. Plugged it into the server and booted into live. Repeat the process and yay! It worked.

Unplugged the USB, started the server again. and finally. It booted into the system.

Act V: Mounting Confusion

Quickly SSH’d into the server and realized that apart from the /boot and the system, the drives were all jumbled up with the mount points. My /dev/sdd1 mounted to the right place at /var/lib/docker, but my docker compose partition is not mounting to it’s original point.

Messed with it for about 10 minutes while scratching my head. Tried Googled but to no avail. I tried mounting to another folder and it worked, files are all there. Just as I was about to give up. I thought to check Webmin to see how the mount points are being set. And guess what, the disks mount points were still reserved at boot but couldn’t find the drive. Found the issue.

Did what’s necessary and cleaned up the mount points. Now it’s mounting successfully. Everything’s doing well. Time to reboot.

Act VI: Finally at Ease

Server booted, managed to SSH in. Had to manually start Apache, MySQL and a few other services, but docker was working and started automatically with all my containers and compose files in place.

Finally breathed a sigh of relief and did a quick check on all my services to ensure there are no database corruption as well. Took an hour to ensure everything seems to be in order. Enough excitement for a day, I left the new disk in the server and didn’t want to touch it until the weekends.

Morale of the Story

Don’t do sensitive stuff to your server at night when you’re already chilled and relaxed. Not worth the panic. Also, always double check the disks before you do anything stupid. From now onwards I’ll be preparing the disk on my desk instead of directly on the server. Minimizies screw ups this way.

Leave a comment:

Top