Introduction#
On Friday, the 2C2G server I purchased on the Bandwagon platform suddenly encountered a kernel error and could not be accessed via SSH or restarted. After various rescue attempts, I finally managed to recover over a thousand images from my image hosting service. It was a harrowing experience, so I decided to document the rescue process and explore a new image hosting solution.
Server Rescue#
This server had been running stably for about a year and a half, hosting many important services and over a thousand images from my blog's image hosting service, which were persisted on the host using Docker Volume.
Server Crash#
To update the image version of my RSSHub instance running on the server, I decided to update all the services to the latest version. I ran docker pull
and docker-compose
commands without any issues until the last service failed to start the container, throwing an error similar to "not enough space." I thought that the disk might be full due to downloading too many images, so I ran docker image prune --all
, docker volume prune
, and docker system prune
commands to free up nearly 10GB of space. However, the issue persisted.
As a developer with limited server maintenance experience, my first instinct was to restart the server. Little did I know that this was the beginning of a nightmare.
To my surprise, after the restart, Uptime Kuma informed me that all services were offline, and I couldn't connect to the machine via SSH.
I quickly logged into the Bandwagon control panel and discovered a kernel error. Even after a forced restart, the issue persisted. I submitted a support ticket and sought help from my DevOps friends.
Data Recovery#
STRRL mentioned that there might be an issue with the rootfs
, but since Bandwagon, being a small cloud provider, didn't offer advanced boot options or additional features, I had to wait for their technical support to handle it. However, I was still worried about the data from my image hosting service, which I hadn't backed up for a year and a half. So, I started thinking of ways to rescue the data.
After exploring the Bandwagon control panel, I found that they provided backups approximately once a week, and these backups could be converted into snapshots with a single click. The most recent backup was on June 22nd, which was fortunate. My first thought was to restore the machine directly from the snapshot. If my recent actions caused any configuration issues, the snapshot from a week ago should be able to start successfully. So, I confidently waited for the snapshot restoration process, which took over ten minutes, only to encounter the same error. Undeterred, I also tried restoring the backup from June 15th, but it didn't work either.
Realizing the severity of the situation, I even prepared for the worst-case scenario of losing all the data. While waiting for a response to my support ticket, I started searching for similar cases and stumbled upon an article titled "Guide to Downloading and Extracting Bandwagon Backup Snapshot Images (.tar.gz)".
I downloaded the snapshot image and obtained a .disk
file. This file seemed to be in a proprietary format, and according to the guide, it could be converted using the VirtualBox command-line tool vboxmanage convertfromraw
. However, I discovered that it didn't support Macs with M1 chips after downloading it from the official website. So, I installed it on my old Intel Mac from 2019, performed the conversion, and obtained a .vmdk
file.
After the conversion, I mounted this .vmdk
file as a disk in a VirtualBox CentOS virtual machine, but I encountered the same error.
So, I took a different approach and found that 7-Zip software supported decompression of common virtual machine formats, but it was only available for Windows.
Although I could use the command-line version p7zip on macOS, I encountered errors during decompression. So, I had to find an alternative solution. I downloaded Windows 11 on a virtual machine and successfully decompressed the files using 7-Zip.
Another problem arose when I obtained Linux disk image files in the format of 1.img
, 2.img
, etc. These files couldn't be loaded on macOS. I asked our operations friends and tried using FUSE, but I still couldn't load them.
During this process, I received some good news. While searching the web, I came across a data recovery software called UFS Explorer, which could load the images successfully. However, files larger than 768KB required payment, which I wasn't willing to do. Nevertheless, seeing that the files could be recognized gave me some peace of mind. At least the data was intact; the remaining issues were technical.
During this time, I received a response to my support ticket from Bandwagon, suggesting that I try restarting or reinstalling the server... 🤣
I gave up on communicating through the support ticket and continued to rescue the data from my img
files. The resourceful STRRL told me about OrbStack, which could start a Linux machine and mount the img
file as a Linux disk.
sudo losetup -fP 1.img
mkdir /mnt/bwg
sudo mount /dev/loop0 /mnt/bwg
With the above commands, I successfully mounted my img
disk image on an OrbStack Ubuntu machine.
When I saw my images in the command-line output, I was moved to tears 😭.
tar -czvf cheverto_chevereto_images.tar.gz cheverto_chevereto_images/
rsync -acvP ./cheverto_chevereto_images.tar.gz pseudoyu@[yu-mac-studio]:~/Downloads/
I quickly created a tar
archive and used rsync
to transfer it to my local Mac. After extracting the files, I finally saw all my images.
Migrating the Image Hosting System to r2#
However, due to this experience, I no longer trusted the stability of a single-server deployment for my image hosting service. So, I spent half a day setting up a new free image hosting system called "Building Your Free Image Hosting System from Scratch (Cloudflare R2 + WebP Cloud + PicGo)".
To transfer the existing data to r2
, I used rclone
for uploading and completed the migration. Mission accomplished!
Conclusion#
This experience made me reconsider service deployment and data security. I plan to move important data to the cloud and reduce reliance on single servers. I will also continue to migrate some services to serverless platforms like fly.io and Zeabur.