
My First Bash Script Deleted Everything: A Recovery Story
Okay, so picture this: it was my first real foray into scripting. I was fresh out of college, brimming with confidence (read: overconfidence), and landed my first IT job. My task? Automate a simple cl...
r5yn1r4143
5h ago
Okay, so picture this: it was my first real foray into scripting. I was fresh out of college, brimming with confidence (read: overconfidence), and landed my first IT job. My task? Automate a simple cleanup script for some temporary files. Easy peasy, right? Famous last words. This is the story of how a seemingly innocent Bash script nearly cost me my job and taught me a lifelong lesson about the power of a misplaced character.
The "Oops" Moment: rm -rf /tmp/my_app/ vs. rm -rf /my_app/
I was tasked with cleaning up a temporary directory used by a new application. The directory was something like /tmp/my_app_data. My brilliant idea was to use rm -rf to blast away all the contents. I spent a good hour crafting this little script, feeling like a digital wizard. I even tested it on a dummy folder I created, ~/test_dir.
#!/bin/bashThis is supposed to clean up the temp directory
TEMP_DIR="/tmp/my_app_data"echo "Cleaning up temporary files in $TEMP_DIR..."
rm -rf $TEMP_DIR/
echo "Cleanup complete!"
It worked flawlessly on my test directory. So, feeling like a seasoned pro, I decided to "optimize" it slightly. I thought, "Why have the trailing slash? It's redundant!" And that, my friends, is where the digital equivalent of tripping down the stairs happened. I changed the line to:
rm -rf $TEMP_DIR
And then, in my infinite wisdom, I decided to run it from the root directory /. Why? Because I was experimenting with absolute paths and wanted to see if it would behave differently. I was logged in as root, mind you. The command I actually typed (or rather, the script executed) was rm -rf /tmp/my_app_data... but not from within the /tmp directory. It was executed from /, and the path was interpreted relative to where I was.
The script, when run from the root directory, interpreted /tmp/my_app_data correctly. But in my earlier, flawed mental model, I had thought about running rm -rf /tmp/my_app/ from within /tmp. My mind conflated the two. The real disaster happened when I decided to try it on a different server, one where the target directory was not /tmp/my_app_data but simply /my_app. And due to a typo in my environment setup, the variable $TEMP_DIR accidentally pointed to / instead of /my_app.
So, instead of rm -rf /my_app/, the script, when run from the root, executed rm -rf /my_app. And because I had removed the trailing / and set the path variable incorrectly on a production server, the script ended up executing:
rm -rf /
Yes. That /. The root directory.
I remember the screen flashing a cascade of "Permission denied" errors, but mixed in were terrifyingly successful deletions. My heart did a triple backflip into my stomach. I saw critical system files start to vanish. The server became sluggish, then unresponsive.
The Panicked Recovery: Ctrl+C, System Logs, and a Deep Breath
My immediate reaction was pure, unadulterated panic. My fingers flew to the keyboard, hitting Ctrl+C with the force of a thousand suns. It was a miracle it responded at all. The script stopped, but the damage was done. The server was in a state of critical failure.
First, I had to assess the damage. The server was practically bricked. Services were crashing left and right. I couldn't even ls some directories without getting "No such file or directory". This is where my limited knowledge of system recovery kicked in.
/var/log and looked for anything unusual. syslog, messages, auth.log – anything that could tell me what happened. I found entries related to rm processes and the files they were attempting to delete, confirming my worst fears.The Recovery Process: Backups, Rebuilds, and Lessons Learned the Hard Way
This is where the real work began. The server was unusable. The only way forward was to rebuild and restore.
fsck.ext4: Superblock has invalid arguments or mount: /mnt/recovery: special device /dev/sda1 does not exist. This indicated serious filesystem corruption.
The Fix (Partial): I ran fsck -y /dev/sda1 (or the appropriate device). This is a filesystem check and repair utility. It's aggressive and can sometimes cause more data loss, but in this state, it was my only hope. It managed to salvage some data, but it was far from perfect. Critical application data was gone.
Comments
Sign in to join the discussion.