
How `find` Broke My Linux: Recovery with `grep`
The air in the server room was thick with the smell of ozone and my own rising panic. It was my first week as a junior sysadmin, and I’d been tasked with a seemingly simple job: cleaning up old log fi...
r5yn1r4143
2w ago
The air in the server room was thick with the smell of ozone and my own rising panic. It was my first week as a junior sysadmin, and I’d been tasked with a seemingly simple job: cleaning up old log files on a production server. “Just use find to locate files older than 30 days and rm to delete them,” my senior colleague had said, patting me on the shoulder. Easy enough, right? Famous last words. I typed out what I thought was the perfect command, feeling like a coding wizard. Then I hit Enter. The cursor blinked. Nothing happened. "Huh," I mumbled, "Maybe it's taking a while." A few minutes later, the server started acting… weird. Applications were crashing, users were complaining about being logged out, and the dreaded "kernel panic" message started flashing on the console. My stomach dropped. I hadn't just deleted log files; I'd managed to find and delete critical system configuration files. My first real "oops moment" had arrived, and it was a doozy.
TL;DR: My find Command Fiasco
So, what happened? I was trying to clean up old log files on a Linux server. I wanted to find files older than 30 days and delete them. My command looked something like this:
find / -type f -mtime +30 -delete
The intention was good. The execution was catastrophic. The find / part meant "search the entire filesystem, starting from the root directory." The -type f specified I was looking for regular files. -mtime +30 meant files modified more than 30 days ago. And -delete… well, that’s the part that did the damage. Instead of targeting a specific log directory, I’d told it to go everywhere and delete anything that matched. It found and deleted things like /etc/passwd, /etc/shadow, and other vital bits of the operating system. The server, predictably, went kaput.
The Panic and the Rescue Plan
My initial reaction was pure, unadulterated panic. I wanted to crawl under my desk and pretend it never happened. But this was production, and people were relying on it. My senior colleague, bless his patient soul, walked over, saw the flashing kernel panic, and just sighed. "Okay," he said calmly, "Deep breaths. We can fix this. First, shut down the server gracefully if you can, or just power cycle it if it's completely frozen."
Once the server was off, we booted it from a rescue disk. This is essentially a bootable Linux environment on a USB drive or CD/DVD. It allows you to access the server's hard drive without booting the compromised operating system. Think of it as a doctor performing surgery on a patient who's asleep – you can get to the problem area without the patient (the OS) interfering.
Our next step was to mount the server's hard drive within the rescue environment. This made the server's file system accessible.
# Assuming your server's disk is /dev/sda1 and you want to mount it to /mnt
mkdir /mnt/server
mount /dev/sda1 /mnt/server
Now, the tricky part: how do we find out exactly what got deleted and try to recover it? This is where grep became my best friend. I remembered seeing logs of system commands executed, and if I was lucky, some of the find command's output might have been logged somewhere before everything went haywire.
I used grep to search through various log files on the mounted drive for any mention of find and rm related to system paths.
# Search system logs for 'find' and 'rm' commands
grep -r 'find.\/etc' /mnt/server/var/log/ > /tmp/find_commands.log
grep -r 'rm.\/etc' /mnt/server/var/log/ >> /tmp/find_commands.log
This was painstaking work. I was sifting through gigabytes of log data. I was looking for patterns that indicated which specific files might have been targeted and deleted. The goal wasn't to magically undelete files (that’s often impossible), but to identify what was missing so we could restore it.
Restoration and Lessons Learned
With the list of potentially deleted critical files from my grep search, we could start the restoration process. This usually involves:
/etc/passwd was gone, reinstalling the passwd package (or the base system package) would often recreate it with default settings.The server was eventually brought back online, but it was a tense few hours. My senior colleague was incredibly understanding, and we spent time afterward going over the find command in detail.
What I Learned (The Hard Way)
This experience was a brutal but effective education. Here are the key takeaways:
Never run find with -delete on a live production system without extreme caution. Always, always, always run find without the -delete flag first. Just let it list the files it would delete. Review that list meticulously.
Only once you are 100% sure, add the # First, LIST the files that would be deleted
find /path/to/target -type f -mtime +30
-delete flag, or better yet, pipe the output to xargs rm.
Understand your commands inside and out. The find command is incredibly powerful, but that power comes with responsibility. I didn't fully grasp the implications of find / combined with -delete.
Backups are your lifeline. Seriously, if you don't have a robust backup strategy, start NOW. This incident wouldn't have been nearly as stressful if our backups weren't a bit out of date.
Rescue disks are your best friend for disaster recovery. Knowing how to boot into a rescue environment and mount your drives is a critical skill for any sysadmin.
grep is your detective tool. When things go wrong, grep can help you sift through logs and system files to piece together what happened.
Start small and specific. Instead of find /, I should have used `find
Comments
Sign in to join the discussion.