Thanks to luxagen for getting me to actually set something up for this. Turned out to be mighty useful when I accidentally trashed half my home folder and wanted to know if syncthing had propagated any of the damage.
The use case is slightly different to mine, but RotKraken is worth a look. Its unique feature is storing file hashes in the extended attributes of the same file. This is very tidy but doesn’t help me with catching unwanted deletions, hence going back to md5deep.
You’d think that running md5deep aka hashdeep wouldn’t be worthy of a blog post, but what I found is that the primary use case for hashdeep is actually validating the integrity of an installation in order to detect rootkits etc. This is not the same as what I’m doing which is being able to spot if I’ve lost any files I care about in /home/tim
through carelessness or bit-rot. It turns out that md5deep does actually have what I needed, but the way the options are described means it’s not at all obvious that it would fulfill this need.
You can find my hash and verify scripts as a gist here
It turns out the magic is in the “audit” section of the docs.
The terminology of the verification output is more about what it did than why you care. The important one for spotting bit-rot when verifying is Known file not used
which means that you have a hash but you no longer have a matching file anywhere. Either you deleted it or modified it on purpose or you’ve just lost something you care about. Time to reach for the backups. I like back-in-time to usb disks for backup.
Hashing
hashdeep -c md5 -of -r -l Music Documents > hash_file.txt
https://explainshell.com/explain?cmd=hashdeep+-c+md5+-of+-r+-l+Music+Documents
Output:
%%%% HASHDEEP-1.0
%%%% size,md5,filename
## Invoked from: /home/tim
## $ hashdeep -c md5 -of -r -l Music Downloads Documents Pictures Phone Dropbox repo
##
3425,3ecc5852703f3846298b381bc2510a39,Music/checksums-verification-Music.txt
461,456e92277eaf9de695bd1229d80f059b,Music/checksums-verification-Music.txt.bak
100663,0d9d53e95e5d80fa43a64f5d02f25b1e,Music/checksums-Music.txt
794926,95c1558e7c97200140c37ffb0d12669d,Music/flac/Mordecai Smyth - Dial M For Mordecai/cover.jpg
17618302,cfb11490aacfdcbb79fd4310cf834e01,Music/flac/Mordecai Smyth - Dial M For Mordecai/Mordecai Smyth - Dial M For Mordecai - 02 Psychedelic Sarah.flac
...
Verifying
hashdeep -k hash_file.txt -rle -of Music Documents -avv
https://explainshell.com/explain?cmd=hashdeep+-k++hash_file.txt+-rle+-of+Music+Documents+-avv
Output:
Documents/hashdeep-checksums-verification.txt.bak: Moved from Documents/hashdeep-checksums-verification.txt
Documents/hashdeep-checksums-verification.txt: No match
Documents/hashdeep-checksums.txt: No match
repo/rust-kata/.idea/workspace.xml: No match
repo/rust-kata/.git/logs/HEAD: No match
...
Documents/hashdeep-checksums-verification.txt.bak: Known file not used
Documents/hashdeep-checksums.txt: Known file not used
repo/rust-kata/.idea/workspace.xml: Known file not used
...
Workflow
I have a monthly calendar reminder to run backups. When that goes off I:
- Plug a backup HDD in and run back-in-time to update the backup
- Run
verify-hashes.sh
and search the output for “Known file not used” to find any rot or churn. - Run
rehash.sh
to update the hashes. - Sleep easy.
I run one hash file for all folders. I started with one per top level folder but that meant the verify couldn’t spot things moved between folders and it reported them as missing.
It would be nice to iterate on this but it’s a good start.
Top comments (0)