Wednesday, May 6, 2009

removing duplicate files using shell script

Friends,

One liner script provided below, can be used to identify the duplicate files within the system. If you want to search in a partticular path, just replace it with "/tmp" provided in sample command. It will redirect all the duplicate file names into removal_list.txt which can be used to delete them.

find /tmp "$@" -type f -print0 xargs -0 -n1 md5sum sort --key=1,32 uniq -w 32 -d --all-repeated=separate sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> removal_list.txt

No comments: