A quick google for a script that would find duplicate files by name in a directory tree turned up two promising techniques, one a
Ruby script posted to OnJava by Bill Siggelkow and the other a
bash script using common Unix tools.
Here's my attempt to reproduce the bash results in Ruby:
#!/usr/bin/env ruby
require 'find'
files = {}
found = {}
# read root directory from command line
ARGV.each do |arg|
Find.find(arg) do |f|
if File.file?(f)
# accumulate the file names
files[f] = File.basename(f)
end
end
end
# count up the number of each file name
files.each_value do |base|
# Ruby doesn't allow this Perl idiom: found[base]++
found[base] = 0 if !found[base]
found[base] += 1
end
# print the path of each file found more than once,
# prepended with rm command commented out
found.each do |name,count|
if count > 1
files.each do |path,filename|
if name == filename
puts "# rm #{path}"
end
end
end
end
Given a directory structure containing files with duplicate names in different directories, the output looks something like this:
# rm /market/fruits/tomato.txt
# rm /market/vegetables/tomato.txt
# rm /market/fruits/pea.txt
# rm /market/vegetables/pea.txt
The output could be piped to a shell script, in which you'd uncomment the "rm" statements for the files that should be deleted (if that's what you want).
This is all a bit clunky, if you've found a better or more Rubyesque way to do this, let me know!