Recursive Directory List with Ruby

by Bill Siggelkow

I was recently given the unenviable task of providing my manager with a spreadsheet listing all the code artificacts (.java files, xml files, JSPs, etc.) that my team was working on. The list should include, among other things, all the source files under my src directory, JSPs under the web directory, and XML configuration files under web/WEB-INF. But it should exclude things like generated source code, .class files, and jar files.

My desktop at the client site runs Windows, so I thought I would try out the dir command. That got me nowhere. Then I jumped into cygwin and tried ls -R. Of course, this listed everything under my project directory, including the CVS directories and their contents. Plus, the file listing didn't specify the full path. If I was going to use this output to create my spreadsheet it would take quite a lot of cutting, pasting, and deleting to make it suitable for the spreadsheet -- there had to be a better way.

31 Comments

phil
2006-04-01 07:13:07
The next time you need this, don't write any code at all:


ls -R | grep -v CVS | grep -v classes | grep -v images | grep -v lib | grep -v tlds


That will give you about 99% of what you want.

Jay
2006-04-01 07:49:14
Souldn't this be on onRuby.net?
k
2006-04-01 09:20:18
why not just use find(1) directly?
Bill Siggelkow
2006-04-01 11:12:08
Phil,
Yes that's come close -- but all I want is a list of files -- not the directories. And I wanted each file to include it's path from the current directory. Granted, I am sure I could get that to work using the shell, I was just pleased that I was able to do in Ruby with a simple script that it's easily customized as needed.
Bill Siggelkow
2006-04-01 11:16:49
Jay,


I really don't think there's any problem with this post here at OnJava. My point was that we are not just Java programmers -- we are software engineers. We need to be open to using whatever tools best help us solve problems.

Bill Siggelkow
2006-04-01 11:26:35
k,


Did you mean use the Unix 'find' command? If so, yes, that certainly could have worked also. One thing nice about the Ruby script is it's not dependent on cygwin.


2006-04-01 20:13:23
why not just obtain a report from your SCM?
Tim O'Brien
2006-04-02 15:41:53
re: phil, Bill


Programmers need to learn the "find" command and xargs. Deciding to run this in Ruby is a better choice than doing it in Java no doubt. But, in general, Bash can beat Ruby in terms of terseness.


But, the thing I hate the most about Ruby posts is the idea that an author is required to do task X in the fewest number of lines of code. I say, three cheers to Bill for solving his problem (even if he didn't grok the powerful master known as "find")


Re: Jay "Shouldn't this be on OnRuby.net?"


The people who run these O'Reilly blogs decided that it would be better to segment up blog authors into focus areas. IMO, it was odd only because rarely does someone only write about technology X or technology Y. *shrug*


But, Bill's post make sense. Java programmers should be learning how to leverage Ruby. IT makes perfect sense.

Katja Bergman
2006-04-03 01:31:36
What is wrong with a non-recursive method? I don't know anything about Ruby but I would think a non-recursive method would have a better performance than a recursive method.
I have done non-recursive directories with Borland Delphi quite a few times. I know, Delphi isn't free but it has some free alternatives. (Including Linux and Mac alternatives.) And all you basically need is a dynamic array to store all the folders you've found, walking through this list and for each folder in the list, you add all it's subfolders to this list.
Of course, any files you find are added to your output. All folders you find are appended to the list, so they will be next in line to search.


Richard Osbaldeston
2006-04-03 05:39:13
I'd probably have used the ant zip task for this which'll ignore the cvs/svn files by default.. and I'm free to add any extra excludes. Then I'd simply dump out the paths in the resulting zip.
greg
2006-04-03 06:27:13
i thought this was an april fool's joke
Bob
2006-04-08 17:53:59
Sorry, when I come to OnJava, I am looking for Java related stuff. I could do the same thing (and probably just as easily) with Tcl, Python or Perl.
Bill Siggelkow
2006-04-09 10:03:40
Wow -- I had no idea one little blog would cause such a stir. From now on I will wear an inverted coffee mug as a mind-control helmet and chant "All Hail Java", "All Hail Java".
Norbert Ehreke
2006-04-10 23:12:09
Bill,


I have had a very similar experience. Indeed, Ruby is a language worth exploring. A self-respecting developer should be a little more language agnostic. The comments here display a lack of composure and open-mindedness. Ruby is not the holy grail, but it certainly is fun.

Sveb
2006-04-13 13:15:42
org.apache.commons.io.FileUtils.listFiles
Kai Middleton
2006-06-30 18:42:11
How about:
dir /s /b
/S Displays files in specified directory and all subdirectories.
/B Uses bare format (no heading information or summary).
Example output:
c:\vircon>dir /s /b
c:\vircon\blog
c:\vircon\hours
c:\vircon\blog\a
c:\vircon\blog\b
c:\vircon\blog\c
c:\vircon\blog\d
c:\vircon\blog\a\29042006
c:\vircon\blog\a\First Delivery from Ravi.txt
c:\vircon\blog\a\29042006\aboutus.html
etc....
Leonardo M. Ramé
2006-08-17 11:14:45
Yes dir /s /b, and dir /s /b>output.txt will send all output to the file output.txt
Sig
2007-01-02 14:40:33
Lovely documentation, I am learning to use ruby as well to replace common tasks like this. However, I still would have used plain old GNU find for this task.
[3!]realit
2007-01-16 10:14:26
great site

2007-03-14 21:03:05
"I was impressed at the quality of the documentation, and just as importantly, the quality of the error messages spat out by the Ruby interpreter when my language guesses weren't quite righ"


LOL!!

trama
2007-04-11 04:52:29
Ich erklare meinen Freunden uber diese Seite. Interessieren!
imparare
2007-04-14 23:02:59
Interesting comments.. :D
Creative Mediapulse Technologies pvt.ltd
2007-05-10 02:01:49
A Media Production company specializing in All types of 2d & 3D Animations,e-learning,
3d simulation,3d walkthrough,3d modeling animation,Multimedia presentations, marketing presentation, webpromoting,Videopresentation,Animations-website-design,webstreaming, development,hosting, Flash presentations, interactive presentations,cbts,wbts,and more...
Ruby on Rails Examples
2007-05-13 20:55:44
I was wondering why you have this if statement:
if excludes.include?(File.basename(path))
aj
2007-05-14 13:42:20
You really need to learn basic bash shell usage, and commands such as find, grep, etc..


As was mentioned above, this sort of thing is a quick shell command. If you don't want directores, just give the "-type f" argument to find, e.g.


find ./ -type f | grep -v CVS | grep -v classes | ...

viz
2007-06-29 05:03:35
I am sure that can be done in Java as well tha easy
WNight
2007-09-20 05:40:03
Hi Bill,


Have you tried JRuby? I'm not a Java user but I've heard great things about it. Not just for Ruby integration, but also for coding in Java interactively, etc. (see irb)


Here are a few shortcuts and/or style tips related to your code sample.


%w():

dirs = %w( src/java libs )
excludes = %w( CVS lib ~.*tmp ).collect {|e| Regexp.new e }


array.each:

dirs.each do |dir|
  code
end


It matches the structure of the other blocks (do |var,var2| .. end) and is clearer. (The object 'dirs' is being passed a code block (do .. end) and it runs it for 'each' element.)


command if condition:

if File.directory? path
  Find.prune if excludes.any? {|e| File.basename(path).match e }
  next
end


Also works like:

next unless x > 6
next if x < 12 unless y.nil?

etc.


Often this improves readability. Instead of



if really_long_tests raise "error foo"
if other_tests raise "error bar"
if this_isn't_what_we_want next

you get

raise "error foo" if really_long_tests
raise "error bar" if ...
next unless this_is_what_we_want


which seems much more readable to me. The flow of the code becomes evident instead of the implementation details.


Also, give 'irb', the interactive ruby interpreter a try. It's the easiest way to test code. (With tab completion turned on it lets you tab through object methods, variables, etc.)


Thanks!

Ruby Newbie
2007-10-18 10:49:16
Hello Bill,


A google search on Recursive Directory Tree for Ruby brought up this page (I know it's a Java Topic but the reason I was searching is because of some contradictory information on this subject) I have the Ruby Cookbook and tried out Recipe 6.12 "Walking a Directory Tree". The discussion that followed said "Note how all the files in the top-level directory are processed after the subdirectories" which is depth-first traversal. My observation of the running code showed that in general this is not true and that only contrived examples of directory trees such as those created by create_tree.rb in the beginning of chapter 6 have this characteristic. Upon investigation, the Dir.open() method used in Find.find() under Linux does not return a sorted list of directories and files. Rather, the order in which directories and files get returned is based on the Inode order of creation on disk. Since the create_tree.rb creates the tree in the order you would expect for depth first traversal, it appears that depth first traversal is functional in the recipe. This may not be true for other operating systems besides Linux. In fact, you can mimic the Dir.open() behavior for directories using the Linux ls -U command for unsorted listings. Having pondered this strange behavior of the Recipe and digging deeper into the Ruby libraries I was able to modify Find.find() so that Dir.open()'s returned array values are sorted before continuing with an extra sort function call and block. After modifying Find.find() recipe 6.12 worked as expected - depth first traversal for all possible directory trees.


Another book, "Programming Ruby - The Pragmatic Programmers Guide" shows an example usage of Find.find() and the results printed verify that there is indeed no sorting feature of the Dir.open() method in Find.find() and that Recipe 6.12 can not possibly yield Depth first traversal behavior using the Ruby library Find.find() as the basis.


I can't decide if this is a bug or a feature in the Find.find() method - using unsorted entries from Dir.open() is certainly faster but can mess up other algorithms relying on a Depth First Traversal result.


Here is my fix inside Find.find() to make the recipe work:


dd = Dir.open(file)
d=dd.entries.sort { |x,y|
f1 = File.join(file,x)
f2 = File.join(file,y)
b1 = File.directory?(f1) ? "D#{f1}" : "F#{f1}"
b2 = File.directory?(f2) ? "D#{f2}" : "F#{f2}"
b2 <=> b1
}
# carry on with the rest of Find.find() using
# sorted variable d instead

Raja
2007-11-06 22:29:33
who can help me i need to write a code using trees to display a list of files and folders in an email inbox using java. all i keep finding use Jswings and i am not looking for those ones
Homie G
2007-12-05 05:04:01
man tree
srboisvert
2008-08-06 01:58:41
If you are excluding based on the file extensions you might be better off excluding by checking against File.extname(path) instead of basename otherwise you maybe excluding files that have the excluded extension as part of their basename.