Sep
30
2011
benchmark - find & delete

I usually use find to search for files and analyze the output.

Reading the manual page i learned about nouser & nogroup test expressions.

So i’ve tried some test to find the quicker (or a better way) to remove files with find.

First, lets create a demo dir and a lot of files


# cp -ra /usr /usr.test
# chown -R 10101.10101 /usr.test

How many files do we have ?


# time find /usr.test/ -xdev | wc -l 
124298

real    0m0.575s
user    0m0.243s
sys 0m0.363s

Ok, 124.298 files are a lot!

If i want to delete the entire directory via rm, the running time will be:


# time rm -rf usr.test/

real    0m5.883s
user    0m0.287s
sys 0m5.063s

5.88 seconds !

A walk through entire tree path:


# time find /usr.test/ -xdev -nouser > /dev/null

real    0m6.480s
user    0m2.763s
sys 0m3.660s

6.48 secs. It’s faster to remove them!

We now have a base to compare our results.
We will try 3 methods:

a. -delete find option
b. -exec find option
c. xargs via pipe

First Method


# time find /usr.test/ -xdev -nouser -delete 

real    0m12.739s
user    0m2.826s
sys 0m9.513s

12.74 secs. Thats the twice amount of time

Second Method


# time find /usr.test -xdev -nouser -exec rm -rf {} ; 

real    0m6.307s
user    0m0.253s
sys 0m5.516s

6.3 secs. Same as rm (that was expected by the way).

Third Method


# time find /usr.test/ -xdev -nouser | xargs rm -rf

real    0m4.666s
user    0m1.117s
sys 0m3.426s

4.66 secs!

So xargs is the faster way for the above methods

  1. Avatar di Pantelis Pantelis

    Friday, September 30, 2011 - 12:11:20

    Although my initial test showed otherwise I redid it with more files and it proved your point above.

    I also did a bit of googling and I found the following page which explains why xargs is faster. It is the fact that -exec rm -rf forks for every file whereas with xargs its like doing rm -rf file1 file2 …
    http://www.gnu.org/software/findutils/manual/html_node/find_html/Deleting-Files.html

  2. Avatar di ebal ebal

    Friday, September 30, 2011 - 12:17:50

    that’s quite right my friend, its all about threads ;)

  3. Avatar di Pantelis Pantelis

    Friday, September 30, 2011 - 14:09:31

    Its funny that if find was actually using threads instead of waiting for each command to finish it whould have been much faster.

    Anoter reference on the issue: http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html