Arg List Too Long

Last updated: Tue, 14 Jul 2009 09:58:44 GMT

Sometimes shit happens. The great thing about our never-ending race for bigger, faster computers and storage is that more and more often, shit happens real fast, and makes a lot of mess in the process.

Today, a certain core piece of infrastructure was all but crippled by some runaway horror. Investigation revealed a directory containing millions upon millions of files. The directory itself was hundreds of megabytes in size. Not the content of the directory, the directory itself was hundreds of megabytes in size, just to hold such an obscene number of files.

The situation was such that the service was barely hobbling along, and we needed relief, as soon as possible. At times like this, the arg list too long error is prone to occur.

The are a number of traditional ways around the arg list too long problem. My own favourite is a simple find -exec, like so:

find . <conditions> -exec rm {} \;

which has always served me well. There are other variations, including piping listings to xargs, but I've been working under sufficiently crufty operating systems, for a sufficiently long time, that I never learned to rely on xargs.

The directory was so large that find appeared to be hanging. Investigation with strace showed that the find was running, and sucking in 32k chunks of that directory, but outputting nothing. Would it read the whole thing before doing anything? Was it something as silly as sorting that was holding us up?

We became impatient and decided we couldn't wait.

We tried various things, including loops over globs, ls -1U, and all appeared stymied by the same attempt to read everything before starting, even that unsorted ls.

How do we read this directory an entry at a time, acting on each entry, without having to wait who knows how long for it to read every entry before it starts?

We use Perl's readdir, like so:

perl -e 'opendir D, "."; while ($e = readdir D){ print "$e\n"; }'

Obviously, you'll probably want to do replace that print with whatever it is you want to actually do with these things; some conditions, perhaps, and a back-ticked remove. But that'll get things moving straight away.

It served us (boom-boom!) in a pinch.