URI::Fetch through a proxy

Last updated: Wed, 25 Jul 2007 20:45:00 GMT

It wasn't entirely obvious to me that it was possible to use URI::Fetch in a proxied environment, but it is, and it's pretty darned simple.

Recently, while I was off work ill, I was tasked with scripting some of the tasks that our front-line desk staff perform in order to ensure that various web services are up and running.

A fairly simple task, quickly achieved in Perl. Specifically, ActiveState's ActivePerl, because it's likely to be running on a Windows box and I know ActivePerl of old.

All it does is run over a list of URI/ regex pairs, download the document pointed to by the URI, check that the regex matches the document content and mail off a warning if it doesn't, including the document, if download was successful. There are probably a dozen publicly available scripts out there that do this already, but as I found list time I assumed that someone had already invented the wheel I wanted to fit to my wagon, it's often harder to beat someone else's dog-egg code into shape than it is to start from scratch.

So, script finished, I took it back to work when I returned, and lo it didn't work. Proxied environment, innit?

I had a read over URI::Fetch's documentation again and found no mention of proxies. Disappointed, I Googled. No luck. As ever, Googling is only as good as the terms you're using to search for. Results for "Perl", "URI", "Fetch" and "proxy" were as common, and as unrelated to my particular problem, as you'd imagine.

And so I decided that it the functionality wasn't there, I'd have a decent hack at implementing it myself and then submit a patch to URI::Fetch's maintainer. Always a better route than just complaining. Boy, am I glad I didn't just complain.

I started digging through the source of URI::Fetch, and so on to HTTP::Request, HTTP::Response and LWP::UserAgent. And, as I was reading the source for LWP::UserAgent, the light bulb came on.

my $userAgent = LWP::UserAgent->new;
$userAgent->proxy(['http', 'https', 'ftp'], 'http://proxy.crank.org.uk:3128/');
my $fetchResponse = URI::Fetch->fetch($uri, ( UserAgent => $userAgent));

It's that simple. The example above may be incomplete; I should probably make more effort settting up my UserAgent, but this appears to be the right road.

Not rocket science, I know. If I'd been using HTTP::Request and HTTP::Response directly instead of starting with URI::Fetch, it probably would have been blindingly obvious from the start. But if this saves someone who suffers from transient daftness, like me, missing the clues...