Using the PHP cURL extension
It sometimes happens that you need to get something from another server in PHP.
It's tempting to just use file_get_contents('http://example.com/')
, but if
you do that you won't have any control over what happens if that server's down,
or if it's redirecting. Using the PHP cURL extension, you get access to a
powerful library for making HTTP requests and handling the output. Here's an
example of how it works. There's far too many comments, and I may have done
something wrong. If so, please don't kill me!
Use one of these instead
If you're making a web services request, use GuzzlePHP instead of this. It's fully-featured and extensible, allowing for things like batching, logging, an automated backoff for repeat requests to busy servers, and offers a tidy, minimal interface to any remote web service. Seriously, I can't stress enough just how awesome Guzzle is.
Kris Wallsmith's Buzz HTTP library offers a clean interface to cURL, with support for cookies, browser history, and redirection. Unlike the example below though, you'll have to do your own caching; it doesn't appear to offer that yet.
Fill in the blanks
If you use this code yourself, be a responsible coder and change the CURL_USERAGENT
string! Make it yours.
Read the known issues and notes after the example, too!
Notes and known issues
The most common reason that this code doesn't work is if your PHP installation doesn't have the cURL extension installed and enabled. If you're running a linux distribution like Debian, CentOS, or Ubuntu, there are simple commands to fetch and install the cURL extension (like
apt-get install php5-curl
, oryum install php5-curl
), but I can't help you with this. I'm not your sysadmin, sorry!There are other ways to check if a URL has new data - the script above only checks file modification time. It also (kinda badly) assumes your server's timezone is UTC, just like HTTP header dates ought to be. Those are two assumptions that don't always turn out right. There's not much you can do about a remote server's time results, but if your own system isn't running on UTC, I suggest converting the file time before doing your check. There's another simple alternative, which is to check against the ETag header instead of the modification time, but you'll have to store that when you download and assume that the remote server supports ETags.
I fail on redirection here because if you're hitting a web service like Last.FM, redirection usually means you're doing something wrong. If you're not hitting a web service, then you might want to be more flexible. Look up the
CURLOPT_FOLLOWLOCATION
setting in the PHP manual.I use
CURLOPT_RETURNTRANSFER
here as a shortcut to retrieve the whole document to a string. This works fine if you're fetching something you know the size of, like a Last.FM top album chart. If you're fetching random web documents or images, you can very easily retrieve something that breaks PHP's memory_limit setting. If you're doing that kind of thing, turn offCURLOPT_RETURNTRANSFER
, and open a file-handle instead, passing it toCURLOPT_FILE
so thatcurl_exec()
saves the content to the file instead of holding it in memory. One nice option is to open a file-handle tophp://temp
; it'll use up to 2 meg of memory, then switch to disk storage automatically. PHP's not the right thing to use for getting large files, really, but if you have to, remember that PHP's script execution limit is usually 30 seconds. That timer is suspended while you download, and will instantly kick in whencurl_exec()
returns, which means your script might download a massive file and then die without doing anything or cleaning up after itself. Hardly ideal! If you're worried that this might happen, you can use the cURL extension to make aHEAD
request instead of aGET
, and check the file-size before downloading.I'm assuming that you're making web-service requests or grabbing RSS files with this, not fetching URLs supplied by user input. Needless to say, if it's the latter, filter the hell out of user input before using it. You're inviting all kinds of mischief if you allow people to specify which URLs your server fetches.
If you're fetching XML, you might want to check that the result actually parses before you overwrite the cache with broken data. Do something like
$xml = simplexml_load_string($result);
before thefile_put_contents()
at the end, and check that$xml
isn't false. As mentioned above, Guzzle is all kinds of awesome.This code is released without warranty or guarantee of any kind. It might not work, it might delete the internet. On balance I think it's fine, but if you choose to use it, you do so at your own risk. On the up side, if you want to use it, you can!
Disclaimer
It should go without saying, but any example code shown on this site is yours to use without obligation or warranty of any kind. As far as it's possible to do so, I release it into the public domain.