Safe PHP input filtering

Whatever your opinion of PHP, it's one of the most-used scripting languages on the web today, and because it's so easy to pick up the basics, many people put very unsafe code online. There are dozens of ways you can screw up your PHP code, but the biggest issues come from failing to filter input, or failing to escape output.

These aren't just beginner mistakes: many very large, very popular PHP scripts written by large communities of coders have regular security releases because of bad or faulty input filtering. Failing to filter input can expose your server and data to attack, and many of these attacks are now automated (because using hacked web servers to send spam is an industry now).

Filtering and Escaping are two essential principles in web coding that you need to apply whatever language you use. This page is about how to filter input safely with PHP.

Most PHP scripts process content in some way, and then either store it in a database or output it to a web page. If that content comes from outside the script (like a form on a web page, or a part of the script's URL), then you need to check that it's what you're expecting, and nothing more.

What can go wrong?

In your (stupidly) basic CMS at http://example.com/index.php?page=about.html, the script gets the page variable from the URL query, then turns it into a path to a content file and displays it:

$page = $_GET['page'];
readfile(__DIR__.'/header.html');
echo file_get_contents('/var/www/mysite/content/'.$page);
readfile(__DIR__.'/footer.html');

Now what happens when someone decides to try ?page=../../../../home/yourname/.ssh/id_rsa, or maybe guess your script's database config file name?

What not to do

Many PHP books and online tutorials will introduce you to the PHP "Superglobals", $_GET, $_POST, $_REQUEST, and $_SERVER. Don't use them. They'll make your code harder to test, and it's too easy to get content from them without filtering. Instead, use the HTTP Foundation component from the Symfony project, which uses PHP's filter mechanism. This will make it clear in your code what kind of input you're expecting, and will make testing, debugging and reusing your code much easier. PHP has a bunch of built-in filters and validators, saving you time and code (and the less code you have to write, the less you have to maintain).

You might be thinking this is overkill, but the HTTP Foundation is pretty lightweight, and the tools we'll use to set it up can be reused and extended to manage other third-party libraries and your own code, so you do get a lot for just a little cutting and pasting.

Installing the component

First you'll need to install the Composer PHP package manager. It'll download the component for you, and provide an autoloader that makes it easy to use in your code.

Now create the config file. You can type php composer.phar init to interactively set up a full config, then run composer install to install everything, or you can just run this:

composer require symfony/http-foundation

Either way, Composer will download the HTTP Foundation to the vendor/ folder, and create an autoloader script. The autoloader is extremely useful, by the way. You can find a large number of other useful libraries on Composer's package site.

Create a Request object

<?php
require_once __DIR__.'/vendor/autoload.php';

use Symfony\Component\HttpFoundation\Request;

$request = Request::createFromGlobals();
?>

Now the $request object provides several methods, like ->get() and ->has(), but the one we're interested in is the ->filter() method on the request (POST) and query (GET) properties. The filter method wraps the PHP built-in filter_var() function, which provides several useful filters and the ability to write your own.

For testing, you can create a fake request instead of needing to run your script through a web server:

<?php
require_once __DIR__.'/vendor/autoload.php';

use Symfony\Component\HttpFoundation\Request;

// e.g. http://example.com/script.php?page=home
$request = Request::create('/script.php', 'GET', [
    'page' => 'home',
]);
?>

Using the Request object

You can see the list of built-in filters and validators on the PHP site. The "sanitize" filters will remove any unexpected content, and the "validate" filters will either return validated input or false.

Bewarned, some of the validators are quite liberal; the URL validator will validate a javascript: URL (it's valid, after all, even if it's unsafe).

Examples

// e.g. script.php?var=grumpycat -- equivalent of $_GET['var]
$var = $request->query->filter('var', 'default value', false, INPUT_SANITIZE_STRING);

// From a form with <input name="email_address"> -- equivalent of $_POST['email']
$email = $request->request->filter('email', 'default value', false, INPUT_VALIDATE_EMAIL);

Custom validators

There are two ways to do custom validation with the filter method: with a regular expression, or with a callback. Regular expressions compare the input against a pattern, while a callback is any PHP function or method (including custom ones or anonymous functions, as in the example below).

Handling HTML input

There're are two good ways to handle HTML input. In order of preference, they are:

  1. Don't.
  2. Use HTMLPurifier.

There are no other ways. There are other libraries, but most of them suck. Some of them suck in subtle ways. HTMLPurifier is big and slow, but that's what it takes. Everything else is cutting corners in the one place that corners should never be cut.

You can install HTMLPurifier into your project using composer, just like the HTTP Foundation above:

composer require ezyang/htmlpurifier

Using it is pretty straightforward, and it defaults safer than you probably want it, but that's a good thing. Read the docs if you want to know how to configure it.

Other gotchas

Handling uploaded files is also tricky business:


Disclaimer

It should go without saying, but any example code shown on this site is yours to use without obligation or warranty of any kind. As far as it's possible to do so, I release it into the public domain.