[] []

Using type inference to make web templates robust against XSS

Mike Samuel <msamuel@google.com>, Prateek Saxena <prateeks@eecs.berkeley.edu>

Contents

Motivation

Scripting vulnerabilities plague web applications today. To streamline the output generation from application code, numerous web templating frameworks have recently emerged and are gaining widespread adoption. However, existing web frameworks fall short in providing mechanisms to automatically and context-sensitively sanitize untrusted data.

For example, a naive web template might look like

<div>{$name}</div>

but this template is vulnerable to Cross-site scripting (XSS) vulnerabilities. An attacker who controls the value of name could pass in <script>document.location = 'http://phishing.com/';</script> to redirect users to a malicious site, steal the users credentials or personal data, or initiate a download of malware.

The template author might manually encode name:

<div>{$name |escapeHTML}</div>

making sure that the user sees exactly the value of name as per spec, and defeating this particular attack. A better web templating system might automatically insert the |escape*** directives, relieving the template author of the burden.

This paper argues that correct sanitization is too important, that manual sanitization is an unreasonable burden to place on template authors (and especially maintainers), defines goals that any automatic approach should satisfy, and introduces an automatic approach that is particularly suitable for bolting onto existing web templating languages.

In particular, we introduce the new notion of "context" type qualifiers to represent the contexts in which untrusted data can be embedded. We propose a new type system that refines the base type system of a web templating language with the context type qualifier. Based on the new type system, we design and develop a context-sensitive auto-sanitization (CSAS) engine which runs during the compilation stage of a web templating framework to add proper sanitization and runtime checks to ensure the correct sanitization. We implement our system in Google Closure Templates, a commercially used open-source templating framework that is used in GMail, Google Docs and other applications. We evaluate our type system on 1035 real-world Closure templates. We demonstrate that our approach achieves both better security and performance than previous approaches.

This system is in the progress of being bolted onto JQuery templates but that work has not yet been evaluated on production code.

Glossary

Context
A parser state in the combined HTML, CSS, and JavaScript grammar used to determine the stack of sanitization routines that need to be applied to any untrusted data interpolated at that point to preserve the security properties outlined here.
Cross-Site Scripting
A quoting confusion attack whereby untrusted data naively interpolated into HTML, CSS, or JavaScript causes code to run with the privileges of an origin not owned by the attacker.
CSS
CSS 2 and 3 plus vendor specific extensions such as expression: and comment parsing and error recovery quirks so that our sanitization function definitions survive a worst-case analysis. This paper assumes a basic familiarity with CSS.
Escaper
A sanitization function that takes content in an input language (usually text/plain) and produces content in an output language. E.g. the function escapeHTML is an escaper that takes plain text, 'I <3 Ponies', and transforms that to semantically equivalent HTML by turning HTML special characters into entities: 'I &lt;3 Ponies'. (Escapers may, in the process, break hearts.) See also OWASP's definition.
Filter
A sanitization function that takes a string and either returns it, returns an innocuous string, or aborts template processing. E.g. an untrusted value at the start of a URL can specify a powerful protocol such as javascript:. A filter can ensure that an untrusted value at the beginning of a URL either contains no protocol or contains one in a whitelist (http, https, or mailto) and if it finds an untrusted value that violates this rule, might return an innocuous value such as '#' which defangs the URL.
HTML
HTML as parsed by browsers. Typically HTML5 but we need to deal with syntactic quirks from mainstream browser lines as old as IE5. This paper assumes a basic familiarity with HTML.
JavaScript
EcmaScript 5 but including vendor specific extensions such as conditional compilation directives so that our sanitization function definitions survive a worst-case analysis. This paper assumes a basic familiarity with JavaScript.
Normalizer
A sanitization function that takes content in an input language and produces content in that same language but that can be used in more contexts. E.g. the function normalizeURI might make sure that quotes are encoded so that a URI path can be embedded in an HTML attribute unchanged : 'mailto:<Mohammed%20"The%20Greatest"%20Ali>%20ali@gmail.com''mailto:%3cMohammed%20%22The%20Greatest%22%20Ali%3e%20ali@gmail.com' and a function that strips tags from valid HTML allows the tagless HTML to be included in an HTML attribute context.
Quoting Confusion
A vulnerability (or an exploitation of such) due to a failure to encode data in one language (such as text/plain) before concatenating it with content in another language such as text/html in the case of XSS. Other examples of quoting confusion include SQL Injection, Shell Injection, and HTTP header splitting.
RTTI
RunTime Type Information. Reflective access to the type of a value at the time a program is running. RTTI APIs include typeof in C++, C#, JavaScript; instanceof in Java and JavaScript; Object.instanceof_of?() in Ruby, type in python, Object.getClass() in Java; and Object.GetType() in C#.
Sanitization Function
A function that takes untrusted data and returns a snippet of web content. There are several kinds of sanitization functions : escapers, normalizers, and filters.
Template
A function from (typically untrusted) data to a string that is specified as a DSL that clearly separates static trusted snippets of content (usually appearing as literal text) from interpolations of untrusted data (usually appearing as expressions or variable names). In this paper the term is used synonymously with "Web Template" which is a type of template that produces a string of content in a web language : HTML, CSS, or JavaScript.
Trusted Path
The ability of an application or piece of code to establish a channel to the user that the user can be sure leads to that piece of code. E.g. Browsers use an unspoofable dialog box for HTTP auth to gather passwords, Windows uses Ctrl-Alt-Delete for the same purpose, and browsers disallow spoofing of the URL bar so that informed users can reliably tell if a page is secure and using valid certs. See also wikipedia.
XSS
See Cross-Site Scripting

Solution Sketch: A static approach with RTTI to avoid over-escaping.

The template below specifies a form whose action depends on two values $name and $tgt which may come from untrusted sources. The {if …}…{else}…{/if} branches to define a dynamic URL.

<form action="{if $tgt}/{$name}/handle?tgt={$tgt}{else}/{$name}/default{/if}">Hello {$name}…

First, we parse the template to find trusted static content, dynamic "data holes" that may be filled by untrusted data, and flow control constructs: if, for, etc. The solid black portions are the data holes, and the green portions are trusted static content.

<form action="{if $tgt}/███████/handle?tgt=██████{else}/███████/default{/if}">Hello ███████

Next we do a flow-sensitive analysis, propagating types to determine the context in which each data hole appears.

<form action="{if $tgt}/███████/handle?tgt=██████{else}/███████/default{/if}">Hello ███████
↑PCDATA ↑URL start ↑URL path ↑URL query ↑URL path ↑PCDATA
                   

Based on those contexts, we determine the type of content that is expected for each hole.

<form action="{if $tgt}/  URL  /handle?tgt=Query {else}/  URL  /default{/if}">Hello  HTML  

Finally we insert calls to sanitizer functions into the template.

<form action="{if $tgt}
  /{escapeHTML($name)}/handle?tgt={encodeURIComponent($tgt)}
{else}
  /{escapeHTML($name)}/default
{/if}
">Hello {escapeHTML($name)}…

That is the gist of the solution, though the above example glosses over issues with re-entrant templates, templates that are invoked in multiple start contexts, and joining branches that end in different contexts; and the exact sanitization functions chosen are different than shown in this simplified example.

The example only shows HTML and URL encoding, but our solution deals with data holes that occur inside embedded JavaScript and CSS as any solution for AJAX applications must.

Problem Definition

In this section we present several metrics on which any competing sanitization scheme should be judged, and a definition of a safe template that can be used to prove or disprove the soundness of a sanitization scheme that we think is relevant to security properties that web applications commonly want to enforce.

Performance

A sanitization scheme should be judged on several performance metrics:

  1. Compile- or load-time overhead. The cost of any static analysis.
  2. Run-time analysis overhead. The cost of any dynamic analysis done when the template is run.
  3. Run-time sanitization overhead. The cost of sanitizing untrusted data.
  4. One-time development overhead. The burden placed on a developer to learn the system.
  5. Continual development overhead. The burden placed on a developer to add sanitization directives, review code to ensure they are used correctly, debug the resulting template code, and deal with any over- or mis-sanitization.

Run-time analysis overhead (proportional to overall template runtime) often differs substantially by platform. High quality parser-generators exist for C and Java, so the overhead may be much lower there than in browser, since iterating char by char over a string is slow in JavaScript.

Our proposal has a modest compile-/load-time cost taking slightly less than 1 second to do static inference for 1035 templates comprising 782kB of source code or about 1ms per template. The runtime analysis for our proposal is zero. The runtime sanitization overhead on a benchmark is between 3% and 10% of the total template execution time, and is indistinguishable from the overhead when non-contextual auto-sanitization is used (all data holes sanitized using HTML entity escaping).

Development overhead is hard to measure but the 1035 templates were migrated by an application group in a matter of weeks without stopping application development with little coordination, so the one-time overhead — the overhead to learn the system — is lower than that to learn and adopt a new templating language. Since the system works by inserting function calls, we provided debugging tools that diffed templates before and after the inference was run to show developers what the system was doing and aid in debugging. Due to the need to debug templates written using any approach, the continual development overhead can never be zero, but tool support, like diffing can make the system transparent and ease debugging.

Finally, once a bug has been identified, we try to make sure there are simple bugfixing recipes.

Ease of Adoption/Migration

What kind of changes, if any, do developers have to make to take an existing codebase of templates and have them properly sanitized? For example, adding sanitization functions manually is time-consuming and error-prone. Making sure that all static content is valid XHTML requires repetitive, time-consuming changes, but would not be as error-prone.

Our proposal allows contextual auto-sanitization to be turned on for some templates and not for others; most templating languages allow templates to be composed, i.e. templates can call other templates, and standard practice seems to be to have a few large templates that call out to many smaller templates. Since this can be done per template, a codebase can be migrated piecemeal, starting with complicated templates that have known problems.

Our proposal does not impose an element structure on template boundaries. Many top level templates look like:

where the common header opens elements that are closed in the common footer:

<html><head>
<!-- Common style and script definitions -->
...
</head><body>
<!-- Common menus -->

Approaches that require template code to be well-formed XML, such as XSLT, cannot support this idiom. Our proposal works for templating languages that allow this idiom because we propagate types as they flow template calls rather than inferring types of content based on a DOM derived from a template.

Ease of Abandonment

If a development team adopts a sanitization scheme, and finds that it does not meet their needs, how easily can they switch it off, and how much of the effort they invested in deploying it can they recover?

Since our solution works by inserting calls to sanitization function into templates, a development team having second thoughts can simply run the type inference engine to insert the calls, and print out the resulting templates to generate a patch to their codebase and then remove whatever directives turned on auto-sanitization. We argued above that cost of adoption is low, and most of the work put into verifying that the sanitization functions chosen were reasonable is recoverable.

Security under Maintenance

Security measures tend to be removed from code under maintenance. Imagine a template that is not auto-sanitized:

<div>Your friend, {escapeHTML($name)}, thinks you'll like this.</div>

that is passed a plain text name. While merging two applications, developers add a call to this code, passing in a rich HTML signature that has been proven safe by a tag whitelister, e.g. "Alan <font color=green>Green</font>span". Eventually, Mr. Greenspan notices that his name is misrendered and files a bug. A developer might check that the rich text signature is sanitized properly before being passed in, but not notice the other caller that doesn't do any sanitization. They resolve the bug by removing the call to escapeHTML which fixes the bug but opens a vulnerability.

Over-encoding is more likely to be noticed by end-users than XSS vulnerabilities, so a project under maintenance is more likely to lose manual sanitization directives than to gain them.

Our proposal addresses this by introducing sanitized content types as a principled solution to over-encoding problems.

Structure Preservation Property

We define a safe template as one that has several properties: the structure preservation property described here, and the code effect and least surprise properties defined in later sections.

Intuitively, this property holds that when a template author writes an HTML tag in a safe templating language, the browser will interpret the corresponding portion of the output as a tag regardless of the values of untrusted data, and similarly for other structures such as attribute boundaries and JS and CSS string boundaries.

This property can be violated in a number of ways. E.g. in the following JavaScript, the author is composing a string that they expect will contain a single top-level bold element surrounded by text.

document.write(greeting + ', <b>' + planet + '</b>!');

and if greeting is "Hello" and planet is "World" then this holds as the output written is "Hello, <b>world</b>!"; but if greeting is "<script>alert('pwned');//" and planet is "</script>" then this does not hold since the structure has changed: the <b> should have started a bold element but the browser interprets it as part of a JavaScript comment in "<script>alert('pwned');//, <b></script></b>!".

Lower level encoding attacks, such as UTF-7 attacks, may also violate this property.

More formally, given any template, e.g.

<div id="{$id}" onclick="alert('{$message}')">{$message}</div>

we can derive an innocuous template by replacing every untrusted variable with an innocuous string, a string that is not empty, is not a keyword in any programming language and does not contain special characters in any of the languages we're dealing with. We choose our innocuous string so that it is not a substring of the concatenation of literal string parts. Using the innocuous string "zzz", an innocuous template derived from the above is:

<div id="zzz" onclick="alert('zzz')">zzz</div>

Parsing this, we can derive a tree structure where each inner node has a type and children, and each leaf has a type and a string value.

Element
 ╠Name : "div"
 ╠Attribute
 ║ ╠Name : "id"
 ║ ╚Text : "zzz"
 ╠Attribute
 ║ ╠Name : "onclick"
 ║ ╚JsProgram
 ║   ╚FunctionCall
 ║     ╠Identifier : "alert"
 ║     ╚String : "zzz"
 ╚Text : "zzz"

A template has the structure preservation property when for all possible branch decisions through a template, and for all possible data table inputs, a template either produces no output (fails with an exception) or produces an output that can be parsed to a tree that is structurally the same as that produced by the innocuous template derived from it for the same set of branch decisions.

∀ branch-decisions ∀ data, areEquivalent(
    parse(innocuousTemplate(T)(branch-decisions, data))
    parse(T(branch-decisions, data)))

where parse parses using a combined HTML/JavaScript/CSS grammar to the tree structure described above, branch-decisions is a path through flow control constructs (the conditions in for loops and if conditions) and where areEquivalent is defined thus:

def areEquivalent(innocuous_tree, actual_tree):
  if innocuous_tree.is_leaf:
    # innocuous_string was 'zzz' in the example above.
    if innocuous_string in innocuous_tree.leaf_value:
      # Ignore the contents of actual since it was generated by
      # a hole.  We only care that it does not interfere with
      # the structure in which it was embedded.
      return True
    # Leaves structurally the same.
    # Assumes same node type implies actual is leafy.
    return (innocuous_tree.node_type is actual_tree.node_type
            and innocuous_tree.leaf_value == actual_tree.leaf_value)
  # Require type equivalence for inner nodes.
  if node_type(innocuous_tree) is not node_type(actual_tree):
    return False
  # Zip below will silently drop extras.
  if len(innocuous_tree.children) != len(actual_tree.children):
    return False
  # Recurse to children.
  for innocuous_child, actual_child in zip(
      innocuous_tree.children, actual_tree.children):
    if not areEquivalent(innocuous_child, actual_child):
      return False
  return True  # All grounds on which they could be inequivalent disproven.

This definition is not computationally tractable, but can be used as a basis for correctness proofs, and in practice branch decisions that go through loops more than twice or recurse more than twice can be ignored so by using fuzzers to generate bad data inputs, we can gain confidence in an implementation.

This property is essential to capturing developer intent. When the developer writes a tag, the browser should interpret that as a tag, and when the developer writes paired start and end tags, the browser should interpret those as a matched pair. It is also important to applications that want to embed sanitized data while preserving a trusted path since the structure preservation property is a prerequisite for visual containment.

Code Effect Property

Web clients may specify data values in code (strings, booleans, numbers, JSON) but only code specified by the template author should run as a result of injecting the template output into a page and all code specified by the template author should run as a result of the same. There are a dizzyingly large number of ways this property can fail to hold for a template. A non-exhaustive sample of ways to cause extra code to run:

There are also many ways to cause security-critical code to not run. In general, it is not wise to rely on JavaScript running in a browser, but many developers, not unreasonably, rely on some code having run if other code is running at a later time. A non-exhaustive sample of ways to stop code running via XSS:

Our proposal enforces this property by filtering URLs to prevent any data hole from specifying an exotic protocol, by filtering CSS keywords, and by only allowing data holes in JavaScript contexts to specify simple boolean, numeric, and string values, or complex JSON values which cannot have free variables. We assume that the JavaScript interpreter will work on arbitrarily large inputs. "Defining Code-Injection Attacks" by Ray & Ligatti defines a similar property: a CIAO (code injection attack on outputs) occurs when an interpolation causes the parse tree to include an expression that is not in its normal form, one consequence of which is that has no free variables.

Finally, our escapers are designed to produce output that avoids grammatical such as semicolon insertion, non-ASCII newline characters, regular-expression/division-operator/line-comment confusion.

Identifying all places in which a URL might appear in HTML (incl. MathML and SVG) is relatively easy compared to CSS. In CSS, it is difficult. For example, in <div style="background: {$bg}">, $bg might specify a URL, a color name, a color value like #000, a function-like color rgb(0,0,0), a keyword value like transparent, or a combination of the above. Given how hard it is to reliably black-list URLs, when you know the content is a URL, we took the rather drastic approach of forbidding anything that might specify a colon in CSS data holes. This seems to affect very little in practice, and we could relax this constraint to allow colons preceded by a safe word like the name of an element, pseudo-element, or innocuous property. Even if we did, it is possible that existing code uses colons in data holes to specify list separators a la semantic HTML, and we would break that use case:

ul.inline li { list-style: none; display: inline }
ul.inline li:before { content: ': ' }  /* ', ' here would give a normal looking list. */
ul.inline li:first-child:before { content: '' }

This property is a prerequisite for many application privacy goals. If a third-party can cause script to run with the privileges of the origin, it can steal user data and phone home. Even if credentials are unavailable to JavaScript (HTTPOnly cookies), scripts with same-origin privileges can screen scrape (using DOM APIs) user names and identifiers and associated page content and phone home.

This property is also a prerequisite for many informed consent goals. If a third-party script can install onsubmit handlers, it can rewrite form data before it is submitted with the XSRF tokens that are meant to ensure that the data submitted was specified by the user.

Least Surprise Property

The last of the security properties that any auto-sanitization scheme should preserve is the property of least surprise. The authors do not know how to formalize this property.

Developer intuition is important. A developer (or code reviewer) familiar with HTML, CSS, and JavaScript; who knows that auto-sanitization is happening should be able to look at a template and correctly infer what happens to dynamic values without having to read a complex specification document. Simple rules-of-thumb should be sufficient to understand the system. E.g. if a mythical average developer sees <script>var msg = '{$msg}';</script> and their intuition is that $world should be escaped using JavaScript style \ sequences, and that is sufficient to preserve the other security properties, then that is what the system should do. Templates should be both easy to write and to code review.

Exceptions to the system should be easily audited. SQL prepared statements are great, but there's no way to have exceptions to the rule without giving up the whole safety net, so sometimes developers work around them by concatenating strings. It's hard to grep (or craft presubmit triggers) for all the places where concatenated strings are passed to SQL APIs, so it's hard for a more senior developer to find these after the fact and explain how naive developers can achieve their goal working within the system, notice a trend that points to a systemic problem with schemas, or agree that the exception to the rule is warranted and document it for future security auditors.

Our proposal was designed with this goal in mind, but we have not managed to quantify our success. We can note that 1035 templates were converted within a matter of weeks without a flood of questions to the mailing lists we monitor, so we infer that most of the parts of the system that were heavily exercised were non-controversial. Different communities of developers may have different expectations. We worked with a group of developers most of whom knew Java, C++, or both before starting web application development, and among whom a high proportion have at least a bachelor's degrees in CS or a related field. They may differ, intuition-wise, from developers who came to web development from a Ruby, Perl, or PHP background.

Alternate Approaches

In this section we introduce a number of alternative proposals, explain why they perform worse on the metrics above. We cite real systems as examples of some of these alternatives. Many of these systems are well-thought out, reasonable solutions to particular problems their authors faced. We merely argue that they do not extend well to the criteria we outlined above and explicitly label these sections "strawmen" to clarify the difference between our design criteria and the contexts in which these systems arose. We do claim though that any comprehensive solution to XSS, at a tools level, should meet the criteria above.

Strawman 0: Manual sanitization

Manual sanitization is the state-of-the-art currently. Developers use a suite of functions, such as OWASP's open source OSAPI encoders and every developer must learn when and how to apply them correctly. They must apply sanitizers either before data reaches a template or within the template by inserting function calls into code.

This places a significant burden on developers and does not guarantee any of the security properties listed above. One lapse can undo all the work put into hardening a website because of the all-or-nothing nature of the same-origin policy.

There is a tradeoff between correctness and simplicity of API that works in the attackers favor. Manual sanitization is particularly error-prone because developers learn the good parts of the languages they work in, but attackers have available to them the bad parts as well. The syntax of HTML, CSS, and JavaScript are much gnarlier than most developers imagine, and it is an unreasonable burden to expect them to learn and remember obscure syntactic corner cases. These corner cases mean that the typical suite of 4-6 escaping functions is the most that many developers can reliably choose from, but they are insufficient to handle corner cases or nested contexts.

Changes in language syntax or vendor-specific extensions (e.g. XML4J and embedded SVG) may invalidate developers previously valid assumptions. Code that was safe before may no longer be safe. With an automated system, a security patch and recompile may suffice, but a patch to code that took a team of developers years to write will take a team of developers to fix.

XSS Scanners (e.g. lemon) can mitigate some of manual sanitization's cons (though they work with any of the other solutions here as well to provide defense-in-depth), but there are no good scanners for AJAX applications, and, with manual sanitization, scanners impose a continual burden on developers to respond to the reported errors.

Strawman I: Non-contextual auto-sanitization

Non-contextual auto-sanitization is a great improvement over manual sanitization. Django templates and others use it.

It works by assuming that every data hole should be sanitized the same way, usually by HTML entity encoding. As such, it is prone to over-escaping and mis-escaping. To understand mis-escaping, consider what happens when the following template is called with ', alert('XSS'), ' :

<button onclick="setName('{$name}')">

The template produces <button onclick="setName('&apos;, alert(&apos;XSS &apos;), &apos;')"> which is exactly the same, to the browser, as <button onclick="setName('', alert('XSS '), '')"> because the browser HTML entity decodes the attribute value before invoking the JavaScript parser on it.

Non-contextual auto-sanitization cannot preserve the structure preservation property for JavaScript, CSS, or URLs because it is unaware of those languages. It also fails to preserve the code effect property.

Bolting filters on non-contextual auto-sanitization will not help it to preserve the code effect property. It is possible to write bizarre JavaScript that does not even need alphanumerics. Since JavaScript has no regular lexical grammar, regular expressions that are less than draconian are insufficient to filter out attacks.

Non-contextual auto-sanitization, with auditable exceptions like Django's, does preserve the least surprise property in a sense. With very little training, a developer can predict exactly what it will do, and empirically, 74% of the time it does what they want (our system chose some kind of HTML entity encoding for 992 out of 1348 data holes).

Strawman II: Strict structural containment

Examples of strict structural containment languages are XSLT, GXP, Yesod, and possibly XHP. For all of these, the input is (or is coercible via fancy tricks) to a tree structure like XML. So for every data hole, it is obvious to the system which element and attribute context the hole appears in†. A similar structural constraint could be applied in principle to embedded JS, CSS, and URIs.

Strict structural containment is a sound, principled approach to building safe templates that is a great approach for anyone planning a new template language, but it cannot be bolted onto existing languages though because it requires that every element and attribute start and end in the same template. This assumption is violated by several very common idioms, such as the header-footer idiom, above, in ways that often require drastic changes to repair.

Since it cannot be bolted onto existing languages, limiting ourselves to it would doom to insecurity most of the template code existing today. Most project managers who know their teams have trouble writing XSS-free code, know this because they have existing code written in a language that does not have this property.

† - modulo mechanisms like <xsl:element name="..."> which can, in principle, be repaired using equivalence classes of elements and attributes. I.e. one could define an equivalence class of elements all of whose attributes have the same meaning and which have the same content type: (TBODY, THEAD, TFOOT), (OL, UL), (TD, TH), (SPAN, I, B, U), (H1, H2, H3, …) and allow a dynamic element mechanism to switch between element types within the same equivalence class. Similar approaches can allow selecting among equivalent dynamic attribute types : all event handlers are equivalent (modulo perhaps those that imply user interaction for some applications).

Strawman III: A runtime typing approach

Prior to this work, the best auto-sanitization scheme was a runtime scheme.

A runtime contextual auto-sanitizer plugs into a template runtime at a low level. Instead of writing content to an output buffer, the template runtime passes trusted and untrusted chunks to the auto-sanitizer. The template:

<ul>{for $item in $items}<li onclick="alert('{$item}')">{$item}{/for}</ul>

might produce the output on the left, and by propagating context at runtime, infer the context in the middle and choose to apply the escaping directives on the right before writing to the output buffer.

ContentTrustedContextSanitization function
<ul>YesPCDATAnone
<li onclick="alert('>YesPCDATAnone
fooNoJS stringescapeJSString
')">YesJS stringnone
fooNoPCDATAescapeHTML
<li onclick="alert('>YesPCDATAnone
<script>doEvil()</script>NoJS stringescapeJSString
')">YesJS stringnone
<script>doEvil()</script>NoPCDATAescapeHTML
</ul>YesPCDATAnone

This works, and with a hand-tuned C parser has been deployed successfully on CTemplates and ClearSilver.

Writing a highly tuned parser in JavaScript though is difficult so implementing this scheme requires making a hard trade-off between flexibility and correctness and download-size/speed.

Our proposal is a factor of 4 faster than a runtime scheme implemented in JavaScript and has no download size cost above and beyond the code for the sanitization functions and the calls to them.

Even in languages for which there are efficient parser generators, runtime approaches might suffer performance-wise. The overhead for the static approach is independent of the number of times a loop is re-entered, so templates that take large array inputs might perform worse with even a highly efficient runtime scheme.

Runtime sanitization does do more elegantly in at least one area though. Dynamic tag and attribute names pose no problems to a runtime sanitizer. Whereas our scheme has to filter attribute names so that $aname cannot be "onclick" in <button {$aname}=…>, because a static approach must decide that the beginning of the attribute value is either a JavaScript context or some other context, a runtime approach can take into account the actual value of $aname. This is not a common problem, and our approach does handle many dynamic attribute situations including: <button on{$handlerType}=…>.

Strawman IV: A purely static approach

We know of no purely static approaches, though they are possible. A purely static approach is one that, like our proposal, infers contexts at compile or load time, but does not take into account the runtime type of the values that fill the data holes.

This approach has problems with over-escaping. Existing systems often use a mix of sanitization in-template and sanitization outside the template in the front-end code that calls the template.

Our solution takes into account the runtime type of the values that fill a hole. If the runtime type marks the value as a known-safe string of HTML, then a sanitization function can choose not to re-escape, and instead normalize or do nothing.

See caveats for other problems that are as equally applicable to pure static systems as to our proposal.

Definitions and Algorithms

This section is only relevant to implementors, testers, and others who want to understand the implementation. Everyone else, including web application developers, can ignore it.

At a high level, the type system defines four things which are expanded upon below:

  1. An initial start context for a public template. Typically HTML_PCDATA.
  2. A context propagation algorithm which takes a chunk of literal text from the template and the context at its start and returns the context at its end. (context * string) → context.
  3. An algorithm that chooses a sanitization function for a data hole. It takes the context before the hole and returns a sanitization function and the context after the hole. context → ((α → string) * context). If data holes have statically available type info, then the type could be taken into account : (context * type) → ((α → string) * context).
  4. A context join operator that takes the contexts at the end of branches and yields the context after the branches have joined. This is used to determine the context at the end of a conditional {if} by joining the context at the end of the then-branch with the context at the end of the else-branch. It is also used with loops, where (unless proven otherwise) we have to join the context at the start (loop never entered) with a context once through, with a steady state context for many repetitions. context list → context

By contrast, the runtime auto-sanitization scheme described in strawman III has the same initial context, the same context propagation operator, no context join operator and uses a slightly differently shaped sanitization function chooser : context → (α → (string * context)).

Contexts

A context captures the state of the parser in a combined HTML/CSS/JS lexical grammar. It is composed of a number of fields which pack into 2 bytes with room to spare:

Contexts support two operators: join and ε-commit.

The join operator produces the context at the end of a condition, loop, switch, or other flow control construct. This sometimes introduces an ambiguity. In the template:

<form action="{if $tgt}/{$name}/handle?tgt={$tgt}{else}/{$name}/default{/if}">Hello {$name}…

One branch ends in the query portion of a URI, and one ends outside it. If there were a data hole at the , then we would not be able to determine an appropriate sanitization function for it†. So context joining often introduces just enough ambiguity, by using do-not-know values for fields, and in the common case, we later reach a point where we discard that info. In the URI case, if there were a # character at the we can reliably transition into a URI fragment context, and in any case, the end of the attribute moots the question.

The ε-commit operator is used when we see a data hole. In some cases, we introduce parser states to delay decision making. In the template fragment, <a href=, we could see a quote character next, or space, or the start of an unquoted value, or the end of the tag (implying empty href), or a data hole specifying the start of an unquoted attribute value. If the next construct is a data hole we need to commit to it being an unquoted attribute. The ε-commit operator in this case goes from an HTML_BEFORE_ATTRIBUTE_VALUE state with an attribute end delimiter of NONE to a state appropriate to the value type (e.g. JS for an onclick attribute) with an attribute end delimiter of SPACE_OR_TAG_END.

The precise details of both these operators were determined empirically to come up with the simplest semantics that handles cases found in real code that web developers do not consider to be badly written or confusing.

† — This could be fixed by migrating the problematic data hole and the code leading up to it into each branch, but this is tricky to do across template boundaries and has not proven to be necessary for the codebase we migrated.

Grammar

The context propagation algorithm uses a combined HTML/CSS and JS lexical grammar described below. Click on non-terminal productions for more detail.

HTML

Attributes

JS

CSS

URI

Context Propagation

The context propagation algorithm uniquely determines the context at every data hole so that a later pass may chose a sanitization function for each hole. The algorithm operates at two level, one on the graph of templates, and another individually within templates. The first deals with identifying the minimal set of templates that need to be processed, and might clone templates to deal with templates that are called in multiple different contexts.

The template context propagation algorithm uses an inference object which is implemented as a set of nested maps and a pointer to a parent inference object. This allows us to speculatively type a template sub-graph, and when we have a consistent view of types, we can collapse our conclusions into the parent by simply copying maps from children to parent. The maps include maps from holes to start contexts, from templates to end contexts used to type calls.

def autosanitize(templates):
  inferences = Inferences()
  for template in templates:
    if inferences.getEndContext(template) is not None: continue # already done
    if template.is_public() or template.is_contextually_autosanitized():
      # By exploring the call graph from only public templates, ones
      # that can be invoked by front-end code, we do not trigger error checks for
      # parts of the code-base that don't yet use contextual
      # auto-sanitization, easing migration.
      compute_end_context(template, inferences, start_context=HTML_PCDATA†)
  return inferences

That algorithm delegates all the hard work to another algorithm below that examines the template graph reachable from one particular top-level template.

def compute_end_context(template, inferences, start_context):
  # We need to choose an end context before typing the body to avoid
  # infinite regression for recursive templates.

  # Start with the optimistic assumption that the template ends in the
  # same context in which it starts.
  # Empirically, less than 0.2% of templates in our sample violate
  # this assumption.
  # The ones that do tend to be some of the gnarliest code that
  # template authors would rather not refactor.
  optimistic_assumption_1 = Inferences(parent=inferences)
  optimistic_assumption_1.template_end_contexts[template] = start_context
  end_context = propagate_context(
      template.children, start_context, optimistic_assumption_1)
  if start_context == end_context:  # Our optimistic assumption was warranted.
    optimistic_assumption_1.commit_into_parent()
    return end_context

  # Otherwise, assume that the end_context above is the end_context
  # and check that we have reached a fixed point.
  optimistic_assumption_2 = Inferences(parent=inferences)
  optimistic_assumption_2.template_end_contexts[template] = end_context
  end_context_fixed_point = propagate_context(
      template.body, start_context, optimistic_assumption_2)
  if end_context_fixed_point == end_context:  # We have a fixed point.  Phew!
    optimistic_assumption_2.commit_into_parent()
    return end_context_fixed_point

  # We could try other strategies to generate optimistic assumptions, but
  # we have not seen a need in real template code.
  raise Error(...)

Thus far, we have done nothing that is particular to the syntax templating language itself. Different languages have different semantics around parameter passing, and provide different flow control constructs. The algorithm below is an example for one that deals with a simple template language that provides calls, conditions, chunks of static template text, and expression interpolations which fill data holes. On a call, it may recurse to the compute end context algorithm above, which is how we lazily explore the portion of the template call graph needed.

def propagate_context(parse_tree_nodes, context, inferences):
  for parse_tree_node in parse_tree_nodes:
    if is_safe_text_node(parse_tree_node):
      context = apply_html_grammar(parse_tree_node.safe_text, context)
    elif is_data_hole(parse_tree_node):
      context = &epsilon_commit(context)  # see definition above
      inferences.context_for_data_hole[node] = context
      context = …  # compute context after hole.
    elif is_conditional(parse_tree_node):
      if_context = propagate_context(parse_tree_node.if_branch, context, inferences)
      else_context = propagate_context(parse_tree_node.else_branch, context, inferences)
      context = context_join(if_branch, else_branch)
    elif is_call_node(parse_tree_node):
      output_context = None
      # possible_callees comes up with the templates this might be calling,
      # and may clone templates if they are called in multiple different contexts.
      # Most template languages have static call graphs, so in practice, there is
      # exactly one possible callee.
      for possible_callee in possible_callees_of(parse_tree_node, context):
        if possible_callee not in inferences.template_end_contexts:
          context_after_call = compute_end_context(possible_callee, inferences, context)
        else:
          context_after_call = inferences.template_end_contexts[possible_callee]
        if output_context is None:
          output_context = context_after_call
        else:
          # Since 99% of templates end in their start context, in practice,
          # this join does little.
          output_context = context_join(output_context, context_after_call)
      context = output_context
  return context

† — We make the simplifying assumption that the start context for all public templates is HTML_PCDATA. Some templating languages may be used in different contexts, and so this assumption might not prove valid. We could choose the starting context for public templates based on some kind of annotation or naming convention particular to the templating language.

Sanitization Functions

We define a suite of sanitization functions. The table below describes them briefly and the context in which they are used. There are significantly more than most manual escaping schemes. As noted above, most developers who don't work on parsers for HTML/CSS/JS have a simplified mental model of the grammar which makes it difficult to choose between this many options. We have many sanitization functions because we want to minimize template output size to minimize network latency; having more sanitization functions lets us avoid escaping common characters like spaces when safe. The naming convention for sanitization function reflects the escaper, filter, and normalizer definitions from the glossary. By convention, sanitization functions are split into broad groups: escaping functions transform an input language (usually plain text) to the output language, filters transform any input language to the same string or to an innocuous string, and normalizers transform a string in the input language to an output in the same language but that is easier to embed in other languages.

escapeHTML HTML entity escapes plain text, and allows pre-saniized HTML content through unchanged
normalizeHTML Normalizes HTML. Same as HTML, but does not encode ampersands.
{escape,normalize}HTMLRcdata Like escapeHTML but does not exempt pre-sanitized content since RCDATA (<title> and <textarea>) can't contain tags.
{escape,normalize}HTMLAttribute Like escapeHTML but strips tags from pre-sanitized content.
filterHtmlElementName Rejects any invalid element name or non PCDATA element.
filterHtmlAttribName Rejects any invalid attribute name or attribute name that has JS, CSS, or URI content.
{escape,normalize}URI Percent encodes (assuming UTF-8) URI,HTML,JS,&CSS special characters to allow safe embedding. This means encoding parentheses and single quotes which should not be normalized according to RFC 3986, and is not valid for all non-hierarchical URI schemes, but the only productions using single quotes or parentheses are obsolete marker productions, and normalizing these characters is essential to safely embedding URIs in unquoted CSS url(…) and to make sure that CSS error recovery mode doesn't jump into the middle of a quoted string.
filterNormalizeUri Like normalizeUri but rejects any input that has a protocol other than http, https, or mailto.
{escape,normalize}JSStringChars Uses \uABCD style escapes for code-units special in HTML, JS, or conditional compilation.
{escape,normalize}JSRegexChars Like {escape,normalize}JSStringChars but also escapes regular expression specials such as '$'.
{escape,normalize}JSValue Encodes booleans & numbers wrapped in spaces, else quotes and escapes.
escapeCSSStringChars Uses \ABCD style escapes to escape HTML and CSS special characters.
filterCssIdentOrValue Allows classes, ids, property name parts for bidi, CSS keyword values, colors, & quantities.
noAutoescape Passes its input through unchanged. This is an auditable exception to auto-sanitization.

Sanitized Content Types

Sanitized content allows template users to pre-sanitize some content, and allow approved structured content.

new SanitizedContent('<b>Hello, World!</b>') specifies a chunk of HTML that the creator asserts is safe to embed in HTML PCDATA.

It is possible for misuse of this feature to violate all the safety properties contextual auto-sanitization provides. We assert that allowing this makes it easier to migrate code that has no XSS safety net to a better place, and satisfies some compelling use cases including HTML translated into foreign languages by trusted translators, and HTML from tag whitelisters, wiki-text-to-html converters, rich text editors. But it needs to be used carefully. Developers should:

Caveats

As noted above, (in the runtime contextual auto-sanitization strawman) static approaches (including ours) cannot handle all possible uses of dynamic attribute and element name. These seem rare in real code, and relatively easy to fix, but if necessary, a hybrid runtime/static approach could address this problem.

Static approaches get into corner cases around zero-length untrusted values. For example, to preserve the code effect property, we need to make sure that no untrusted value specifies a javascript: or similar URL protocol. In template code like <img src="{$x}{$y}"> we might naively decide that it is sufficient to filter $x to make sure that it specifies no protocol or an approved one. But if $x is the empty string, then $y might still specify a dangerous protocol. Alternatively $x might specify "javascript" and $y start with a colon. This hole can be closed a number of ways, e.g. java%73cript:alert(1337) is not a dangerous URL. Similar problems arise with JavaScript regular expressions: var myPattern = /{$x}/ where an empty $x could turn the regular expression literal into a line comment and there are similar special case fixes (/(?:)/ is not a comment). But a general solution to empty strings would be a source of considerable complexity. Simply making sanitizer functions variadic ({$x}{$y}{filterNormalizeUri($x, $y)}) will not suffice because the two interpolations might cross template boundaries.

Our JavaScript parser is unsound. JavaScript is a language that does not have a regular lexical grammar (even ignoring conditional compilation) because of the way it specifies whether a / starts a regular expression or a division operator. We use a scheme based on a draft JavaScript 1.9 grammar devised by Waldemar Horwat that makes that decision based on the last non-comment token. This works well for all the code we've seen that people actually write, and makes our approach feasible, but there is a known case where it fails: x++ /a/i vs x = ++/a/i. The second code snippet, while nonsensical, is valid JavaScript that our scheme fails to handle correctly.

Our parser does not currently recognize HTML5 escaping text spans, the regions inside <script> and <style> bodies delimited by <!-- and --> that suppress end-tag processing. This can be fixed if a codebase seems to use them. Our santization function choices are designed to not produce content containing escaping text span boundaries.

Our parser does not descend into HTML, CSS, or JS in data: URLs. We could but have not encountered the need in existing code.

Case Study

We studied 1035 templates that were migrated from an existing codebase to use contextually sanitized templates. Most of the templates were relatively small but totaled 21098 LOC and 783kB. The compilation load time cost for these 1035 templates was 998339279 ns on a platform with 2 GB of RAM, an Intel 2.6 MHz dual-core processor running Linux 2.6.31.

LOC# templates
1- 18######################################## (685)
19- 36############ (210)
37- 55#### (78)
56- 73# (33)
74- 91 (10)
92- 110 (7)
111- 128 (4)
129- 147 (3)
148- 220 (1) x 4
221- 294 (0) x 4
295- 312 (1)

Most of the sanitization functions chosen were plain text→HTML. the non-contextual auto-sanitization is correct 63% of the time assuming the auto-sanitizer is sufficient in Html, HtmlAttribute, and HtmlRcdata contexts.

If values were aggressively filtered to prevent dangerous URLs from appearing in the template input, then non-contextual auto-sanitization would be sufficient in 77% of cases. The rates might be hire for a codebase written for non-contextual sanitization by developers aware of its limitations.

|escapeHtml602 |escapeUri15
|escapeHtmlAttribute380 |escapeHtmlRcdata10
|filterNormalizeUri, |escapeHtmlAttribute231 |escapeHtmlAttributeNospace7
|escapeJsValue39 |filterHtmlIdent3
|filterCssValue33 |filterNormalizeUri1
|escapeJsString27

268 out of 1348 interpolation sites require runtime filtering (19.9)%, mostly filterNormalizeUri.

The benchmark runs over a large template with dummy data that is meant to be representative of the application using it. The benchmarks range from 15.2-16.8 ms and the standard-deviation is roughly 0.6 ms, which puts the runtime-cost of the sanitization functions in the noise.

No sanitization                   : 50% Scenario 16709334.99 ns; σ= 615548.54 ns @ 10 trials
Non-contextual auto-sanitization  : 50% Scenario 16835324.39 ns; σ=6030836.03 ns @ 10 trials
Full contextual auto-sanitization : 50% Scenario 15227861.39 ns; σ= 616193.00 ns @ 10 trials

In JavaScript, a state-machine based runtime contextual auto-sanitization approach shows a 3-4 time slowdown over string concatenation.

# rows string += Array.join open(Template(…)) DOM render time
1000 54 ms 68 ms 204 ms 508 ms 586 ms
5000 267 ms 332 ms 1159 ms 2528 ms 1458 ms

We ran the same benchmark against a runtime contextual auto-sanitizer we wrote for javascript. The "noEscape" case simply appends all the strings to a buffer. It does no context inference. The "parseOnly" case appends to a buffer and does context inference, but does no escaping. The "dynEscape" does context propagation and chooses one of three escaping methods by looking at the context from the parser. The cost of applying the escaping directive is about the same as a string copy, and the cost of parsing and propagating context at runtime is about 6 times that cost. This benchmark is a good comparison for templates where the logic that computes values to fill data holes is simple so the cost of executing the template should approach string concatenation.

For 1000 runsnoEscapeparseOnlydynEscape
  491316000 ns(1.0)  2979672000 ns(6.1)  3531971000 ns(7.2)

References


Last modified: Mon Feb 13 13:26:44 EST 2012