This is a guest post by Blake Loring, a PhD student at Royal Holloway, University of London. Blake worked at Cloudflare as an intern in the summer of 2017.
Compression is often considered an essential tool when reducing the bandwidth usage of internet services. The impact that the use of such compression schemes can have on security, however, has often been overlooked. The recently detailed CRIME, BREACH, TIME and HEIST attacks on TLS have shown that if an attacker can make requests on behalf of a user then secret information can be extracted from encrypted messages using only the length of the response. Deciding whether an element of a web-page should be secret often depends on the content of the page, however there are some common elements of web-pages which should always remain secret such as Cross-Site Request Forgery (CSRF) tokens. Such tokens are used to ensure that malicious webpages cannot forge requests from a user by enforcing that any request must contain a secret token included in a previous response.
I worked at Cloudflare last summer to investigate possible solutions to this problem. The result is a project called cf-nocompress. The aim of this project was to develop a tool which automatically mitigates instances of the attack, in particular CSRF extraction, on Cloudflare hosted services transparently without significantly impacting the effectiveness of compression. We have published a proof-of-concept implementation on GitHub, and provide a challenge site and tool which demonstrates the attack in action).
The proof-of-concept is implemented as a plugin for NGINX and requires a small patch to the gzip module. The plugin uses sregex to identify secrets within a page. The modified gzip functions as normal, however when a secret is processed compression is disabled. This ensures secrets do not get added to the compression dictionary, removing any on response size.
Additional security considerations
The regular expression matching engine we use in this proof-of-concept is not guaranteed to run in constant time. As such, matching a string against some regular expressions could introduce a timing based side-channel attack. This issue is compounded by the complexity of modern regular expressions as matching time can often be non-intuitive. Whilst in many cases the risk such an attack would pose is minimal, a limited matcher with constant runtime and restrictions on unbounded loops should be developed if our mitigation is adopted.
The Challenge Site
We have set up the challenge website compression.website with protection, and a clone of the site compression.website/unsafe without it. The page is a simple form with a per-client CSRF designed to emulate common CSRF protection. Using the example attack presented with the library we have shown that we are able to extract the CSRF from the size of request responses in the unprotected variant but we have not been able to extract it on the protected site. We welcome attempts to extract the CSRF without access to the unencrypted response.