If you're reading this blog post then you probably know by now that caching Drupal pages with varnish is pretty easy with Drupal 7. So long as the pages are anonymous. As soon as you're logged in however, the game changes. Infact as soon as you obtain a session with PHP, the game changes and you instead rely on block level caching and views and panels caching. For some sites that will be acceptable. But when you start to scale the amount of users hitting your site, PHP just can't keep up and you'll either start to run out of connections or max out your memory.

Over the last 24 months I've been working on making a Drupal newspaper site go fast for its paid subscribers. Thats right, they're authenticated. And we managed to get Drupal serving authenticated cached pages in varnish so I'm going to do my dardest to try explain to you how we did it.

Before you read on

You should already know how to setup Varnish with Drupal. This blog post does not cover this. It also assumes you have a basic understanding of Varnish's workflow process.

Understading how authenticated page caching works conceptually.

Before we get into the technical part, there are a few things we need to consider about authenticated caching.

  • Caching is not authenticated
    Authentication is held in the form of a PHP session between the application and the client. Not the varnish cache. Since the idea of serving page cache is to not hit the backend, on a cache hit, you can't with 100% certainty say that the request is authenticated. For the most part, this is not an issue as there are ways to cache against an authenticated user but in scenarios where a session can be terminated by the application without the consent of the client, the client will not know they are logged out until they trigger a cache miss. Likewise, the cache will continue to serve authencated pages to the client untill they trigger a cache miss.
  • User specific HTML is not cacheable
    At least from a page cache perspective this is true. Any information that is specifically for the user and not for the user's role in Drupal, needs to be loaded a different way, either with javascript, ajax or varnish ESIs. The authenticated caching I'm writting about allows you to serve cached pages based on role groupings so it assumes there is no user specific HTML or that it is handled some other way.
     
  • Authenticated caching utilizes the HTTP Vary header
    Varnish listens to the vary header in an application's response. If the client sends a header like "Accept-Encoding: gzip" and the application replys with a header like "Vary: Accept-Encoding" then varnish can serve that gzipped cached page to all client requests with the same Accept-Encoding header. But when it changes to something like "Accept-Encoding: deflate" varnish will miss the cache and get a new response from the application - Drupal. For authenticated caching to work, all we have to do is add an HTTP header to the client request in varnish that Drupal can Vary on in its response. This will tell varnish to serve different versions of the same page to different variations of this custom header. 

    So in other words, if we add a header like "X-Drupal-Roles: 2,3,4", then Drupal's response could send "Vary: X-Drupal-Roles" and when the roles of the user changed, so would the page cache. It would vary.

How do we add a header to a client's HTTP request?

We have to add a middle step in the varnish's cache lookup process. This requires manipulating the varnish's VCL a bit.

Let's assume the header we'll add is called X-Drupal-Roles. The first thing we need is a value to give X-Drupal-Roles. This value must come the Drupal backend as its the one that deals with authentication, not varnish. This value is also agnostic to the request URL but specific to the role groups the user belongs too. This is important to note as it means regardless of the URL, the header value will always be the same, which means its cacheable too.

Setup a menu callback in Drupal that simply returns the user's role ids in the X-Drupal-Roles header. Lets assume the path is "x-drupal-roles".

<?php
    function mymodule_authcache_header() {
        global $user;
        // Cache response for 3 minutes to reduce request load on authcache request
        // which happens on every request.
        header('Cache-Control:public, max-age=180');
        // Change the cache of this response on any criteria that may determine this user
        // is coming from a different device.
        header('Vary: Accept-Encoding,Cookie,User-Agent');
        // Set custom X-Drupal-Roles header.
        header('X-Drupal-Roles: ' . implode(',', array_keys($user->roles)));
        // No need to render any content. Just die.
        echo ''; die;
    }
?>

Now everytime a request comes through varnish that maybe an authenticated page, varnish will first ask Drupal for the X-Drupal-Roles header via the "x-drupal-roles" path. Drupal will respond with a cachable response that also sends the X-Drupal-Roles header.

Next we need to configure the VCL in varnish so that the initial request to the backend is for "x-drupal-roles". We'll then add the X-Drupal-Roles header from the backend response to the original request in varnish and restart the VCL execution process. On the second run through, the request will go back to Drupal and Drupal will respond as per normal with a "Vary: X-Drupal-Roles" header that we've yet to add to the code base. But first the VCL:

vcl_recv {
  // We'll always restart once. Therefore, when restarts == 0 we can ensure
  // that the HTTP headers haven't been tampered with by the client.
  if (req.restarts == 0) {
    unset(req.http.X-Drupal-Roles);

    // We're going to change the URL to x-drupal-roles so we'll need to save
    // the original one first.
    set req.http.X-Original-URL = req.url;
    set req.url = '/x-drupal-roles';
  }
}

vcl_deliver {
   // If the response contains the X-Drupal-Roles header and the request URL
   // is right. Copy the X-Drupal-Roles header over to the request and restart.
   if (req.url == '/x-drupal-roles' && resp.http.X-Drupal-Roles) {
     set req.http.X-Drupal-Roles = resp.http.X-Drupal-Roles;
     set req.url = req.http.X-Original-URL;
     unset(req.http.X-Original-URL);
     return (restart);
   }
}

Making Drupal tell Varnish to cache authenticated pages

Now that the request contains a header we can tell Varnish to vary on for authenticated caching, we can continue to control caching in the application. When drupal_page_is_cacheable() evaluates to true, Drupal will natrually send the right headers in its response to make varnish cache anonymous pages. But when a session is present, this function is set to return false. To get around this we need to implement a couple of hooks. The first is re-initialise the page as cachable in hook_init(). The second is to send the right headers in hook_page_build().

<?php
/**
 * Implements hook_init().
 */
function mymodule_init() {
  drupal_page_is_cacheable(empty($_POST));
}

/**
 * Implements hook_page_build().
 */
function mymodule_page_build() {
  if (drupal_page_is_cacheable()) {
   $ttl = variable_get('page_cache_maximum_age', '86400');

    $expires     = strtotime('+' . $ttl . ' seconds');
    // Override core's expiration header.
    drupal_add_http_header('Expires', date('r', $expires));
    drupal_add_http_header('Cache-Control', 'public, max-age=' . $ttl);
  }
  // Regardless, always vary on X-Drupal-Roles.
  drupal_add_http_header('Vary', 'X-Drupal-Roles');
}
?>

Gotchas

Thats pretty much the gist of how authenticated caching works. There are however some gotchas that you'll need to be mindful of:

  • Cache hits instead of pages with Drupal messages
    Drupal's messaging system (drupal_set_message) is somewhat a hack that only reliable works without authenticated caching. Why? Because varnish caches GET requests. While Drupal won't tell varnish to cache a page with messages on it, Varnish may serve a cached page inplace of a page you might normally expect a message over a GET request. For example, saving a node sends a POST request with all the node form data which returns a 301 redirect to the node's page. The client then does a GET request which may end up being served from cache. If so then in that case you wouldn't receive a notification regarding the submission of the form data.

    To get around this you'll need the expire and purge module installed as well as a hook_form_alter implementation that looks for destination parameters on form submissions and purges the those paths in varnish prior to being redirected to them.
     

  • Pages with form tokens should not be cached
    Form tokens are specific to the user and their session and so cannot be shared by a role group. A hook_form_alter implementation can check for the use of form tokens and set drupal_page_is_cachable() to false to prevent caching.

    Alternately, you can load the value of the hidden input form via an ESI. Thats what we do.