Core caching and drupal_static to make cached drupal_http_requests

I was presented with a couple of performance problems recently while developing a rather sophisticated network of Drupal sites. For details on what this system is for greater context please read Learning Tools Interoperability in Drupal! (and why you should care)

The short of it is..

  • There is a Drupal site that relies on accurate information from a central source
  • That central source needs to be (potentially) accessed in real time to ensure accuracy
  • For architectural reasons these need to be separate Drupal systems
  • Drupal has lots of built in functions that can help!

I learned about how to utilize cache_set, cache_get and drupal_static from the expertly written and stepped through examples in this great article by lullabot about caching in Drupal 7:

A Beginner's Guide to Caching Data in Drupal 7

So, I basically have a menu callback that will ask the other system for the content required.  The page looks like this

course screen shot

All areas outlined in red are contextually presented based on who the current student is - meaning they are actually loaded from the remote system via drupal_http_requests.  There are three requests (in addition to the page load) that make up this example page:

  • Seeing what outline they should view based on section
  • Instructor contact info based on section (in block and on page)
  • Course Questions / reporting technical problems are generated by a remote call to central resources

The problem though is that in 98% of scenarios, this information won't change much for the individual once it's been generated.  I have to have authenticated users because of the nature of education but for everyone in the same section, this data will largely remain the same.  So, how best to get the most up to date information without damaging performance substantially?

No caching

No caching

This is a screenshot of a page's Devel output prior to doing any optimization.  You can see that while 202 queries (a lot I know) were executed in 630 ms, it actually took 3,401 ms to deliver the page.  This is because it's making 4 HTTP requests to the other system and rendering the information returned.  These requests aren't counted in the processing of the queries to build a page but rather the execution time to load the page.  This is effectively an unusable solution in this state.  This is one user and a page is taking over 3 seconds to load!

In tracing things down I could route it back to this offending <?php function _cis_connector_request ?>.  This is the function in question without the enhancements in place

function _cis_connector_request($url, $options = array(), $bucket = 'cis') {
    // convert to something db friendly
    // allow for direct http requests
      if ($bucket == 'none') {
        $data = drupal_http_request($url, $options);
      }
      // look for settings for this bucket
      else if ($settings = _cis_connector_build_registry($bucket)) {
        $address = _cis_connector_format_address($settings);
        // url passed in has everything for the request minus http location
        $data = drupal_http_request($address . '/' . $url, $options);
      }
      else {
        drupal_set_message(t("Educational service registry call missing, connection unavailable. Please see README.txt for details on setup."), 'error');
      }
  return $data;
}

You'll notice that this will accept all 4 requests that it's given and execute them.  The first step is to apply drupal_static and reduce the requests to 3.

Using drupal_static effectively

Results with drupal_static function in use

The first line of defense in reducing function issues is not making multiple costly requests more then once.  This is where drupal_static helps you drastically reduce the load time of these costly operations by making sure they only happen once!  The call to grab the instructor contact info is redundant so we really only need to make it once in the building of a page.  This is only valid during the current page load though and the value is not permanent beyond the current run-time of the page (meaning refresh will yield similar results every time).

You'll notice in the screenshot that our execution time and queries were reduced.  This is because of effective use of drupal_static to help prevent needless calls to up-stream.  This technique was applied to other parts of the code base as well but below we can see how it was used in the previous request function.

function _cis_connector_request($url, $options = array(), $bucket = 'cis') {
  $data = FALSE;
  // trick to mash request into a single item
  $args = func_get_args();
  // options can be an array so need to implode on its own
  if (is_array($args[1])) {
    $args[1] = implode('_', $args[1]);
  }
  // append bucket type in case default is utilized
  if (!isset($args[2])) {
    $args[2] = $bucket;
  }
  // generate a unique call signature
  $call = __FUNCTION__ . implode('_', $args);
  // statically cache future calls
  $data = &drupal_static($call);
  if (!isset($data)) {
      // allow for direct http requests
      if ($bucket == 'none') {
        $data = drupal_http_request($url, $options);
      }
      // look for settings for this bucket
      else if ($settings = _cis_connector_build_registry($bucket)) {
        $address = _cis_connector_format_address($settings);
        // url passed in has everything for the request minus http location
        $data = drupal_http_request($address . '/' . $url, $options);
      }
      else {
        drupal_set_message(t("Educational service registry call missing, connection unavailable. Please see README.txt for details on setup."), 'error');
      }
  }
  return $data;
}

I bolded the most important part above.  You'll see that we now call &drupal_static and pass in the name of the function.  This matches the resulting $data that would have been received in the end.  This is Drupal's way of skipping the need to call out for this information if it already has.  This can be seen most effectively in common developer functions like node_load in Drupal core and other menu areas that involve a lot of loading of the same data (potentially).

Page execution time was still around 1.5 seconds which is still much higher then we'd like it to be for this basic page!  So, now what?

Full Database Object Caching

The best way to optimize calls out for complex data is to never have had to!  Using Drupal's built in cache system you can set data and retrieve it in a fashion similar to Drupal static.  This goes above and beyond drupal_static though because it writes the information to the database, making it more pervasive in scope then drupal_static.  This works in conjunction with drupal_static though so you'll often want to use both if you are going for database caching in the first place.

Here's what the example function now looks like fully optimized:

/**
 * Wrapper for drupal_http_request to enable cached requests
 */  
function _cis_connector_request($url, $options = array(), $bucket = 'cis', $cached = TRUE) {
  $data = FALSE;
  // trick to mash request into a single item
  $args = func_get_args();
  // options can be an array so need to implode on its own
  if (is_array($args[1])) {
    $args[1] = implode('_', $args[1]);
  }
  // append bucket type in case default is utilized
  if (!isset($args[2])) {
    $args[2] = $bucket;
  }
  // generate a unique call signature
  $call = __FUNCTION__ . implode('_', $args);
  // statically cache future calls
  $data = &drupal_static($call);
  if (!isset($data)) {
    // convert to something db friendly
    $salt = drupal_get_hash_salt();
    $cid = hash('sha512', $salt . $call);
    if ($cached && ($cache = cache_get($cid, 'cache_cis_connector'))) {
      $data = $cache->data;      
    }
    else {
      // allow for direct http requests
      if ($bucket == 'none') {
        $data = drupal_http_request($url, $options);
      }
      // look for settings for this bucket
      else if ($settings = _cis_connector_build_registry($bucket)) {
        $address = _cis_connector_format_address($settings);
        // url passed in has everything for the request minus http location
        $data = drupal_http_request($address . '/' . $url, $options);
      }
      else {
        drupal_set_message(t("Educational service registry call missing, connection unavailable. Please see README.txt for details on setup."), 'error');
      }
      cache_set($cid, $data, 'cache_cis_connector');
    }
  }
  return $data;
}

Breaking down what's happening above

  • Check to see if we've run this function previously and had a request for the exact same information (if yes, get it from drupal_static)
  • Ensure that the function is being asked for a cached copy (default is TRUE)
  • Try and set a $cache object by asking the database for an encrypted version of the data we are looking for
  • If something is returned, set data to what was in the cache

If the cached object isn't found then it finally does the costly operation.  This means that after this information is added to the cache table, the requests no longer have to be made at all!  It also means that redundant calls to the cache table can be avoided as well thanks to drupal_static.

Drum roll please

Full caching implemented

About half a second!  That's a savings of about 3 seconds per load of this page based on 1 function improvement.  Now, to be truthful you would always go hunting around for performance optimizations but this has proven to be a rather profitable one to improve!  Also, I started implementing drupal_static on other parts of the code-base for cis_connector so that could have helped reduce the time for page processing as well.

Bonus: More drupal_static!

I started to dig through the devel log a bit more and found two other projects that could benefit from drupal_static calls (or no call to costly operations at all).  Those two modules saw newer releases as a result of this sniffing: Node Reference Highlight and Regions module. Check out page load now after cleaning up those two modules!

More Drupal static tuning
This page that started out with 202 queries to build is now 81 and builds 11x faster then it previously did (3400 down to 335).  This was also without core page compression and CSS / JS aggregation turned on.  After enabling those I was able to squeeze the same page down to: Executed 81 queries in 15.02 ms. Queries exceeding 5 ms are highlighted. Page execution time was 304.92 ms.

Drupal 8 / future

This should start to be less of an issue in D8 as Guzzle is now part of core: Adopt Guzzle library to replace drupal_http_request() Guzzle has plugins for caching requests but out of the box I don't believe it does.  Something like the function presented above may or may not be needed as a wrapper on future guzzle requests to allow for cached requests stored locally in Drupal (time will tell).