HTTP requests and responses – Part 3

about a year ago by Miloslav Hůla  

In the first and second sections of this mini-series I describe the possibilities of controlling HTTP protocol from the presenter in Nette application. In this episode, I focus on tools for working with HTTP caches in Nette.

HTTP cache and Nette

If you've ever been interested in the HTTP protocol a bit, you might have come across the topic of caching. So how to save some of the transmitted data. This can be seen in “Developer tools” in browsers. Open them on the “Network” tab and load a website, like GitHub. Then click refresh. The main page is likely to load with code 200, but lots of other files, styles, scripts, and icons will be grayed out with code 304. They were loaded from the browser cache and 304 means “Not Modified”.

HTTP caching is a complex issue, so let's just look under the lid. The HTTP headers Last-Modifed, ETag, Pragma, Cache-Control, If-Modified-Since, and If-None-Match play the role. These are the most important at first, but there are more of them. I will show you how the web client talks to the webserver:

  • Client: Hi.
  • Server: Hi.
  • Client: I'm supposed to download main.css file from you, If-Modified-Since 11.11.2018.
  • Server: 304, do not download anything, the file has not changed since.
  • Client: Thanks (and takes the file from its cache).

or:

  • Client: Hi.
  • Server: Hi.
  • Client: I'm supposed to download selfie file from you, If-None-Match akjJ54sd
  • Server: 200, yeah, it has a different hash, here he is it, its ETag is now bfhd54se
  • Client: Oh yeah, again? (and starts downloading the file and saves it in a cache with a new hash)

The question is, how can Nette help us with HTTP caching? Nette\Http\Context will help us. Method isModified() decides whether or not to resend the file based on HTTP request headers. It also sets the necessary headers and response codes.

We will show a somewhat simplified version of implementation FileResponse that takes into account the cache.

final class FileCachedResponse implements Nette\Application\IResponse
{
	private $file;

	public function __construct(string $file)
	{
		$this->file = $file;
	}

	public function send(Nette\Http\IRequest $request, Nette\Http\IResponse $response)
	{
		$response->setContentType('...');
		$response->setHeader('Content-Description', '...');
		$response->setHeader('Content-Disposition',	'...');
		$response->setHeader('Content-Length', filesize($this->file));

		$response->setHeader('Pragma', null);
		$response->setHeader('Cache-Control', null);

		$context = new Nette\Http\Context($request, $response);

		$mTime = filemtime($this->file);
		if ($context->isModified($mTime)) {
			readfile($this->file);
		}
	}
}

Headers Pragma and Cache-Control are set by PHP. Basically, it is necessary to remove them from the HTTP response, otherwise caching will not work.

We passed the download timestamp as a parameter to Nette\Http\Context::isModified() method. The method compares the timestamp with the If-Modified-Since header, if any, and returns true/false. The method has a second parameter, the hash of the content being sent. This compares with the header ETag. Hash is useful if we do not know the time of modification of the sent data. This is not a security character hash, MD5 sum is enough, and probably simpler CRC32 or CRC64 too. If you are interested in hash validation based on the importance of the content being sent (for example, XML, in which sometimes the order of the tags doesn't matter) and not it's exact bit content, look for “weak ETag” validation.

Whether to use a timestamp, a hash or both is up to you. Both method parameters isModified() are optional and nullable.

Caching using Last-Modified and ETag headers will reduce data flows between the HTTP client and the server, but do not reduce the number of HTTP queries. The HTTP query and response headers are always transmitted, it saves transmission of the response body. If you are sure that the sent data will not change, let's say another 10 minutes, you can tell the client about their expiration using Nette\Http\IResponse::setExpiration(). For example:

$response->setExpiration('10 minutes');

In this case, the client will not ask for the URL at all for the next 10 minutes, he will not ask at all if the content has changed and will always use cache. Method setExpiration() sets headers Cache-Control and Expires.

Think mainly about the biggest cache problem: its invalidation. I personally prefer the content to arrive a little later correctly than immediately but no longer valid.

And that's all. I would like to close this three-part HTTP miniseries in Nette: o)

Addition

Adam Zemek reminded me on Twitter, where the HTTP headers Pragma and Cache-Control, and Expires, automatically sent from PHP, are taken.

Headers will start sending automatically when you start a session. See the session_cache_limiter() function in the PHP manual for instructions on how to edit or turn them off.