[Trac] [RFC] HTTP Caching in Trac 0.9

Christopher Lenz cmlenz at gmx.de
Mon Dec 20 11:08:43 EST 2004


HTTP Caching in Trac 0.9
========================

Preliminary support for using the HTTP protocol-level provisions for
caching have been added to the current development version of Trac in
trunk. The changeset to first introduce this support was [1108],
subsequent changesets improved/fixed the implementation.

I'm sending this mail to solicit feedback on the approach, and outline
some concerns that I would like to address before this functionality
goes into a release (which would be 0.9).

Note: what is described here is not related in any way to server-side
caching, but only to supporting the caching performed by user agents or
proxy servers.


Basic Concept
-------------

The HTTP feature used to support caching is known as "Conditional GET"
[cget]. For example, on a request to the timeline, the current code
will generate an "ETag" header that includes the date/time of the last
event in the timeline (note that we could also use the "Last-Modified"
header for this information, but that doesn't give us enough
flexibility as I'll explain shortly). The browser then keeps a local
copy of the timeline page and remembers the ETag.

On subsequent requests to the same resource, the browser will send this
ETag as part of the "If-None-Match" request header. The timeline module
will check the value of the ETag against the date/time of the last
event in the timeline, and if both match, will send a short "304 Not
Modified" response instead of the full content. This should help making
Trac more responsive and bandwidth-friendly. While it may not completely
avoid processing the request on the server, it is often possible to stop
the processing earlier than for unconditional requests.


Supporting Personalization
--------------------------

However, taking only the last-modified time into account isn't enough:
to support "personalization" (i.e. pages that are rendered differently
depending on authentication) we also need to to differentiate between
login state and identity. For this reason, the current implementation
adds the remote user name to the ETag.

So if you look at the timeline as anonymous user, and then login and
view the timeline again, the ETag won't match and Trac will respond
with the complete content of the timeline.

Without that behaviour you wouldn't even see the "Login" / "Logged in
as xxx" navigation elements above the navigation bar reflect the login
state. But it's also important because different users may see a
different set of events in the timeline due to different permissions.
Adding the user name to the ETag enables this.

Note: "Vary: Authorization, Cookie" should also do this in theory AFAIK,
but it seems that in practice no browser implements the "Vary" header
completely.


Supporting Server-Side State
----------------------------

Trac 0.8 added server-side state (aka sessions), which of course makes
caching more complicated. With sessions, the representation of a
resources may vary not only based on the authenticated user, but also
based on the session variables (aka settings) of the user.

Currently, the only place this makes a difference is when viewing
diffs. We store the users preferences for diff layout (such as inline
vs. side-by-side, number of context lines, etc) in the session. Now, if
those settings aren't taken into account when generating the ETag for
e.g. a changeset, the user will need to force-reload the changeset page
to see the new diff options take effect.

To fix this, I have added code to the changeset module that will add a
short representation of the diff options to the ETag. Now the ETag
header for changesets would look something like this:

   ETag: W"cmlenz/1103538813/inline-U2"

The first part is the user name, the second is the changeset time (in
seconds since 1970, as usual), and the third segment represents the
diff options.


API
---

Every request-processing module needs to explicitly add support for
conditional GET. However, the Request class provides a convenience
method "check_modified()" that should be used. This method expects an
int parameter for the last modification time of the requested resource
(seconds since 01-01-1970). In addition a module can pass in an extra
string as argument that will be appended to the ETag verbatim. This is
used by the changeset module to include the diff options.

The "req.check_modified()" method will generate the ETag value, and
compare it to the ETag supplied in the request header "If-None-Match".
If the request doesn't have that header, or the header doesn't match,
the method adds the ETag header and returns normally. Otherwise, it
will send a "304 Not Modified" response and throw a
NotModifedException, which will finally be caught and ignored by the
request dispatching code.


Open Issues
-----------

While the mechanism described above works pretty well, there are some
"non-essential" aspects of Trac pages that are not taken into account
by the ETag generation. This means that some changes may not get picked
up by the browser if it has a cached copy of the pages.


1. Taking changes to "maybe-related" entities into account

The most important aspect that is currently ignored by the caching code
is what I'll call the "system state": this includes all changes,
additions or deletions made to any of the Trac objects such as tickets,
wiki pages or changesets.

This becomes important because Wiki text can link to any other object
in the system, and the link will in many cases be decorated with
information from the database. For example, a link to a wiki page that
doesn't exist will be decorated with a question mark, but as soon as
the page is created the question mark goes away. New tickets are
decorated with a asterisk character, the asterisk will go away when the
ticket is assigned, and the ticket link will be striked out when the
ticket is closed. Basically this means that the representation of any
resource that includes Wiki text may depend on an unknown number of
completely independent objects.

So anywhere where Wiki text is involved we cannot guarantee that the
requested page hasn't changed in comparison to some cached version by
looking only at the requested resource itself. The actual changes might
be considered non-essential, but they'd still be changes. (That's also
why the ETags generated are "weak validators", as identified by the "W"
prefix. This means that the resource might have changed, but not so
much that it would be necessary to invalidate the cached version.)

But because Wiki formatting is such a central aspect of Trac, I think
that this problem shold be fixed. I'd be grateful for any ideas
here.


2. Taking changes to templates into account

This only seems important for development machines where the templates
are actively beeing worked on. For this use case I would suggest that
we introduce a configuration option for disabling the caching support.


References:
   [1108]: http://projects.edgewall.com/trac/changeset/1108
   [cget]: 
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.5

----

Cheers,
Chris
--
Christopher Lenz
/=/ cmlenz at gmx.de



More information about the Trac mailing list