#11 new
Simon Rozet

Lazy ETag/Last-Modified generation

Reported by Simon Rozet | April 23rd, 2008 @ 08:35 AM | in 0.3.0 Sammy

I pushed a refactored version of lazy ETag generated, after Ryan's feedbacks. However, I am wondering if it is worth to extract the lazy last-modified generation code from ResponseHelpers#last_modified to #last_modified_for

Also, I tried to write some doc, but well it is far from being good. Editor anyone? Ryan? :)

I'll merge the branch with my master when those issues are resolved.

caching_helpers branch : http://github.com/sr/sinatra/com...

Comments and changes to this ticket

  • Ryan Tomayko

    Ryan Tomayko May 2nd, 2008 @ 11:49 PM

    Looks good, Simon. I've created a branch with some doc fixes and minor code tweaks: http://github.com/rtomayko/sinat...

    A few notes about the changes on that branch:

    • When embedding code samples in code comments, indent by two spaces. This causes rdoc to render the sample code in a fixed width font. You can run `rake doc` to generate local API doc and see how your doc comments look in HTML.
    • I made the object argument to entity_tag_for explicit (it was part of the varargs before) since that arg is always required.

    The only thing that strikes me as a little shaky in your changes is how entity_tag_for ignores attributes that don't exist on the object provided. It feels like we should raise a NoMethodError or ArgumentError when the object doesn't respond to one of the attributes specified. Otherwise, it's very easy to miss a typo. Is there a particular reason why you chose to ignore missing attributes?

  • Simon Rozet

    Simon Rozet May 10th, 2008 @ 08:59 AM

    The only thing that strikes me as a little shaky in your changes

    is how entity_tag_for ignores attributes that don't exist on the

    object provided. It feels like we should raise a NoMethodError or

    ArgumentError when the object doesn't respond to one of the

    attributes specified. Otherwise, it's very easy to miss a typo.

    Is there a particular reason why you chose to ignore missing

    attributes?

    I though it'd be better to just ignore when one of the attribute doesn't works and to raise when none of them works instead.

    But yeah, I agree that it is much more "safe" to raise when the object doesn't respond to one the specified attribute. Pushed.

    Also, I modified last_modified so that it now tries to get a time from updated_at and created_at.

    BTW, I deleted and recreated the caching_helpers. Also, some diff aren't perfect, which sucks. I am still trying to find a way to update the "diff of a commit" ala git add --patch and then recommit the unstaged content. I may also squash them all caching_helpers's commits into two commits. What do you think?

    Thanks a lot for taking the time to review my little branch!

  • Simon Rozet

    Simon Rozet May 19th, 2008 @ 08:07 AM

    Just to let you know that I git-rebase'd my caching_helpers branch into so called "logical commits".

  • Simon Rozet

    Simon Rozet July 8th, 2008 @ 11:25 AM

    • → Tag changed from “” to “caching”

    OK so I kind off rebased it on top of blake's next.

    Actually, a git-rebase didn't worked, so I created a local branch tracking blake's next and git-cherry-pick'd the two commit on it. Is it ok?

    The branch is still located at http://github.com/sr/sinatra/com...

  • Dan Kubb

    Dan Kubb August 14th, 2008 @ 07:19 PM

    One suggestion I'd have is to change the last_modified() method so that instead of comparing the httpdate string format for the passed-in time, I would parse the HTTP_IF_MODIFIED_SINCE using Time.httpdate (the class method) and compare that against the time object.

    The reason for this is that clients are permitted to use several date formats, and if you just do a direct string comparison, there's a chance things won't match up when they should.

    One other tip, since you're comparing two Time objects, always make sure the usec (microseconds) is set to 0 for both objects. A common mistake is to assume that since two Time objects look the same when stringified, that they will always be equal, but since the microseconds aren't part of the stringification, slight differences can go unnoticed.

    Given that you'd be parsing HTTP_IF_MODIFIED_SINCE with Time.httpdate, the usec will default to 0. However, the passed-in Time object may have a non-zero usec attribute and will need to be "truncated" before comparison.

  • Ryan Tomayko

    Ryan Tomayko August 14th, 2008 @ 11:51 PM

    The reason for this is that clients are permitted to use several date formats, and if you just do a direct string comparison, there's a chance things won't match up when they should.

    While true in theory (and specified as such in RFC 2616), using real time comparisons has proven almost impossible to get right in practice. Most user agents and caches (all that I'm aware of, in fact) treat the Last-Modified date produced by the origin server as an opaque identifier. Clients typically store the the Last-Modified value as a byte string and give it back in the If-Modified-Since response header byte-for-byte. String equality turns out to be the most reliable method of comparison.

    Seriously. Try this:

    http://gist.github.com/5547

    
    require 'rubygems'
    require 'sinatra'
    get '/' do
      response['Last-Modified'] = 'SpongeBobSquarePants'
      "If-Modified-Since: #{request.env['HTTP_IF_MODIFIED_SINCE']}"
    end
    

    Hit that in Firefox and then refresh. Safari checks that the time value is a valid RFC 2616 date but treats it as an opaque identifier otherwise.

    IIRC, this is one of the big reasons why the ETag header was introduced. The way RFC 2616 specifies weak etags is actually a much better specification for how Last-Modified is used in practice.

  • Dan Kubb

    Dan Kubb August 15th, 2008 @ 04:27 PM

    Ryan, that is very interesting. I didn't know that most browsers and caches treated it as an opaque identifier.

    However, I'm not sure specifically how parsing the Date according to the RFC and doing the comparison with Time objects would be a bad thing. What you're suggesting is more of a micro-optimization: skipping the parsing and doing a string comparison, just because most clients treat it as an opaque identifier.

    The reason I brought this up in the first place is that I've written a couple of bots that actually did parse the Last-Modified header into a Time object using Time.httpdate, and then used Time#httpdate in subsequent requests to set the If-Modified-Since header.

    In the case of browsers and caches I can totally see your point, but I wouldn't want to write server code that assumed every bot or HTTP client library would treat the Last-Modified header as an opaque string. A client could be completely compliant with RFC 2616, omit the date using of the three allowed formats, and yet not work with the currently proposed code.

  • Ryan Tomayko

    Ryan Tomayko September 7th, 2008 @ 08:54 AM

    Yeah. That's a good point, actually. If the code passes a Time/DateTime object to #last_modified, we should probably try to parse the If-Modified-Since header and do a true Time object comparison.

  • Ryan Tomayko

    Ryan Tomayko September 7th, 2008 @ 06:31 PM

    • → Assigned user changed from “Blake Mizerany” to “Ryan Tomayko”

Please Login or create a free account to add a new comment.

You can update this ticket by sending an email to from your email client. (help)

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Tags