Internet

RSS and Conditional GETS

Trying to clear up some of the info on Drupal, RSS and Conditional GETs.
<!--break-->
Gary has been having some problems with his Drupal RSS feeds over on Teledyn. He has blogged about it several times:

Now I get mentioned a bit in the comments, and there are a few things I feel to clarify for Gary, and others. Since the If-Modified-Since and If-None-Match logic in Drupal is something I wrote this is a subject I hopefully know a little about.

The current implementation in Drupal simply matches the If-Modified-Since header against the timestamp saved in the database, and note that this applies not just to RSS feed but all content that Drupal serves. Most (all?) web browsers in common usage store the value of the Last-Modified header and use the same one when doing an If-Modified-Since, apparently most RSS aggregators don't. Using other timestamps is valid according to RFC 2616, but it is problematic so even the RFC recommends that the timestamp specified by the server be used, and not a timestamp generated on the client side. The quick reasoning for this is:

  1. Timezones and daylight savings don't need to be factored in.
  2. Clock desync isn't an issue. Is your PC synced to an accurate time server? Is the server?
  3. Less chance of mal-formed timestamps. What you got from the server it should be well equipped to parse.

RFC 2616 has more fleshed out reasons under section 14.25.
So to avoid the previous issues, it was simply convenient to not convert the If-Modified-Since header back to a timestamp and perform a numeric comparison instead of a string comparison. No browser I have tested has any problems with this. I, however, don't use any aggregators and they don't seem to have the same level of robust support for old standards. Since I have modified the logic in Drupal for Gray to convert the timestamp to a numeric and doing the proper checks it will probably end up being in Drupal, just needs more error checking and tuning. Since RSS traffic is likely to rise this will probably be necessary.
I would still recommend aggregators store the Last-Modified or ETag header values and use those to do a new GET instead of creating their own, as most server implementation I have seen in PHP and Python seem to do the string comparison check. It isn't like storing 30 bytes extra per feed will eat up all the disc space? Odds are that not doing so will cause a lot more bandwidth being sucked up by RSS providers.
The RSS distribution model will probably need to undergo some form of change if it continues to grow. Even if all the conditional GET stuff is sorted out the number of useless requests will grow, but the problem is that RSS feeds are usually collected on the hour so traffic will have really high spikes at those moment. Normal web browsing is more smooth as users generally don't wait till the clock read 00 before going to your site. Maybe as the tools mature this will be less of a problem, but I don't have any real data on the matter so time will tell.
Speaking of data I wonder if any popular sites have analyzed the requests for the RSS feeds? Not just the number of requests, but how many result in a 304 response, and when is a feed requested on average, HTTP 1.0 or 1.1, If-None-Match of If-Modified-Since, etc? Maybe I'll get around to making a Drupal patch that stores this type of data and asking nicely if some of the more popular Drupal sites will run it for a months time to gather data from several places. Could be an interesting experiment.

EU and VAT on electronic sales

The EU has released an FAQ on the new VAT regulations.

I read in a comment, forget where sorry, asking why European corporate lobbyists didn't prevent this from happening. Well, the story is more of them wanting it to happen because it gave an unfair advantage to non-EU companies selling to the Eurozone. EU companies have had to charge VAT all along, this just levels the playing field.

Enforcing this will be interesting, but I don’t think it’s a major problem. From what I remember with VAT regulations it’s the consumer’s responsibility to make sure they pay the VAT. This means that if the authorities ask if you paid VAT on product X you need to provide the documentation. This might not be a big issue for individuals, but will be of bigger concern to larger companies.

XML is too hard for programmers

Tim Bray has some interesting comments on XML in his entry titled: XML Is Too Hard For Programmers.
---
There is even a Slashdot thread, which as usual provides some valuable comments in a sea of junk.

Feedster continues to make progress

The RSS search engine Feedster continues to advance and become more usefull. I've started using it to search for Drupal to see what gets said in the blog sphere. Its pretty cool to find metions that you never knew about.

What 10 odd Hours of Hacking Can Produce: An RSS Search Engine

What 10 odd Hours of Hacking Can Produce: An RSS Search Engine -

What 10 odd Hours of Hacking Can Produce: An RSS Search Engine


R O O G L E
(yeah that's RSS google)
[The FuzzyBlog!]

I love the top 10 list. Somehow sex is always in the top 10 when it comes to searches... sigh.

Syndicate content