2009-04-18

Declaring Canonical URLs in HTTP Headers

Google recently posted about its new support for recognizing canonical URLs for page content.

A canonical URL is simply the “official” URL for accessing the item in question. Google explains it thus:

If your site has identical or vastly similar content that’s accessible through multiple URLs, [specifying a canonical URL] provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.

Canonical URLs become useful in sites like Amazon or YouTube, where the same product or video page may be accessed by a number of different URLs.

Consider, for example, the following two Amazon URLs:

Both point to the same Acer laptop product page on Amazon. I found the second URL from Amazon’s Acer laptops page, and you can bet that the URL contains information about where the user found the link, and more.

Thanks to canonical URLs, however, every link Google (as well as Ask, Microsoft Live Search, and Yahoo!) finds around the web that winds up landing on this page will all consolidate under a single URL, which will—as post linked above explains—bolster the page’s ranking in search results placement.

Using our example above, all Amazon would need to do to take advantage of this is to add the following special tag in the <head> of the product page:

So canonical URLs super simple to implement are clearly useful for SEO.

Another context in which I believe they’d be useful is the URLs of asset files (images, videos, PDFs, etc) and RESTful web services. Unfortunately, these are all cases in which server’s response data will not be HTML, in which case the solution shown above simply cannot be used.

I believe, however, that I have stumbled upon a solution worth borrowing to solve this problem.

Earlier today I was reading Robert Spychala’s proposals for URL auto-discovery, which proposes a means to include URL auto-discovery information in HTTP response headers. It struck me immediately as a great idea which can directly translate to a solution for allowing non-HTML data to specify a canonical URL.

Specifically, the canonical URL header data for the Acer laptop example from above would look like this:

Link: ; rel=canonical

And the canonical URL of its primary photo:

Link: ; rel=canonical

Since Google, Yahoo!, and the rest all index images, videos, and other non-HTML resources, I believe that supporting canonical URL declaration in the HTTP response headers is an idea worth seriously considering for all the same reasons it makes brilliant sense for HTML documents.