Tuesday, January 17, 2012

The Enterprise SEO Guide To Response Codes

Response codes impact every page, image and file on your website.

A visiting search engine bot figures out what to do based on those codes. Incorrect response codes can cause:

  • Indexation problems;
  • Duplicate content;
  • Site performance problems;
  • All manner of other site higgledy-piggledy.

Enterprise SEO is all about big, site-wide wins.

Response codes are just that: They’re easy to set up. They have a broad impact. Seems like a slam-dunk to me.

And yet, when I checked 1,000+ large sites—’large’ meaning ‘more than 5,000 pages’—only 30% got their response codes right.

Thirty. Percent.

With that, I dust off my response code tutorials, and write this quick guide to response codes for enterprise website developers, SEOs and anyone else who will listen:


The Big Three Response Codes


There are three response codes you want to know the most about:
  • 404. Page not found. If a file simply doesn’t exist, your server should deliver a 404 status. You can use a 410 response if you want Googlebot to retry the bad URL less frequently.
  • 301. Page permanently moved. If you’ve permanently removed one URL and replaced it with another, use a 301.
  • 302. Page temporarily moved. If you’ve removed something and will be putting it back, use a 302.

There are others: 200 means ‘OK’. Hopefully, you’ve got that one squared away.


Page Not Found Responses


Most important: If a browser or bot visits your site and attempts to load a file that does not exist, it should get a 404 response.

404 is how a server says “Uh, that file isn’t here.” It’s not a bad thing. It’s the right answer when someone clicks a broken link, or a page is just gone.

You can get tricky with redirection if you want to try to preserve link authority of a deleted page. But the default answer for a missing page should be 404.


The problem: Many sites deliver 302 temporary redirect, 301 permanent redirect or, even worse, 200 ‘OK’ response codes. This leads to massive site duplication and terrible crawl efficiency. Visiting bots spend their time crawling worthless content.

Possible causes and solutions:
  • .NET loves to take over control of 40x errors, replacing them with a 302 redirect to a friendly error page. That’s nice. But totally wrong. Turn off .NET’s 404 handling and let IIS take over, instead. You can still have a friendly error page.
  • A misguided developer may have thought that redirecting all ‘not found’ errors to your home page helps users. It doesn’t. It’s totally confusing, like going into a revolving door and coming out at some random location. Provide a friendly 404 page that explains something went wrong and provides options.
  • Someone may have set up a redirect page that uses a javascript or meta refresh to then reroute visitors to a ‘best guess’ page. See the previous item—same problem.
  • If your site’s on PHP, it may be using header('location: /'); die();. Try something like header("HTTP/1.1 404 Not Found");, instead.
  • Your site just delivers a 200 ‘OK’ code no matter what. I have no idea why you’d do this, but I’ve seen 100-200 sites that do. Change it.
A 410 response is OK, too. It causes Googlebot to more quickly remove a URL from the index, and to retry the URL less often. You can read up on 4xx codes, and just about every other status code, on the W3′s Status Codes definition page.

What Kind Of Redirection?

Redirects are a powerful SEO tool. They let you consolidate authority in the right places. But you have to do ‘em right.

A 301 code tells a visiting bot or browser that the page it’s loading is gone, forever, and the URL of the replacement page. Use this to consolidate authority and resolve basic canonicalization issues.

A visiting search bot will transfer some of the authority of the old URL to the new one. It will also eventually stop visiting the old URL, replacing it with the new one.

A 302 code tells a visiting bot or browser the page it’s loading is gone, but only temporarily. A visiting bot will keep returning to the old URL, checking to see if the page is back.


The problem: As near as I can tell, large sites randomly mix 302 and 301. They lose authority in some cases, and force bots to crawl permanently-removed content again and again.

Possible causes and solutions:
  • IIS 6 and earlier didn’t have a nice, clear button that said “Make this a 301 redirect”. Instead, you must check “A permanent redirect for this resource”. By default, that box is unchecked. So the default behavior is a 302, temporary redirect.
  • You’re writing redirection into your Web application, but you left out the status code. Some servers are configured to default to a 302, temporary redirect if you don’t set the status code to 301.

It’s Not That Hard

No matter how complex the server infrastructure, getting the big three response codes —404, 301, 302— right makes for sitewide wins. If you’re running a big site, look to your response codes. Get ‘em right. It’ll boost SEO, performance and user experience.

No comments:

Post a Comment