12.10.2008

Your SEO Library Card - robots.txt

As I’m working on resetting a new site to be compliant with the latest SEO strategies, I’m reminded of an analogy I’ve used to describe the importance of the robots.txt file. In the world of Googlebot and other bots/spiders, who are like patrons visiting a library full of information contained in books on shelves, they can’t enter your domain effectively without the robots.txt file – much like a patron can’t effectively enter a library without a library card. Sure, patrons can enter the building, they can read, but they cant check out anything. But they may go to the wrong shelves a few times. And its difficult to use the information more than one time.

How to improve the performance of your site? Create a regular text file called robots.txt. This file must be uploaded to the root accessible directory of your site, not a subdirectory (ie: http://www.domain.com/ but NOT http://www.domain.com/folder/) -- anything else and its just another text file.

Next? Unless you are a webmaster, I’d suggest you call yours :). They can help you format the file and actually write valuable commands for the search engines (formally known as the "Robots Exclusion Protocol"). The format is simple enough for most intents and purposes: a USERAGENT line to identify the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your site. But again, because it’s so simple, it’s just as easy to screw things up.

When finished your file can be as simple as this, which allows all spiders to
index everything:

User-agent: *
Disallow:

Or this,
which prevents all spiders from indexing any part of your
site:

User-agent: *
Disallow: /

A single keystroke separates the files, but it also determines who will have the higher page rank, and ultimately determine the traffic flow.

Good luck and may the spiders infest your site! Speak with you soon. DC