Tuesday, November 2, 2010

Robotstxt.org

How search engines craws/spider the websites and how it can be controlled (legally) is maintained by publicly available /robots.txt on all websites.

Sample robots.txt files

http://www.google.com/robots.txt
http://www.microsoft.com/robots.txt
http://www.amazon.com/robots.txt
http://www.facebook.com/robots.txt
http://www.wiley.com/robots.txt

Standards are defined in

http://www.robotstxt.org/robotstxt.html

No comments: