How search engines craws/spider the websites and how it can be controlled (legally) is maintained by publicly available /robots.txt on all websites.
Sample robots.txt files
http://www.google.com/robots.txt
http://www.microsoft.com/robots.txt
http://www.amazon.com/robots.txt
http://www.facebook.com/robots.txt
http://www.wiley.com/robots.txt
Standards are defined in
http://www.robotstxt.org/robotstxt.html
Tuesday, November 2, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment