What is robots.txt?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Cheat Sheet

Block all web crawlers from all content
User-agent: *
Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot
Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot
Disallow: /no-google/blocked-page.html
Sitemap Parameter
User-agent: *
Disallow:
Sitemap: http://www.mywebapp.com/none-standard-location/sitemap.xml
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s