What is robots.txt?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Cheat Sheet

Block all web crawlers from all content
User-agent: *
Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot
Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot
Disallow: /no-google/blocked-page.html
Sitemap Parameter
User-agent: *
Disallow:
Sitemap: http://www.mywebapp.com/none-standard-location/sitemap.xml