Robots.txt
A text file at the root of your website that instructs search engine crawlers which pages or sections they are allowed or not allowed to crawl.
Simple Explanation
Robots.txt is a simple text file that lives at the root of your website (like yourdomain.com/robots.txt) and acts as a set of instructions for search engine bots. You can use it to say 'feel free to crawl everything' or 'don't go into these specific areas.' It's commonly used to block bots from admin pages, login areas, shopping carts, and other pages that shouldn't appear in Google. Think of it like a 'staff only' sign on a door โ bots that follow the rules (and most major ones do) won't enter those areas.
Advanced SEO Explanation
Robots.txt follows the Robots Exclusion Protocol. It supports multiple user-agent directives (targeting specific bots like Googlebot, Bingbot, or all bots with *), Disallow rules (paths not to crawl), Allow rules (override Disallow for sub-paths), Crawl-delay (throttle crawl speed โ not supported by Google), and a Sitemap directive. Robots.txt is a crawl directive only โ it does NOT prevent indexing. Pages blocked by robots.txt can still appear in search results if external sites link to them (Google indexes the URL without crawling the content). To prevent indexing, use noindex meta tags (requires crawling). A critical operational risk: robots.txt cached incorrectly or deployed with Disallow: / can de-index an entire site within hours. Google caches robots.txt for up to 24 hours.
Why Robots.txt Matters for Rankings
Protects private areas from crawling
Admin dashboards, staging environments, checkout flows, and user account pages should never appear in search โ robots.txt keeps bots out.
Preserves crawl budget
Blocking low-value pages (parameter URLs, infinite scroll pages, duplicate content) prevents Googlebot from wasting crawl budget on them.
Controls which bots access your server
Aggressive or unwanted bots (scrapers, AI crawlers) can be blocked by specifying their user-agent name in robots.txt.
Points crawlers to your sitemap
The Sitemap directive in robots.txt tells all compliant bots where to find your XML sitemap, accelerating content discovery.
Real-World SEO Examples
Standard robots.txt for most websites
Allows all bots to crawl the public site while blocking private areas and pointing to the sitemap.
Code Example
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /wp-admin/
Sitemap: https://example.com/sitemap.xmlBlocking a specific bot (e.g., AI scraper)
Target individual crawlers by their user-agent name.
Code Example
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xmlThe most dangerous robots.txt mistake
Disallow: / blocks every bot from every page โ effectively making your site invisible to search engines.
User-agent: * Disallow: /
User-agent: * Disallow: /admin/ Sitemap: https://example.com/sitemap.xml
Common Robots.txt Mistakes
โ Mistake
Using robots.txt to hide sensitive data
โ The Fix
Robots.txt is publicly readable โ anyone can see it. Never list sensitive URLs you want hidden. Use authentication or HTTP headers instead.
โ Mistake
Thinking Disallow prevents indexing
โ The Fix
Disallow only prevents crawling, not indexing. A page blocked in robots.txt can still appear in Google if external sites link to it. Use noindex for true indexation control.
โ Mistake
Forgetting to list the sitemap
โ The Fix
Add 'Sitemap: https://yourdomain.com/sitemap.xml' to robots.txt so every compliant bot can find your full URL list.
โ Mistake
Blocking CSS and JavaScript files
โ The Fix
Googlebot needs to access your CSS and JS to render your pages. Blocking them with robots.txt can cause your site to appear broken in Google's renderer.
โ Mistake
Wrong path format (missing trailing slash)
โ The Fix
Disallow: /admin blocks /admin and everything under it. Disallow: /admin without a slash also matches /administrator. Test paths carefully.
Free Tools for Robots.txt
Related Articles
Robots.txt FAQs
Frequently Asked Questions
People Also Search For
Continue Learning: Next Terms
Crawlability
The ease with which search engine bots can discover, access, and crawl all the pages on your website.
Intermediateโ๏ธCrawl Budget
The number of pages Googlebot will crawl and index on your site within a given timeframe, determined by crawl rate limit and crawl demand.
Advancedโ๏ธXML Sitemap
A structured XML file that lists all the important URLs on your website, helping search engines discover and prioritize your content for crawling.
Beginnerโ๏ธIndexing
The process by which Google adds a crawled page to its searchable database, making it eligible to appear in search results.
Beginner