Need expert SEO help? sales@toolsnest.io
ToolsNestTOOLSNEST
โš™๏ธ Technical SEOBeginnerUpdated May 2026

Robots.txt

A text file at the root of your website that instructs search engine crawlers which pages or sections they are allowed or not allowed to crawl.

๐ŸŒฑ

Simple Explanation

Robots.txt is a simple text file that lives at the root of your website (like yourdomain.com/robots.txt) and acts as a set of instructions for search engine bots. You can use it to say 'feel free to crawl everything' or 'don't go into these specific areas.' It's commonly used to block bots from admin pages, login areas, shopping carts, and other pages that shouldn't appear in Google. Think of it like a 'staff only' sign on a door โ€” bots that follow the rules (and most major ones do) won't enter those areas.

โš™๏ธ

Advanced SEO Explanation

Robots.txt follows the Robots Exclusion Protocol. It supports multiple user-agent directives (targeting specific bots like Googlebot, Bingbot, or all bots with *), Disallow rules (paths not to crawl), Allow rules (override Disallow for sub-paths), Crawl-delay (throttle crawl speed โ€” not supported by Google), and a Sitemap directive. Robots.txt is a crawl directive only โ€” it does NOT prevent indexing. Pages blocked by robots.txt can still appear in search results if external sites link to them (Google indexes the URL without crawling the content). To prevent indexing, use noindex meta tags (requires crawling). A critical operational risk: robots.txt cached incorrectly or deployed with Disallow: / can de-index an entire site within hours. Google caches robots.txt for up to 24 hours.

Why Robots.txt Matters for Rankings

Protects private areas from crawling

Admin dashboards, staging environments, checkout flows, and user account pages should never appear in search โ€” robots.txt keeps bots out.

Preserves crawl budget

Blocking low-value pages (parameter URLs, infinite scroll pages, duplicate content) prevents Googlebot from wasting crawl budget on them.

Controls which bots access your server

Aggressive or unwanted bots (scrapers, AI crawlers) can be blocked by specifying their user-agent name in robots.txt.

Points crawlers to your sitemap

The Sitemap directive in robots.txt tells all compliant bots where to find your XML sitemap, accelerating content discovery.

Real-World SEO Examples

Standard robots.txt for most websites

Allows all bots to crawl the public site while blocking private areas and pointing to the sitemap.

Code Example

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /wp-admin/

Sitemap: https://example.com/sitemap.xml

Blocking a specific bot (e.g., AI scraper)

Target individual crawlers by their user-agent name.

Code Example

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml

The most dangerous robots.txt mistake

Disallow: / blocks every bot from every page โ€” effectively making your site invisible to search engines.

โœ— Problematic
User-agent: *
Disallow: /
โœ“ Correct Approach
User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml

Common Robots.txt Mistakes

โœ— Mistake

Using robots.txt to hide sensitive data

โœ“ The Fix

Robots.txt is publicly readable โ€” anyone can see it. Never list sensitive URLs you want hidden. Use authentication or HTTP headers instead.

โœ— Mistake

Thinking Disallow prevents indexing

โœ“ The Fix

Disallow only prevents crawling, not indexing. A page blocked in robots.txt can still appear in Google if external sites link to it. Use noindex for true indexation control.

โœ— Mistake

Forgetting to list the sitemap

โœ“ The Fix

Add 'Sitemap: https://yourdomain.com/sitemap.xml' to robots.txt so every compliant bot can find your full URL list.

โœ— Mistake

Blocking CSS and JavaScript files

โœ“ The Fix

Googlebot needs to access your CSS and JS to render your pages. Blocking them with robots.txt can cause your site to appear broken in Google's renderer.

โœ— Mistake

Wrong path format (missing trailing slash)

โœ“ The Fix

Disallow: /admin blocks /admin and everything under it. Disallow: /admin without a slash also matches /administrator. Test paths carefully.

Free Tools for Robots.txt

Related Articles

โ“

Robots.txt FAQs

Frequently Asked Questions

People Also Search For

๐Ÿ” How to create robots.txt file๐Ÿ” Robots.txt vs noindex๐Ÿ” Check robots.txt๐Ÿ” Robots.txt disallow examples๐Ÿ” Robots txt sitemap