Ads Growth Tools
SEOSEOPaid AcquisitionPaid acquisitionProgrammaticWebsite MonetizationProgrammaticApp UAApp MonetizationWebsite monetizationKeyword ResearchSearch IntentApp acquisitionROASCPAApp monetizationCPCLTVAffiliateeCPMRPMRetail MediaAttributionConversion TrackingCreative IntelMMPHeader BiddingDSPSSPRTBAd ViewabilityFill RateASOSKAdNetworkARPDAURewarded VideoAd MediationAffiliateCreative TestingA/B TestingRetargetingLookalike AudiencesCampaign OptimizationBrand SafetySupply Path
SEOSEOPaid AcquisitionPaid acquisitionProgrammaticWebsite MonetizationProgrammaticApp UAApp MonetizationWebsite monetizationKeyword ResearchSearch IntentApp acquisitionROASCPAApp monetizationCPCLTVAffiliateeCPMRPMRetail MediaAttributionConversion TrackingCreative IntelMMPHeader BiddingDSPSSPRTBAd ViewabilityFill RateASOSKAdNetworkARPDAURewarded VideoAd MediationAffiliateCreative TestingA/B TestingRetargetingLookalike AudiencesCampaign OptimizationBrand SafetySupply Path

Free SEO Tools

Robots.txt 测试器

Free

A free robots-txt rule tester that replicates the RFC 9309 matching algorithm used by search engines and major AI crawlers — including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — to show exactly which paths are allowed or blocked for any user-agent. Paste your file contents, choose a user-agent, enter test URL paths one per line, and instantly see which specific directive controls each result, filling the gap left when Google deprecated its official testing tool in 2023.

Setuprobots.txtcrawl controltechnical SEOGooglebotAI crawlersRFC 9309

Robots.txt 测试器 / Robots.txt Tester

按 RFC 9309 规则验证 robots.txt 指令 / Validate robots.txt directives per RFC 9309

路径 / Path结果 / Result匹配规则 / Matched Rule
/✓ AllowedAllow: / [line 7]
/admin/dashboard✓ AllowedAllow: / [line 7]
/admin/public/page✓ AllowedAllow: / [line 7]
/docs/report.pdf✗ BlockedDisallow: /*.pdf$ [line 6]
/docs/report.pdfx✓ AllowedAllow: / [line 7]
/about✓ AllowedAllow: / [line 7]

What Is a Robots.txt Tester?

A robots.txt file is a plain-text directive that tells web crawlers which parts of your site they may or may not access. Every major search engine — Google, Bing, and increasingly AI training crawlers like GPTBot and ClaudeBot — fetches and parses this file before crawling any other URL. A single misplaced rule can accidentally block your entire site from being indexed, or unintentionally open sensitive pages to every bot on the internet.

Google operated an official robots.txt tester inside Google Search Console until it was shut down in 2023, leaving SEOs without a canonical tool for verifying rules. This free robots.txt tester fills that gap, implementing the RFC 9309 matching algorithm — the same specification that Google, Bing, and AI crawlers use to evaluate directives.

For a deeper understanding of how crawl control fits into your overall site architecture, see the guide to technical SEO.

Why Robots.txt Errors Are Costly

Robots.txt mistakes fall into two categories: over-blocking (accidentally preventing crawlers from reaching content you want indexed) and under-blocking (failing to restrict crawlers from pages that should remain private, like staging environments, admin panels, or duplicate content directories).

Over-blocking is more common and more damaging. A directive like Disallow: / under User-agent: * blocks every crawler from every URL — a configuration sometimes set on staging environments and accidentally promoted to production. Without a testing tool, this error can go undetected for days or weeks, causing an indexing collapse that takes months to recover from.

Under-blocking carries different risks. If you fail to block GPTBot or ClaudeBot, your original content may be used to train large language models without your consent. Many publishers made deliberate choices in 2023–2025 to allow or deny AI crawlers, and robots.txt remains the standard mechanism for communicating those preferences.

RFC 9309: The Matching Algorithm That Actually Matters

Not all robots.txt parsers behave identically. Google's implementation, now codified in RFC 9309, defines two critical rules that many simplified testers get wrong:

Longest match wins. When multiple rules match a URL path, the most specific one takes precedence — not the first one, and not the most permissive one. Given:

Disallow: /products/
Allow: /products/sale/

A request to /products/sale/item-1 matches both rules. The Allow: /products/sale/ rule wins because /products/sale/ (15 characters) is longer than /products/ (10 characters).

Equal-length ties go to Allow. If a Disallow and an Allow rule are exactly the same length and both match a path, the Allow directive wins. This is a tiebreaker that exists to protect against accidental blocking.

User-agent precedence. If a user-agent block exactly matches the crawler's name (e.g., User-agent: Googlebot), that block takes full precedence and the User-agent: * wildcard block is ignored for that crawler entirely.

AI Crawlers: A 2026 Priority

The robots.txt landscape changed significantly when major AI companies began deploying training crawlers at scale. As of 2026, the most commonly blocked AI bots are:

User-AgentOperatorCommon Use
GPTBotOpenAILLM training data collection
ClaudeBotAnthropicLLM training data collection
PerplexityBotPerplexity AIReal-time search index
CCBotCommon CrawlOpen dataset used by many LLMs
Google-ExtendedGoogleGemini training (separate from Googlebot)

Blocking these crawlers requires explicit User-agent blocks in your robots.txt. A blanket User-agent: * Disallow will block them, but it will also block Googlebot — which is rarely the intent. This tester lets you simulate each user-agent individually to confirm your rules behave correctly for each crawler class.

The relationship between AI crawlers and organic search is explored further in the AI Overviews guide, which covers how Google's own AI-generated summaries interact with your content strategy.

How to Use the Robots.txt Tester

  1. Paste your robots.txt content — copy the raw text from https://yourdomain.com/robots.txt and paste it into the input field.
  2. Select a user-agent — choose from the dropdown: Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or enter a custom user-agent string.
  3. Enter URL paths to test — add one or more paths (e.g., /products/, /admin/login, /blog/) in the path list. You can test multiple paths in a single run.
  4. Run the test — the tester applies RFC 9309 matching and returns a result for each path.
  5. Read the results — each path shows: Allowed or Blocked, the specific rule that matched, and the line number in your robots.txt where that rule appears.
  6. Iterate — edit your robots.txt in the input field and re-run to test proposed changes before deploying them.

Common Robots.txt Mistakes

Forgetting the trailing slash on directories. Disallow: /admin blocks /admin but may not block /admin/settings in all parsers. Use Disallow: /admin/ to be explicit.

Blocking crawlers from your XML sitemap. Your sitemap should always be accessible. If you have a broad Disallow: / block for certain agents, make sure your sitemap URL is not caught by it. The XML Sitemap Generator produces sitemap files and explains the correct Sitemap: directive syntax for referencing them inside robots.txt.

Using wildcards incorrectly. Disallow: /*.pdf$ requires a parser that supports Google's extended syntax. The RFC 9309 core spec does not mandate wildcard support — which means some crawlers will ignore the rule entirely.

Multiple User-agent blocks for the same agent. If Googlebot appears in two separate blocks, Google reads only the first matching block and ignores the second.

FAQ

Does Google always respect robots.txt?

For crawling purposes, yes — Google will not fetch a URL blocked by robots.txt. However, Google may still index a blocked URL if other sites link to it; it will appear in search results with a "No information is available for this page" snippet. To prevent indexing entirely, you need a noindex directive delivered via HTTP response header or meta tag — which requires the page to be crawlable.

Can I block AI crawlers without affecting Googlebot?

Yes. Use separate User-agent blocks:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Googlebot
Allow: /

This configuration blocks AI training crawlers while leaving Googlebot unrestricted.

How often do crawlers re-read robots.txt?

Googlebot typically caches robots.txt for up to 24 hours. Changes you deploy may not take effect for existing crawl queues until the cache expires. For urgent blocks (e.g., a staging environment accidentally exposed), use Google Search Console's URL Inspection tool to request immediate recrawling.

Concepts behind this tool

More tools: Free SEO Tools

Free

Schema 生成器

Generate valid JSON-LD structured data for any page without writing code. Choose from Article, FAQPage, Product, Organization, LocalBusiness, or BreadcrumbList schema types, fill out the form fields, and copy the finished markup into your page head to unlock Google rich results and rich snippets. Every output includes a one-click link to the Google Rich Results Test for immediate compliance validation.

Free SEO Tools
Free

Hreflang 生成器 / 校验器

Create and validate hreflang link tags for multilingual and multi-regional websites. The generator mode outputs a complete set of `<link rel="alternate" hreflang="...">` tags including x-default, while the validator mode checks your existing tags for common errors like incorrect language codes (en-UK instead of en-GB), duplicate hreflang values, missing x-default declarations, and missing self-referencing tags that cause Google to ignore the entire hreflang cluster.

Free SEO Tools
Free

XML Sitemap 生成器

Generate a standards-compliant XML sitemap from any list of URLs without creating an account or uploading files anywhere. Paste URLs one per line, configure optional lastmod, changefreq, and priority values, and enable multilingual mode to add hreflang alternate links before downloading the finished file directly. The tool enforces the 50,000 URL Sitemaps protocol hard limit, highlights malformed or non-HTTP entries before export, and outputs valid XML that Google Search Console and Bing Webmaster Tools accept immediately.

Free SEO Tools