Who is Robots.txt 测试器 best for?

Technical SEOs validating crawl rules before deploying robots.txt Publishers deciding which AI training crawlers to allow or block Developers debugging why a page is or is not being indexed

What should beginners know about Robots.txt 测试器?

Robots.txt controls crawling, not indexing; a blocked URL can still appear in results, so use noindex for true removal. Test the exact user-agent string a crawler sends, because rules for Googlebot do not automatically apply to AI bots like GPTBot. Remember that the most specific matching rule wins, so a broad Disallow can be overridden by a narrower Allow.

Free SEO Tools

Robots.txt 测试器

Free

A free robots-txt rule tester that replicates the RFC 9309 matching algorithm used by search engines and major AI crawlers — including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — to show exactly which paths are allowed or blocked for any user-agent. Paste your file contents, choose a user-agent, enter test URL paths one per line, and instantly see which specific directive controls each result, filling the gap left when Google deprecated its official testing tool in 2023.

Setuprobots.txtcrawl controltechnical SEOGooglebotAI crawlersRFC 9309

What it does

This tester recreates the RFC 9309 robots.txt matching algorithm that search engines and major AI crawlers follow, so you can confirm exactly which URLs a given user-agent may fetch. Paste your robots.txt contents, choose a crawler such as Googlebot, GPTBot, ClaudeBot, PerplexityBot, or Google-Extended, then list the URL paths you want to check. The tool reports allowed or blocked for each path and, crucially, names the specific directive responsible — including how longest-match and allow-over-disallow precedence rules resolve conflicts. It fills the gap left when Google retired its official robots.txt tester in 2023, and it is especially useful now that publishers want to control AI training crawlers as deliberately as they manage traditional search bots.

Where it fits

This tester sits in the technical SEO setup stage, verifying crawl directives before they ship and after every robots.txt edit.

Core features

RFC 9309-compliant matching that mirrors real crawler behavior
Support for Googlebot plus AI crawlers like GPTBot and ClaudeBot
Per-path allowed or blocked verdict for batch URL testing
Identification of the exact directive controlling each result
Correct handling of longest-match and allow-over-disallow precedence

Best for

Technical SEOs validating crawl rules before deploying robots.txt
Publishers deciding which AI training crawlers to allow or block
Developers debugging why a page is or is not being indexed

Beginner notes

Robots.txt controls crawling, not indexing; a blocked URL can still appear in results, so use noindex for true removal.
Test the exact user-agent string a crawler sends, because rules for Googlebot do not automatically apply to AI bots like GPTBot.
Remember that the most specific matching rule wins, so a broad Disallow can be overridden by a narrower Allow.

Robots.txt 测试器 / Robots.txt Tester

按 RFC 9309 规则验证 robots.txt 指令 / Validate robots.txt directives per RFC 9309

robots.txt 内容 / Content

User-Agent

测试路径（每行一个）/ Test Paths (one per line)

路径 / Path	结果 / Result	匹配规则 / Matched Rule
/	✓ Allowed	Allow: / [line 7]
/admin/dashboard	✓ Allowed	Allow: / [line 7]
/admin/public/page	✓ Allowed	Allow: / [line 7]
/docs/report.pdf	✗ Blocked	Disallow: /*.pdf$ [line 6]
/docs/report.pdfx	✓ Allowed	Allow: / [line 7]
/about	✓ Allowed	Allow: / [line 7]

What Is a Robots.txt Tester?

A robots.txt file is a plain-text directive that tells web crawlers which parts of your site they may or may not access. Every major search engine — Google, Bing, and increasingly AI training crawlers like GPTBot and ClaudeBot — fetches and parses this file before crawling any other URL. A single misplaced rule can accidentally block your entire site from being indexed, or unintentionally open sensitive pages to every bot on the internet.

Google operated an official robots.txt tester inside Google Search Console until it was shut down in 2023, leaving SEOs without a canonical tool for verifying rules. This free robots.txt tester fills that gap, implementing the RFC 9309 matching algorithm — the same specification that Google, Bing, and AI crawlers use to evaluate directives.

For a deeper understanding of how crawl control fits into your overall site architecture, see the guide to technical SEO.

Why Robots.txt Errors Are Costly

Robots.txt mistakes fall into two categories: over-blocking (accidentally preventing crawlers from reaching content you want indexed) and under-blocking (failing to restrict crawlers from pages that should remain private, like staging environments, admin panels, or duplicate content directories).

Over-blocking is more common and more damaging. A directive like Disallow: / under User-agent: * blocks every crawler from every URL — a configuration sometimes set on staging environments and accidentally promoted to production. Without a testing tool, this error can go undetected for days or weeks, causing an indexing collapse that takes months to recover from.

Under-blocking carries different risks. If you fail to block GPTBot or ClaudeBot, your original content may be used to train large language models without your consent. Many publishers made deliberate choices in 2023–2025 to allow or deny AI crawlers, and robots.txt remains the standard mechanism for communicating those preferences.

RFC 9309: The Matching Algorithm That Actually Matters

Not all robots.txt parsers behave identically. Google's implementation, now codified in RFC 9309, defines two critical rules that many simplified testers get wrong:

Longest match wins. When multiple rules match a URL path, the most specific one takes precedence — not the first one, and not the most permissive one. Given:

Disallow: /products/
Allow: /products/sale/

A request to /products/sale/item-1 matches both rules. The Allow: /products/sale/ rule wins because /products/sale/ (15 characters) is longer than /products/ (10 characters).

Equal-length ties go to Allow. If a Disallow and an Allow rule are exactly the same length and both match a path, the Allow directive wins. This is a tiebreaker that exists to protect against accidental blocking.

User-agent precedence. If a user-agent block exactly matches the crawler's name (e.g., User-agent: Googlebot), that block takes full precedence and the User-agent: * wildcard block is ignored for that crawler entirely.

AI Crawlers: A 2026 Priority

The robots.txt landscape changed significantly when major AI companies began deploying training crawlers at scale. As of 2026, the most commonly blocked AI bots are:

User-Agent	Operator	Common Use
GPTBot	OpenAI	LLM training data collection
ClaudeBot	Anthropic	LLM training data collection
PerplexityBot	Perplexity AI	Real-time search index
CCBot	Common Crawl	Open dataset used by many LLMs
Google-Extended	Google	Gemini training (separate from Googlebot)

Blocking these crawlers requires explicit User-agent blocks in your robots.txt. A blanket User-agent: * Disallow will block them, but it will also block Googlebot — which is rarely the intent. This tester lets you simulate each user-agent individually to confirm your rules behave correctly for each crawler class.

The relationship between AI crawlers and organic search is explored further in the AI Overviews guide, which covers how Google's own AI-generated summaries interact with your content strategy.

How to Use the Robots.txt Tester

Paste your robots.txt content — copy the raw text from https://yourdomain.com/robots.txt and paste it into the input field.
Select a user-agent — choose from the dropdown: Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or enter a custom user-agent string.
Enter URL paths to test — add one or more paths (e.g., /products/, /admin/login, /blog/) in the path list. You can test multiple paths in a single run.
Run the test — the tester applies RFC 9309 matching and returns a result for each path.
Read the results — each path shows: Allowed or Blocked, the specific rule that matched, and the line number in your robots.txt where that rule appears.
Iterate — edit your robots.txt in the input field and re-run to test proposed changes before deploying them.

Common Robots.txt Mistakes

Forgetting the trailing slash on directories. Disallow: /admin blocks /admin but may not block /admin/settings in all parsers. Use Disallow: /admin/ to be explicit.

Blocking crawlers from your XML sitemap. Your sitemap should always be accessible. If you have a broad Disallow: / block for certain agents, make sure your sitemap URL is not caught by it. The XML Sitemap Generator produces sitemap files and explains the correct Sitemap: directive syntax for referencing them inside robots.txt.

Using wildcards incorrectly. Disallow: /*.pdf$ requires a parser that supports Google's extended syntax. The RFC 9309 core spec does not mandate wildcard support — which means some crawlers will ignore the rule entirely.

Multiple User-agent blocks for the same agent. If Googlebot appears in two separate blocks, Google reads only the first matching block and ignores the second.

FAQ

Does Google always respect robots.txt?

For crawling purposes, yes — Google will not fetch a URL blocked by robots.txt. However, Google may still index a blocked URL if other sites link to it; it will appear in search results with a "No information is available for this page" snippet. To prevent indexing entirely, you need a noindex directive delivered via HTTP response header or meta tag — which requires the page to be crawlable.

Can I block AI crawlers without affecting Googlebot?

Yes. Use separate User-agent blocks:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Googlebot
Allow: /

This configuration blocks AI training crawlers while leaving Googlebot unrestricted.

How often do crawlers re-read robots.txt?

Googlebot typically caches robots.txt for up to 24 hours. Changes you deploy may not take effect for existing crawl queues until the cache expires. For urgent blocks (e.g., a staging environment accidentally exposed), use Google Search Console's URL Inspection tool to request immediate recrawling.

Concepts behind this tool

Technical SEO

Technical SEO helps search engines crawl, understand, render, and index a website.

AI Overviews

AI Overviews are Google's AI-generated summaries at the top of search results that synthesize answers from multiple sources, appearing above the traditional blue links and changing how publishers receive traffic from informational queries.

More tools: Free SEO Tools

Free

Schema 生成器

Generate valid JSON-LD structured data for any page without writing code. Choose from Article, FAQPage, Product, Organization, LocalBusiness, or BreadcrumbList schema types, fill out the form fields, and copy the finished markup into your page head to unlock Google rich results and rich snippets. Every output includes a one-click link to the Google Rich Results Test for immediate compliance validation.

Free SEO Tools

Free

Hreflang 生成器 / 校验器

Create and validate hreflang link tags for multilingual and multi-regional websites. The generator mode outputs a complete set of `<link rel="alternate" hreflang="...">` tags including x-default, while the validator mode checks your existing tags for common errors like incorrect language codes (en-UK instead of en-GB), duplicate hreflang values, missing x-default declarations, and missing self-referencing tags that cause Google to ignore the entire hreflang cluster.

Free SEO Tools

Free

XML Sitemap 生成器

Generate a standards-compliant XML sitemap from any list of URLs without creating an account or uploading files anywhere. Paste URLs one per line, configure optional lastmod, changefreq, and priority values, and enable multilingual mode to add hreflang alternate links before downloading the finished file directly. The tool enforces the 50,000 URL Sitemaps protocol hard limit, highlights malformed or non-HTTP entries before export, and outputs valid XML that Google Search Console and Bing Webmaster Tools accept immediately.

Free SEO Tools