public tool

Robots.txt Analyzer

Inspect a site's published crawl preferences before you collect pages. This tool reads robots.txt and reports directives; it does not attempt to bypass them.

The API normalizes your input to the origin and fetches only /robots.txt.

preview result

https://example.com

200 OK
robots URLhttps://example.com/robots.txt
user-agent groups2 groups
allow rules1 rule
disallow rules3 rules
crawl-delay5
sitemaps1 URL
groups2 sections

Group 1

4 directives
*
crawl-delay: 51 allow2 disallow
allow
  • /public
disallow
  • /admin
  • /checkout

Group 2

2 directives
ExampleBot
crawl-delay: none0 allow1 disallow
allow
none declared
disallow
  • /
search notes

Read crawl policy before planning a monitor

The robots.txt analyzer fetches a public robots file, groups directives by user-agent, and highlights sitemaps, crawl-delay values, and notable warnings for compliance-aware discovery.

What it checks

  • Robots URL, status code, final URL, and content type
  • User-agent groups with allow and disallow counts
  • Crawl-delay values and sitemap directives
  • Warnings for large files, unusual responses, and parser edge cases

Use cases

  • Check crawl policy before monitoring catalog or listing pages
  • Find sitemap URLs for discovery without guessing paths
  • Compare allow and disallow rules across competitor sites
  • Document compliance considerations for a data extraction project

Limitations

  • Robots.txt is a public policy signal, not an access-control system.
  • Rules can differ by user-agent and may require human review.
  • This analyzer does not grant permission or override site terms.
faq

Common questions

What does robots.txt tell a scraper?

Robots.txt communicates crawl preferences such as allowed paths, disallowed paths, crawl-delay hints, and sitemap locations for different user agents.

Does robots.txt block access technically?

No. Robots.txt is a policy file. Responsible crawlers use it as an input when deciding what to crawl and how often.

Why are sitemap directives useful?

Sitemap directives reveal URL inventories that can help plan discovery, estimate coverage, and reduce unnecessary crawling.