public tool

Sitemap Extractor

Find a site's sitemap inventory, nested indexes, URL samples, and latest lastmod signals before planning a catalog crawl.

sitemap urlnested indexesurl countlastmodsample urls
search notes

Turn sitemap files into a clean URL discovery sample

The sitemap extractor normalizes a domain to sitemap.xml, follows a capped number of nested sitemap indexes, and returns URL samples with freshness signals.

What it checks

  • Sitemap status, type, nested sitemap count, and URL count
  • Latest lastmod values across sampled URLs
  • Nested sitemap index summaries
  • Copyable sample URL list for quick scoping

Use cases

  • Estimate the size of a catalog before building a monitor
  • Find product, category, article, or location URLs from public sitemap files
  • Create seed URLs for a compliant discovery workflow
  • Compare freshness and coverage across content-heavy sites

Limitations

  • The extractor caps nested sitemap fetches and URL samples for safety.
  • Some sites publish multiple sitemap indexes outside the default path.
  • Sitemaps can be stale, incomplete, or intentionally selective.
faq

Common questions

How do I find all URLs in a sitemap?

Start with the domain sitemap.xml or a direct sitemap index URL. This extractor follows a small number of nested sitemaps and returns a capped sample for scoping.

Why does the URL count look lower than expected?

The tool caps returned URLs and nested sitemap fetches. Large sites often split indexes across many files, so use the result as a discovery sample.

Can sitemaps help scraping projects?

Yes. Sitemaps can provide seed URLs and freshness hints, reducing guesswork before building a more reliable extraction pipeline.