What Is an XML Sitemap?
An XML sitemap is a structured file that lists every URL on your website you want search engines to discover and crawl. It is not a navigation file for humans — it is a communication protocol between your site and search engines, submitted directly to Google Search Console, Bing Webmaster Tools, and other indexing services. Sitemaps tell crawlers where your content lives, how recently it was updated, and how important each page is relative to others on the site.
Sitemaps matter most in two situations: for new sites with few inbound links (where crawlers might not discover all pages through link-following alone) and for large sites where crawl budget management is a genuine concern. For established sites with strong backlink profiles and thorough internal linking, search engines will likely find most pages without a sitemap — but a sitemap still serves as a quality signal and speeds up indexing after content updates.
For a full picture of how sitemaps fit alongside robots.txt, canonicalization, and crawl budget, see the technical SEO guide.
Sitemap Protocol: What the Standard Requires
XML sitemaps follow the Sitemaps protocol, a standard supported by Google, Bing, Yahoo, and Ask. Key constraints:
| Limit | Value |
|---|
| Maximum URLs per sitemap file | 50,000 |
| Maximum uncompressed file size | 50 MB |
| Maximum compressed file size | ~10 MB (gzip) |
| Sitemap index files | Can reference up to 50,000 sitemaps |
If your site exceeds 50,000 URLs, you must split the sitemap into multiple files and reference them from a sitemap index file (<sitemapindex> root element). The generator handles splitting automatically and will flag when an index file is required.
A complete sitemap entry looks like this:
<url>
<loc>https://example.com/page/</loc>
<lastmod>2026-06-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<loc> (required) — the canonical URL of the page. Must be fully qualified with protocol (https://). URLs must be consistent with your canonical declarations — if your canonical tag points to the www version, the sitemap should too. The generator validates URL format and flags malformed entries before download.
<lastmod> (optional, recommended) — the date the page's content was last meaningfully changed, in W3C datetime format (YYYY-MM-DD is sufficient). Google uses this to prioritize recrawling of recently updated pages. This is one of the few signals that directly influences crawl frequency. Do not set lastmod to today's date on every page indiscriminately — if Google sees that every page has today's date every time it crawls, it learns to ignore the field entirely.
<changefreq> (optional, advisory) — hints at how often the page typically changes: always, hourly, daily, weekly, monthly, yearly, never. Google treats this as a suggestion, not a directive. Use daily for news pages, monthly for stable product pages, yearly for evergreen content, never for archived content that will not change. The field is increasingly deprioritized by Google's own crawl scheduler, which makes its own recrawl decisions based on observed change rates.
<priority> (optional, advisory) — a relative importance score between 0.0 and 1.0, default 0.5. This is site-relative, not absolute: a page with priority 0.8 tells Google it is more important than your 0.5 pages but less important than a 1.0 page. The most common mistake is setting every page to 1.0, which eliminates the relative signal entirely. Use 1.0 for your homepage, 0.8 for top-level category pages, 0.6–0.7 for important content pages, and 0.3–0.5 for lower-priority pages.
Connecting Sitemaps and Robots.txt
The sitemap file location should be declared in your robots.txt file using the Sitemap: directive:
Sitemap: https://example.com/sitemap.xml
This declaration is user-agent independent — it appears outside any User-agent: block and is read by all crawlers. Multiple Sitemap: directives are permitted if you have multiple sitemap files or a sitemap index. Test your robots.txt with the Robots.txt Tester to confirm the sitemap URL is accessible to Googlebot and other crawlers.
How to Submit a Sitemap to Google Search Console
- Log in to Google Search Console and select your property.
- Navigate to Indexing > Sitemaps in the left sidebar.
- Enter your sitemap URL in the "Add a new sitemap" field (e.g.,
https://example.com/sitemap.xml).
- Click Submit.
Google will attempt to fetch and parse the sitemap immediately. The Sitemaps report shows how many URLs were submitted, how many have been discovered, and any parsing errors. Resubmit the sitemap after major content updates — this signals to Google that new URLs are available without waiting for the regular crawl cycle.
For Bing, submit via Bing Webmaster Tools at the equivalent Sitemaps section, or include the Sitemap directive in robots.txt, which Bing reads automatically.
Multilingual Sitemaps with Hreflang
Sitemaps can declare hreflang alternate URLs using <xhtml:link> elements inside each <url> block. This is an alternative to declaring hreflang in page <head> tags, useful for large multilingual sites where modifying individual page templates is impractical.
<url>
<loc>https://example.com/page/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/page/"/>
<xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/page/"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/page/"/>
</url>
The same reciprocity requirements apply: every URL in the cluster must appear in every other URL's alternate list. The generator validates this reciprocity and flags clusters with missing return tags before you download.
How to Use the XML Sitemap Generator
- Paste your URLs — enter one URL per line in the input field. Include only URLs you want indexed: skip paginated URLs (unless they contain unique content), admin pages, filtered views, and duplicate parameterized URLs.
- Configure optional fields — toggle on lastmod, changefreq, and/or priority if you want to include them. If toggled on, set values per-URL or apply a global default.
- Validate — the tool checks each URL for format validity (proper protocol, no disallowed characters) and flags issues before you proceed.
- Generate — click Generate Sitemap to produce the XML output.
- Download — download the completed
sitemap.xml file directly to your computer.
- Upload to your server — place the file at
https://yourdomain.com/sitemap.xml (the standard path) or any path you choose, then declare the path in robots.txt.
- Submit to Search Console — follow the steps above to submit the sitemap URL to GSC.
FAQ
Should I include every page in my sitemap?
No. Include only pages you want indexed and that have sufficient content to merit indexing. Exclude: paginated pages beyond page 2 (unless they surface unique products), filtered/faceted navigation URLs that produce near-duplicate content, thank-you and confirmation pages, account and checkout pages, search results pages, and URLs already blocked by robots.txt (never list blocked URLs in a sitemap — it creates a conflicting signal). A smaller, high-quality sitemap is more useful than a large one full of thin or duplicate content.
My sitemap was submitted but URLs are not getting indexed. What is wrong?
Sitemap submission does not guarantee indexing. Google decides independently whether a URL meets its quality bar for indexing. Common reasons submitted URLs are not indexed: thin or duplicate content, poor internal linking (sitemap is a supplement to internal links, not a replacement), canonicalization pointing elsewhere, or crawl budget constraints on large sites. Use the URL Inspection tool in Search Console to check the indexing status and fetch-as-Google result for specific URLs.
How frequently should I regenerate and resubmit my sitemap?
For active sites: regenerate and resubmit whenever you publish significant new content (more than 10–20 new URLs) or make major content updates to existing pages. For static or slow-moving sites: monthly resubmission is sufficient. Automated sitemap generation — where your CMS or build process produces a fresh sitemap on every publish event — eliminates this step entirely and is the recommended approach at scale.