Skip to content

XML Sitemap Generator


Enter a domain name


Modified date
dd/mm/yyyy
Change frequency
Default priority
How many pages do I need to crawl?

Crawling...
Links Found: 0


                
                

About XML Sitemap Generator

Before a search engine can rank your page, it has to find it. That sounds simple, but on any site larger than a few pages — or any site that is new and has few external links pointing to it — discovery is a genuine problem. Googlebot moves through the web by following links. If a page has no links pointing to it from other pages, Googlebot has no path to reach it, regardless of how valuable the content is. These pages are called orphan pages, and they are far more common than most site owners realise.

An XML sitemap solves this directly. It is a structured text file — formatted in a machine-readable language called XML — that lists every important page on your website. You place it at the root of your domain (at yoursite.com/sitemap.xml), submit it to Google Search Console and Bing Webmaster Tools, and from that point crawlers have a direct map of your content rather than having to discover it purely by following links from page to page.

Think of it this way: your website is a building. Internal links are the corridors connecting the rooms. A sitemap is the architect's blueprint — handed directly to Google so it can find every room, including those at the end of a corridor with no signage. Google still decides which rooms are worth entering based on its own judgment, but the blueprint removes the obstacle of not knowing where to look.

The format has been a web standard since 2005, when Google, Yahoo, and Microsoft jointly adopted the sitemaps.org protocol. It was formalised as an official IETF standard in 2022. Every major search engine reads XML sitemaps today — Google, Bing, DuckDuckGo, Apple, and most AI search crawlers. The DigitalSub Pro XML Sitemap Generator creates a correctly formatted, submission-ready file in seconds, without requiring any coding knowledge or technical configuration.

Does Your Site Actually Need One?

This question deserves an honest answer rather than a reflexive yes. Google's own documentation states directly: "If your site's pages are properly linked, Google can usually discover most of your site." A small, well-structured website with clean internal navigation and a handful of external links probably does not depend on a sitemap for Google to discover its content.

That said, the situations where a sitemap provides clear, measurable value are more common than people expect:

New websites with few or no external links. When you launch a new site, Googlebot has nothing external to follow to your pages. Submitting a sitemap is often the fastest route to initial indexing. A developer who built a 40-page service site for a local business and submitted a sitemap on launch day typically sees that site appear in Google's index within days rather than weeks.

Sites over 500 pages. Google explicitly recommends sitemaps for large sites. As a site grows, the probability that some pages get missed by link-following alone increases — especially pages deep in the navigation hierarchy, recently published content that has not yet accumulated internal links, and product or service pages added during site updates.

Sites with orphan pages. Any page that exists but has no internal links pointing to it is invisible to crawlers. An entire section of a site can become orphaned when a menu item is accidentally deleted during a CMS update, when a category is restructured, or when content is migrated without updating the linking structure. One documented case: an eCommerce technical SEO team found an entire Clearance section had become orphaned after a developer pushed a code update that inadvertently removed the navigation link. The sitemap caught it before the pages dropped out of the index.

Frequently updated sites. News publishers, active blogs, and eCommerce stores with regular inventory changes benefit most from the <lastmod> timestamp in a sitemap. When accurate, this date tells Googlebot which pages have changed since the last crawl, allowing it to prioritise those URLs rather than re-crawling your entire site on a fixed schedule.

Sites with JavaScript-rendered content or complex navigation. If important pages are behind multiple clicks, filters, or require JavaScript execution to appear, crawlers face the same difficulty users would navigating without a map. A sitemap provides a direct path to those URLs.

Google is also clear about what a sitemap does not do: "Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling." A sitemap is a discovery and prioritisation signal — a strong recommendation — not an indexing command. Submitting a sitemap does not override Google's quality assessment. Pages with thin content, duplicate content, or poor user signals will not be indexed simply because they appear in a sitemap file. The sitemap clears one obstacle. Content quality clears the rest.

How the Generator Works

Using the tool takes under two minutes. Enter your website URL, configure the options (crawl depth, change frequency if needed, last modification date), and click Generate. The tool crawls your site up to the configured page limit, discovers accessible URLs, and produces a correctly formatted XML file ready to download and upload to your server.

The generated file follows the sitemaps.org protocol exactly, uses UTF-8 encoding, escapes special characters correctly, and includes only the tags that search engines actually use — so you get a clean, valid file without any of the padding that some generators include.

After generating, the process is:

  1. Download the sitemap file
  2. Upload it to your domain root so it is accessible at https://yoursite.com/sitemap.xml
  3. Submit the URL in Google Search Console and Bing Webmaster Tools
  4. Add a Sitemap: directive to your robots.txt file pointing to it
  5. Monitor the indexed URL count in Search Console over the following week

What the Generated File Looks Like

The XML format is straightforward to read even without technical knowledge. Here is an example of a correctly structured sitemap with three pages — exactly the kind of output this generator produces:

Generated sitemap.xml — example output

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  <!-- Homepage -->
  <url>
    <loc>https://yoursite.com/</loc>
    <lastmod>2025-05-01</lastmod>
  </url>

  <!-- Blog post -->
  <url>
    <loc>https://yoursite.com/blog/seo-tips/</loc>
    <lastmod>2025-05-10</lastmod>
  </url>

  <!-- Service page -->
  <url>
    <loc>https://yoursite.com/services/seo-audit/</loc>
    <lastmod>2025-04-28</lastmod>
  </url>

</urlset>

Each entry uses two tags. <loc> is the full canonical URL of the page — must be HTTPS if your site runs on HTTPS, must match the canonical URL exactly. <lastmod> is the date the page was last meaningfully updated, in YYYY-MM-DD format. That is all a valid, Google-compliant sitemap requires for each URL.

The Four Tags — Which Ones Google Actually Uses

The sitemaps.org standard defines four optional tags beyond <loc>: <lastmod>, <changefreq>, <priority>, and extensions for images and video. The difference between which tags search engines actually use and which they officially ignore is one of the most practically important details in sitemap configuration — and one of the most commonly misunderstood.

Tag Required? Google uses it? Verdict
<loc> Yes Yes — primary signal Always include. Full HTTPS canonical URL, no trailing variation.
<lastmod> No Yes — when accurate Include for real content updates only. Google ignores dates that do not correlate with actual changes.
<changefreq> No No — officially ignored Skip entirely. Google Search Central documentation confirms this is not used by Googlebot.
<priority> No No — officially ignored Skip entirely. Google's documentation states: "Google ignores <priority> and <changefreq> values."

Source: Google Search Central — developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap

The <lastmod> tag deserves special attention because it is genuinely useful when used correctly and actively counterproductive when used incorrectly. Google has stated that it uses lastmod values when they are "consistently and verifiably accurate" — for example, when the date can be compared to the actual modification timestamp of the page's content. When sites update lastmod on every sitemap regeneration regardless of whether any content changed, or set every page to today's date to make their content appear fresh, Google learns to distrust those timestamps and stops giving them weight. Reserve lastmod for genuine content changes: a new blog post, a substantially updated product description, a refreshed service page. Pages that have not changed do not need an updated lastmod — leaving an older accurate date is better than padding the field with false freshness signals.

What to Include in Your Sitemap — and What to Leave Out

The content of your sitemap is a direct signal to Google about which pages you consider important enough to recommend for indexing. Including the wrong pages does not just waste space — it sends confusing signals about your site's quality and directs crawl capacity toward URLs you would rather search engines ignored. Quality matters more than completeness.

Pages to include

  • All pages you actively want to rank in search results — blog posts, product pages, service pages, landing pages, the homepage, and important category pages
  • Canonical URLs only — if a page has a canonical tag pointing to a different URL, list the canonical destination, not the source
  • HTTPS URLs throughout — if your site runs on HTTPS, every URL must begin with https://. Mixed HTTP and HTTPS URLs in a sitemap are a technical error
  • Clean URLs returning a 200 status code — no broken pages, no pages that redirect

Pages to exclude

  • Noindex pages. Including a page with a noindex meta tag in your sitemap is a direct contradiction — you are simultaneously asking Google to index it (sitemap) and not index it (noindex tag). Remove all noindex pages from the sitemap entirely. This includes thank-you pages, login pages, admin pages, and any page intentionally excluded from search results.
  • Redirect URLs. List only the final destination URL. A URL that redirects to another page should not appear in the sitemap — only the canonical destination should.
  • Duplicate and parameter-generated URLs. Filtered views of product listings, sorted variations of the same category page, search result pages, session ID URLs, UTM-tagged versions — all of these produce duplicate content and should stay out of the sitemap. Including them wastes crawl budget and can dilute the relevance signals of your canonical pages.
  • Pagination pages. Paginated archives and category pages beyond page 1 are generally better excluded unless each page has genuinely unique, valuable content worth indexing independently.
  • Low-quality and thin content pages. Tag archive pages, date archive pages, author pages with few posts, stub pages, and placeholder pages. If a page is not worth ranking, it is not worth recommending to Google.

One of the most important things to check after generating your sitemap: compare it against the pages you actually want indexed. Any URL in the sitemap that returns a non-200 status code, carries a noindex directive, or uses a non-canonical URL is a sitemap error that should be corrected before submission.

Sitemap Size Limits and When You Need a Sitemap Index

A single XML sitemap file can contain a maximum of 50,000 URLs and must not exceed 50 MB uncompressed. These limits are defined by the sitemaps.org standard and applied by all major search engines. For large eCommerce sites, enterprise content platforms, and news publishers, exceeding these limits is common — and the solution is a sitemap index file.

A sitemap index is a master XML file that points to multiple child sitemap files rather than individual page URLs. Search engines fetch the index first, then follow it to each child file and process all of them. The index file itself follows a similar XML structure:

  • Large sites typically split by content type: one file for products, one for blog posts, one for category pages, one for static pages
  • This makes monitoring much cleaner in Google Search Console — you can see indexed vs submitted counts per sitemap file, making it immediately obvious if, say, your product pages have a lower indexing rate than your blog posts
  • Each child sitemap must still stay under 50,000 URLs and 50 MB
  • The index file itself has the same limits — it can reference up to 50,000 sitemap files

For most sites covered by this tool — those in the small-to-medium range — a single file is all you need. The generator handles this automatically based on the number of pages it discovers.

How to Submit Your Sitemap and What to Do Next

Generating the file is only the first step. For it to have any meaningful effect, search engines need to know where to find it. There are three ways this happens, and for best results you should use all three.

Submission workflow — complete all five steps after generating

  1. Upload sitemap.xml to your domain root. Verify it is live by visiting https://yoursite.com/sitemap.xml in your browser — you should see the raw XML of the file.
  2. Open Google Search Console → Indexing → Sitemaps → enter the sitemap URL and click Submit. Google will start processing it within hours to a few days.
  3. Open Bing Webmaster Tools → Sitemaps → Submit sitemap URL. Bing powers DuckDuckGo, Yahoo Search, and several AI assistants — this submission takes 30 seconds and extends your reach beyond Google.
  4. Add a Sitemap: directive to your robots.txt file: Sitemap: https://yoursite.com/sitemap.xml. This ensures any crawler that reads your robots.txt finds the sitemap without needing a direct submission. Generate or update that file using the Robots.txt Generator.
  5. Return to Google Search Console after 48–72 hours. Check the Sitemaps report for submitted vs discovered vs indexed URL counts. A gap between submitted and indexed is worth understanding — though not always fixing at the sitemap level. See below.

After submission, the most useful monitoring happens in Search Console's Coverage report, not the Sitemaps report. The Coverage report shows you which submitted URLs are indexed, which were discovered but not indexed (and why), and which returned errors. The most important distinction: a URL showing as "Submitted URL not indexed" is almost never a sitemap problem. It means Google found the page, processed it, and chose not to index it — which is a content quality or relevance signal, not a technical configuration issue. Fixing that requires improving the page, not resubmitting the sitemap.

Understanding the "Submitted but Not Indexed" Problem

This is the most common confusion people encounter after submitting a sitemap for the first time. You submit 80 URLs, Search Console shows 80 submitted, but only 52 indexed. Where did the other 28 go?

The honest answer: Google saw those pages and decided not to include them in its index. The reasons are almost always content-related, not technical:

  • Thin content — pages with very little original text, or content that closely duplicates other pages on your site or elsewhere on the web
  • Noindex tag present — the page appears in your sitemap but has a noindex directive in its HTML — this is a configuration conflict that should be resolved by removing the page from the sitemap
  • Soft 404 — a page that returns a 200 status code but has no meaningful content, such as an empty product page or a deleted blog post that shows a "no results" message
  • Low authority and no inbound links — on very new sites, Google may hold off on indexing pages with no external links pointing to them until it gathers more quality signals
  • Crawl budget prioritisation — on large sites, Google may simply not have processed those pages yet. Check back in two weeks before drawing conclusions.

The sitemap did its job in all of these cases — it got Google to look at the pages. Getting them indexed is a separate task that requires addressing the underlying content or authority issues.

Sitemaps and Crawl Budget — When It Actually Matters

Crawl budget is the number of pages Googlebot will crawl on your site within a given time window. For small sites under a few hundred pages, crawl budget is not a concern — Google crawls everything it can reach. For larger sites, it becomes a genuine constraint worth managing.

On eCommerce sites with thousands of product pages, crawl budget management is one of the highest-impact technical SEO activities available. When Googlebot spends its crawl time on parameter-generated duplicate URLs, filtered product listing variations, empty pagination pages, and old promotional landing pages, it crawls fewer of your valuable product and category pages. New products take longer to appear in search results. Price updates and stock changes take longer to be reflected in Google's index. Content that should rank does not reach its potential because it has not been crawled recently enough.

A well-configured sitemap that includes only your high-value canonical URLs helps redirect crawl capacity toward the pages that drive actual business results. Combined with a robots.txt file that blocks low-value URL patterns from being crawled at all, this is the foundation of effective crawl budget management for larger sites.

For smaller sites where crawl budget is not a concern, the benefit of a sitemap is primarily about discoverability — making sure newly published content gets found promptly rather than waiting for Googlebot to encounter it organically through link following.

Common Sitemap Mistakes and How to Avoid Them

Most sitemap errors fall into a small number of recurring patterns. These are the ones worth checking specifically:

Including noindex pages. This is the most common structural error. Any page carrying a noindex directive should be removed from the sitemap. If you tell Google to index it (sitemap) and not index it (noindex tag) at the same time, the noindex wins — but you are wasting crawl capacity on a page that will never appear in search results regardless.

Inflating lastmod dates. Setting every page's lastmod to today's date regardless of whether anything changed is counterproductive. Google verifies these dates against the actual content it finds when it crawls the page. When the date does not match a real content change, Google learns to distrust your lastmod values entirely and stops giving them any weight in crawl scheduling.

Using HTTP URLs instead of HTTPS. If your site serves on HTTPS, every URL in the sitemap must use HTTPS. A URL listed as HTTP in a sitemap on an HTTPS domain is a mismatch that creates a redirect chain — the crawler follows the HTTP URL, gets redirected to HTTPS, and has to process an additional hop. Use canonical, final-destination URLs throughout.

Listing redirect source URLs. The sitemap should list where you want crawlers to go, not where they start. Only list final destination URLs — the canonical page that would return a 200 status code. URLs that redirect should be excluded.

Wrong file location. The sitemap must be accessible at your domain root. A file at yoursite.com/blog/sitemap.xml only covers URLs in the /blog/ directory scope. It needs to be at yoursite.com/sitemap.xml to cover the full domain.

Special characters not entity-escaped. Ampersands in URLs (common in parameter URLs) must be written as &amp; in XML. An unescaped ampersand breaks the entire sitemap file, causing Search Console to report an XML parsing error. This is one of the main advantages of using a generator rather than writing the file manually — the generator handles encoding automatically.

How to Update Your Sitemap When Content Changes

Regenerate and resubmit your sitemap when the change is significant enough that you want search engines to recrawl the affected pages promptly. There is no fixed schedule — the trigger is the change, not the calendar. Common situations that warrant an update:

  • Publishing a new page that you want indexed quickly
  • Substantially updating existing content — adding a new section, refreshing outdated statistics, significantly rewriting a page
  • Removing pages — deleted URLs should be removed from the sitemap rather than left pointing to 404 responses. A sitemap with 404 URLs is actively sending bad signals to Search Console.
  • Restructuring URL architecture after a site migration — a fresh sitemap with the new canonical URLs helps Google process the redirect chain changes and establish the new URLs in its index faster
  • Launching new site sections — new blog categories, new product lines, new service areas

After updating the file and uploading the new version to your server, resubmit the same sitemap URL in Google Search Console. You do not need to delete and re-add the URL — resubmitting signals that the file has changed and prompts a fresh processing cycle.

What This Generator Does vs What a CMS Plugin Does

It is worth being direct about the difference between a free online generator and a plugin-based solution, so you can choose the right tool for your situation.

This generator crawls your site at the moment you use it and produces a static sitemap file that is accurate at that point in time. It does not update automatically when you publish new content, change URLs, or remove pages. If you publish ten new blog posts next month, those posts will not appear in the sitemap until you regenerate and re-upload the file.

A CMS plugin like Yoast SEO or Rank Math on WordPress generates and updates the sitemap dynamically — every time you publish or update a page, the sitemap automatically reflects the change and (in many configurations) pings Google to notify it of the update. This is clearly superior for actively managed sites publishing new content regularly.

The generator is the right tool in these specific situations:

  • You need an initial sitemap quickly for a site without a CMS plugin installed
  • Your site is built on a platform where plugins cannot be installed (static sites, certain custom-built platforms)
  • You want to inspect and verify the exact format and contents of your current sitemap
  • You are building or testing a new site and need a one-time file before configuring a longer-term solution
  • You manage a site on a platform with auto-generation (like Shopify) but want to verify the sitemap output independently

Tools to Use Alongside This One

Robots.txt Generator Google Index Checker Broken Links Finder Keyword Position Checker Meta Tags Analyzer Redirect Checker Page Speed Checker

Frequently Asked Questions

Does submitting a sitemap guarantee my pages get indexed? +

No — and Google says so explicitly. A sitemap is a discovery and prioritisation signal. It tells Googlebot where to look and which pages you consider important. But Google still applies its own quality filters to decide what actually gets indexed. Pages that are thin, duplicated, low-authority, or carry a noindex tag will not be indexed regardless of sitemap submission. The most common error people make after submitting a sitemap is interpreting a "submitted but not indexed" status as a sitemap problem. In almost every case, it is a content quality problem. Fix the content on those pages first.

What is the difference between an XML sitemap and an HTML sitemap? +

An XML sitemap is a machine-readable file intended for search engine crawlers. It uses XML markup, lives at your domain root, and is what you submit to Google Search Console. Users rarely see it directly. An HTML sitemap is a regular webpage — typically linked from the footer — that shows a human-readable, structured list of your site's content for visitors to navigate. They serve different purposes and are not substitutes for each other. This generator produces the XML format for search engine submission.

Should I set priority values for my most important pages? +

No. Google's own documentation states: "Google ignores <priority> and <changefreq> values." There is no benefit to spending time assigning different priority levels to different pages — Googlebot does not use this signal to determine crawl order or indexing priority. The generator omits these tags by default, which is the correct approach based on Google's current behaviour.

My sitemap shows errors in Google Search Console. How do I fix them? +

Click on the specific error in Search Console to see which URLs triggered it. The most common errors are: the file returning a 404 (check that it is correctly uploaded to the domain root and accessible at yoursite.com/sitemap.xml); invalid XML formatting caused by an unescaped special character in a URL (ampersands must be written as &amp;); and URLs in the file returning non-200 status codes (remove broken URLs, redirect sources, and pages that no longer exist). After fixing, resubmit the URL in Search Console. The generator handles encoding and formatting automatically, so if you use fresh output from this tool and upload it correctly, formatting errors should not occur.

Should I include images and videos in my sitemap? +

Google supports sitemap extensions for images and video using additional XML namespaces. Image sitemaps use the <image:image> tag nested within a URL entry and can help images appear in Google Images search results — valuable for photographers, eCommerce stores selling physical products, and visually-driven businesses. Video sitemaps are similarly useful for publishers with significant video content who want it discoverable in Google's video search. This generator produces a standard URL sitemap. For image and video sitemaps, the additional tags are best handled through a CMS plugin like Yoast SEO, which supports the image sitemap extension natively on WordPress.

Can I have more than one sitemap file? +

Yes. If you exceed 50,000 URLs or 50 MB, you must split across multiple files referenced by a sitemap index. Even within those limits, splitting by content type — one file for blog posts, one for product pages, one for static pages — makes indexing monitoring much cleaner in Search Console. You can see at a glance whether your product pages are being indexed at a different rate to your blog content, for instance. You can submit multiple URLs in Search Console, or submit a single index file that references all of them.

How often should I regenerate and resubmit my sitemap? +

Regenerate when your content changes significantly — new pages, deleted pages, major URL structure changes. There is no benefit to resubmitting an unchanged file repeatedly. For sites publishing new content regularly, a CMS plugin that updates the file automatically and notifies Google on each change is a better long-term arrangement than manual regeneration. Use this generator for initial setup, for sites where plugins cannot be installed, or when you need to verify the exact format of the output independently of what your CMS produces.

Is the XML Sitemap Generator free to use? +

Yes — completely free, no account required, no usage limits. All 47 tools on DigitalSub Pro work the same way.

Where your sitemap lives after upload — by platform

WordPress + Yoast / Rank Math

Plugins generate and update automatically. Typically at /sitemap.xml or /sitemap_index.xml. No manual upload needed. Submit that URL to Search Console.

WordPress (no plugin)

Upload sitemap.xml to /public_html/ via FTP or cPanel File Manager so it is accessible at the domain root.

Shopify

Shopify auto-generates at yourstore.com/sitemap.xml. No upload needed. Submit that URL directly to Google Search Console and Bing Webmaster Tools.

Static / HTML sites

Save the generated output as sitemap.xml and upload to the same folder as your index.html — the root directory of your domain.