How to Create a Custom XML Sitemap

Most CMS platforms generate sitemaps automatically—but automatic doesn't mean optimal. Default sitemaps include everything without discrimination: thin pages, duplicate content, out-of-stock products, and URLs you'd rather search engines ignore. A custom XML sitemap gives you control over exactly what gets indexed, how it's organized, and how often it's updated.

What is a custom XML sitemap?

A custom XML sitemap is one you build intentionally rather than accepting whatever your platform generates by default. Instead of a generic dump of every URL on your site, a custom sitemap reflects deliberate decisions about:

  • Which URLs deserve search engine attention
  • How URLs are grouped and prioritized
  • When sitemaps get updated
  • What metadata accompanies each URL

Custom doesn't necessarily mean hand-coded. It means configured with purpose—whether through specialized tools, platform settings, or custom development.

On the other hand, you have automatic or default sitemaps, which are sitemaps that are auto-generated by a CMS, plugin, or website theme.

Why Default Sitemaps Can Fall Short

Auto-generated sitemaps prioritize completeness over strategy. They include every URL the system knows about, which creates several problems.

Lack of Quality Filtering

Default sitemaps treat all pages equally. Your cornerstone content sits alongside:

  • Tag pages with three posts
  • Author archives for one-time contributors
  • Parameter-based duplicates
  • Placeholder pages with thin content
  • Out-of-stock products that aren't coming back

Search engines must crawl through everything to find what matters.

No Strategic Organization

Platform-generated sitemaps typically organize URLs alphabetically or by content type—not by business priority. Your best-selling products get the same treatment as discontinued items. New content that needs discovery sits alongside pages indexed years ago.

Stale or Inaccurate Metadata

Many default sitemaps set lastmod to the sitemap generation date rather than actual content modification dates. Some include changefreq and priority values that never change. These inaccurate signals reduce the sitemap's usefulness as a crawl guide.

Missing Content Types

Default generators often miss:

  • JavaScript-rendered content
  • PDF documents and downloadable resources
  • Images that deserve their own indexing
  • Video content
  • Subdomains or separate properties that should be unified

Who Needs a Custom Sitemap?

Large Ecommerce Stores

Catalogs with thousands of products need strategic sitemap organization. You want crawl budget focused on in-stock, high-margin products—not every variant and filtered view your platform generates.

Publishers and Media Sites

News sites need timely indexing of fresh content while managing archives that span years. A custom approach lets you prioritize recent articles without abandoning evergreen content.

Enterprise Sites with Multiple Properties

Organizations with subdomains, microsites, or international properties benefit from unified sitemap strategies that present a coherent picture to search engines.

Sites with Complex URL Structures

If your site has faceted navigation, multiple URL parameters, or complex routing, default sitemaps often include URLs that shouldn't be indexed. Custom sitemaps let you include only canonical, indexable URLs.

Anyone Doing Serious SEO

If organic search matters to your business, your sitemap shouldn't be an afterthought. Custom sitemaps are a foundational technical SEO practice.

How to Create a Custom Sitemap

Option 1: Configure Your Existing Platform

Many platforms offer sitemap customization if you dig into the settings.

WordPress with Yoast/Rank Math:

  • Exclude specific post types
  • Remove author archives
  • Exclude individual posts/pages
  • Adjust URLs per sitemap file

Shopify:

  • Limited native options
  • Third-party apps add exclusion capabilities
  • robots.txt.liquid offers some control

Magento:

  • Extensive configuration options
  • Category and product inclusion rules
  • Scheduled generation settings

WordPress

All WordPress users use the default /wp-sitemap.xml. Still, many choose the popular Yoast SEO plugin for XML sitemap creation. There are dozens of options available.

Platform configuration works for basic customization but has limits. You're constrained by what the platform developers anticipated.

Option 2: Use a Dedicated Sitemap Tool

Specialized sitemap tools offer more control than platform plugins. Sitemap.ai is purpose-built for creating optimized XML sitemaps with features that go beyond basic generation:

Intelligent URL Analysis Rather than blindly including every URL, Sitemap.ai analyzes your site structure to identify which pages should be in your sitemap and which shouldn't—flagging thin content, duplicate pages, and redirect chains before they waste crawl budget.

Custom Segmentation Organize URLs into logical sitemap groups based on your business priorities, not arbitrary technical limits. Separate product pages from blog content, prioritize high-value landing pages, and structure your sitemap index to reflect what matters.

Accurate Metadata Generate lastmod values based on actual content changes, not sitemap generation timestamps. Set meaningful priority signals based on page importance rather than default values.

Ongoing Monitoring Sitemap.ai doesn't just generate once and forget. It monitors your site for changes, identifies new URLs that should be added, flags pages that have become problematic, and keeps your sitemap current without manual intervention.

For sites where organic search drives revenue, a dedicated tool pays for itself in improved crawl efficiency and faster indexing of important content.

Option 3: Build a Custom Solution

For maximum control, build sitemap generation into your own infrastructure.

Database-driven generation: Query your product or content database directly, applying business logic to determine inclusion:

def generate_sitemap():
    products = db.query("""
        SELECT url, updated_at 
        FROM products 
        WHERE status = 'published'
        AND stock_status = 'in_stock'
        AND page_quality_score > 0.7
        ORDER BY revenue_30d DESC
    """)
    
    # Generate XML from filtered results
    return build_sitemap_xml(products)

Crawl-based generation: Spider your own site, evaluating each page against inclusion criteria:

  • Returns 200 status
  • Has indexable robots meta
  • Canonical points to self
  • Meets content quality thresholds

Hybrid approaches: Combine database queries for known content with crawling to discover pages that might be missed.

Custom development offers unlimited flexibility but requires ongoing maintenance. It makes sense for large organizations with dedicated technical SEO resources.

Building Your Custom Sitemap: Step by Step

Step 1: Audit Your Current Sitemap

Before building something new, understand what you have. Download your existing sitemap and analyze:

  • Total URL count — How many URLs are included?
  • URL types — What content types are represented?
  • Status codes — Do all URLs return 200?
  • Canonical alignment — Do URLs match their canonical tags?
  • Index status — How many sitemap URLs are actually indexed?

Tools like Screaming Frog can crawl your sitemap and surface these insights quickly.

Step 2: Define Your Inclusion Criteria

Decide what belongs in your sitemap. Good candidates:

  • Pages returning 200 status codes
  • Pages with self-referencing canonical tags
  • Pages not blocked by robots.txt or noindex
  • Pages with substantial, unique content
  • Pages you actually want ranking

Poor candidates:

  • Paginated URLs (often—depends on your site)
  • Filtered/faceted navigation URLs
  • Internal search results pages
  • User account and checkout pages
  • Thin tag or archive pages
  • Duplicate or near-duplicate content

Document your criteria explicitly. This becomes your sitemap policy.

Step 3: Plan Your Sitemap Structure

For sites with more than a few hundred URLs, plan how you'll organize child sitemaps:

sitemap-index.xml
├── sitemap-products-bestsellers.xml
├── sitemap-products-standard.xml
├── sitemap-categories.xml
├── sitemap-blog.xml
├── sitemap-resources.xml
└── sitemap-pages.xml

Consider:

  • Content types — Products, posts, pages, videos
  • Update frequency — Daily-changing content vs. stable pages
  • Priority tiers — High-value pages vs. long-tail content
  • Size limits — Keep files under 50,000 URLs and 50MB

Step 4: Generate Your Sitemap

Using your chosen method (platform config, dedicated tool like Sitemap.ai, or custom code), generate your sitemap applying your inclusion criteria and structure.

Ensure each URL entry includes:

<url>
  <loc>https://example.com/products/widget/</loc>
  <lastmod>2025-01-20T14:30:00+00:00</lastmod>
</url>

The <loc> tag is required. The <lastmod> tag is optional but valuable when accurate.

Step 5: Validate Before Deploying

Check your sitemap for errors:

XML validity:

  • Proper encoding declaration
  • Correct namespace declarations
  • All tags properly closed
  • Special characters escaped

Content validity:

  • All URLs accessible (200 status)
  • All URLs match their canonicals
  • No URLs blocked by robots.txt
  • No redirect URLs included

Size compliance:

  • Under 50,000 URLs per file
  • Under 50MB uncompressed per file

Step 6: Deploy and Submit

Upload your sitemap to your site root or a /sitemaps/ directory. Reference it in robots.txt:

Sitemap: https://example.com/sitemap-index.xml

Submit through Google Search Console and Bing Webmaster Tools. Monitor processing status over the following days.

Step 7: Establish Update Processes

A sitemap generated once and forgotten loses value quickly. Establish processes to:

  • Regenerate when content changes
  • Add new URLs as content is published
  • Remove URLs when content is deleted or redirected
  • Update lastmod when pages are modified
  • Review and adjust inclusion criteria periodically

This is where tools like Sitemap.ai add significant value—automating the ongoing maintenance that custom sitemaps require.

Custom Sitemap Best Practices

Keep Sitemaps Focused

Resist the urge to include everything "just in case." A sitemap with 50,000 carefully chosen URLs outperforms one with 500,000 URLs that includes junk. Search engines have finite crawl resources—help them spend those resources on your best content.

Align with Canonicals

Every URL in your sitemap should be the canonical version. If you're unsure whether a URL should be in your sitemap, check its canonical tag. If the canonical points elsewhere, exclude it.

Use Accurate Lastmod Values

Only update lastmod when content actually changes meaningfully. A typo fix doesn't warrant a new timestamp. A price change or content refresh does. Artificially inflated lastmod values train search engines to ignore your freshness signals.

Segment Strategically

Your sitemap structure should reflect your indexing priorities. Put your most important content in dedicated sitemaps that update frequently. Relegate stable, less critical content to sitemaps that update weekly or monthly.

Monitor Index Coverage

After implementing custom sitemaps, track the impact:

  • Are more of your sitemap URLs getting indexed?
  • Is important content being discovered faster?
  • Are low-value pages being excluded from the index?
  • Has crawl efficiency improved?

Google Search Console's Index Coverage report shows how your sitemap URLs are being processed.

Document Your Decisions

Maintain documentation of your sitemap strategy:

  • Inclusion/exclusion criteria
  • Sitemap structure rationale
  • Update schedules
  • Responsible parties

This prevents knowledge loss when team members change and ensures consistency over time.

Common Custom Sitemap Mistakes

Including Non-Canonical URLs

If page A canonicals to page B, only page B belongs in your sitemap. Including both waste and crawl budget sends mixed signals.

Including Blocked URLs

URLs blocked by robots.txt or marked noindex shouldn't be in your sitemap. Search engines can't reconcile "please index this" (sitemap) with "don't index this" (robots/noindex).

Forgetting to Update

A custom sitemap that's six months stale is worse than a basic auto-generated one. If you can't commit to maintenance, use a tool that handles updates automatically. If you need a sitemap that automatically updates, reach out to us or look for solutions that can integrate directly with your website.

Over-Segmentation

Splitting your sitemap into 50 files when 5 would suffice adds complexity without benefit. Segment enough to be useful, not so much that it's unmanageable.

Ignoring Errors

When Google Search Console reports sitemap errors, fix them promptly. Persistent errors—404s, redirect loops, blocked URLs—degrade your sitemap's credibility.

Measuring Custom Sitemap Success

Track these metrics to evaluate your custom sitemap's performance:

Indexed/Submitted ratio: What percentage of your sitemap URLs are actually indexed? Higher is generally better, indicating you're submitting quality URLs.

Crawl stats: Are Googlebot requests increasing for important content? Are crawl errors decreasing?

Time to index: How quickly does new content appear in search results after publication? Custom sitemaps should accelerate discovery.

Index coverage trends: Is the number of indexed pages growing appropriately as you publish new content?

Organic traffic: Ultimately, are more pages receiving organic traffic? Better indexing should translate to more entry points from search.