What are the Main Parts of an XML Sitemap?

An XML sitemap is just that - a map of your website. It serves as a roadmap for search engines, telling them which pages on your website exist and providing useful metadata about each URL. While search engines can discover pages through crawling, a sitemap ensures nothing important gets missed—particularly on large sites, new sites, or pages with limited internal linking.

The Basic Structure of an XML Sitemap

Every XML sitemap follows a standardized format defined by the sitemaps.org protocol. Here's a simple example:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page/</loc>
    <lastmod>2025-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Let's examine each component.

The XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

This line appears at the very top of every XML sitemap. It declares the document as XML version 1.0 and specifies UTF-8 character encoding, which supports international characters and special symbols. This declaration is required for the file to be valid XML.

The Urlset Element

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

The <urlset> tag is the container element that wraps all URLs in your sitemap. The xmlns attribute declares the XML namespace, pointing to the official sitemap protocol specification. This namespace declaration tells parsers (including search engine crawlers) that the document follows the sitemap standard.

If you're using sitemap extensions for images, videos, or news, you'll add additional namespace declarations here:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">

The URL Element

<url>
  ...
</url>

Each <url> tag represents a single page on your website. A sitemap can contain up to 50,000 URL entries, and the uncompressed file size cannot exceed 50MB. If your site has more URLs or the file would be larger, you'll need to split it into multiple sitemaps and reference them through a sitemap index file.

Required Tag: loc

<loc>https://example.com/products/widget/</loc>

The <loc> tag is the only required element within each URL entry. It contains the full, absolute URL of the page, including the protocol (https://). A few important considerations:

URLs must be properly encoded. Spaces become %20, ampersands become &amp;, and other special characters need their XML entity equivalents. Most sitemap generators handle this automatically.

Be consistent with your URL format. If your site uses trailing slashes, include them. If it doesn't, leave them off. The URL in your sitemap should match your canonical URL exactly.

Only include URLs that return a 200 status code. Don't add redirects, 404 pages, or URLs blocked by robots.txt.

Optional Tag: lastmod

<lastmod>2025-01-15T14:30:00+00:00</lastmod>

The <lastmod> tag indicates when the page was last modified. Search engines use this as a hint for crawl prioritization—pages with recent modifications may get crawled sooner.

The date format follows the W3C Datetime standard. You can use varying levels of precision:

  • Year only: 2025
  • Year and month: 2025-01
  • Complete date: 2025-01-15
  • Date with time: 2025-01-15T14:30:00+00:00

The more precise format with timestamps is generally preferred because it provides search engines with more useful information.

One critical point: only update the lastmod value when the page content actually changes in a meaningful way. Some sites automatically update this timestamp whenever any minor change occurs, or worse, set it to the current date for all pages. This practice erodes trust in your lastmod data, and search engines may begin ignoring it entirely.

Optional Tag: changefreq

<changefreq>weekly</changefreq>

The <changefreq> tag provides a hint about how often the page typically changes. Valid values include:

  • always — for pages that change every time they're accessed
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never — for archived pages that won't change

In practice, search engines largely ignore this tag. Google has stated publicly that they don't use changefreq as a ranking or crawling signal. They prefer to determine crawl frequency based on their own observations of how often your content actually changes. You can include it for completeness, but don't expect it to influence crawling behavior.

Optional Tag: priority

<priority>0.8</priority>

The <priority> tag suggests the relative importance of a page compared to other pages on your site. Values range from 0.0 (lowest importance) to 1.0 (highest importance), with 0.5 as the default.

This value only affects how search engines prioritize URLs within your own site—it doesn't influence how your pages rank against competitors. Setting all pages to 1.0 is meaningless because you're not actually differentiating anything.

Like changefreq, priority receives little attention from major search engines. Google has confirmed they don't use this signal. If you choose to include it, use it logically: your homepage and main category pages might warrant higher values, while pagination pages or less important content could be lower.

Sitemap Index Files

When your site exceeds 50,000 URLs or 50MB, you need a sitemap index file to organize multiple sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2025-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2025-01-10</lastmod>
  </sitemap>
</sitemapindex>

The <sitemapindex> element replaces <urlset> as the root container. Each <sitemap> entry contains a <loc> pointing to an individual sitemap file and an optional <lastmod> indicating when that sitemap was last updated.

Sitemap index files can reference up to 50,000 individual sitemaps, giving you capacity for billions of URLs if needed.

Image Sitemap Extension

For sites with important images, the image extension provides additional context:

<url>
  <loc>https://example.com/products/widget/</loc>
  <image:image>
    <image:loc>https://example.com/images/widget-main.jpg</image:loc>
    <image:title>Blue Widget - Front View</image:title>
    <image:caption>Our premium blue widget shown from the front angle</image:caption>
  </image:image>
</url>

The <image:loc> tag is required and contains the image URL. Optional tags include <image:title> for the image title and <image:caption> for descriptive text. You can include up to 1,000 images per page entry.

Video Sitemap Extension

Video sitemaps help search engines understand and index video content:

<url>
  <loc>https://example.com/videos/tutorial/</loc>
  <video:video>
    <video:thumbnail_loc>https://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
    <video:title>How to Install Your Widget</video:title>
    <video:description>Step-by-step installation guide for our widget product.</video:description>
    <video:content_loc>https://example.com/videos/tutorial.mp4</video:content_loc>
    <video:duration>324</video:duration>
  </video:video>
</url>

Only include canonical, indexable URLs. If a page has a canonical tag pointing elsewhere, include the canonical URL, not the duplicate. Exclude pages with noindex tags, pages blocked by robots.txt, and redirect URLs.

For large e-commerce sites, consider organizing sitemaps by content type—one for products, one for categories, one for blog posts. This makes it easier to identify crawling issues and understand which sections of your site are being indexed.

How to Validate Your Sitemap

Before submitting your sitemap, validate it to catch any formatting errors. Google Search Console will flag issues when you submit, but you can also use standalone XML validators or the W3C Markup Validation Service to check syntax before submission.

Common errors include missing namespace declarations, improperly encoded URLs, invalid date formats, and exceeding size or URL count limits. Most CMS platforms and SEO plugins generate valid sitemaps automatically, but custom implementations should always be tested.