An XML sitemap is a file that lists all the URLs of a site. It also includes secondary information, such as when a page has been updated, its update frequency, and relative importance. With all its attributes, the annotation for each URL in the XML sitemap would be:
<?xml version=”1.0″ encoding=”UTF-8″?>
Keep in mind that hreflang links could also be included in the XML sitemap, as we discussed in the multilingual SEO section.
The XML file is typically located at https://airline.com/sitemap.xml or https://airline.com/sitemap_index.xml. However, a sitemap can be placed anywhere on a site, considering that it will affect only descendants of the parent directory.
XML sitemaps are especially important for large websites because they help search engines discover, crawl, understand, and index the webpages. Actually, Google has said that it receives XML sitemaps “in the form of an energy drink”:
As with everything in SEO, there are certain guidelines for handling XML sitemaps.
In this section:
1. Create an XML sitemap for each localized site edition
Create a parent XML sitemap listing the sub-sitemaps for each localized site version. This will improve the discoverability and crawlability of the pages targeting a language/region. Additionally, you will be able to submit every separate sitemap to each Google Search Console property associated with a language/region.
2. Break up large sitemaps into smaller sitemaps
Sitemaps should be no larger than 50MB uncompressed and can contain a maximum of 50,000 URLs. Therefore, for very large airline sites, you will have to split sitemaps that surpass these limits into smaller sitemaps. You should also include them in a parent XML sitemap, often named sitemap_index.xml:
3. Only include pages with SEO value
Google will crawl the URLs exactly as listed in the sitemap. Therefore, the XML sitemap should only include pages that you want search engines to serve in search results. But not all pages on an airline’s website have SEO value. In fact, on a typical airline’s website, there are several types of pages that definitely should not be included in the XML sitemap. By including only relevant pages with SEO value, you are telling search engines to prioritize those pages over the excluded pages.
This advice goes beyond just being a rule of thumb. A messy sitemap can negatively impact the SEO performance of a large website. After all, everything listed on a sitemap will eventually be picked up by Google, and it better be good!
Here are the pages that should be included in the XML sitemap:
- Canonical URLs. Canonicalized URLs with parameters, session IDs, etc. should be excluded from the XML sitemap to reduce duplicate crawling.
- URLs with consistent HTTP protocol. Do not mix pages with https and HTTP protocol in the sitemap. Be consistent!
- URLs with 200 response code. Why would you want search engines to constantly crawl redirected or broken pages? That’s a waste of crawl budget, which on a very large website could become a critical issue. Therefore, avoid including 3XX, 4XX, and 5XX pages in the sitemap.
- Indexed and unblocked pages. If a page has a noindex meta robots tag or has been blocked in the robots.txt file, it’s because that page is not intended to be served in search. Likewise, it should be excluded from the XML sitemap.
The airTRFX sitemaps, by default, exclude all these pages dynamically.
4. Make the sitemap dynamic
Static sitemaps are not recommended for airlines’ websites. Airlines are constantly adding/removing flight pages, or very often update content on existing pages. Thus, managing static XML sitemaps on an airline’s website could become a nightmare. This is why we have never met someone at an airline dedicated to updating static XML sitemaps.
If the website CMS can’t handle dynamic XML sitemaps, make sure to involve the IT or Development teams. Alternatively, there are plenty of dynamic sitemap generators out there, fully supported by most search engines.
5. Use the <lastmod> tag
The <lastmod> tag indicates the URL’s last modified date and time. Google ignores the <changefreq> and the <priority> tags, but the <lastmod> tag, a.k.a “last modified”, is the most important ancillary information. It’s recommended to dynamically change the last modified value when meaningful changes to an URL occur.
6. Avoid URLs with non-ASCII characters
An XML sitemap can contain only URLs with ASCII characters. If the sitemap URL contains non-ASCII characters, there will be an error when trying to add it.
There are also some characters that require entity escaping:
7. Submit the sitemaps to Google Search Console
Whenever an XML sitemap is new or has been significantly updated, submit it to the Google Search Console. This will make Google catch up with the changes way faster.
You can use the Google Search Console Sitemap tool for this purpose:
Keep in mind that you will need Owner permission in Google Search to use this tool.
If the XML sitemap is properly formatted, you should see the “Success” status almost immediately. Otherwise, the Status report would return “Has errors” or “Couldn’t fetch”. For a full list of potential XML errors, check out Google’s guide to the Sitemap report in Google Search Console.
8. Specify the sitemap location in the robots.txt file
It will help search engines discover the location of the XML sitemap. To do this, add the following line to the robots.txt, including the full sitemap URL:
If there are multiple sitemaps, adding the parent XML sitemap would be enough.