The canonical URL is the preferred URL in a set of variations. It’s the URL that you want Google to serve in search.
The same URL can have multiple variations based on several factors, such as the use of http/https, www/non-www, parameters, etc. Below are examples of possible variations of a homepage:
B and C are variations of the preferred URL, which let’s say it’s A in this case. Therefore, B and C should have a canonical tag pointing to A.
A canonical tag is an HTML tag that would look like this if installed on C, based on the above example: <link rel=”canonical” href=”https://www.airline.com” />
By properly canonicalizing variations of the same URL, you are consolidating duplicate content. According to Google,” if you don’t explicitly tell Google which URL is canonical, Google will make the choice for you, or might consider them both of equal weight, which might lead to unwanted behavior”.
That “unwanted behavior” could be any or all of the following:
- Google showing the unpreferred URLs in search results.
- Dilution of link signals across the duplicate URLs, which would lead to the deterioration of the SEO value of the preferred URL.
- Google “wasting” crawl budget on duplicate pages instead of spending it on new or updated pages.
A proper site canonicalization structure is the absolute first step to prevent these “unwanted behaviors”. However, in our experience with more than 50 airlines, the sad truth is that it may not be enough. Even with all canonicals well implemented on an airline’s website, Google may still choose its own canonical, which could be particularly damaging to the airline’s SEO performance.
For example, in Google Search Console, there is an “unwanted behavior” reflected in the Coverage report called “Duplicate, Google chose a different canonical than user”. We recently helped an airline fix massive deindexation issues for flight pages caused by Google choosing its own canonical. For illustration purposes, let’s call the airline Avengers Airlines.
Here is what Google Search Console shows when you inspect an URL included in this report:
Basically, Google is saying that it chooses to serve https://avengersairlines.com/en/flights-from-abu-dhabi-to-hyderabad in search rather than the preferred URL https://avengersairlines.com/en-ae/flights-from-abu-dhabi-to-hyderabad.
Although not too helpful, Google describes the issue in this way: “This page is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Google has indexed the page that we consider canonical rather than this one.”
But why would Google think another URL makes a better canonical? It comes down to what Google calls “site preference signals” and they include:
- Canonical annotations (the user-declared canonical)
- Internal linking
- URL in the sitemap file
- HTTPS preference
- “Nicer” looking URLs
Because most airlines have country-market site editions (localized versions of the website), let’s add hreflang to the mix. Google’s advice on this? Be consistent. Stick to one canonical URL and align the site preference signals around it.
Going back to the example of Avengers Airlines, we noticed that many site preference signals were already aligned around the preferred URL:
- Self-referring canonicals in place.
- Properly implemented hreflang tags.
- Preferred URLs in the corresponding XML sitemap.
- “Nice” looking preferred URLs.
Then, why was Google picking its own canonical for the preferred URLs? In a nutshell: inconsistent redirects and interlinking!
It turned out that Avengers Airlines’ website migrated from subdomain to subfolder and ended up redirecting thousands of pages from multiple country-market site editions (en-ae, en-us, etc.) to the generic English site edition (en). Here are examples of pages that they wrongly redirected to the generic English version:
Interestingly enough, the affected subdomain (https://subdomain.avengersairlines.com/en-ae/flights-from-abu-dhabi-to-hyderabad) was properly redirected to the corresponding subfolder page (https://avengersairlines.com/en-ae/flights-from-abu-dhabi-to-hyderabad).
However, by redirecting thousands of localized URLs on the subdomain to the generic English version on the root domain subfolder, Avengers Airlines sent the wrong site preference signals to Google. Google “decided” that all pages on the new generic English site edition on the subfolder were the preferred URLs. Thus, Google ultimately displayed the unpreferred URLs in search, instead of the localized URLs.
We also found thousands of internal links from the country-market site editions to generic English URLs even though the relevant localized URL existed.
After working with Avengers Airlines to fix the inconsistent redirects and internal linking network, the site immediately experienced a drop in the number of deindexed pages. Here is a snapshot of the immediate drop for one country-market site edition: