Google: Normal for Not All Pages to Be Indexed on a Website
Getting all pages indexed in Google can be a challenge for some site owners. But if Google isn’t indexing every page from your sitemap, is that a problem or is that just how Google handles it normally?
The question came up in a recent Google Webmaster Office Hours, from a site owner who was wondering why Google wasn’t indexing all pages from a site submitted via sitemaps.
“Yes, that’s true. In Search Console we give you information on whether or not, on how many URLs within a sitemap are indexed but not which ones specifically. For the most part, that’s not something you need to worry about, it’s completely normal for us not to index all URLs that we find, and that’s not something you need to artificially inflate.
The one thing that I would watch out for, of course, that if something is really important for your website, that that is actually indexed, but you notice that fairly quickly because these are the pages that should be sending you traffic.”
That said, you really should look at the percent of pages that are indexed versus not. For example, if your site has a ton of WordPress tags and you are trying to get those all indexed, when they show the same or almost the same content on each of them (ie. if WordPress created tag pages for both “book” and “books” with the same posts tagged with both) Google would very likely filter at least one of those pages for being identical.
Unless there are technical reasons, most pages that tend to be filtered out are simply filtered out for being duplicate or near duplicate of something else already indexed. In these cases, it is worthwhile looking and seeing if those pages might be better served with canonicals.
Also, look for issues where new pages aren’t being indexed when added to the sitemap, but this is something many site owners will notice outside of the sitemap indexing numbers.
If you do have a very large site and you are trying to figure out why Google isn’t indexing large parts of it, you can split your sitemap up to try and isolate the issue. For example, you might split your sitemap between product types or page types, and it will often be easier to identify which parts Google is having problems indexing, so that it can be fixed.