Large Site SEO Basics: Faceted Navigation

April 20, 2017
seounity

If you work on an enterprise site — particularly in e-commerce or listings (such as a job board site) — you probably use some sort of faceted navigation structure. Why wouldn’t you? It helps users filter down to their desired set of results fairly painlessly.

While helpful to users, it’s no secret that faceted navigation can be a nightmare for SEO. At Distilled, it’s not uncommon for us to get a client that has tens of millions of URLs that are live and indexable when they shouldn’t be. More often than not, this is due to their faceted nav setup.

There are a number of great posts out there that discuss what faceted navigation is and why it can be a problem for search engines, so I won’t go into much detail on this. A great place to start is this post from 2011.

What I want to focus on instead is narrowing this problem down to a simple question, and then provide the possible solutions to that question. The question we need to answer is, “What options do we have to decide what Google crawls/indexes, and what are their pros/cons?”

Brief overview of faceted navigation

As a quick refresher, we can define faceted navigation as any way to filter and/or sort results on a webpage by specific attributes that aren’t necessarily related. For example, the color, processor type, and screen resolution of a laptop. Here is an example:

Because every possible combination of facets is typically (at least one) unique URL, faceted navigation can create a few problems for SEO:

  1. It creates a lot of duplicate content, which is bad for various reasons.
  2. It eats up valuable crawl budget and can send Google incorrect signals.
  3. It dilutes link equity and passes equity to pages that we don’t even want indexed.

But first… some quick examples

It’s worth taking a few minutes and looking at some examples of faceted navigation that are probably hurting SEO. These are simple examples that illustrate how faceted navigation can (and usually does) become an issue.

Macy’s

First up, we have Macy’s. I’ve done a simple site:search for the domain and added “black dresses” as a keyword to see what would appear. At the time of writing this post, Macy’s has 1,991 products that fit under “black dresses” — so why are over 12,000 pages indexed for this keyword? The answer could have something to do with how their faceted navigation is set up. As SEOs, we can remedy this.

Home Depot

Let’s take Home Depot as another example. Again, doing a simple site:search we find 8,930 pages on left-hand/inswing front exterior doors. Is there a reason to have that many pages in the index targeting similar products? Probably not. The good news is this can be fixed with the proper combinations of tags (which we’ll explore below).

I’ll leave the examples at that. You can go on most large-scale e-commerce websites and find issues with their navigation. The points is, many large websites that use faceted navigation could be doing better for SEO purposes.

Faceted navigation solutions

When deciding a faceted navigation solution, you will have to decide what you want in the index, what can go, and then how to make that happen. Let’s take a look at what the options are.

“Noindex, follow”

Probably the first solution that comes to mind would be using noindex tags. A noindex tag is used for the sole purpose of letting bots know to not include a specific page in the index. So, if we just wanted to remove pages from the index, this solution would make a lot of sense.

The issue here is that while you can reduce the amount of duplicate content that’s in the index, you will still be wasting crawl budget on pages. Also, these pages are receiving link equity, which is a waste (since it doesn’t benefit any indexed page).

Example: If we wanted to include our page for “black dresses” in the index, but we didn’t want to have “black dresses under $100” in the index, adding a noindex tag to the latter would exclude it. However, bots would still be coming to the page (which wastes crawl budget), and the page(s) would still be receiving link equity (which would be a waste).

Canonicalization

Many sites approach this issue by using canonical tags. With a canonical tag, you can let Google know that in a collection of similar pages, you have a preferred version that should get credit. Since canonical tags were designed as a solution to duplicate content, it would seem that this is a reasonable solution. Additionally, link equity will be consolidated to the canonical page (the one you deem most important).

However, Google will still be wasting crawl budget on pages.

Example: /black-dresses?under-100/ would have the canonical URL set to /black-dresses/. In this instance, Google would give the canonical page the authority and link equity. Additionally, Google wouldn’t see the “under $100” page as a duplicate of the canonical version.

Disallow via robots.txt

Disallowing sections of the site (such as certain parameters) could be a great solution. It’s quick, easy, and is customizable. But, it does come with some downsides. Namely, link equity will be trapped and unable to move anywhere on your website (even if it’s coming from an external source). Another downside here is even if you tell Google not to visit a certain page (or section) on your site, Google can still index it.

Example: We could disallow *?under-100* in our robots.txt file. This would tell Google to not visit any page with that parameter. However, if there were any “follow” links pointing to any URL with that parameter in it, Google could still index it.

“Nofollow” internal links to undesirable facets

An option for solving the crawl budget issue is to “nofollow” all internal links to facets that aren’t important for bots to crawl. Unfortunately, “nofollow” tags don’t solve the issue entirely. Duplicate content can still be indexed, and link equity will still get trapped.

Example: If we didn’t want Google to visit any page that had two or more facets indexed, adding a “nofollow” tag to all internal links pointing to those pages would help us get there.

Avoiding the issue altogether

Obviously, if we could avoid this issue altogether, we should just do that. If you are currently in the process of building or rebuilding your navigation or website, I would highly recommend considering building your faceted navigation in a way that limits the URL being changed (this is commonly done with JavaScript). The reason is simple: it provides the ease of browsing and filtering products, while potentially only generating a single URL. However, this can go too far in the opposite direction — you will need to manually ensure that you have indexable landing pages for key facet combinations (e.g. black dresses).

Here’s a table outlining what I wrote above in a more digestible way.







Options:

Solves duplicate content?

Solves crawl budget?

Recycles link equity?

Passes equity from external links?

Allows internal link equity flow?

Other notes

“Noindex, follow”

Yes

No

No

Yes

Yes

Canonicalization

Yes

No

Yes

Yes

Yes

Can only be used on pages that are similar.

Robots.txt

Yes

Yes

No

No

No

Technically, pages that are blocked in robots.txt can still be indexed.

Nofollow internal links to undesirable facets

No

Yes

No

Yes

No

JavaScript setup

Yes

Yes

Yes

Yes

Yes

Requires more work to set up in most cases.

But what’s the ideal setup?

First off, it’s important to understand there is no “one-size-fits-all solution.” In order to get to your ideal setup, you will most likely need to use a combination of the above options. I’m going to highlight an example fix below that should work for most sites, but it’s important to understand that your solution might vary based on how your site is built, how your URLs are structured, etc.

Fortunately, we can break down how we get to an ideal solution by asking ourselves one question. “Do we care more about our crawl budget, or our link equity?” By answering this question, we’re able to get closer to an ideal solution.

Consider this: You have a website that has a faceted navigation that allows the indexation and discovery of every single facet and facet combination. You aren’t concerned about link equity, but clearly Google is spending valuable time crawling millions of pages that don’t need to be crawled. What we care about in this scenario is crawl budget.

In this specific scenario, I would recommend the following solution.

  1. Category, subcategory, and sub-subcategory pages should remain discoverable and indexable. (e.g. /clothing/, /clothing/womens/, /clothing/womens/dresses/)
  2. For each category page, only allow versions with 1 facet selected to be indexed.
    1. On pages that have one or more facets selected, all facet links become “nofollow” links (e.g. /clothing/womens/dresses?color=black/)
    2. On pages that have two or more facets selected, a “noindex” tag is added as well (e.g. /clothing/womens/dresses?color=black?brand=express?/)
  3. Determine which facets could have an SEO benefit (for example, “color” and “brand”) and whitelist them. Essentially, throw them back in the index for SEO purposes.
  4. Ensure your canonical tags and rel=prev/next tags are setup appropriately.

This solution will (in time) start to solve our issues with unnecessary pages being in the index due to the navigation of the site. Also, notice how in this scenario we used a combination of the possible…

Get more information about this news here: Read More