cipro reactions

Attacking Approximate Duplication

approximate duplication

These are all quite similar!

So many websites are learning the hard way about approximate duplication.  E-commerce sites and publishers of all sizes have had to add a phrase to their vocabulary: Panda Violation.

Approximate duplication occurs when a website publishes pages, posts or articles that are similar in nature.  Usually, this content is thin, meaning not having a lot of depth, sourcing or other engaging factors.  It can happen quickly and many times companies learn they have this problem after receiving a Panda violation.  So let's examine how to overcome the issue of approximate duplication.

But first a little background.  Since February of 2011, Google has begun applying overt violations to websites that have excessive "approximate duplication".  No matter your disposition on Google's intent, this action is truly an attempt to increase the quality of the search engine results pages.  I also like how it rewards websites that have proper information architecture, content strategy and strong publishing practices.

A colleague of mine, Brian Cosgrove has said:  "Writing unique content is the cost of entry for SEO."

He is right.  I think most companies have accepted that but there are still pitfalls that come about.   So many companies start writing content without a plan.  Many don't even know their audience.  These are the makings of an SEO tragedy of Greek proportions. Let me walk you through some definitions and an example of how I dealt with this issue with success.

How to Identify Approximate Duplication

It's important to know that duplication can happen on your website and across other websites.  Some websites don't safely syndicate their content.  Others might produce unique content, but place it on several websites.

This can even happen with design.  If several of your owned websites have the same design code (templates), they could be viewed as having duplication.

Use this simple operator in Google to see all the pages associated with a keyword: keyword phrase

The results you see are a descending order of pages that Google views as relevant for that keyword phrase.  From this list you can create categories of your content.

How to Consolidate Approximate Duplication

Now that you have identified the pages, you can determine better methods for publishing this information.  Sometimes its better to update the same page with new information rather than create a new post or article.

If you know your audience, you can consolidate content based on tasks, problems or goals by persona.  If you don't know your audience, you will have to consolidate by topic.

With the culprits all nicely grouped you can apply the remedies.   I have identified 4 methods for consolidating content and having proper canonicalization:

  • 301 redirect
  • Canonical tag
  • Pagination (Next / Prev attributes)
  • Robots meta tag (noindex)

Use the 301 redirect for pages that need to be grouped to a single, main page.  If you need to keep several versions of these pages, posts or articles the canonical tag is an efficient and clear method for mapping these duplicates to the main page or post.

Pagination is essential for multi-page content.  Use the Next/Prev attributes to indicate that these certain pages are grouped together.

In some cases, you need to keep the page, but you don't want it to compete with other content on your site.  Pages like this are:

  • HTML sitemaps
  • Category pages - Pages that list articles or other content with the headlines and snippets

In a corporate environment you will need to include Product, Design and possibly Marketing folks in the decision on which pages will be the canonical.

How to Avoid Approximate Duplication

Hopefully you don't have to learn the hard way.  The best way to make sure your website does not get a violation is to understand your audience and focus on making great pages.  It's the old saying: Quality over Quantity.  If you cover a narrow subject it is important that you identify the many unique problems, tasks or goals for several personae in your audience.  This creates a unique angle for you to discuss the topic.

Many publishers also forget to utilize different content types.  Not everything has to be an article, in fact, it shouldn't.  Consider finding some reputable data sources and making tables, charts or graphs to depict your topic.

Of course, make sure you stop any process that seems to be producing the duplication.

Another big area of concern can be your information architecture.  If you don't have a coherent hierarchy, you could have several pages competing against each other.

Those are the major concepts around this topic.  Certainly a concern for those who may not have taken content publishing seriously.  To learn more about this topic, contact me directly.  We can help your organization defend against Panda updates.

Contact me if you would like more information about this process. Add me to your circles, +Rudy De La Garza on Google+ to stay in the conversation or follow me on Twitter!

Speak Your Mind