How Duplicate Content Occurs and how to avoid Getting Penalized for It

April 22, 2011 by Dustin

Search engines take into consideration a large number of factors when assigning a rank to a website. Although there are a number of factors that are obvious or specifically identified as ranking factors, the majority of the ranking criteria are kept private. Search engines avoid publicizing the exact pieces of the Search Engine Optimization puzzle that they use to determine a rank because it would most likely lead to a massive overflow of irrelevant search results.

Search Engines recognize that if all the sites on the Internet knew the characteristics that they use, then every site would be able to rank higher. Providing this information would make it too easy for website and go against their main goal which is to provide users with the most relevant and useful search results. Clearly expressing the measurements that are used to rank a website would prevent a relevant site from standing out amongst the other results and could potentially place a higher rank on a page that is worst or is less relevant to a user.

Search Engines use both on-the page as well as off-page elements to assign a rank. On-page factors usually focus on internal site structure including keyword use and internal linking where as an off-page aspect considers who you link to, how you link to them, and the popularity or relevance of your links. Some of the factors that are known or thought to be important to ranking well in search engines include:

Keyword in the pages Title Tag.
Appropriate and descriptive anchor text of Inbound Links.
Global Link Popularity of Site.
Age of domain.
Link Popularity or equity within the internal link Structure.
Relevance of an inbound link to the site and the text surrounding that link.
Keyword use throughout the body copy.
The popularity or authority of the website providing an inbound link.

With that being said, one of the major factors that search engines do directly distinguish as being harmful to a Websites rank is Duplicate Content. Duplicate Content is best described as a portion or a complete copy of text that is found in multiple locations either within your own site or shared between your site and another site entirely.

So, what is the harm with Duplicate Content? Duplicate Content can prevent your pages or entire site from being indexed. When search engines visits a website for indexing they usually have a set limit on pages to index. When Duplicate Content exists it can use up the majority of pages that the robots are authorized to index and prevent other important pages from being logged. Search engines can choose not to index your site at all, ban your site, reduce its ranking, or impose penalties. Duplicate Content also causes the dilution of link juice throughout your site pages because it splits the link equity between the Duplicate Content. In regards to content scraping a search engine can inadvertently choose to index the website that has scraped your content instead of yours. This in turn can cause a loss of traffic to your site and a loss in revenue, plus someone else is taking credit for and benefiting from your content without giving you the credit which is bad enough all on its own.

Duplicate Content is created through a number of different ways but can be broken down into two primary categories. On one side it can be created from structuring errors or on page elements such as session ID’s, URL parameters, comment pagination, printer friendly pages, and www Vs. non-www content pages. On the other side it can occur when you or another user purposefully scrapes the content of another website. Scraping website content can range from article syndication where you might publish another author’s article without providing a link to the original publication, using portions or the entire web content found from another site, or duplicating a site identically.

Although the majority of Duplicate Content is frowned upon by search engines there are a few exceptions. Commonly quotations are used across several sites and most likely will not be considered Duplicate Content by search engines. Additionally if you do use another author’s article, by providing a link to its original location you create an acceptable form of duplication. Duplicate Content is also acceptable when providing product descriptions on ecommerce websites and when mirroring a website in another language.

A major problem with Duplicate Content is that most the time it occurs without anyone knowing it and continues to cause problems that you are unaware of. It’s extremely important to check for Duplicate Content and manage it as quickly as possible. In most cases you can quickly identify it and deal with it swiftly by simply deleting it. There are a lot of great tools that can be used to identify any repetitive content such as CopyScape or Google’s Webmaster Tools. CopyScape lets you search out any Duplicate Content that is on your site as well as other sites. This program also allows the user to set up automatic searches and notifications. The automatic feature performs routine searches and reports when Duplicate Content or scraping has occurred via email.

Once you have found the Duplicate Content there are several options to resolve the issues. The following are some suggestion for both preventing and dealing with Duplicate Content.

To prevent duplicate content from occurring, plan and layout each page of your site before you begin to create it.
Syndicate carefully by making sure other users link back to your site when they use your articles and you link back to their site if you use an article from it.
Use robots.txt to prevent search engine bots from indexing pages that with necessary duplicate content.
Use 301 Redirects to redirect users as well as Googlebots away from duplicate content.
Avoid having both www and non-www version by telling the search engine which domain you prefer. Be consistent with your URL’s and use the Canonical URL which means using the best URL to point to the content.
Use Top Level Domains (TLD’s)
Don’t use the same Title and Description Tag for each page.
When two or more pages contain the same text consider linking them to a common page with more details instead of having the same text body of each page.
Be familiar with your website and frequently check for duplicate content.

For more tips and solutions for Duplicate Content visit Google Webmaster Central Blog and Google’s Webmaster Tools!

Thanks for reading!

Dustin

Internet Beacon

See more SEO tips for small businesses.

Tags: content, duplicate, duplicate content, inbound link, link, rank, search, search engine, website

This entry was posted on Friday, April 22nd, 2011 at 2:58 pm and is filed under Internet Marketing, Search Engine Optimization SEO, Web Tips. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

SEO Blog & Internet Marketing Blog

How Duplicate Content Occurs and how to avoid Getting Penalized for It

Search

Pages

Categories

Tags