“Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
1. Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
2. Store items shown or linked via multiple distinct URLs
3. Printer-only versions of web pages”
Identical or substantially similar content.
Within your own domain or across others.
Most of it is normal and acceptable.
So far, so good.
Why Is Duplicate Content a Problem?
How would you like to search for the best pecan pie recipe just to find that every single result on the first page turned out the exact same recipe?
Users don’t like the same result and Google doesn’t like crawling the same results.
For a search engine, it’s also a processing consideration. If there is substantial duplication, the crawl/indexation rates might be dampened. In short, the site can lose some ‘trust’.
Two Types of Duplicate Content
We all have our own ideas of duplicate content and most of the times they boil down to “Don’t republish the same article to multiple directories. Instead, spend countless hours spinning that same article to the point where it doesn’t make sense any longer and THEN publish it to a zillion and one directories. That will surely trick all the PhDs working for Google into ranking my site pretty highly.”
Now in the spirit of “being informed”, let’s take a look at the 2 types of duplicate content you see around, shall we?
1. Cross-domain type: this one is the most commonly thought of and includes the same content, which (often unintentionally) appears on several external sites.
2. Within-your-domain type: the one that Google is actually mostly concerned about, i.e. that appears (often unintentionally) in several different places within your site.
Let’s now do a little more exploring into each type and see what Google really thinks about it.
Off-Site Content Syndication
There is absolutely nothing wrong with syndicating your content to different sites per se.
NOTHING WRONG WITH IT!
Here’s what happens when your content gets syndicated: Google will simply go through all the available versions and show the one that they find the most appropriate for a specific search.
Mind you the most appropriate version might not be the one you’d prefer to have ranked. That’s why it’s very important that each piece of syndicated content includes a link back to your original post – I assume it would be on your site. That way Google will trace the original version and will most likely (but not always) display it in its search results.
Per Matt Cutts:
I would be mindful that taking all your articles and submitting them for syndication all over the place can make it more difficult to determine how much the site wrote its own content vs. just used syndicated content. My advice would be 1) to avoid over-syndicating the articles that you write, and 2) if you do syndicate content, make sure that you include a link to the original content. That will help ensure that the original content has more PageRank, which will aid in picking the best documents in our index.
Black Hat Syndication
However, here’s the other side of content syndication coin: the content is deliberately duplicated across the web in an attempt to manipulate search engine rankings or to generate more traffic.
This results in repeated content showing up in SERPs, upsets the searchers, and forces Google to clean out the house.
“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
On-Site Content Syndication
On-site duplicate content problems are much more common and guess what: they are entirely UNDER YOUR CONTROL, which makes it very easy to fix them.
The first step to identifying the potential weak spots on your blog is learning more about your content management system.
For example, a blog post can show up on the home page of your blog, as well as category page, tag page, archives, etc. – THAT’S the true definition of duplicate content.
We, the users, have the common sense to understand that it’s still the same post; we just get to it via different URLs. However, search engines as unique pages with exactly same content = duplicate content.