[BNM] [bnm] SEO, Tumblr, Posterous and whatnot
Kath Burke
kath at kathburke.com
Mon Nov 24 10:38:11 GMT 2008
Hi Nick
I think the duplicate content issue is because Google finds the version that
came firsts and assumes that was the original... and if there's a verbatim
copy added to another site ,Google thinks you may have plagiarised it.
Here's an excerpt from E-consultancy's report on seo:
2.5.1 Duplicate content
Google has a strong dislike of repetition and replication.
Site owners have been known to copy content from other sites. This is often
done using 'screenscrapers' which then merge several sources in order to
increase keyword density and include copy with natural grammar on the
offending site.
However, the biggest duplicate content issue comes when your own site
accidentally or deliberately duplicates or triplicates its own content. This
can occur when DNS / domain mapping goes awry, or just through poor content
management / replication of pages.
Search engines test for the uniqueness of document text. If a page is
similar to another, your page can then effectively be subject to a
"duplicate content penalty", meaning that the search engines could remove
pages from the search engines.
Alternatively, your pages may be placed in the supplemental results (see
next box) although this is more likely be due to the page having a low
PageRank assigned to it. Although this effect is often known as a "dupe
content penalty", in effect it is better thought of as a duplicate content
filter applied to the index.
At the time of writing, Googles Matt Cutts has not blogged extensively about
this, although it is covered in some video postings , e.g.. This transcript
gives a good summary:
". We do a lot of duplicate content detection. It's not like there's one
stage where we say, OK, right here is where we detect the duplicates.
Rather, it's all the way from the crawl, through the indexing, through the
scoring, until finally just milliseconds before you answer things.
And there are different types of duplicate content. There's certainly exact
duplicate detection. So if one page looks exactly the same as another page,
that can be quite helpful, but at the same time it's not the case that pages
are always exactly the same. And so we also detect near duplicates, and we
use a lot of sophisticated logic to do that".
This note is significant since it shows that Google uses duplicate content
detection extensively.
Best practice: 1. Be aware of the origin of different duplicate content
filters described above.
2. Take steps to minimise similarities between documents in this case. The
main document attributes that can be used to indicate to the search engines
that content is not similar are the document, i.e. Find techniques for
making these document attributes unique:
. <title>
. <meta name="description">
. <meta name="keywords">
. Introduction copy - vary this for syndicated content
. Body copy in different sections
. Headings
. Number of backlinks into a document
. Links out from a document
. Document name (could be similar in different directories)
It is generally thought that the document meta data is important, but the
Google patent referenced below shows that extensive fingerprinting is
completed on different sections of the document.
3. For pages within your site which you know to be duplicate, for which you
want to exclude one or more versions, use robots.txt to prevent the search
robots accessing them.
4. For malicious copying of content it is worth checking copyright
infringement on the homepage regularly. For example, Copyscape could be used
for a selection of pages.
Kath Burke
| Editor | Copywriter | Website revitaliser |
........................
T 0845 638 1477 (or 01273 728 897)
M 07708 342 446
F 0870 284 6257
E kath at kathburke.com
W www.kathburke.com
A: 11a Jew street, North Laine, Brighton BN1 1UT
..........................
Kath Burke Ltd, registered company: 636 4401
More information about the BNMlist
mailing list. Powered by Wessex Networks