Technical Tuesday is a new series dedicated to fixing the coding or technical issues your site may be having. Technical SEO accounts for a large portion of on-site SEO so make sure you check back every Tuesday for more. Use the Technical Tuesday category on the right sidebar to find them all.
Use the “site:yoursite.com” operator to find on-site duplicate content
Type this into Google: site:highonseo.com
Before this post, I have 96 results on this domain. (Since the change from PhotoSEO.info)
According to my WordPress Dashboard, I have
The issue is simple: 96 pages show up on Google while 57+14 (pages + posts) = 71. Let’s find the problem links.
What do I know? I know that I eliminated duplicate content early in my blog’s history so I’m going to assume some of the problems are with early content. One of my early posts on this blog is “Google, why do you vex us so?” Since “vex” is an unlikely word, if I search the domain with this word, I’m likely to stumble on the pattern.
Bingo! We nailed it first shot. It’s an archive link (/2011/03) I am betting a second search will nail the rest of my issues.
Ready to check it? Use the same search but check for 2011 content. site:highonseo.com 2011 Click the image for a full size version. We did it! Archives are being indexed.
Let’s stop that from happening.
Using All in one SEO plugin, it’s very simple.
Use noindex for archives was, in fact, unchecked. By checking that box, we’ve told Google not to index it. We can fix their current indexing in our Webmasters tools but for now, let’s call that a step in the right direction.
You can utilize the site:yourdomain.com tag to find duplicate content fairly easily. Most duplicate content comes from a couple of places: categories, tags, archives, especially for blogs. Check site:yoursite.com tags to find out if you have 100s or 1000s of runaway links.
Find off-site duplicate content
What is off site duplicate content? The main one for our discussion today is “stolen content.”
We use a site called Copyscape to protect our work. The results of two copyscape searches can be seen below. The first is a general search for my site. I don’t have a lot of text so it always comes up 0 results. The second shows what happens if you have content stolen. We have a section of our vendor reviews called “in their own words” that summarizes a business. We take this directly from their site as part of their review. You can see 4 results – pages that have the same text in the same order in a high enough percentage to be considered “copies.”
We’ve found many, many sites infringing on our articles, copyright, etc. using Copyscape and contacted webmasters to have those duplicates removed or changed.
Duplicate info may not hurt your SEO on your own domain – that’s up for debate – but duplicate content off your site does you no good. Clean it up! We’ll explore other ways to find duplicate content later but you have some homework now.