on site optimization

Technical Tuesday: Find Duplicate Content

Posted on by Matt Antonino in All posts, SEO Tips, Wordpress Tips Comments Off

Technical Tuesday is a new series dedicated to fixing the coding or technical issues your site may be having.  Technical SEO accounts for a large portion of on-site SEO so make sure you check back every Tuesday for more.  Use the Technical Tuesday category on the right sidebar to find them all.

Use the “site:yoursite.com” operator to find on-site duplicate content

Type this into Google:  site:highonseo.com

Before this post, I have 96 results on this domain.  (Since the change from PhotoSEO.info)  

According to my WordPress Dashboard, I have

57 Posts
14 Pages
7 Categories
341 Tags

The issue is simple: 96 pages show up on Google while 57+14 (pages + posts) = 71.  Let’s find the problem links.  

What do I know?  I know that I eliminated duplicate content early in my blog’s history so I’m going to assume some of the problems are with early content.  One of my early posts on this blog is “Google, why do you vex us so?”  Since “vex” is an unlikely word, if I search the domain with this word, I’m likely to stumble on the pattern.

finding duplicate content

Bingo!  We nailed it first shot.  It’s an archive link (/2011/03)  I am betting a second search will nail the rest of my issues.

Ready to check it?  Use the same search but check for 2011 content.  site:highonseo.com 2011  Click the image for a full size version.  We did it!  Archives are being indexed.

finding duplicate content

Let’s stop that from happening.

Using All in one SEO plugin, it’s very simple.  

noindex

Use noindex for archives was, in fact, unchecked.  By checking that box, we’ve told Google not to index it.  We can fix their current indexing in our Webmasters tools but for now, let’s call that a step in the right direction.

You can utilize the site:yourdomain.com tag to find duplicate content fairly easily.  Most duplicate content comes from a couple of places: categories, tags, archives, especially for blogs.  Check site:yoursite.com tags to find out if you have 100s or 1000s of runaway links.  

Find off-site duplicate content

What is off site duplicate content?  The main one for our discussion today is “stolen content.”   

We use a site called Copyscape to protect our work.   The results of two copyscape searches can be seen below.  The first is a general search for my site.  I don’t have a lot of text so it always comes up 0 results.  The second shows what happens if you have content stolen.   We have a section of our vendor reviews called “in their own words” that summarizes a business.  We take this directly from their site as part of their review.  You can see 4 results – pages that have the same text in the same order in a high enough percentage to be considered “copies.”  

copyspace in use

vs.

copyspace in use

We’ve found many, many sites infringing on our articles, copyright, etc. using Copyscape and contacted webmasters to have those duplicates removed or changed.

Duplicate info may not hurt your SEO on your own domain – that’s up for debate – but duplicate content off your site does you no good.  Clean it up!  We’ll explore other ways to find duplicate content later but you have some homework now.  


Google’s newest algorithm change – reducing black hat SEO

Posted on by Matt Antonino in All posts, SEO Tips 1 Comment

What Google algorithm update?

Google yesterday announced the complete release of a long-awaited over-optimization SEO penalty and algorithm update.

The main goal of this update is “to help searchers find sites that provide a great user experience and fulfill their information needs.”  These updates are relevant to our last post: Spams & Scams in SEO.  Whenever Google updates, if you read their information, it’s always to create better sites with better content for end users.  If you remember this mantra, you’ll understand SEO better:

Webmasters are not the consumers, they’re the product Google sells.

If you act as if Google will make changes FOR YOU, you’re wrong and will always be wrong.  When you realize that Google makes changes to ensure the best possible end-user (site visitor) experience, you’ve got it.


How does this affect my SEO through HighonSEO?

Fortunately, High on SEO shares the belief with Google that the best content should win.  Yes, you have to tell Google what’s on the page.  That will never change.    Our link building techniques and on-page SEO fixes are done correctly and for good reasons, however.  We believe, as Google does, that the best sites should show up first.

Today I analyzed data from over 20 clients, past and present.  I have determined that none of our sites were massively hit by the algorithm change.  We saw only typical slight movements in SERPs, as we always would after a month or two of not checking results.   The fact that not one site we’ve been working on for over 2 months has taken a dive shows us that we’re doing it “right” at High on SEO.

This is one client’s results, all updated this afternoon, post-algorithm change.  She lost one page 1 and gained five page 1 results.  I’ll take that for my clients any day.

seo updates after algorithm change for one client


What will Google do next to their search rankings?

We don’t know.  Not really, anyways.   The best way to keep up with Google algo changes and ensure your site is ranking well after using an SEO service is monthly SEO updates.

What we do know about the future is that Google’s customers (search users) want the best, most relevant searches at the top of the SERPs.  Create pages that are worth visiting.  Stop webspamming because it’s going to be virtually worthless anyways.  SEO is here to stay – the question is how we’re going to do it “right” now and in the future.  White hat SEO works.  Use it properly and remember Google’s own mantra “don’t be evil.”