On my personal finance blog, Gather Little by Little, I often find my copyrighted content being scraped by other sites. If you aren’t familiar with the concept of scraping, it’s basically when someone takes your blog’s content and posts it on their blog in order to use it to make money.
Typically, the scraper uses a WordPress plugin that reads your blog’s RSS feed and automatically places your content on their blog. Shockingly, there is even software that will scrap your entire blog (theme and all) and copy it to the scrapers site so they have an exact replica of your site and content!
Combating scrapers
There are multiple ways of dealing with scrapers. Some are more effective than others and some are dependent on where the hosting company is located. Here are the one’s that I typically use and in the order that I use them:
1 – Send them an email
Some (although not many) will stop scraping your site if you just take the time to ask them to stop. This is always the first thing I do and then I give them a few days to reply. Some scrapper sites don’t offer and email or contact page, so this isn’t an option. If you can’t contact them or they don’t respond, on to step 2.
2- Report them to their host
Hosting companies in the US are required to respond to reports of copyright violations. Technically, US law says that a formal DMCA notification is required, but many hosting companies will respond via a simple email that contains the URL for the violating site along with examples of the copyright violation, including links to the original content. Not sure who’s hosting the site? Use Who is Hosting This.
3 – Report them to Google Adsense
Most scrappers are using your content in order to run Google Adsense for revenue. One effective means of really hitting the scraper in the bottom line is by reporting them to Adsense, as using copyrighted content is against Google’s terms of service.
To report a blog to to Google:
- click on the “Ads by Google” text located on Adsense ads.
- Next, scroll down to the bottom of the window that opens and select “Send Google your thoughts on the site or the ads you just saw“.
- Scroll down a little further now and select “Also Report a Violation?” followed by “The ads”.
- This will allow you to check “The site is hosting/distributing my copyrighted content“.
I don’t recommend putting your email address as this will cause Google to just send you an email requesting a formal DMCA nofication, which is a pain. If you leave out your email they will typically suspend the blog owner’s Adsense account right away.
Image replacement
The final weapon I use after I’ve done all of the above is image replacement. Every article I write on Gather Little by Little contains a 500×150 graphic just below the article title. This has become a “signature style” for me. The advantage of this style is that I can really hit the scraper, here’s how.
First, create a replacement image. The one I use on Gather Little by Little looks as follows:
Upload this graphic to a directory on your blog’s webserver. I put mine in a directory off the root named “wp-images” and named it stolen.jpg.
Next, you’ll need to ftp to your blog’s server and edit the .htaccess file located in the root directory. Mine happens to be \httpdocs (I’m on Media Temple). Using a text editor add the following to your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site1\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site2\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site3\.com/ [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /wp-images/stolen.jpg [L]
The script above will take any image requests from site1.com, site2.com, or site 3.com and reroute the image request to your stolen.jpg (or whatever you named it) file. To add additional sites, just add additional “RewriteCond” lines and replace the siteX.com part with the additional site URL.
If you only want to block one site for now, use the following code:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site1\.com/ [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /wp-images/stolen.jpg [L]
Wouldn’t you love to see the look on those scrapers faces when they see their blog full of “Stolen content” images???
I’d like to thank @capitalfellow on Twtter for making me aware of this technique and to AltLab for providing the article with the .htaccess code.
Comments on this entry are closed.