On my personal finance blog, Gather Little by Little, I often find my copyrighted content being scraped by other sites. If you aren’t familiar with the concept of scraping, it’s basically when someone takes your blog’s content and posts it on their blog in order to use it to make money.
Typically, the scraper uses a Wordpress plugin that reads your blog’s RSS feed and automatically places your content on their blog. Shockingly, there is even software that will scrap your entire blog (theme and all) and copy it to the scrapers site so they have an exact replica of your site and content!
Combating scrapers
There are multiple ways of dealing with scrapers. Some are more effective than others and some are dependent on where the hosting company is located. Here are the one’s that I typically use and in the order that I use them:
1 – Send them an email
Some (although not many) will stop scraping your site if you just take the time to ask them to stop. This is always the first thing I do and then I give them a few days to reply. Some scrapper sites don’t offer and email or contact page, so this isn’t an option. If you can’t contact them or they don’t respond, on to step 2.
2- Report them to their host
Hosting companies in the US are required to respond to reports of copyright violations. Technically, US law says that a formal DMCA notification is required, but many hosting companies will respond via a simple email that contains the URL for the violating site along with examples of the copyright violation, including links to the original content. Not sure who’s hosting the site? Use Who is Hosting This.
3 – Report them to Google Adsense
Most scrappers are using your content in order to run Google Adsense for revenue. One effective means of really hitting the scraper in the bottom line is by reporting them to Adsense, as using copyrighted content is against Google’s terms of service.
To report a blog to to Google:
- click on the “Ads by Google” text located on Adsense ads.
- Next, scroll down to the bottom of the window that opens and select “Send Google your thoughts on the site or the ads you just saw“.
- Scroll down a little further now and select “Also Report a Violation?” followed by “The ads”.
- This will allow you to check “The site is hosting/distributing my copyrighted content“.
I don’t recommend putting your email address as this will cause Google to just send you an email requesting a formal DMCA nofication, which is a pain. If you leave out your email they will typically suspend the blog owner’s Adsense account right away.
Image replacement
The final weapon I use after I’ve done all of the above is image replacement. Every article I write on Gather Little by Little contains a 500×150 graphic just below the article title. This has become a “signature style” for me. The advantage of this style is that I can really hit the scraper, here’s how.
First, create a replacement image. The one I use on Gather Little by Little looks as follows:
Upload this graphic to a directory on your blog’s webserver. I put mine in a directory off the root named “wp-images” and named it stolen.jpg.
Next, you’ll need to ftp to your blog’s server and edit the .htaccess file located in the root directory. Mine happens to be \httpdocs (I’m on Media Temple). Using a text editor add the following to your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site1\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site2\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site3\.com/ [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /wp-images/stolen.jpg [L]
The script above will take any image requests from site1.com, site2.com, or site 3.com and reroute the image request to your stolen.jpg (or whatever you named it) file. To add additional sites, just add additional “RewriteCond” lines and replace the siteX.com part with the additional site URL.
If you only want to block one site for now, use the following code:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?site1\.com/ [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /wp-images/stolen.jpg [L]
Wouldn’t you love to see the look on those scrapers faces when they see their blog full of “Stolen content” images???
I’d like to thank @capitalfellow on Twtter for making me aware of this technique and to AltLab for providing the article with the .htaccess code.







{ 4 comments… read them below or add one }
If I have a blogger hosted blog, is there still a way to add some sort of image or note to my posts still?
Nice work. If you’re feeling particularly nasty, you can also use this to force them to display images that are against Adsense TOS (like porn, for example) and then report them. Google will likely respond to something like this more quickly than a copyright violation.
Thanks for the information. I had idea this practice was even happening. I will have to be more aware of my content from this point forward.
I may be a new blogger, but I’ve been online a long time and I thought the term was called scraping so hopefully I understand you correctly
Another tip or 2 is:
1) If you run your own web site, I would make the stolen.jpg file (I’ve been using the same name on my ecommerce site for about 10 years) BIG HUGE MASSIVE … over 1 megabyte or more if you can. I find the majority of scrapers are not North American and more than likely they are still paying for internet usage and/or have slower connections than we’re accustomed to … so a really big file will force many people to bail out on the site before it gets a chance to load … this, too, will hurt them in the proverbial pocketbook;
b) This shouldn’t have much effect on your own server’s bandwidth if the image is cached on the other end. But certainly it is worth monitoring if you find your content stolen 100s or 1000s of times.
2) If you think your site has no value outside of North America; meaning that someone in Malaysia will find it of little value, I also recommend using Google’s “no translate” meta tag — doing a Google search will get you enough results;
b) Also, many thieves in well known countries use the “translate this page” to find all their electronic goodies to steal … so if they can’t translate your site, they’ll move on to the next.
Good luck, and Happy Holidays!