It appears that there’s yet another web site content scraper being let loose on the web. If like me you’ve been around a few years you’ll know that scrapers come and go. In my 16 years of web development and use I’ve seen a fair few come and go.
The latest one on the marketplace is Summly. This application can be downloaded for free to use on apple products. But if you read the web site you know they have plans for delivering your content in the application on desktop computers and other outlets.
Summly app aims to take your hard work and original content.
Yep Summly is planning on visiting your web site, scraping your copyrighted content and works and delivering it through their application presumably to make money off your hard work and investment.
Tackling standard scrapers is pretty easy by firing off Digital Millennium Copyright Act (DMCA) requests to the culprits and relevant search engines. While Summly runs on a different platform than other scrapers it is still doing the same thing as a scraper – collecting your content for their own use.
While the business is trying to pass itself off as organising and making search easier and quicker than normal search engines it is not. It is a content scraping platform and nothing more.
Summly fails in the fair use defense.
There is a big difference in the way that Summly plans to work compared to search engines. When a search engine such as Google puts together a snippet about your site to display under the search result it does not breach fair use of copyright material. The snippet displayed is either taken from your content metadata or it pulls small sections of sentences from your web page surrounding the keyword searched for. The maximum snippet size is about 160 characters in length.
Summly though does much more than this. It takes all your content, re-organises it and then delivers it to the web searcher. This is much more extensive than a Google search snippet. It clearly breaks the fair use definition as it is purely producing content based on your work for it’s own means.
Original content publishers will lose out because of Summly.
Web content publishers who spend time and money in producing original content will lose web site visitors through the actions of Summly and they will lose financially and commercially.
If you are a web site owner and concerned about the impacts of Summly on publishers and loss of visitors then please get in touch. I’m currently monitoring my traffic logs and stats of all my web sites to find out how Summly is impacting on them.
Summly should pay licensing fee to content creators instead of just taking it.
The more web site owners that do the same the more evidence we can build up to take legal action against Summly for financial loss and require them to pay a licensing fee to access and use original web site content in much the same way that radio stations and tv stations pay a royalty fee.
I have no problems with Summly wanting to deliver a précis copy of my web content but I do expect them to pay for my time and work. If you want to ensure that your original content is not taken from you without payment then please send contact me through the form below and we can start to set up a group to get fair compensation for our work.
Remember this is just yet another in a long line of content scrapers. Few have survived in the long run because recycling other peoples work rarely provides the quality information that people are looking for.
Update: 29th December
Protect your original content.
You’ve done all the hard work and done all the research for your piece of work. Whether you write articles, reviews or news stories. It’s cost you time and effort. You really do need to stop Summly from coming along and cherry picking all your valuable information. It really will cost you money.
Could you afford to lose 50%, 60%, 70% or even 80% of your web visitors as they prefer to read the cherrypicked content from your work rather than coming and reading the original version?
What would a 60% or 70% loss of traffic to your site do to your earnings. Will you still be able to pay your staff? Will a crash in advertising earnings mean you will no longer be able to afford your hosting fees? Is that round the world trip of a life time being paid for by blogging about your experiences come to a grinding halt as people stop clicking through to your site and keep upto date with your exploits through Summly instead?
How to beat Summly.
As with all content scrapers it will be fairly easy to beat Summly. Once we know the footprint that a Summly scraper leaves behind then it’s simply a question of writing up some htaccess rewrite rules so that the scrapers gets the information we want it to have rather than it cherry picking the information it wants.
We can either send the scraper off to another web page to give it totally different information. We could run the Summly scraper through a filter so all the important information is removed. We can also deliver a personal message to people who click through from a Summly link to remind them they are supporting copyright theft and stopping writers from being paid the right amount of money for their work.
Lot’s of things can be done once we know the server footprint of Summly in order to protect and defend our own works. Remember that as a web master you control what happens on your server.
Help by sending us your server logs.
We can only start to advice webmasters on how to beat Summly once we have a full understanding of how their scraper interacts with the server. We currently monitoring all our server logs for tell-tale signs of the scraper. If you think you may have had a visit from Summly then please let use know.
We’ll greatly appreciate core server logs of the visit so we can see what it does and if there’s any consistent patterns that we can use for recognition of the robot. Once we’ve discovered a pattern then we can start writing up and publishing some server scripting for you to use to control what Summly sees and delivers to its users.
Join us to help content writers protect their works.
I’ll shortly be launching a newsletter and discussion group about how to tackle and campaign against scrapers of all sorts. You can be sure of one thing if Summly is successful in its business model or using other peoples content then there will be a great deluge of other scraper sites come on the market using your content for their profit.
You can help by spreading the word and linking back to this page which will be the sign up page for our group once it’s launched. You can use the images below to provide graphic links here by right clicking on the size you want and then copy and pasting the code shown.



[contact-form-7 404 "Not Found"]





Sour Grapes?
The logic of this article is that no one should write a review of an author or publisher’s original content without paying them. I am not aware that this has ever happened. Indeed reviews or summaries draw people to the original content. The thesis of Nick D’Aloisio is that people are too busy to get at this content. So summaries will actually help them to reach and read the original content rather than keep them away.
I think your article comes across as a little too strident. There are an increasing number of people with information overload. Anything that helps address the problem should be positively engaged with in my view.
Chris Graves
Hi
I don’t think you actually understand what Summly is doing.
It is not doing a review of a web site – reviews generally mean someone looking at it and giving a critique and views. This is nothing but an automated script scraping a web site content to re-organise and distribute it under its own banner.
Neither is it a summary of a web site as such. giving a summary of a third party content generally means that it is part of a greater work. giving a summary of third party content does not mean using it as a standalone piece of work without putting in your own input.
Summly is nothing but a scraper of other peoples original content and it will hurt financially those web sites that produce the original content and rely on advertising.
It contributes no new content to the internet and will reduce the amount of high quality original content available on the internet.
This is not sour grapes, I have seen dozens of these script-kiddie works in my time. What has not happen before though is big money chasing this sort of plagiarism of work. for that reason alone publishers need to move fast and put the company before the courts to justify it’s business model.
Kevin