A scraper site is a website that displays no original or usable information. This site is usually automated and its content automatically updated by bots crawling all over the web. All the content showed in a scraper site is taken without permission from other open-content websites and their webmasters.

Unlike search engines, a scraper site does not direct a visitor to the original site where the content came from. Scraper sites do not respect copyright and repost content without including the original authors’ name and information. In fact, the main purpose of a scraper site is to spam search engines for profit in affiliate programs like AdSense. Because of this, sites that are content rich and open to the public are the most vulnerable to this type of activity.

MFA

Some scraper sites are called MFA’s or Made for Adsense. This designation comes from the fact that these websites were created for the purpose of generating clicks on advertisements found within it. Of course, scraper sites create the problem of diluting search results and providing unsatisfactory search results to users by generating redundant results. Because of this, search engines are implementing ways to remove scraper sites from the search results.

Link Farms

Scraper sites are also sometimes referred to as link farms because of the similarity of the goals and tactics. A link farm is basically a collection of web pages connected to each other by hyperlinks. This spams the search engines and increases the ranking of such web pages. Link farms are, however, different from other websites that selectively swap links. With the new algorithms for determining page ranks, link farming has also evolved into other forms of spam indexing or spamming the search engines.

Legal Concerns

The activities of this kind of sites have raised legal concerns within the Internet community because scraper sites hijack content without the knowledge of the original author and without concern for copyright laws. Copyright law states that users who publish other people’s work from blogs, forums, websites, etc should not only ask for the permission of the original author but include pertinent information on his repost as well like the license information, author name, etc.

Modus Operandi

Scraper sites usually target other websites or RSS feeds based on their ranking on the search results given specific keywords. They then copy bits of text containing the keyword from the original site that ranked high in the search results. Because these scraper sites copy only bits and pieces of content, the advertisements on the website are usually the only comprehensible content in the webpage, thus inducing the visitors to click on these as these are the only words in the webpage that looks like they can deliver the content the user needs.

Because these scraper sites are so easy to set up with the use of bots specifically designed for scraping web pages and compiling thousands of pages based on popular keywords, owners of these sites have profited much from this type of operations.