Articles
Ins and Outs of Web Scraping
Mar 23, 2009 | Information Security
Introduction to Web Scraping
What is Web Scraping?
It is a programming technique for collecting data from any web page. It works like a hidden browser where all the input and output of this browser is controlled by a program. As a result programs have the html return by a webpage, and then that program can collect required data from the html return by a webpage. Usually, web scraping is used for collecting data from a website which does not provide RSS or open API. Web scraping made it possible to let a software use any data available on the web in html.
Web scrapping technique also works with any password protected web page. For that all that it requires is to have the required password to get access to password protected web page. Web scraping can do almost all that a human can do on a website through a browser like Internet Explorer (IE), Mozilla etc.
Why it is Important?
Web scraping is essential when someone need to go through a huge number of websites and collect required data from those websites. Web scraping can be used to automatically spider through thousands of pages and collect required data in a fraction of the time it would take someone to grab the data manually.
Is it Legal?
Web scraping technology is actually fairly questionable. In a way, they can be seemed as stealing the information owned by a web site. The whole issue is complicated because it is unclear where copy/paste ends and scraping begins. Moreover, web scraping cannot access any web content that is not allowed to access. It is okay for people to copy and save the information from web pages, but it might not be legal to have software do this automatically. But scraping of the page and then offering a service that leverages the information hiding and not crediting the original source, is unlikely to be legal.
But it does not seem that scraping is going to stop because the main purpose of this technology is converting manual time consuming hard works into automated quick way. Moreover, its been more then five year from when web scrapping is commonly present on the web.



MySpace
Facebook
Twitter
Digg
Email
Previous
Loading ...