Monday 6 May 2013

Scraping

Scraping, or "web scraping," is the process of extracting large amounts of information from a website. This may involve downloading several web pages or the entire site. The downloaded content may include just the text from the pages, the full HTML, or both the HTML and images from each page.

There are many different methods of scraping a website. The most basic is manually downloading web pages. This can be done by either copying and pasting the content from each page into a text editor or using your browser's File → Save As… command to save local copies of individual pages. Scraping can also be done automatically using web scraping software. This is the most common way to download a large number of pages from a website. In some cases, bots can be used to scrape a website a regular intervals.

Web scraping may be done for several different purposes. For instance, you may want to archive a section of a website for offline access. By downloading several pages to your computer, you can read them at a later time without being connected to the Internet. Web developers sometimes scrape their own websites when testing for broken links and images within each page. Scraping can also done for unlawful purposes, such as copying a website and republishing it under a different name. This type of scraping is viewed as a copyright violation and can lead to legal prosecution.

NOTE: While scraping a website for the purpose of republishing information is always wrong, scraping a site for other purposes may still violate the website's terms of use. Therefore, you should always read a website's terms of use before downloading content from the site.

Source: http://www.techterms.com/definition/scraping

No comments:

Post a Comment