Screen Scraping MIS 424 MIS 424 Professor Sandvig Professor Sandvig
Today What is Screen Scraping What is Screen Scraping When to use it When to use it How How Legal Issues Legal Issues
What is Screen Scraping Programmatically “scraping” information from a web page Programmatically “scraping” information from a web page Two steps: Two steps: 1. Retrieve Page 2. Scrape desired information Regular Expressions Regular Expressions
When to Use Data not available via more direct methods Data not available via more direct methods web services web services database database RSS RSS AJAX AJAX
When to Use Examples Examples Search engines Search engines Comparison shopping sites Comparison shopping sites PriceGrabber, BizRate, NexTag, FareChase, … PriceGrabber, BizRate, NexTag, FareChase, … News sites News sites Google news, Yahoo news, … Google news, Yahoo news, … PadMapper, MapCraigs PadMapper, MapCraigs PadMapper Scrape Craigslist Scrape Craigslist Interface with Legacy Systems Interface with Legacy Systems No support for web services, RSS, etc. No support for web services, RSS, etc.
How Handout: Handout: ScreenScrape.aspx (source) ScreenScrape.aspx (source) ScreenScrape.aspxsource ScreenScrape.aspxsource NOAA weather forecast: NOAA weather forecast:
Legal Issues Provides ability to copy data from web pages Provides ability to copy data from web pages Post to web forms Post to web forms Potential to violate copyright laws Potential to violate copyright laws History of lawsuits History of lawsuits Meta shopping sites Meta shopping sites Google Google Google Use cautiously Use cautiously