Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIS Professor Sandvig MIS 424 Professor Sandvig

Similar presentations


Presentation on theme: "MIS Professor Sandvig MIS 424 Professor Sandvig"— Presentation transcript:

1 MIS 324 -- Professor Sandvig MIS 424 Professor Sandvig
12/31/2018 Screen Scraping MIS 424 Professor Sandvig

2 MIS 324 -- Professor Sandvig
12/31/2018 Today What is Screen Scraping Also called web scraping When to use it How Legal Issues

3 What is Screen Scraping
MIS Professor Sandvig 12/31/2018 What is Screen Scraping Programmatically “scraping” information from a web page Two steps: Retrieve Page Scrape desired information Regular Expressions

4 MIS 324 -- Professor Sandvig
12/31/2018 When to Use Data not available via more direct methods: APIs Designed to expose data Structured web services RSS database

5 MIS 324 -- Professor Sandvig
12/31/2018 When to Use Examples Search engines Google, Bing, Yahoo, … News sites Google news, Yahoo news, … PadMapper, MapCraigs Scrape Craigslist Interface with Legacy Systems No support for web services, RSS, etc.

6 MIS 324 -- Professor Sandvig
12/31/2018 How Handout: ScreenScrape Example: scrape CBE Faculty/Staff Directory

7 MIS 324 -- Professor Sandvig
12/31/2018 Legal Issues Potential to violate copyright laws Many lawsuits: LinkedIn sues 100 individuals for scraping user data (Oct. 2016) Europe battles Google News over 'snippet tax' proposal Belgian Newspapers Claim Retaliation By Google After Copyright Victory

8 MIS 324 -- Professor Sandvig
12/31/2018 Legal Issues MapCraigs.com Scraped Craigslist real estate Displayed on Google maps Blocked IP PadMapper vs. Craigslist lawsuit Paid Craigslist $1,000,000 History: Is Web Scraping Legal? Use cautiously

9 Summary Screen Scraping Useful tool for collecting data from web pages
When API not available Many legal uses: Search engines Legacy systems Can violate copyrights


Download ppt "MIS Professor Sandvig MIS 424 Professor Sandvig"

Similar presentations


Ads by Google