Download presentation
Presentation is loading. Please wait.
Published byGunjan Anthony Modified over 5 years ago
1
CSCE 590 Web Scraping – Information Extraction HW
Topics Yahoo signin Information Retrieval March 16, 2017
2
Yahoo – login two step Create a yahoo email account with:
Your USC login as the account name Your login as the password, so I can test your ability to login Write a utility function “dump_html (page, tag)” that will use the Beautifulsoup function prettify the “page” and write to the file “output_”+tag Use Selenium to login to your Yahoo account and dump the page Use scrapy to scrape table (see next slide) from export to csv Use scrapy to extract the same info on FB, XOM, STX, NFLX, AMZN (start_requests builds URLs from company symbol and yields them)
3
Facebook information from http://finance.yahoo.com/quote/FB?p=FB
4
IR project1 Scrapy - Where is Coach K?
Subject: Duke coach Mike Krzyzewski By hand use google to find three starting URLs Open pages (parse) verify “Krzyzewski” on page Find date/year Find location Save in csv table with URL IR project2 Automate to the first step, i.e., have start_requests call google, using semantic comparison to page = … to rank the top three
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.