Presentation is loading. Please wait.

Presentation is loading. Please wait.

AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, Cowell 416 Midterm Presentation.

Similar presentations


Presentation on theme: "AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, Cowell 416 Midterm Presentation."— Presentation transcript:

1 AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, 2009 @ Cowell 416 Midterm Presentation

2 PRESENTATION OUTLINE Background and Motivation Goals Design Challenges Timeline and Milestones Current Progress 1 12/4/2015 AfterCollege Scrape Utility

3 AFTERCOLLEGE BACKGROUND Customized career network for colleges and professional organizations across the country Goal: create a better way for job seeking students and alumni to connect with the right employer 2 12/4/2015 AfterCollege Scrape Utility

4 3 3 12/4/2015 AfterCollege Scrape Utility

5 WHAT’S ALREADY THERE? 12/4/2015 AfterCollege staff manually creates configuration files A simple crawler running periodically Output of Crawler is posted on AfterCollege’s website 4 AfterCollege Scrape Utility

6 LIMITATIONS Scalability Unable to handle POST requests Unable to handle dynamic websites Expensive to maintain Requires technical knowledge 5 12/4/2015 AfterCollege Scrape Utility

7 DESIGN OVERVIEW 12/4/2015 6 A new GUI Tool assists staffs through configuration process Web Proxy captures user activities Crawler uses pattern matching based on new configuration file AfterCollege Scrape Utility

8 GOALS: GUI TOOL Guides users through configuration process Deal with dynamic websites 7 12/4/2015 AfterCollege Scrape Utility

9 GOALS: WEB PROXY Capture user activities Generate configuration files 8 12/4/2015 AfterCollege Scrape Utility

10 GOALS: CRAWLER Scrape job posts Check result integrity 9 12/4/2015 Crawl Job List page Get Configuration file Pattern-Match Application Generate Job List Result AfterCollege Scrape Utility

11 DESIGN ISSUES FireFox Plugin vs. Web Proxy Integration with back-end Ability to add functionalities Dojo vs. YUI - Fade-In/Out, Drag & Drop - Deals with different browsers - XML vs. JSON Simplicity & efficiency on parsing Availability of wrapper methods in YUI 10 12/4/2015 AfterCollege Scrape Utility

12 DESIGN OVERVIEW Browser 12/4/2015 11 Rendered HTML page Injected YUI Javascript Web Proxy Apache HTTP Client Tomcat Web/App Server HTML Parser Job List Sites Crawler Loader/ Scheduler Parser HTTP Client Config.xml JobFeed.xml Feed Generator AfterCollege Scrape Utility

13 CHALLENGES DOM objects analysis at runtime for those websites using AJAX to dynamically generate DOM objects at client side Deal with tricky Javascript Embedded HTML pages 12 12/4/2015 AfterCollege Scrape Utility

14 MILESTONES GUI Tool (March 20) Work flow support Capture job information Web Proxy (March 20) Render html pages Capture HTTP communications Web Crawler (April 13) Pattern Matching ability given configuration file Integrity check Integration Test (April 20) Testing (April 27) 13 12/4/2015 AfterCollege Scrape Utility

15 CURRENT FOCUS Web Proxy Ability to deal with Javascript Session/Cookie support GUI Tool Embedded web pages Allow user modifications 14 12/4/2015 AfterCollege Scrape Utility

16 CURRENT PROGRESS Demo 15 12/4/2015 AfterCollege Scrape Utility

17 RESOURCES Course Instructor Dr. Jeff Buckwalter Sponsor Steve Girolami, Perry Lee, & Saan Saeteurn Source code control System Dasidae SVN from Perry Wiki Site Knowledge share, work log, resource portal Google group Discussion and information exchange medium 16 12/4/2015 AfterCollege Scrape Utility

18 ` Questions? 17 12/4/2015 AfterCollege Scrape Utility

19 Thank You 18 12/4/2015 AfterCollege Scrape Utility


Download ppt "AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, Cowell 416 Midterm Presentation."

Similar presentations


Ads by Google