AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, Cowell 416 Midterm Presentation.

Slides:



Advertisements
Similar presentations
IS 6116 Introduction – 10 Jan Lecturer Details Aonghus Sugrue Website: aonghussugrue.wordpress.com
Advertisements

The DataFlex Web Framework Changing the Game Stephen W. Meeley Development Team Data Access Worldwide
The Developer Perspective Michelle Osmond. Design – Requirements Gathering Sales & Research projects –Prototypes/Demos User group meetings Usability workshops.
Muhammad Taimoor Khan
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Multiple Tiers in Action
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
1 The World Wide Web Architectural Overview Static Web Documents Dynamic Web Documents HTTP – The HyperText Transfer Protocol Performance Enhancements.
Usability Test by Knowing User’s Every Move - Bharat chaitanya.
Charlie Crocker Vice President Farallon Geographics, Inc. An Overview of Internet Mapping Technology.
Web 2.0 with AJAX Students : LASC Ioana KELEMEN Csilla POP Dan Adrian CIOBANU Dumitru Daniel Project leader : Ahmed RHIAT.
ITM352 Javascript and Dynamic Web Pages: Client Side Processing.
Quick Tour of the Web Technologies: The BIG picture LECTURE A bird’s eye view of the different web technologies that we shall explore and study.
Secure Search Engine Ivan Zhou Xinyi Dong. Project Overview  The Secure Search Engine project is a search engine that utilizes special modules to test.
CSCI 6962: Server-side Design and Programming Course Introduction and Overview.
JavaScript & jQuery the missing manual Chapter 11
Philly.NET Hands-on Labs JAVASCRIPT SERIES. July 9: JavaScript Syntax Visual Studio ◦Projects ◦Editors ◦Debugging ◦Script blocks ◦Minification and bundling.
WaveMaker Visual AJAX Studio 4.0 Training Troubleshooting.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
ASP.NET + Ajax Jesper Tørresø ITNET2 F08. Ajax Ajax (Asynchronous JavaScript and XML) A group of interrelated web development techniques used for creating.
AJAX Without the “J” George Lawniczak. What is Ajax?
VIVO Multi-site search Structure and function overview.
Bob German Principal Architect A New on SharePoint Development Building Light-Weight Web Parts with AngularJS
Nutch in a Nutshell (part I) Presented by Liew Guo Min Zhao Jin.
Kenny Trytek Joe Briggie Abby Birkett Derek Woods Advisor: Simanta Mitra Client: Matt Good, Kingland Systems.
Presentation: SOAP in a distributed object framework, Application Servers & AXIS SOAP.
Writing various AJAX forms in Drupal 7 1. Overview of Form API 2. Ctools 2.1 Ctools features 3. Ajax 3.1 Ajax Forms in Drupal 4. Putting it all together.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
OWL Jan How Websites Work. “The Internet” vs. “The Web”?
Cross Site Integration “mashups” cross site scripting.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
StockWatch Developers: Nimrod Hagay Hagai Barkan Supervisors: Assaf Solomovitch Viktor Kulikov June 2009.
By Matt Baker Eric Sprauve Stephen Cauterucio. The Problem Advisors create a sign-up sheet to be posted on the door of their office. These sign-up sheets.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
ISMT E-200: Trends in Enterprise Information Systems Project: GLOCO – Integrated Corporate Portal Part 2 Technical Specification Team Members: Joyce Torres.
WebSphere Portal Technical Conference U.S Creating Rich Internet (AJAX) Applications with WebSphere Portlet Factory.
AfterCollege Self-Service Scrape Configuration & Posting Utility Kai Hu Haiyan Wu May 14, Harney 235.
ISMT E-200: Trends in Enterprise Information Systems Project: GLOCO – Integrated Corporate Portal Part 2 Technical Specification Team Members: Joyce Torres.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
WEP Presentation for non-IT Steps and roles in software development 2. Skills developed in 1 st year 3. What can do a student in 1 st internship.
GLOCO – Integrated Corporate Portal Part 2 - Technical Specification Presented by Team 3 1 Team 3 Members: Joyce Torres Kenneth Kittredge Pamela Fisher.
AxKit A member of the Apache XML project Ryan Maslyn Kyle Bechtel.
REAL WORLD AJAX MARCH TIBCO USER CONFERENCE / 2004 Enterprise Rich Internet Applications with AJAX Real World Case Studies with TIBCO General Interface™
Building a Vertical Search Site (using lots of Apache software, of course)
Ajax for Dynamic Web Development Gregory McChesney.
Plug-in Architectures Presented by Truc Nguyen. What’s a plug-in? “a type of program that tightly integrates with a larger application to add a special.
Google Web Toolkit Dynamic web on Java (Script) Jordan Jordanov 6 March 2008.
The basics of knowing the difference CLIENT VS. SERVER.
Overview Web Technologies Computing Science Thompson Rivers University.
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
IN THIS LESSON WE WILL REVIEW THE STRUCTURE OF THE INTERNET AND HOW BROWSERS ASSEMBLE WEBSITES BASED ON INSTRUCTIONS THEY RECEIVE FROM SERVERS. Internet.
Modern Development Technologies in SharePoint SHAREPOINT SATURDAY OMAHA APRIL, 2016.
X2R Spec 1. Change log DateVersionPeopleNote 2013/11/01V0.0.1Chien-Wei Yu, Anderson Ou First draft, add X2R files spec. 2013/12/16V0.0.2Anderson Ou, Doc.
Ajax & Client-side Dynamic Web Gunwoo Park (Undergraduate)
INNOV-16: Rich User Interface for the Web???? AJAX to the Rescue Ken Wilner VP of Technology Progress Software.
Web Technologies Computing Science Thompson Rivers University
Google Web Toolkit Tutorial
Application with Cross-Platform GUI
Web Systems & Technologies
04 | Web Applications Gerry O’Brien | Technical Content Development Manager Paul Pardi | Senior Content Publishing Manager.
IS 360 Course Introduction
Web Browser server client 3-Tier Architecture Apache web server PHP
Web scraping tools, an introduction
Secure Web Programming
WPS - your story so far Seems incredible complicated, already
BOF #1 – Fundamentals of the Web
DR. JOHN ABRAHAM PROFESSOR UTPA
Web Technologies Computing Science Thompson Rivers University
#01# ASP.NET Core Overview Design by: TEDU Trainer: Bach Ngoc Toan
Presentation transcript:

AFTERCOLLEGE SELF- SERVICE SCRAPE CONFIGURATION AND POSTING UTILITY Kai Hu Haiyan Wu March 17, Cowell 416 Midterm Presentation

PRESENTATION OUTLINE Background and Motivation Goals Design Challenges Timeline and Milestones Current Progress 1 12/4/2015 AfterCollege Scrape Utility

AFTERCOLLEGE BACKGROUND Customized career network for colleges and professional organizations across the country Goal: create a better way for job seeking students and alumni to connect with the right employer 2 12/4/2015 AfterCollege Scrape Utility

3 3 12/4/2015 AfterCollege Scrape Utility

WHAT’S ALREADY THERE? 12/4/2015 AfterCollege staff manually creates configuration files A simple crawler running periodically Output of Crawler is posted on AfterCollege’s website 4 AfterCollege Scrape Utility

LIMITATIONS Scalability Unable to handle POST requests Unable to handle dynamic websites Expensive to maintain Requires technical knowledge 5 12/4/2015 AfterCollege Scrape Utility

DESIGN OVERVIEW 12/4/ A new GUI Tool assists staffs through configuration process Web Proxy captures user activities Crawler uses pattern matching based on new configuration file AfterCollege Scrape Utility

GOALS: GUI TOOL Guides users through configuration process Deal with dynamic websites 7 12/4/2015 AfterCollege Scrape Utility

GOALS: WEB PROXY Capture user activities Generate configuration files 8 12/4/2015 AfterCollege Scrape Utility

GOALS: CRAWLER Scrape job posts Check result integrity 9 12/4/2015 Crawl Job List page Get Configuration file Pattern-Match Application Generate Job List Result AfterCollege Scrape Utility

DESIGN ISSUES FireFox Plugin vs. Web Proxy Integration with back-end Ability to add functionalities Dojo vs. YUI - Fade-In/Out, Drag & Drop - Deals with different browsers - XML vs. JSON Simplicity & efficiency on parsing Availability of wrapper methods in YUI 10 12/4/2015 AfterCollege Scrape Utility

DESIGN OVERVIEW Browser 12/4/ Rendered HTML page Injected YUI Javascript Web Proxy Apache HTTP Client Tomcat Web/App Server HTML Parser Job List Sites Crawler Loader/ Scheduler Parser HTTP Client Config.xml JobFeed.xml Feed Generator AfterCollege Scrape Utility

CHALLENGES DOM objects analysis at runtime for those websites using AJAX to dynamically generate DOM objects at client side Deal with tricky Javascript Embedded HTML pages 12 12/4/2015 AfterCollege Scrape Utility

MILESTONES GUI Tool (March 20) Work flow support Capture job information Web Proxy (March 20) Render html pages Capture HTTP communications Web Crawler (April 13) Pattern Matching ability given configuration file Integrity check Integration Test (April 20) Testing (April 27) 13 12/4/2015 AfterCollege Scrape Utility

CURRENT FOCUS Web Proxy Ability to deal with Javascript Session/Cookie support GUI Tool Embedded web pages Allow user modifications 14 12/4/2015 AfterCollege Scrape Utility

CURRENT PROGRESS Demo 15 12/4/2015 AfterCollege Scrape Utility

RESOURCES Course Instructor Dr. Jeff Buckwalter Sponsor Steve Girolami, Perry Lee, & Saan Saeteurn Source code control System Dasidae SVN from Perry Wiki Site Knowledge share, work log, resource portal Google group Discussion and information exchange medium 16 12/4/2015 AfterCollege Scrape Utility

` Questions? 17 12/4/2015 AfterCollege Scrape Utility

Thank You 18 12/4/2015 AfterCollege Scrape Utility