Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

Web 2.0 Programming 1 © Tongji University, Computer Science and Technology. Web Web Programming Technology 2012.
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
ClassMate A System for Automated Event Extraction from Course Websites Ashutosh Kulkarni & Harry Robertson.
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
IIS Technologies.
Alex Meng Chunshi Jin Elliott Conant Jonathan Fung.
BTW (“By The Way…”) Information Annotation By Rudd Stevens, Jason Endo University of San Francisco.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Best Practices for Website Design & Web Content Management.
Web Page Behavior IS 373—Web Standards Todd Will.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
Content Management Systems Digital Resources for Research in the Humanities 2001.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
Creating your website Using Plain HTML. What is HTML? ► Web pages are authored in HyperText Markup Language (HTML) ► Plain text is marked up with tags,
Louisa Lambregts, What Makes a Web Site Successful and Effective? Bottom Line... Site are successful if they meet goals/expectations.
Databases & Data Warehouses Chapter 3 Database Processing.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Dynamic Web Pages (Flash, JavaScript)
Server-side Scripting Powering the webs favourite services.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Standalone Java Application vs. Java Web Application
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Please note: this presentation has not received Director’s approval and is subject to revision.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
1 NODC Geoportal Server Yuanjie Li & Jefferson Ogata.
© Ex Libris Ltd. All Rights Reserved. From Library Systems to Information SystemsMetaLib Jenny Walker ICOLC 2001.
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
WEB SERVER SOFTWARE FEATURE SETS
Search Engines: A History  First search engine was Veronica for the Gopher network  1991 Gopher  After Gopher disappeared, the first one for modern.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
INTERNET APPLICATIONS CPIT405 Forms, Internal links, meta tags, search engine friendly websites.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Iphone Online Training AcuteSoft: India: , Land Line: +91 (0) USA: , UK : +44.
Finally getting to html and CSS… Tim Berners-Lee, the writer of the software program that makes him the inventor of the WWW, defines the Internet as a.
Information Retrieval in Practice
Information Retrieval in Practice
Information Architecture
AEM Digital Asset Management - DAM Author : Nagavardhan
Adding a web site to your online presence...
Search Engine Optimization (SEO)
Search Engine Optimization
Objective % Select and utilize tools to design and develop websites.
ITM352 PHP and Dynamic Web Pages: Server Side Processing 1.
Using E-Business Suite Attachments
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Strategies for improving Web site performance
CIW Lesson 6 Web Search Engines.
Building Search Systems for Digital Library Collections
Objective % Select and utilize tools to design and develop websites.
UNIT 15 Webpage Creator.
Extraction, aggregation and classification at Web Scale
Dynamic Web Pages (Flash, JavaScript)
Cataloging the Internet
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Text Categorization Rong Jin.
JavaScript Form Validation
SEO Hand Book.
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Client-Server Model: Requesting a Web Page
Presentation transcript:

Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu

Goal Create a specialized search engine for events in the Seattle area. Enable search by Location Date and Time Price Category Enable full-text search

The Internet Nutch Crawler Event Extraction Event Indexing Geocoding Event Database Web Service Event Classification Web Application

Reality Site-specific extractors used rather than machine learning approach Classifier is tuned for the events from specifically chosen websites rather than the full web. (overfitted, but in a good way) Email notifications not implemented. Less dynamic web interface.

Demo http://amlia.cs.washington.ed u:1987/eventfinder/search/

What We Found Surprising CRF++ limited in extracting attributes CRF extracts multiple values for one attribute from an event description If only extract from a paragraph of description, in most cases, some attribute are not contented in. Use Site-specific structures Extracted attributes values from corresponding Html tags No more ambiguity Can extract all the attributes from the each event page Javascript's negative impact Some sites not fully loading after the first Http Request URLs generated by Javascript functions could not be retrieved

Classifier Experiments Variables / Results Number of Categories: 8 Training Data Source: Crawler Data Scaled vs. Non-scaled Training Data: Scaled Single Words vs. Word Pairs as attributes: Single Words Only Scaled LIBSVM attribute values? Yes Lower Bound: 0 Upper Bound: 1 Weights URL Words: 50 Title Words: 15 Location Words: 10 Tests (32 Recorded Tests) Ablation: Turning On/Off features Tuning: Adjusting Variable Values

Usability Experiment Survey Results Participants went to the website and completed 3 tasks Task completion and overall feedback Only three submissions due to server load Results All found site navigable Participants used list view, map view and individual event pages Some results not relevant Task 1: Find any event that is happening in Seattle Task 2: Find a music event that is happening more than a week away from now in Seattle Task 3: Find the celebration event of author Sandra Cisneros this past May (2009) 500/10,000 .5 task completion rate

What We Learned Information Retrieval / Extraction/Indexing: Gained more experience with regex, Java servlet technology, and working with open-source projects. Classifier: Many methods of classification and related variables. Deciding on a classification method and setting variables can be as much of an art form as a science. Front End: Frameworks (Django) are very powerful but it takes a long time to learn. Watch your resources!

Breakdown of Work Justin - Classifier and Database Jessica and Jenn - Front end Sophia - Extraction, Nutch, Lucene, Database Chuan - Extraction

Questions?