Download presentation
Presentation is loading. Please wait.
Published byElaine Scott Modified over 6 years ago
1
Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson
Sophia Gansky Chuan Yu
2
Goal Create a specialized search engine for events in the Seattle area. Enable search by Location Date and Time Price Category Enable full-text search
3
The Internet Nutch Crawler Event Extraction Event Indexing Geocoding Event Database Web Service Event Classification Web Application
4
Reality Site-specific extractors used rather than machine learning approach Classifier is tuned for the events from specifically chosen websites rather than the full web. (overfitted, but in a good way) notifications not implemented. Less dynamic web interface.
5
Demo u:1987/eventfinder/search/
6
What We Found Surprising
CRF++ limited in extracting attributes CRF extracts multiple values for one attribute from an event description If only extract from a paragraph of description, in most cases, some attribute are not contented in. Use Site-specific structures Extracted attributes values from corresponding Html tags No more ambiguity Can extract all the attributes from the each event page Javascript's negative impact Some sites not fully loading after the first Http Request URLs generated by Javascript functions could not be retrieved
7
Classifier Experiments
Variables / Results Number of Categories: 8 Training Data Source: Crawler Data Scaled vs. Non-scaled Training Data: Scaled Single Words vs. Word Pairs as attributes: Single Words Only Scaled LIBSVM attribute values? Yes Lower Bound: 0 Upper Bound: 1 Weights URL Words: 50 Title Words: 15 Location Words: 10 Tests (32 Recorded Tests) Ablation: Turning On/Off features Tuning: Adjusting Variable Values
8
Usability Experiment Survey Results
Participants went to the website and completed 3 tasks Task completion and overall feedback Only three submissions due to server load Results All found site navigable Participants used list view, map view and individual event pages Some results not relevant Task 1: Find any event that is happening in Seattle Task 2: Find a music event that is happening more than a week away from now in Seattle Task 3: Find the celebration event of author Sandra Cisneros this past May (2009) 500/10,000 .5 task completion rate
9
What We Learned Information Retrieval / Extraction/Indexing:
Gained more experience with regex, Java servlet technology, and working with open-source projects. Classifier: Many methods of classification and related variables. Deciding on a classification method and setting variables can be as much of an art form as a science. Front End: Frameworks (Django) are very powerful but it takes a long time to learn. Watch your resources!
10
Breakdown of Work Justin - Classifier and Database
Jessica and Jenn - Front end Sophia - Extraction, Nutch, Lucene, Database Chuan - Extraction
11
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.