Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu
Goal Create a specialized search engine for events in the Seattle area. Enable search by Location Date and Time Price Category Enable full-text search
The Internet Nutch Crawler Event Extraction Event Indexing Geocoding Event Database Web Service Event Classification Web Application
Reality Site-specific extractors used rather than machine learning approach Classifier is tuned for the events from specifically chosen websites rather than the full web. (overfitted, but in a good way) Email notifications not implemented. Less dynamic web interface.
Demo http://amlia.cs.washington.ed u:1987/eventfinder/search/
What We Found Surprising CRF++ limited in extracting attributes CRF extracts multiple values for one attribute from an event description If only extract from a paragraph of description, in most cases, some attribute are not contented in. Use Site-specific structures Extracted attributes values from corresponding Html tags No more ambiguity Can extract all the attributes from the each event page Javascript's negative impact Some sites not fully loading after the first Http Request URLs generated by Javascript functions could not be retrieved
Classifier Experiments Variables / Results Number of Categories: 8 Training Data Source: Crawler Data Scaled vs. Non-scaled Training Data: Scaled Single Words vs. Word Pairs as attributes: Single Words Only Scaled LIBSVM attribute values? Yes Lower Bound: 0 Upper Bound: 1 Weights URL Words: 50 Title Words: 15 Location Words: 10 Tests (32 Recorded Tests) Ablation: Turning On/Off features Tuning: Adjusting Variable Values
Usability Experiment Survey Results Participants went to the website and completed 3 tasks Task completion and overall feedback Only three submissions due to server load Results All found site navigable Participants used list view, map view and individual event pages Some results not relevant Task 1: Find any event that is happening in Seattle Task 2: Find a music event that is happening more than a week away from now in Seattle Task 3: Find the celebration event of author Sandra Cisneros this past May (2009) 500/10,000 .5 task completion rate
What We Learned Information Retrieval / Extraction/Indexing: Gained more experience with regex, Java servlet technology, and working with open-source projects. Classifier: Many methods of classification and related variables. Deciding on a classification method and setting variables can be as much of an art form as a science. Front End: Frameworks (Django) are very powerful but it takes a long time to learn. Watch your resources!
Breakdown of Work Justin - Classifier and Database Jessica and Jenn - Front end Sophia - Extraction, Nutch, Lucene, Database Chuan - Extraction
Questions?