Download presentation
Presentation is loading. Please wait.
1
Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006
2
June 28, 2015Andrea Moed | IS56 ANLP2 Overview Motivations Corpus Processing Nickname discovery Ongoing experiments –Attraction extraction –Review classification Future work
3
June 28, 2015Andrea Moed | IS56 ANLP3 Motivations Local knowledge of well-known places… for locals –“Nobody goes there anymore, it’s too crowded” –Major draws (views, dishes, people…) –Best times/seasons/modes of transport? –Places to combine in one excursion “A good place for X” vs. a Great Good Place* –*Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999
4
June 28, 2015Andrea Moed | IS56 ANLP4 Corpus Yelp San Francisco –Social site organized around cities, launched 2004 –Thousands of SF places, reviews and reviewers –Largely local interest (Mass Media, Pets) –Some areas useful for visitors (Night Life, Shopping) –Writerly culture high structural and stylistic variation in the text Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor –Destinations –Frequently reviewed places: 20+ reviews
5
June 28, 2015Andrea Moed | IS56 ANLP5 Processing Used Dappit to build page scrapers Generated XML; parsed in Python –Place objects consisting of location info + reviews –Corpus collects place objects from various categories Challenges of screen scraping –Tradeoff between more places and places with most reviews (optimization requires exhaustive search) –TripAdvisor proved too difficult Analysis with Python and NLTK Lite
6
June 28, 2015Andrea Moed | IS56 ANLP6 Place Nickname Discovery Goal: Discover alternate search terms to surface more diverse local results in web search Method: Regular expression matching
7
June 28, 2015Andrea Moed | IS56 ANLP7 Place Nickname Discovery Steps –Counted frequency of Yelp-given place name in reviews of that place –Tokenized name on whitespace –Rule-based generation of candidate nicknames: acronym, subsets of tokens –Compared frequencies of given name and each nickname –Potentially useful nicknames are those that occur at least half as often as the given name
8
June 28, 2015Andrea Moed | IS56 ANLP8 Place Nickname Discovery Results –From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each –23 of 61 places appeared to have frequently used nicknames –BUT in 9 cases this was due to common words in names –First word most commonly used nickname in remaining cases –Hypothesis: Long tail of less predictable nicknames
9
June 28, 2015Andrea Moed | IS56 ANLP9 Ongoing Work Attraction extraction –TF/IDF calculation to find the concepts most widely associated with a place –Further text analysis to collect understandings of key concepts Specificity Sentiment Temporality
10
June 28, 2015Andrea Moed | IS56 ANLP10 Ongoing Work Attraction extraction –TF/IDF calculation to find the concepts most widely associated with a place –Further text analysis to collect understandings around key concepts Specificity Sentiment Temporality
11
June 28, 2015Andrea Moed | IS56 ANLP11 Ongoing Work Classification of reviews: recommendation vs. narrative –Recommendations help people “use” a city –Narrative is associated with memorable and unique locations Features for classification –Verb tense distribution –Paragraph breaks –Opinion words at beginning and end (recommendation) –Memory and relationship words (narrative)
12
June 28, 2015Andrea Moed | IS56 ANLP12 Future Work Relating understanding about location features to external data (geocoding, weather) Visualization of extracted concepts Development of a training set for classification
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.