SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University

Slides:



Advertisements
Similar presentations
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Information Retrieval in Practice
SMS-Based Web Search for Low-end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University Eric Brewer University of California.
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Xiaobin Zheng April 13 th, Outline Mobile search Mobile Web Types of services Case Study: Google Search for mobile Yahoo! Search for mobile Conclusion.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Search engines. The number of Internet hosts exceeded in in in in in
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Parallel and Distributed IR
Information Retrieval
Overview of Search Engines
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Roy McElmurry EXPLORATION SEMINAR 2 SEARCHING AND GOOGLE.
TwitterSearch : A Comparison of Microblog Search and Web Search
Business Process Performance Prediction on a Tracked Simulation Model Andrei Solomon, Marin Litoiu– York University.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Search Engines and Information Retrieval Chapter 1.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Master Thesis Defense Jan Fiedler 04/17/98
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Facilitating Document Annotation using Content and Querying Value.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Web- and Multimedia-based Information Systems Lecture 2.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Facilitating Document Annotation Using Content and Querying Value.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Information Retrieval in Practice
Search Engine Architecture
Information Retrieval in Practice
Text Based Information Retrieval
Information Retrieval
Introduction to Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University Eric Brewer University of California, Berkely

 SMS-based web service is a rapidly growing market  Over 12 million subscribers in July 2008  A significant fraction of mobile devices in developing regions are still low-cost devices 2 Motivation(1)  Undesirable performance about current existing SMS- based web service  Low accuracy (Google SMS 22.2%, Yahoo! One search 27.8%[vertical and pre-defined topics])  Long median response time (ChaCha seconds [hire human to search the web and answer questions])

3 Motivation(2)

 SMS search suffer from the long tail phenomenon  21% of the queries are verticals and 79% are long tailed (in ChaCha)  None of the existing automated SMS search services is a complete solution for search queries across arbitrary topics  The search queries are inherently ambiguous 4 Challenges

5  Seek to build an automated system has performance  Fast (unlike ChaCha)  Accurate(unlike Google SMS and Yahoo! One search)  Return a disambiguated result for queries across arbitrary topics 5 Problem

6 Related work  Mobile search is different from conventional desktop search  Click-through rate and search page views were significantly lower  Persistence of mobile users was very low  Diversity of search topics for low-end phone users was much less  Distinct at least one of the three dimensions fromTREC tracks  The nature of the input query  The document collection set  The nature of the search result in the query response

7 System architecture Run algorithm and return a snippet

8  Vertical: topics are pre-defined or popular  Long tail: topics are not popular  A snippet: any continuous stream of text that fits within an SMS message(within 140 bytes)  Hint: a term or a collection of consecutive terms that determine what kind of information the user is looking for Introduce of definition

9 SMSFind algorithm The SMSFind search problem can be characterized as : ★ Given an unstructured SMS search query in the form of and top-k return pages by a search engine, extract a condensed set of text snippets from the response pages that provide an appropriate search response to the query. This problem definition assumes that the hint is specified for every query. Like Google SMS have a similar explicit requirement, where a keyword is specified as the last term.(this paper’s hint is arbitrary)

10 SMSFind algorithm  Neighborhood Extraction  N-gram Ranking  Snippet Ranking Considering a search query (Q,H) where Q is the search query containing the hint term H. Let P1,... PN represent the textual content of the top N search response pages to Q. Given(Q,H) and P1... PN, the SMSFind snippet extraction algorithm contains three main steps:

11 Process of SMSFind Filtering n-grams Neighborhood extraction Ranking n-grams Split snippets tiles Snippet ranking Generate n-grams Filter the set of n-gram based on three dimensions: frequency (3), mean rank(ignore low PageRank n-gram) and Minimum distance(10). Rank(s)=freq(s)+meanranks(s)+min dist(s) Based on the cumulative rank of top- k(5) ranked n-grams within the snippet Using a 140bytes slide window

12 Generate n-gram n-gram :1-5 words N-gramFrequencyMin. Distance "the"21 "the brown"13 "the brown cow"12 "brow cow jumped"11 Table 1: Slicing example for the text “the brown cow jumped over the moon”. Hint=“over”

13 N-gram Ranking  Three metrics:  Frequency : the number of times the n-gram occurs across all snippets  Mean rank: the sum across every occurrence of a n-gram of the PageRank of the page in which it occurs, divided by the n-gram’s raw frequency.  Minimum distance : the minimum distance between a n- gram and the hint across any occurrences of both.

14 An example at this point of metrics to evaluate the rank of n-gram If two n-grams s,t have the same frequency measure but if n- gram s has a much lower web frequency than t, then s needs to be higher ranked than t TF-IDF Rank(s)=freq(s)+meanrank(s)+mindist(s) {a linear combination of three normalized ranks}

15 snippet Ranking

16 How to extract a hint  Resource date analysis:  95% of 100, 000 queries from ChaCha are less than 14 terms or less  Several common structures can be observed and have corresponding transformation rules  Like: 45% of the queries began with “what”, of which over 80% of the queries are in standard forms (e.g. “what is”, “what was”, “what are”, “what do”, “what does”) e.g. “what is a quote by Ernest Hemingway” Satisfy structure of “what is X”, ignore the stop word “a”, the final is

17 Implement  Implement:  Language: 600 lines of python uses publicly parsing Library  Deployment: a front-end to send and receive SMS message  Set up: a SMS short code with a local telco in Kenya, and route all SMS requests and response to and from our server machine  Implement interfaces : to several basic vertical as a part of service including: weather, definitions, local business results, and news. (each of those interfaces under 150 lines python code)

18 Evaluation

19 Use the sub-topic in ChaCha to focus on long tail topics

20 variety of the topics

21 Important to use n- gram to rank the snippet Critical to return a snippet rather than n-gram Significant to modify the queries

22 The readability of our snippets is poor

23 Conclusion  A combination of simple Information Retrieval algorithms in conjunction with existing search engines can provide reasonably accurate search response for SMS queries  Using queries across arbitrary topics show SMSFind can answer 57.3% of the queries in test set.  Represent a foray into an open and practical research domain