SMS-Based Web Search for Low-end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University Eric Brewer University of California.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Information Retrieval in Practice
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Search engines. The number of Internet hosts exceeded in in in in in
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Parallel and Distributed IR
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University
Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger.
TwitterSearch : A Comparison of Microblog Search and Web Search
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Search Engines and Information Retrieval Chapter 1.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Master Thesis Defense Jan Fiedler 04/17/98
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Querying Structured Text in an XML Database By Xuemei Luo.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
JANE LI, SCOTT B. HUFFMAN, AND AKIHITO TOKUDA JULY 2009 PRESENTED BY : GAURANG JHAWAR Good Abandonment in Mobile and PC Internet Search 1.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Facilitating Document Annotation using Content and Querying Value.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Have a conversation  You encounter a stranger who asks:  What is your name?  What is your address?  What is your phone number, …?  What is your.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Information Retrieval in Practice
Search Engine Architecture
Text Based Information Retrieval
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Thanks to Bill Arms, Marti Hearst
Data Mining Chapter 6 Search Engines
Information Retrieval and Web Design
Presentation transcript:

SMS-Based Web Search for Low-end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University Eric Brewer University of California XinMiao Wu

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Explanation SMS – Short Messaging Service – 140 bytes limited SMS-Based Web Search – Not via XHTML/WAP – Just uses SMS Service

Conventional SMS-Based Web Search …………… ……………. 1 Short message 2 invoke 1.Response1 2. response2 3. response3 4. response4. 3 response …………… ……………. 4 Short messages …………… ……………. …………… ……………. SMS Server Search Engine TOP N search response User

What the authors address …………… ……………. 1 Short message 2 invoke 1.Response1 2. response2 3. response3 4. response4. 4 response …………… ……………. 5 Short message SMS Server Search Engine (SMSFind) TOP N search response User 140 bytes main Content 3 extract Snippet

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Why meaningful? Growth of the mobile phone market – motivated the design of new forms of mobile information services Growth of Twitter and other social messaging networks – Short-Messaging Service (SMS) based applications and services become popular Mobile devices in developing regions are still simple low-cost devices – With limited processing and communication capabilities – Voice and SMS will likely continue to remain the primary communication channels

Why SMS-Based Search? For any SMS-based web service, efficient SMS-based search is an essential building block. vertical ( Google SMS and Yahoo! oneSearch ) Existing long tail ( ChaCha, JustDial ) --- need human being None of the existing automated SMS search services is a complete solution for search queries across arbitrary topics Using pre-defined topics, such as “define” or “movies” (e.g. Google SMS: “define boils”)

Difficulties of SMS-Based Search 140 bytes Search response time (10 seds ~ several mins) Small form factor and low bandwidth (Even XHTML/WAP) Long tail phenomenon Rarely have the luxury (VS. Desktop) Ambiguous Problem: How does a mobile user efficiently search the Web using one round of interaction where the search response is restricted to one SMS message? – SMSFind

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Related Works Two surveys – First: Need a new mobile search model for low-end mobile devices. – Second: SMS is expected to continue its growth as it is popular, cheap, reliable and private. Two kinds of SMS search – Vertical: Google, Yahoo!, and Microsoft – Long tail: ChaCha and Just Dial Automatic Text Summarization – The goal is different

Related Works The problem that SMSFind seeks to address is similar to: – A question/answering systems (developed by the Text Retreival Conference) But distinct from: – Unstructured search style queries (simple natural language style) – SMSFind is a snippet extraction and snippet ranking algorithm – The collection of documents being searched over

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Known Verticals vs Long Tail

SMSFind Search Problem Characterized as follows: Given + the top N search response pages  extract a text snippet as an appropriate search response to the query. Note that: 1.What is a snippet? 2.What is the hint?

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Disambiguate query A common technique: – use additional contextual information from which the search is being conducted. – here we use an explicit hint. Consider the query :.

Most search result pages will contain: – “Michelle” or “Michelle Obama” or “Michelle Robinson” or “Michelle Lavaughn Robinson” within the neighborhood of the word “wife” in the text of the page. SMSFind will search the neighborhood of the word “wife” in every result page and look for commonly occurring n-grams. 1<=n<=5. For example, “Michelle Obama” is a 2−gram.

n-grams and snippets Both represent continuous sequences of words in a document A n-gram is extremely short in length (1−5 words) A text snippet is a sequence of words that can fit in a single SMS message n-grams are used as an intermediate unit Snippets are used for the final ranking

SMSFind Algorithm Consider a search query (Q,H) – Q is the search query containing the hint term(s) H. Let P1,... PN represent the textual content of the top N search response pages to Q. Three steps: Neighborhood Extraction; N-gram Ranking; Snippet Ranking

Neighborhood Extraction

N-gram Ranking

Basic rationale of n-gram ranking algorithm Any n-gram which satisfies the following three properties is potentially related to the appropriate response: 1. the n-gram appears very frequently around the hint. 2. the n-gram appears very close to the hint. 3. the n-gram is not a commonly used popular term or phrase. As an example, the n-gram “Michelle Obama”.

Three Metrics Frequency - The number of times the n-gram occurs across all snippets. Mean rank – The sum of the PageRanks of every page in which the n-gram occurs, divided by the n-gram’s raw frequency. Minimum Distance to the hint.

Should return the response “rainn wilson” Here, freq(s), meanrank(s) and mindist(s) are normalized scores of a n-gram s

Snippet Ranking

Hint Extraction from the Query 45% of the queries began with the word “what”. And over 80% of the queries are in standard forms. (e.g. “what is”, “what was”, “what are”, “what do”, “what does”). The “what is X” pattern. Example, the hint of “what is a quote by ernest hemingway” is “quote”. (“a” is a stop word )

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion 8 mins

IMPLEMENTATION 600 lines of Python code 1.8Ghz Duo Core Intel PC 2 GB of RAM 2 Mbps broadband A front-end Setup a SMS short code with a local telco in Kenya

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

EVALUATION How about the query set? How about the correct answers? How to judge correct or not? How about the percentage of verticals? Can the hint be always got correctly?

Result SMSFind results in 57.3% correct answers. While Google SMS results in only 9.5% of these queries.

what do the snippet results actually look like?

What is more interesting? if remove the vertical queries? if consider only the highest n-grams returned rather than the entire snippet? Whether n-grams are necessary or if ranking snippets alone would perform just as well? How Important is the Hint Term?

Summary of several results

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

Difficult Types of Queries Really ambiguous Explanations Enumerations Analysis Time sensitive SMSFind can not handle these kinds of queries now!

Outline What the authors address Introduction Related Work SMSFind Problems SMSFind Search Algorithm Implementation Evaluation Discussion Conclusion

CONCLUSION We have presented SMSFind, an automated SMS- based search response system. SMSFind can work across arbitrary topics. We find that a combination of simple Information Retrieval algorithms with existing search engines can provide reasonably accurate search responses for SMS queries. SMSFind is able to answer 57.3% of the queries in our test set.

Thank you!