Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Probase: Understanding Data on the Web Haixun Wang Microsoft Research Asia.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Tag-based Social Interest Discovery
Friends and Locations Recommendation with the use of LBSN
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Short Text Understanding Through Lexical-Semantic Analysis
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
The Business Model and Strategy of MBAA 609 R. Nakatsu.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Querying Structured Text in an XML Database By Xuemei Luo.
SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Chapter 6: Information Retrieval and Web Search
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
Concept-based Short Text Classification and Ranking
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
1 Chapter 2 Database Environment Pearson Education © 2009.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Developing GRID Applications GRACE Project
Toward Topic Search on the Web 전자전기컴퓨터공학과 G 김영제 Database Lab.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
CS-508 Databases and Data Mining By Dr. Noman Hasany.
11.30 – Semantic job search- showcase
Chapter 2 Database Environment.
ProBase: common Sense Concept KB and Short Text Understanding
Presentation transcript:

Wong Cheuk Fun Presentation on Keyword Search

Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu

Popular iphone 5s smart cover Modifiers Constraint Head

90% of distinct queries consist of 2 or more components

Detection Challenges No grammar rules Popular iphone 5s smart cover vs Popular smart cover iphone 5s Require external knowledge Job search vs Job interview - Instance-level head-modifier knowledge - Conceptual knowledge - Concept-level head-modifier knowledge

Detection Approach (concept [head], concept [modifier], score) e.g. (accessary [head], device [modifier], 0.9) Three major challenges: - Knowledge’s coverage to handle all possible input - Avoid deriving conflicting patterns - Identify constraints from non-constraint modifiers

Mining Concept Patterns -- Probase IsA taxonomy - Entities vs concepts (Barack Obama) vs USA president 2.7 million concepts P(e|c) tells how popular e as concept c is concerned and vice versa. e.g. P(Fujitsu|Computer) > P(Acer|Computer) n(e,c) denotes the frequencies of e and c occur together

Mining Concept Patterns – Instance-level Head-Modifiers Identify head and modifiers no matter what their orders are “smart cover for iphone 5s” Other prepositions: - ‘of’, ‘with’, ‘in’, ‘on’, ‘at’ - When they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.

Mining Concept Patterns – Concept-level Head-Modifiers Levels of Conceptualization (head, modifier, score)  (smart cover, iphone 5s) too specific, (obj, obj) too general  Conflicting rules: (company, device) vs (device, company) Conceptualizing instances 1. Map e to c if P(c|e) is among top k; 2. Map e to c if P(e|c) is among top k; 3. Map e to c if P(c|e)*P(e|c) is amont top k; 4. Map e to itself if e is itself a concept First two are not desirable as they are either too general or too specific For(3), larger value shows evidence of the closeness between c and e. For(4), we use entropy to identify popular instance:

Mining Concept Patterns – Conceptualizing Pairs Term “apple” conceptualizes to “fruit” or “company” “CEO for apple”  (CEO, fruit), (CEO, company) Obviously, (CEO, fruit) is wrong. Wrong concept pairs introduced will be filtered out due to low score

Head and Modifier Detection – Parsing 1. Text are parsed using Probase *“New York and New York Times” 2. Remove non-constraint modifiers 3. Cluster terms  Cluster short text having more than one head (e.g. apple ipad microsoft surface)  Reduce pair for conceptualization

Head and Modifier Detection for 2 components

Head and Modifier Detection for > 2 components Modifier can thus be ranked by its closeness to the head For query “college football player”, we remove the likely weakest edge college  player.

Mining non-constraint modifiers “Top query Seattle”, “good travelling hostel” Non-constraint modifiers: Top, good Non-constraint modifiers are more likely on the left of the query e.g. “cheap red shoe” instead of “red cheap shoe” Mining non-constraint modifiers using Probase  2.7 million concepts

Mining non-constraint modifiers – mining process 1. Construct modifier networks based on observations 2. Calculate score of each node as a non-constraint modifier in the networks Lower PMS makes it a non-constraint modifier

Framework for head, modifier and constraint detection

On Masking Topical Intent in Keyword Search Peng Wang and Chinya V. Ravishankar

Keyword-Based Obfuscation Hide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA). Advantage: Purely client-based Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable

Topical Intent Obfuscation For a real user query q, dummy queries are created matching other topics. *Topic Relevance  ensure obfuscation Under two thresholds, α, β ( β < α ), with topic t and query q, Pr[t] : t’s relevance based on general interest pattern Pr[t|q] : t’s relevance after taking q into account Pr[t|q] - Pr[t] > α  t is relevant to q. Aim: Pr[t|q] - Pr[t] < β to create irrelevant dummy queries