Download presentation
Presentation is loading. Please wait.
Published byMadeleine Hodges Modified over 9 years ago
1
Wong Cheuk Fun Presentation on Keyword Search
2
Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu
3
Popular iphone 5s smart cover Modifiers Constraint Head
4
90% of distinct queries consist of 2 or more components
5
Detection Challenges No grammar rules Popular iphone 5s smart cover vs Popular smart cover iphone 5s Require external knowledge Job search vs Job interview - Instance-level head-modifier knowledge - Conceptual knowledge - Concept-level head-modifier knowledge
6
Detection Approach (concept [head], concept [modifier], score) e.g. (accessary [head], device [modifier], 0.9) Three major challenges: - Knowledge’s coverage to handle all possible input - Avoid deriving conflicting patterns - Identify constraints from non-constraint modifiers
7
Mining Concept Patterns -- Probase IsA taxonomy - Entities vs concepts (Barack Obama) vs USA president 2.7 million concepts P(e|c) tells how popular e as concept c is concerned and vice versa. e.g. P(Fujitsu|Computer) > P(Acer|Computer) n(e,c) denotes the frequencies of e and c occur together
8
Mining Concept Patterns – Instance-level Head-Modifiers Identify head and modifiers no matter what their orders are “smart cover for iphone 5s” Other prepositions: - ‘of’, ‘with’, ‘in’, ‘on’, ‘at’ - When they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.
9
Mining Concept Patterns – Concept-level Head-Modifiers Levels of Conceptualization (head, modifier, score) (smart cover, iphone 5s) too specific, (obj, obj) too general Conflicting rules: (company, device) vs (device, company) Conceptualizing instances 1. Map e to c if P(c|e) is among top k; 2. Map e to c if P(e|c) is among top k; 3. Map e to c if P(c|e)*P(e|c) is amont top k; 4. Map e to itself if e is itself a concept First two are not desirable as they are either too general or too specific For(3), larger value shows evidence of the closeness between c and e. For(4), we use entropy to identify popular instance:
10
Mining Concept Patterns – Conceptualizing Pairs Term “apple” conceptualizes to “fruit” or “company” “CEO for apple” (CEO, fruit), (CEO, company) Obviously, (CEO, fruit) is wrong. Wrong concept pairs introduced will be filtered out due to low score
11
Head and Modifier Detection – Parsing 1. Text are parsed using Probase *“New York and New York Times” 2. Remove non-constraint modifiers 3. Cluster terms Cluster short text having more than one head (e.g. apple ipad microsoft surface) Reduce pair for conceptualization
12
Head and Modifier Detection for 2 components
13
Head and Modifier Detection for > 2 components Modifier can thus be ranked by its closeness to the head For query “college football player”, we remove the likely weakest edge college player.
14
Mining non-constraint modifiers “Top query Seattle”, “good travelling hostel” Non-constraint modifiers: Top, good Non-constraint modifiers are more likely on the left of the query e.g. “cheap red shoe” instead of “red cheap shoe” Mining non-constraint modifiers using Probase 2.7 million concepts
15
Mining non-constraint modifiers – mining process 1. Construct modifier networks based on observations 2. Calculate score of each node as a non-constraint modifier in the networks Lower PMS makes it a non-constraint modifier
16
Framework for head, modifier and constraint detection
17
On Masking Topical Intent in Keyword Search Peng Wang and Chinya V. Ravishankar
19
Keyword-Based Obfuscation Hide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA). Advantage: Purely client-based Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable
21
Topical Intent Obfuscation For a real user query q, dummy queries are created matching other topics. *Topic Relevance ensure obfuscation Under two thresholds, α, β ( β < α ), with topic t and query q, Pr[t] : t’s relevance based on general interest pattern Pr[t|q] : t’s relevance after taking q into account Pr[t|q] - Pr[t] > α t is relevant to q. Aim: Pr[t|q] - Pr[t] < β to create irrelevant dummy queries
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.