Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.

Similar presentations


Presentation on theme: "Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu."— Presentation transcript:

1 Wong Cheuk Fun Presentation on Keyword Search

2 Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu

3 Popular iphone 5s smart cover Modifiers Constraint Head

4 90% of distinct queries consist of 2 or more components

5 Detection Challenges No grammar rules Popular iphone 5s smart cover vs Popular smart cover iphone 5s Require external knowledge Job search vs Job interview - Instance-level head-modifier knowledge - Conceptual knowledge - Concept-level head-modifier knowledge

6 Detection Approach (concept [head], concept [modifier], score) e.g. (accessary [head], device [modifier], 0.9) Three major challenges: - Knowledge’s coverage to handle all possible input - Avoid deriving conflicting patterns - Identify constraints from non-constraint modifiers

7 Mining Concept Patterns -- Probase IsA taxonomy - Entities vs concepts (Barack Obama) vs USA president 2.7 million concepts P(e|c) tells how popular e as concept c is concerned and vice versa. e.g. P(Fujitsu|Computer) > P(Acer|Computer) n(e,c) denotes the frequencies of e and c occur together

8 Mining Concept Patterns – Instance-level Head-Modifiers Identify head and modifiers no matter what their orders are “smart cover for iphone 5s” Other prepositions: - ‘of’, ‘with’, ‘in’, ‘on’, ‘at’ - When they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.

9 Mining Concept Patterns – Concept-level Head-Modifiers Levels of Conceptualization (head, modifier, score)  (smart cover, iphone 5s) too specific, (obj, obj) too general  Conflicting rules: (company, device) vs (device, company) Conceptualizing instances 1. Map e to c if P(c|e) is among top k; 2. Map e to c if P(e|c) is among top k; 3. Map e to c if P(c|e)*P(e|c) is amont top k; 4. Map e to itself if e is itself a concept First two are not desirable as they are either too general or too specific For(3), larger value shows evidence of the closeness between c and e. For(4), we use entropy to identify popular instance:

10 Mining Concept Patterns – Conceptualizing Pairs Term “apple” conceptualizes to “fruit” or “company” “CEO for apple”  (CEO, fruit), (CEO, company) Obviously, (CEO, fruit) is wrong. Wrong concept pairs introduced will be filtered out due to low score

11 Head and Modifier Detection – Parsing 1. Text are parsed using Probase *“New York and New York Times” 2. Remove non-constraint modifiers 3. Cluster terms  Cluster short text having more than one head (e.g. apple ipad microsoft surface)  Reduce pair for conceptualization

12 Head and Modifier Detection for 2 components

13 Head and Modifier Detection for > 2 components Modifier can thus be ranked by its closeness to the head For query “college football player”, we remove the likely weakest edge college  player.

14 Mining non-constraint modifiers “Top query Seattle”, “good travelling hostel” Non-constraint modifiers: Top, good Non-constraint modifiers are more likely on the left of the query e.g. “cheap red shoe” instead of “red cheap shoe” Mining non-constraint modifiers using Probase  2.7 million concepts

15 Mining non-constraint modifiers – mining process 1. Construct modifier networks based on observations 2. Calculate score of each node as a non-constraint modifier in the networks Lower PMS makes it a non-constraint modifier

16 Framework for head, modifier and constraint detection

17 On Masking Topical Intent in Keyword Search Peng Wang and Chinya V. Ravishankar

18

19 Keyword-Based Obfuscation Hide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA). Advantage: Purely client-based Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable

20

21 Topical Intent Obfuscation For a real user query q, dummy queries are created matching other topics. *Topic Relevance  ensure obfuscation Under two thresholds, α, β ( β < α ), with topic t and query q, Pr[t] : t’s relevance based on general interest pattern Pr[t|q] : t’s relevance after taking q into account Pr[t|q] - Pr[t] > α  t is relevant to q. Aim: Pr[t|q] - Pr[t] < β to create irrelevant dummy queries


Download ppt "Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu."

Similar presentations


Ads by Google