Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.

Wong Cheuk Fun Presentation on Keyword Search

Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu

Popular iphone 5s smart cover Modifiers Constraint Head

90% of distinct queries consist of 2 or more components

Detection Challenges No grammar rules Popular iphone 5s smart cover vs Popular smart cover iphone 5s Require external knowledge Job search vs Job interview - Instance-level head-modifier knowledge - Conceptual knowledge - Concept-level head-modifier knowledge

Detection Approach (concept [head], concept [modifier], score) e.g. (accessary [head], device [modifier], 0.9) Three major challenges: - Knowledge’s coverage to handle all possible input - Avoid deriving conflicting patterns - Identify constraints from non-constraint modifiers

Mining Concept Patterns -- Probase IsA taxonomy - Entities vs concepts (Barack Obama) vs USA president 2.7 million concepts P(e|c) tells how popular e as concept c is concerned and vice versa. e.g. P(Fujitsu|Computer) > P(Acer|Computer) n(e,c) denotes the frequencies of e and c occur together

Mining Concept Patterns – Instance-level Head-Modifiers Identify head and modifiers no matter what their orders are “smart cover for iphone 5s” Other prepositions: - ‘of’, ‘with’, ‘in’, ‘on’, ‘at’ - When they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.

Mining Concept Patterns – Concept-level Head-Modifiers Levels of Conceptualization (head, modifier, score)  (smart cover, iphone 5s) too specific, (obj, obj) too general  Conflicting rules: (company, device) vs (device, company) Conceptualizing instances 1. Map e to c if P(c|e) is among top k; 2. Map e to c if P(e|c) is among top k; 3. Map e to c if P(c|e)*P(e|c) is amont top k; 4. Map e to itself if e is itself a concept First two are not desirable as they are either too general or too specific For(3), larger value shows evidence of the closeness between c and e. For(4), we use entropy to identify popular instance:

Mining Concept Patterns – Conceptualizing Pairs Term “apple” conceptualizes to “fruit” or “company” “CEO for apple”  (CEO, fruit), (CEO, company) Obviously, (CEO, fruit) is wrong. Wrong concept pairs introduced will be filtered out due to low score

Head and Modifier Detection – Parsing 1. Text are parsed using Probase *“New York and New York Times” 2. Remove non-constraint modifiers 3. Cluster terms  Cluster short text having more than one head (e.g. apple ipad microsoft surface)  Reduce pair for conceptualization

Head and Modifier Detection for 2 components

Head and Modifier Detection for > 2 components Modifier can thus be ranked by its closeness to the head For query “college football player”, we remove the likely weakest edge college  player.

Mining non-constraint modifiers “Top query Seattle”, “good travelling hostel” Non-constraint modifiers: Top, good Non-constraint modifiers are more likely on the left of the query e.g. “cheap red shoe” instead of “red cheap shoe” Mining non-constraint modifiers using Probase  2.7 million concepts

Mining non-constraint modifiers – mining process 1. Construct modifier networks based on observations 2. Calculate score of each node as a non-constraint modifier in the networks Lower PMS makes it a non-constraint modifier

Framework for head, modifier and constraint detection

On Masking Topical Intent in Keyword Search Peng Wang and Chinya V. Ravishankar

Keyword-Based Obfuscation Hide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA). Advantage: Purely client-based Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable

Topical Intent Obfuscation For a real user query q, dummy queries are created matching other topics. *Topic Relevance  ensure obfuscation Under two thresholds, α, β ( β < α ), with topic t and query q, Pr[t] : t’s relevance based on general interest pattern Pr[t|q] : t’s relevance after taking q into account Pr[t|q] - Pr[t] > α  t is relevant to q. Aim: Pr[t|q] - Pr[t] < β to create irrelevant dummy queries

Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.

Similar presentations

Presentation on theme: "Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.

Similar presentations

Presentation on theme: "Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu."— Presentation transcript:

Similar presentations

About project

Feedback