Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy.

Similar presentations


Presentation on theme: "Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy."— Presentation transcript:

1 Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search

2 Limitation of Traditional Entity Search 2 Entity Search: in most cases, what we want are not pages, but entities (Cheng 2007) Thalmic Lab founded by ? Limitation 1.Fail to model the relation implied by the query 2.Difficult for user to enumerate different representations of a relation Thalmic Lab founded by #person

3 Our Task: Relational Entity Search 3 Relational Entity Search: relation-specific ranking functions Relational Entity Searcher Founder-Of Ranker Grad-From Ranker … Advantages 1.Train rankers from relational data to capture the relation semantics 2.Relieve users from the burden of specifying keywords. Output 1.Stephen Lake 2.Aaron Grant 3.… 1.Columbia Univ. 2.Harvard Law School Input FounderOf (“Thalmic Lab”) GraduateFrom (“Barack Obama”)

4 Proposal: Relational Entity Search Framework 4

5 Relational Entity Search Framework 5 Relational Entity Searcher Founder-Of Ranker … Entity-aware Searcher Keyword Indexes Entity Index Entity Indexes (#person, #location, #color, #medicine, …) #person: (“bill gates”, d1, 5) (“Steven Ballmer”, d2, 1)… Snippet s1: Microsoft was founded by Bill Gates s2: Steven Ballmer is CEO of Microsoft

6 Relational Entity Search Framework 6  Stephen Lake Stephen Lake, co-founder of Thalmic Lab  Daniel Debow Daniel Debow investigates Thalmic Lab … Entity-Snippet Relational Entity Searcher Founder-Of Ranker … Entity-aware Searcher Keyword Indexes Entity Index Relational Query FounderOf (“Thalmic Lab”) Result 1.Stephen Lake 2.Aaron Grant … Translated Query Thalmic Lab, Founded by Founder, Started,…, #Person

7 Training Relational Entity Ranker Offline 7 CompanyFounder MicrosoftBill Gates, Paul Allen IBMThomas Watson …… FounderOf Relation  Bill Gates Microsoft was founded by Bill Gates …  Paul Allen  Steven Ballmer Steven Ballmer is CEO of Microsoft Entity-Snippet Entity-aware Searcher Founder-Of Ranker

8 Challenges on Accuracy and Efficiency 8

9 Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 9 Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities? EntitySnippets e 1 : “Bill Gates” CompanyFounder MicrosoftBill Gates … s 11 : Microsoft was founded by Bill Gates and … s 13 : Bill Gates dropped out of college and started Microsoft. s 12 : Bill Gates met Microsoft CEO at his home … Noise

10 Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 10 Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? “started”, #person Entity-aware Searcher “founded by”, #person “founder”, # person … Snippets Require Expensive Index Checking Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities?

11 Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 11 Distantly Supervised Ranking  Distantly Supervised: Only Entity Labels, No Snippet Labels  Ranking: Efficiency is Required Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities?

12 12 Insight: Redundancy Ranking Principle

13 Learn indicative patterns based on redundancy (Challenge 1: Accuracy) 13 Microsoft => Bill Gates Microsoft was founded by Bill Gates. IBM => Thomas Watson Founded by Thomas Waston, IBM is … Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg … BeforeEntity [“founded by”] Indicative Pattern: Some important patterns that are indicative of the relation. E.g., founded by, started, created …

14 Filter Noisy Snippets by Indicative Patterns (Challenge 1: Accuracy) 14 e: Bill Gates Indicative Patterns P Evidence Snippet: Snippets that contain at least one indicative pattern

15 A small number of indicative patterns are sufficient (Challenge 2: Efficiency) 15 Indicative Patterns BeforeEntity [“started”] BeforeEntity [“founded by”] Around [“founder”] BeforeEntity [“created”] … Microsoft => Bill Gates Microsoft was founded by Bill Gates. Bill Gates created Microsoft … IBM => Thomas Watson The founder of IBM is Thomas … Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg Mark Zuckerberg created Facebook in 2006 Snippets

16 Redundancy Ranking Principle 16 e: Bill Gates Web Redundancy 0 Redundancy Ranking Principle

17 17 Redundancy Ranking Principle Query-Entity Distance Query Frequency BeforeEntity[“founded by”] BeforeEntity[“started”] … Snippet Feature f(s) e: Bill Gates 0

18 Solution: Pattern-based Filter Network 18

19 Objective Function 19 Subject to For efficiency concern, the number of indicative patterns should be small

20 Model Redundancy Ranking Principle: Pattern-based Filter Network (PFNet) 20 Noise Filtering Layer Evidence Aggregation Layer 1. Filter Noisy Snippets by Indicative Patterns. 2. Aggregate Contribution from Evidence Snippets

21 Noise Filtering Layer in PFNet 21 …

22 Evidence Aggregation Layer in PFNet 22 …

23 Likelihood for PFNet 23 Subject to

24 Factor Design 24

25 Factor Design 25 Aggregate Contribution from Evidence Snippets.

26 Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 26 Indicative Patterns P Candidate Snippet Features Log Likelihood BeforeEntity [“founded by”] -100.20 Given current P, calculate the maximized likelihood by gradient ascent on w

27 Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 27 Indicative Patterns P Candidate Snippet Features Log Likelihood Around[“founder”] BeforeEntity [“founded by”] Around [“microsoft”] BeforeEntity [“started”] -100.20 -5600.21 -76.13 -200.43 Around[“founder”]

28 Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 28 Indicative Patterns P Candidate Snippet Features Log Likelihood Around[“founder”] BeforeEntity [“founded by”] Around [“microsoft”] BeforeEntity [“started”] -60.20 -3450.21 -103.43 Around[“founder”] BeforeEntity [“founded by”]

29 Experiment Setting 29 6 sets of different relations DatasetBase TypeQuery Num Positive / Total Entity Num Snippet Num FounderOf#person371473 / 200611033507 PublisherOf#organization323329 / 191661488347 WriterOf#person669993 / 466832111565 PlaceOfBirth#location350350 / 243481376995 PlaceOfDeath#location350350 / 232461105738 GraduateFrom#organization228228 / 855997916

30 Experiment Setting 30 Baselines EntityRank (Cheng 2007) Multi Instance Learning (MIL, Riedel 2010) SVMRank (Joachims 2003) BaselineFilter NoiseRedundancy MILYesNo SVMRankNoYes

31 31 Ranking Performance on 6 Different Relations.

32 32 1. Relation-specific ranking function performs better. 2. It is important to leverage redundancy. 3. It is necessary to filter noisy snippets.

33 Larger Improvement on More Noisy Relations 33 Relation Runner-up Baseline (NDCG@5) PFNet (NDCG@5)Improvement Percentage of Noise FounderOf 0.6906480.753557 9.11%62.7% PublisherOf 0.54620880.675435 23.7%72% WriterOf 0.5664430.69115422.0% 76.7% PlaceOfBirth 0.5862410.67038414.3% 76.7% PlaceOfDeath 0.5187370.59331114.4% 90.3% GraduateFrom 0.7521520.799303 6.3%49.4%

34 Around 10 indicative patterns are sufficient 34

35 1. Higher redundancy can achieve better results 35 2. Filtering noise is helpful for queries of different redundancy

36 1. Performance increases with more training examples. 2. Around 90 training examples are sufficient for most relations. 36

37 Thanks. Q & A 37


Download ppt "Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy."

Similar presentations


Ads by Google