Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Entity-Centric Document Filtering: Boosting Feature Mapping through Meta-Features.
Large-Scale Entity-Based Online Social Network Profile Linkage.
SIGIR 2013 Recap September 25, 2013.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.
Discriminative Segment Annotation in Weakly Labeled Video Kevin Tang, Rahul Sukthankar Appeared in CVPR 2013 (Oral)
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.
ACM Multimedia th Annual Conference, October , 2004
1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.
Adapting Deep RankNet for Personalized Search
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Interactive Discovery and Semantic Labeling of Patterns in Spatial Data Thomas Funkhouser, Adam Finkelstein, David Blei, and Christiane Fellbaum Princeton.
Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short Texts Zhongyuan Wang, Haixun Wang, Zhirui Hu.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Machine Learning.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang WSDM 2010, New York, USA 1.
Boris 2 Boris Babenko 1 Ming-Hsuan Yang 2 Serge Belongie 1 (University of California, Merced, USA) 2 (University of California, San Diego, USA) Visual.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Search Engine Architecture
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Image Classification for Automatic Annotation
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Entity-Relationship Query over Wikipedia Xiaonan Li 1, Chengkai Li 1, Cong Yu 2 1 University of Texas at Arlington 2 Yahoo! Research International Workshop.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
1 Cross Market Modeling for Query- Entity Matching Manish Gupta, Prashant Borole, Praful Hebbar, Rupesh Mehta, Niranjan Nayak.
An Empirical Study of Learning to Rank for Entity Search
Search Engine Architecture
Content-Aware Click Modeling
Weakly Learning to Match Experts in Online Community
Bolun Wang*, Yuanshun Yao, Bimal Viswanath§ Haitao Zheng, Ben Y. Zhao
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Towards Exploratory Relationship Search: A Clustering-Based Approach
Search Engine Architecture
Topic: Semantic Text Mining
Presentation transcript:

Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search

Limitation of Traditional Entity Search 2 Entity Search: in most cases, what we want are not pages, but entities (Cheng 2007) Thalmic Lab founded by ? Limitation 1.Fail to model the relation implied by the query 2.Difficult for user to enumerate different representations of a relation Thalmic Lab founded by #person

Our Task: Relational Entity Search 3 Relational Entity Search: relation-specific ranking functions Relational Entity Searcher Founder-Of Ranker Grad-From Ranker … Advantages 1.Train rankers from relational data to capture the relation semantics 2.Relieve users from the burden of specifying keywords. Output 1.Stephen Lake 2.Aaron Grant 3.… 1.Columbia Univ. 2.Harvard Law School Input FounderOf (“Thalmic Lab”) GraduateFrom (“Barack Obama”)

Proposal: Relational Entity Search Framework 4

Relational Entity Search Framework 5 Relational Entity Searcher Founder-Of Ranker … Entity-aware Searcher Keyword Indexes Entity Index Entity Indexes (#person, #location, #color, #medicine, …) #person: (“bill gates”, d1, 5) (“Steven Ballmer”, d2, 1)… Snippet s1: Microsoft was founded by Bill Gates s2: Steven Ballmer is CEO of Microsoft

Relational Entity Search Framework 6  Stephen Lake Stephen Lake, co-founder of Thalmic Lab  Daniel Debow Daniel Debow investigates Thalmic Lab … Entity-Snippet Relational Entity Searcher Founder-Of Ranker … Entity-aware Searcher Keyword Indexes Entity Index Relational Query FounderOf (“Thalmic Lab”) Result 1.Stephen Lake 2.Aaron Grant … Translated Query Thalmic Lab, Founded by Founder, Started,…, #Person

Training Relational Entity Ranker Offline 7 CompanyFounder MicrosoftBill Gates, Paul Allen IBMThomas Watson …… FounderOf Relation  Bill Gates Microsoft was founded by Bill Gates …  Paul Allen  Steven Ballmer Steven Ballmer is CEO of Microsoft Entity-Snippet Entity-aware Searcher Founder-Of Ranker

Challenges on Accuracy and Efficiency 8

Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 9 Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities? EntitySnippets e 1 : “Bill Gates” CompanyFounder MicrosoftBill Gates … s 11 : Microsoft was founded by Bill Gates and … s 13 : Bill Gates dropped out of college and started Microsoft. s 12 : Bill Gates met Microsoft CEO at his home … Noise

Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 10 Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? “started”, #person Entity-aware Searcher “founded by”, #person “founder”, # person … Snippets Require Expensive Index Checking Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities?

Challenges on Accuracy and Efficiency: Distantly Supervised Ranking 11 Distantly Supervised Ranking  Distantly Supervised: Only Entity Labels, No Snippet Labels  Ranking: Efficiency is Required Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? Challenge 1 (Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities?

12 Insight: Redundancy Ranking Principle

Learn indicative patterns based on redundancy (Challenge 1: Accuracy) 13 Microsoft => Bill Gates Microsoft was founded by Bill Gates. IBM => Thomas Watson Founded by Thomas Waston, IBM is … Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg … BeforeEntity [“founded by”] Indicative Pattern: Some important patterns that are indicative of the relation. E.g., founded by, started, created …

Filter Noisy Snippets by Indicative Patterns (Challenge 1: Accuracy) 14 e: Bill Gates Indicative Patterns P Evidence Snippet: Snippets that contain at least one indicative pattern

A small number of indicative patterns are sufficient (Challenge 2: Efficiency) 15 Indicative Patterns BeforeEntity [“started”] BeforeEntity [“founded by”] Around [“founder”] BeforeEntity [“created”] … Microsoft => Bill Gates Microsoft was founded by Bill Gates. Bill Gates created Microsoft … IBM => Thomas Watson The founder of IBM is Thomas … Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg Mark Zuckerberg created Facebook in 2006 Snippets

Redundancy Ranking Principle 16 e: Bill Gates Web Redundancy 0 Redundancy Ranking Principle

17 Redundancy Ranking Principle Query-Entity Distance Query Frequency BeforeEntity[“founded by”] BeforeEntity[“started”] … Snippet Feature f(s) e: Bill Gates 0

Solution: Pattern-based Filter Network 18

Objective Function 19 Subject to For efficiency concern, the number of indicative patterns should be small

Model Redundancy Ranking Principle: Pattern-based Filter Network (PFNet) 20 Noise Filtering Layer Evidence Aggregation Layer 1. Filter Noisy Snippets by Indicative Patterns. 2. Aggregate Contribution from Evidence Snippets

Noise Filtering Layer in PFNet 21 …

Evidence Aggregation Layer in PFNet 22 …

Likelihood for PFNet 23 Subject to

Factor Design 24

Factor Design 25 Aggregate Contribution from Evidence Snippets.

Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 26 Indicative Patterns P Candidate Snippet Features Log Likelihood BeforeEntity [“founded by”] Given current P, calculate the maximized likelihood by gradient ascent on w

Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 27 Indicative Patterns P Candidate Snippet Features Log Likelihood Around[“founder”] BeforeEntity [“founded by”] Around [“microsoft”] BeforeEntity [“started”] Around[“founder”]

Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns 28 Indicative Patterns P Candidate Snippet Features Log Likelihood Around[“founder”] BeforeEntity [“founded by”] Around [“microsoft”] BeforeEntity [“started”] Around[“founder”] BeforeEntity [“founded by”]

Experiment Setting 29 6 sets of different relations DatasetBase TypeQuery Num Positive / Total Entity Num Snippet Num FounderOf#person / PublisherOf#organization / WriterOf#person / PlaceOfBirth#location / PlaceOfDeath#location / GraduateFrom#organization /

Experiment Setting 30 Baselines EntityRank (Cheng 2007) Multi Instance Learning (MIL, Riedel 2010) SVMRank (Joachims 2003) BaselineFilter NoiseRedundancy MILYesNo SVMRankNoYes

31 Ranking Performance on 6 Different Relations.

32 1. Relation-specific ranking function performs better. 2. It is important to leverage redundancy. 3. It is necessary to filter noisy snippets.

Larger Improvement on More Noisy Relations 33 Relation Runner-up Baseline PFNet Percentage of Noise FounderOf %62.7% PublisherOf %72% WriterOf % 76.7% PlaceOfBirth % 76.7% PlaceOfDeath % 90.3% GraduateFrom %49.4%

Around 10 indicative patterns are sufficient 34

1. Higher redundancy can achieve better results Filtering noise is helpful for queries of different redundancy

1. Performance increases with more training examples. 2. Around 90 training examples are sufficient for most relations. 36

Thanks. Q & A 37