Entity-Relationship Query over Wikipedia Xiaonan Li 1, Chengkai Li 1, Cong Yu 2 1 University of Texas at Arlington 2 Yahoo! Research International Workshop.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Entity Ranking and Relationship Queries Using an Extended Graph Model Ankur Agrawal S. Sudarshan Ajitav Sahoo Adil Sandalwala Prashant Jaiswal IIT Bombay.
Mining Web’s Link Structure Sushanth Rai University of Texas at Arlington
1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Saarbrucken / Germany ¨
The History of Computers By: Chandler Jones The History of Computers.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining) Sisay Fissaha Adafre School of Computing Dublin City University.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Structured Querying of Web Text A Technical Challenge Kulsawasd Jitkajornwanich University of Texas at Arlington CSE6339 Web Mining.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data Mining for Web Intelligence Presentation by Julia Erdman.
ISC340 Web Programming Internet Success Stories. Yahoo! The first popular Web search engine. Created by two graduate electrical engineering in Stanford.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Using linked data to interpret tables Varish Mulwad September 14,
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Exploiting Relevance Feedback in Knowledge Graph Search
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Where Should I Publish? Journal Ranking Tools
Queensland University of Technology
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
An Empirical Study of Learning to Rank for Entity Search
Information Extraction from Wikipedia: Moving Down the Long Tail
Web IR: Recent Trends; Future of Web Search
Research at Open Systems Lab IIIT Bangalore
Applying Key Phrase Extraction to aid Invalidity Search
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Wikitology Wikipedia as an Ontology
Global Enterprise Search
Information Retrieval
Data Integration for Relational Web
Information retrieval and PageRank
DBpedia 2014 Liang Zheng 9.22.
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Probabilistic Databases
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Web archives as a research subject
Topic: Semantic Text Mining
Presentation transcript:

Entity-Relationship Query over Wikipedia Xiaonan Li 1, Chengkai Li 1, Cong Yu 2 1 University of Texas at Arlington 2 Yahoo! Research International Workshop on Search and Mining User-Generated International Conference on Information and Knowledge Management Toronto, Ontario, Canada, 2010

2 Motivation A business analyst is investigating the development of Silicon Valley. She is looking for: Silicon Valley Companies founded by Stanford graduates Google Yahoo! HP Jerry Yang Larry Page David Packard Yahoo! David Filo

3 Larry Page Jerry Yang Dick Price David Filo... page reading Pain with Search Engine stanford graduate pages Search silicon valley company Search pages Paypal HP Yahoo! Googl e... confirmed as a right answer Search found pages

4 Entity-Relationship Query Entity Variable (with type constraints) ● Entity-Centric ● Declarative ● Structured ● Keyword-based SELECT x FROM PERSON AS x stanford graduate Search WHERE x:[“stanford” “graduate”]... p1 Relation Predicate (2 or more variables) Jerry Yang found Yahoo Search AND x,y:[“found”] p3 Selection Predicate (one variable) silicon valley company Search AND y:[“silicon valley”] p2, y, COMPANY AS y

5 x:[“Stanford” “graduate”]... p1 y:[“Silicon Valley”] p2 x,y:[“found”] p3 is a query answer, if Query Answer Stanford University graduates Jerry Yang and … … a senior manager at Yahoo! in Silicon Valley. Jerry Yang co-founded Yahoo!. co-occurrence contexts as evidence

6 Type in the query Browse the answers

7 Ranking The predicate score is aggregated from contexts / evidence 0.8 * 0.7 * 0.8 = 0.448

8 Predicate Scoring Baseline 1 [COUNT]: number of co-occurrence contexts However, contexts are different from each other. We exploit positional features for refined evaluation of contexts. ordering patternproximitymutual exclusion

9 Feature 1: Proximity s1: Stanford University graduates Jerry Yang and … prox(Jerry Yang, s1) = ( ) / 5 = 0.8 Higher proximity indicates more reliable evidence Baseline 2 [PROX]: weight each contexts by proximity x:[“Stanford” “graduate”]... p1

10 Feature 2: Ordering Pattern x~PERSON, s~Stanford, g~graduate 6 patterns: xsg,xgs, sxg, gxs, sgx, gsx A professor at Stanford University, Colin Marlow had a relationship with Cristina Yang before she graduated... (3 times) Stanford University graduates Jerry Yang and … (4 times) … Stanford graduates Larry Page … (2 times) f(sgx)=(4+2)/(4+2+3)=0.67 f(sxg)=(3)/(4+2+3)=0.33 Frequent patterns indicate reliable evidence.

11 Feature 3: Mutual Exclusion s4: After Ric Weiland graduated from Stanford University, Paul Allen and Bill Gates hired him in Assumption: only one of the colliding patterns is effective. xgs Ric Weiland gsx Paul Allen, Bill Gates colliding patterns Patten Entities The most proximate entity as representative xgs gsx Which one?

12 Feature 3: Mutual Exclusion (2) s4: After Ric Weiland graduated from Stanford University, Paul Allen and Bill Gates hired him in # context (Ric Weiland) = 4 # context (Paul Allen) = 2 credit (xgs, s4) = 4 / (4+2) = 0.67 credit (gsx, s4) = 2 / (4+2) = 0.33 The pattern represented by more prominent entity (thus higher credit) is more likely to be effective. Baseline 3 [MEX]: weight each contexts by credit (effectiveness) (for contexts without collision, the credit(o,s)=1)

13 Predication Scoring (cont.) Bounded Cumulative Model (BCM) : integrating all features ordering pattern proximitymutual exclusion

14 Experiment: Data Set Data Set 2 million Wikipedia articles 10 predefined types (PERSON, COMPANY, NOVEL, etc.) 0.75 million entities (a subset of articles) 100 million entity occurrences (links to entities) Two Query Sets INEX17 – adapted from topics in INEX09 Entity Ranking track Single-11, Multi-6 OWN28 – manually created Single-16, Multi-12 Pride and Prejudice Categories: 1913 novels | British novels Pride and Prejudice is a novel by Jane Austen...

15 State-of-the-art: EntityRank(ER) [Cheng et al. VLDB07] A probabilistic model Only uses proximity feature (in a different way) Only handle queries similar to our single-predicate querys SELECT x FROM PERSON AS x WHERE x:[“stanford” “graduate”] We use it for computing predicate scores in our structured query model

16 Experiment: Results (nDCG and MAP) ● BCM has the best performance ● The advantage even more clear for multi-predicate queries. single-predicate queries multi-predicate queries

17 Experiment: Results precition-at-k BCM EntityRank ● BCM is consistently the best ● EntityRank is close to BCM at top 2, but degrades quickly

18 Related Work (1) IE-Based Approach ● Pre-extract information into database for query ● Still a huge challenge to extract all information ● Un-extracted information is lost and unavailable for query [TextRunner] Etzioni et al. Open information extraction from the Web. Communication ofACM, 51(12):68–74, [DBPedia] Auer et al. DBpedia: A nucleus for a Web of open data. In Int.l Semantic Web Conf., [Yago] Suchanek. YAGO: a core of semantic knowledge unifying WordNet and Wikipedia. In WWW, 2007.

19 Related Work (2) IR-Based Approach (ERQ belongs to this approach) ● Search directly in corpus ● All information is pristine and is available for query ● ProxSearch and EntityRank only handle queries resembling our single-predicate query ● CSAW focuses on HTML tables [ProxSearch] Chakrabarti et al. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In WWW, [EntityRank] Cheng et al. EntityRank: searching entities directly and holistically. VLDB [CSAW] Limaye et al. Annotating and Searching Web Tables Using Entities, Types and Relationships. VLDB 2010.

20 Demo