Disambiguation Algorithm for People Search on the Web Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Rabia Nuray-Turan, Naveen Ashish For questions.

Slides:



Advertisements
Similar presentations
A Unified Framework for Context Assisted Face Clustering
Advertisements

Optimizing search engines using clickthrough data
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Exploiting Relationships for Object Consolidation Zhaoqi Chen Dmitri V. Kalashnikov Sharad Mehrotra Computer Science Department University of California,
Exploiting Relationships for Object Consolidation Zhaoqi Chen Dmitri V. Kalashnikov Sharad Mehrotra Computer Science Department University of California,
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Liyan Zhang, Ronen Vaisenberg, Sharad Mehrotra, Dmitri V. Kalashnikov Department of Computer Science University of California, Irvine This material is.
ADVISE: Advanced Digital Video Information Segmentation Engine
Interfaces for Selecting and Understanding Collections.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Adaptive Graphical Approach to Entity Resolution Dmitri V. Kalashnikov Stella Chen, Dmitri V. Kalashnikov, Sharad Mehrotra Computer Science Department.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotra University of California,
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CS 512 Jan 18, 2010 WinaCS Project Web Entity Extraction and Mapping Discovering.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov Rabia Nuray-Turan Sharad Mehrotra Dept of Computer.
Web People Search via Connection Analysis Authors : Dmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, and Rabia Nuray-Turan From : IEEE Trans.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Introduction to Information Retrieval CS 5604: Information Storage and Retrieval ProjCINETViz by Maksudul Alam, S M Arifuzzaman, and Md Hasanuzzaman Bhuiyan.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Lesley Charles November 23, 2009.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Search Engines.
____________________________ XML Access Control for Semantically Related XML Documents & A Role-Based Approach to Access Control For XML Databases BY Asheesh.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.
Algorithmic Detection of Semantic Similarity WWW 2005.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM Submitted in partial fulfillment of requirement for the V Sem MCA Mini Project Under Visvesvaraya Technological.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Your caption here POLYPHONET: An Advanced Social Network Extraction System from the Web Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Hansheng Xue School of Computer Science and Technology
IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Computer Science Department University of California, Irvine
Shortest-Paths Trees Kun-Mao Chao (趙坤茂)
Disambiguation Algorithm for People Search on the Web
Self-tuning in Graph-Based Reference Disambiguation
Web Information retrieval (Web IR)
Lab 2: Information Retrieval
Presentation transcript:

Disambiguation Algorithm for People Search on the Web Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Rabia Nuray-Turan, Naveen Ashish For questions visit: Computer Science Department University of California, Irvine

2 Entity (People) Search Person1 Person2 Person3 Unknown beforehand Top-K Webpages

3 Standard Approach to Entity Resolution

4 Key Observation: More Info is Available =

5 RelDC Framework

6 Where is the Graph here? Use Extraction!

7 Overall Algorithm Overview 1. User Input. A user submits a query to the middleware via a web-based interface. 2. Web page Retrieval. The middleware queries a search engine’s API, gets top-K Web pages. 3. Preprocessing. The retrieved Web pages are preprocessed: a)TF/IDF. Preprocessing steps for computing TF/IDF are carried out. b)Ontology. Ontologies are used to enrich the Webpage content. c)Extraction. Named entities, and web related information is extracted from the Webpages. 4. Graph Creation. The Entity-Relationship Graph is generated 5. Enhanced TF/IDF. Ontology-enhanced TF/IDF values are computed 6. Clustering. Correlation clustering is applied 7. Cluster Processing. Each resulting cluster is then processed as follows: a)Sketches. A set of keywords that represent the web pages within a cluster is computed for each cluster. The goal is that the user should be able to find the person of interest by looking at the sketch. b)Cluster Ranking. All cluster are ranked by a choosing criteria to be presented in a certain order to the user c)Web page Ranking. Once the user hones in on a particular cluster, the Web pages in this cluster are presented in a certain order, computed on this step. 8. Visualization of Results. The results are presented to the user in the form of clusters (and their sketches) corresponding to namesakes and which can be explored further.

8 Correlation Clustering In CC, each pair of nodes (u,v) is labeled –with “+” or “-” edge –labeling is done according to a similarity function s(u,v) Similarity function s(u,v) –if s(u,v) believes u and v are similar, then label “+” –else label “-” –s(u,v) is typically trained from past data Clustering –looks at edges –tries to minimize disagreement –disagreement for element x placed in cluster C, is a number of “-” edges that connect x and other elements in C

9 Similarity Function Connection strength between u and v: –where c k – the number of u-v paths of type k –and w k – the weigh of u-v paths of type k Similarity s(u,v) is a combination

10 Training s(u,v) on pre-labeled data

11 Experiments: Quality of Disambiguation By Artiles, et al. in SIGIR’05 By Bekkerman & McCallum in WWW’05

12 Experiments: Effect on Search