Entity Resolution with Evolving Rules

Slides:



Advertisements
Similar presentations
An improved on-the-fly tableau construction for a real-time temporal logic Marc Geilen 12 July 2003 /e.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
McGill University School of Computer Science Ph.D. Candidate in the Modelling, Simulation and Design Lab MPM’09 Explicit Transformation Modelling Thomas.
ICS 624 Spring 2011 Entity Resolution with Evolving Rules Preface to Steven Whang’s slides Asst. Prof. Lipyeow Lim Information & Computer Science Department.
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Computer Science 101 Circuit Design Algorithm. Circuit Design - The Problem The problem is to design a circuit that accomplishes a specified task. The.
Detection and Resolution of Anomalies in Firewall Policy Rules
Li Yi, APSEC ‘12 Constructing Feature Models Us­­ing a Cross-Join Merging Operator.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters H.Yang, A. Dasdan (Yahoo!), R. Hsiao, D.S.Parker (UCLA) Shimin Chen Big Data.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Miin-Shen Yang a*, Wen-Liang Hung b, De-Hua Chen a 2012, FSS Self-organizing map.
Program Correctness. 2 Program Verification An object is a finite state machine: –Its attribute values are its state. –Its methods optionally: Transition.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Kuo-Hua Wang, Chung-Ming Chan, Jung-Chang Liu Dept. of CSIE Fu Jen Catholic University Slide: Chih-Fan Lai Simulation and SAT-Based Boolean Matching for.
Generic Entity Resolution: Identifying Real-World Entities in Large Data Sets Hector Garcia-Molina Stanford University Work with: Omar Benjelloun, Qi Su,
Confidential CFR Part 11 Public Meeting The Role of the Technology Provider in the Pharmaceutical Industry Jean Paty, Ph.D. Co-founder.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Information Integration Entity Resolution – 21.7 Presented By: Deepti Bhardwaj Roll No: 223_103.
COMP 170 L2 L08: Quantifiers. COMP 170 L2 Outline l Quantifiers: Motivation and Concepts l Quantifiers: Notations and Meaning l Saying things with Quantified.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Properties Incompleteness Evaluation by Functional Verification IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 4, APRIL
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Computer Systems Laboratory Stanford University Clark W. Barrett David L. Dill Aaron Stump A Framework for Cooperating Decision Procedures.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Model Checking Early Requirements Specifications in Tropos Presented by Chin-Yi Tsai.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Detecting Web Attacks Using Multi-Stage Log Analysis
The Role of the Technology Provider in the Pharmaceutical Industry
Presented by Niwan Wattanakitrungroj
Finding Replicated web collections
Chapter 7. Propositional and Predicate Logic
Supporting Ranking and Clustering as Generalized Order-By and Group-By
CS 540 Database Management Systems
Data Science Algorithms: The Basic Methods
CS 440 Database Management Systems
CCNT Lab of Zhejiang University
Liability in a Personal Injury Accident
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
On Efficient Graph Substructure Selection
Association Rule Mining
Towards Automatic Model Synchronization from Model Transformation
Open-Category Classification by Adversarial Sample Generation
Topic Oriented Semi-supervised Document Clustering
Seminar Title By Name of the Candidate A Seminar on
Nanjing University of Aeronautics and Astronautics
Metadata The metadata contains
Chapter 7. Propositional and Predicate Logic
Software requirements
Leverage Consensus Partition for Domain-Specific Entity Coreference
Design tools and techniques for a relational database system
Paper ID: XX Track: Track Name
Pei Lee, ICDE 2014, Chicago, IL, USA
Gradient Domain Salience-preserving Color-to-gray Conversion
Evaluating Entity Resolution Results
Applying principles of computer science in a biological context
Enhancing ER Diagrams to View Data Transformations Computed with Queries Carlos Ordonez, Ladjel Bellatreche UH (USA), ENSMA (France) 1.
Centre for Technology Alternatives for Rural Areas, IIT Bombay
Probabilistic Ranking of Database Query Results
Title of Article First Author: Second Author: Third Author:
Presentation transcript:

Entity Resolution with Evolving Rules Youzhong Ma 2010-9-25 Lab of WAMDM

Outline Motivations ER Related concepts ER properties Conclusions

Entity Resolution background

Entity Resolution background

Naïve ER Approach Vs. New Approach

Outline Motivations ER Related concepts ER properties Conclusions

ER Related concepts Suppose market A will merge market B They have to combine their customers The same person may occur in two markets’ customer DB, but some attributes are different How to deal with it?

ER Rule Boolean functions determines if two records represent the same entity: true or false. Distance functions How different(similar) the records are.

ER Example

ER procedure The Evolving rule approach only works if the ER algorithm satisfies Certain properties and B2 is Stricter than B1. So one contribution of this paper is to exploit Under what conditions and for what ER algorithms Are incremental approaches feasible? original records set S = {r1,r2,r3,r4} ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname  E1 = {{r1,r2,r3},{r4}} (6 comps) 6 comps Naïve approach 3 comps Evolving rule B2: Pname ∧ Pzip  E2 = {{r1,r2},{r3},{r4}}

Materialization! original records set S = {r1,r2,r3,r4} ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} (6 comps) Pname  Ename = {{r1,r2,r3},{r4}} Pzip  Ezip = {{r1,r2},{r3},{r4}} 3comps B2: Pname ∧ Phone  E2 ={{r1},{r2,r3},{r4}}

Outline Motivations ER Related concepts ER properties Conclusions

Two important properties for ER algorithms that enable efficient rule evolution for match-based clustering Rule Monotonicity(RM) Context Free(CF)

Pname ∧ Pzip ≤ Pname

Rule Monotonicity(RM) B1: Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} B2:Pname  E2 = {{r1,r2,r3},{r4}}

Context Free (CF)

Existing properties in literature General Incremental VS. Context Free Order independent VS. Rule Monotonicity An ER algorithm is order independent if the ER result is same regardless of the order of the records processed.

experiments

Outline Motivations ER Related concepts ER properties Conclusions

conclusions Propose a new ER approach with evolving rules Exploiting the properties (RM、CF) of the ER algorithms that enable efficient rule evolution Providing guidance to the ER algorithms designer

Some problems How are the comparision rules generated? How to design the ER Algorithms that hold the RM and CF properties? How to Implement the ER algorithms in MapReduce framework?

Thanks to everyone of Web Group sincerely

Thank You !