Presentation is loading. Please wait.

Presentation is loading. Please wait.

Entity Resolution with Evolving Rules

Similar presentations


Presentation on theme: "Entity Resolution with Evolving Rules"— Presentation transcript:

1 Entity Resolution with Evolving Rules
Youzhong Ma Lab of WAMDM

2 Outline Motivations ER Related concepts ER properties Conclusions

3 Entity Resolution background

4 Entity Resolution background

5 Naïve ER Approach Vs. New Approach

6 Outline Motivations ER Related concepts ER properties Conclusions

7 ER Related concepts Suppose market A will merge market B
They have to combine their customers The same person may occur in two markets’ customer DB, but some attributes are different How to deal with it?

8 ER Rule Boolean functions
determines if two records represent the same entity: true or false. Distance functions How different(similar) the records are.

9 ER Example

10 ER procedure The Evolving rule approach only works if
the ER algorithm satisfies Certain properties and B2 is Stricter than B1. So one contribution of this paper is to exploit Under what conditions and for what ER algorithms Are incremental approaches feasible? original records set S = {r1,r2,r3,r4} ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname  E1 = {{r1,r2,r3},{r4}} (6 comps) 6 comps Naïve approach 3 comps Evolving rule B2: Pname ∧ Pzip  E2 = {{r1,r2},{r3},{r4}}

11 Materialization! original records set S = {r1,r2,r3,r4}
ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} (6 comps) Pname  Ename = {{r1,r2,r3},{r4}} Pzip  Ezip = {{r1,r2},{r3},{r4}} 3comps B2: Pname ∧ Phone  E2 ={{r1},{r2,r3},{r4}}

12 Outline Motivations ER Related concepts ER properties Conclusions

13 Two important properties for ER algorithms that enable efficient rule evolution for match-based clustering Rule Monotonicity(RM) Context Free(CF)

14 Pname ∧ Pzip ≤ Pname

15 Rule Monotonicity(RM)
B1: Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} B2:Pname  E2 = {{r1,r2,r3},{r4}}

16 Context Free (CF)

17 Existing properties in literature
General Incremental VS. Context Free Order independent VS. Rule Monotonicity An ER algorithm is order independent if the ER result is same regardless of the order of the records processed.

18

19

20

21 experiments

22 Outline Motivations ER Related concepts ER properties Conclusions

23 conclusions Propose a new ER approach with evolving rules
Exploiting the properties (RM、CF) of the ER algorithms that enable efficient rule evolution Providing guidance to the ER algorithms designer

24 Some problems How are the comparision rules generated?
How to design the ER Algorithms that hold the RM and CF properties? How to Implement the ER algorithms in MapReduce framework?

25 Thanks to everyone of Web Group sincerely

26 Thank You !


Download ppt "Entity Resolution with Evolving Rules"

Similar presentations


Ads by Google