Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uncertainty in Data Integration Ai Jing 2007-11-10.

Similar presentations


Presentation on theme: "Uncertainty in Data Integration Ai Jing 2007-11-10."— Presentation transcript:

1 Uncertainty in Data Integration Ai Jing 2007-11-10

2 Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

3 Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

4 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

5 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

6 Traditional Data Integration Systems SELECT P.title AS title, P.year AS year, A.name AS author FROM Author, Paper, AuthoredBy WHERE Author.aid = AuthoredBy.aid AND Paper.pid = AUthoredBy.pid Q Q1Q1 Q2Q2 Q3Q3 Q4Q4 Q5Q5

7 Uncertainty Can Occur at Three Levels in Data Integration Applications III. Query Level II. Mapping Level I. Data Level Focus of the paper: Probabilistic schema mappings

8 Example Probabilistic Mappings T(name, email, mailing-addr, home-addr, office-addr) S(pname, email-addr, current-addr, permanent-addr) T(name, email, mailing-addr, home-addr, office-addr) S(pname, email-addr, current-addr, permanent-addr) T(name, email, mailing-addr, home-addr, office-addr) S(pname, email-addr, current-addr, permanent-addr) m1: 0.5 m2: 0.4 m3: 0.1

9 Top-k Query Answering w.r.t. Probabilistic Mappings Mediated Schema Q: SELECT mailing- addr FROM T 0.50.40.1 Q1: SELECT current-addr FROM S Q2: SELECT permanent-addr FROM S Q3: SELECT email-addr FROM S

10 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

11 Definition of probabilistic mappings Schema Mapping Probabilistic Mapping S=(pname, email-addr, home-addr, office-addr) T=(name, mailing-addr) one-to-one schema matching have exact knowledge of mapping S=(pname, email-addr, home-addr, office-addr) T=(name, mailing-addr) 1.0 0.1 0.5 0.4

12 By-Table Semantics DT=DT= m 0.5

13 By-Tuple Semantics DT=DT= Pr( )=0.05 …

14 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

15 By-Table Query Answering

16 By-Tuple Query Answering

17 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

18 Complexity of query answering

19 More on By-Tuple Query Answering The high complexity comes from computing probabilities the number of mapping sequences is exponential in the size of the input data n tuples, m mappings m^n mapping sequences There are two subsets of queries that can be answered in PTIME by query rewriting SELECT mailing-addr FROM T SELECT mailing-addr FROM T,V WHERE T.mailing-addr = V.hightech In general query answering cannot be done by query rewriting One of Dt

20 Extensions to More Expressive Mappings The complexity results for query answering carry over to three extensions to more expressive mappings Complex mappings GLAV mappings Conditional mappings:

21 Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

22 Definition of probabilistic mappings Semantics: by-table v.s. by-tuple Complexity of query answering

23 Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

24 Overview of MUD 2007 Theory A New Language and Architecture to Obtain Fuzzy Global Dependencies A New Language and Architecture to Obtain Fuzzy Global Dependencies About the Processing of Division Queries Addressed to Possibilistic Databases About the Processing of Division Queries Addressed to Possibilistic Databases Making Aggregation Work in Uncertain and Probabilistic Databases Application Making Aggregation Work in Uncertain and Probabilistic Databases Application Materialized Views in Probabilistic Databases Application Flexible matching of Ear Biometrics Consistent Joins Under Primary Key Constraints

25 A New Language and Architecture to Obtain Fuzzy Global Dependencies SQL does not satisfy the minimum requirements to be true DM language A New Language: dmFSQL (data mining Fuzzy Structured Query Language) Fuzzy Database Data mining

26 About the Processing of Division Queries Addressed to Possibilistic Databases They devised a data model which is a strong representation system for operations in possibilistic databases A possibilistic databases D can be interpreted as a weighted disjunctive set of regular databases Division Queries

27 Making Aggregation Work in Uncertain and Probabilistic Databases Trio is a prototype database management system for storing and querying data with uncertainty and lineage Trio s query language TriQL Trio data model and query semantics Aggregation function in the Trio system for uncertain and probabilistic data

28 Materialized Views in Probabilistic Databases Materialized Views for probabilistic may not define a unique probability distribution view representation Answer queries on large probabilistic data set more efficiently with materialized views

29 Flexible matching of Ear Biometrics Research area Image Recognition (or Identification) Scenario identifying found bodies in a large-scale disaster Challenge fast and cheap identification no DNA-databases or fingerprint databases are at hand

30 Consistent Joins Under Primary Key Constraints Inconsistent database primary key will the natural join of the repaired relations always be nonempty, no matter which tuples are selected? game theory, winning strategy

31 Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

32 No perfect data Noise Dirty Redundancy …… No perfect solution Web data extraction Interface integration ……

33 Uncertainty in Deep Web Data Integration(1) Robust Evaluable

34 Uncertainty in Deep Web Data Integration(2) Tuning Feedback Evaluable

35 Uncertainty in Jobtong(1) Data level

36 Uncertainty in Jobtong(2) Query level How can we give every result a probability to show it s importance?

37 Uncertainty in Jobtong(3) The automatic maintenance of configuration files /html/body//table/tr[@class='nob'] 2 title td[2]/a/span company td[3]/a/span /html/body//table/tr[@class='list2' or @class='list3'] 2 title td[2]/a company td[3]/a

38 Q&A Thank you!


Download ppt "Uncertainty in Data Integration Ai Jing 2007-11-10."

Similar presentations


Ads by Google