Download presentation
Presentation is loading. Please wait.
Published byHelena Walton Modified over 8 years ago
1
Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan
2
Data Exchange ST database Data may be stored in different databases. Each database has its own schema. We are interested in representing data of one schema in terms of other schema. Source schema S Target schema S
3
Data Exchange Problem Major data exchange problem is dealing with translations algorithm between schemas: Given source and target schema in priori. Given dependencies specified between two schemas Source schema S Target schema T Translation algorithm which generates good mapping to low level language (XQuery,XSLT). good mapping saves constrains,dependencies,data. ∑
4
The Data Exchange Problem Low-level mapping (Queries) ST Source schema S Target schema S High-level mapping (Dependencies) Mapping Algorithm How to restructure data from a source schema to a target schema, according to a given visual specification?
5
Data Exchange Model Visual specification is a schema representation of table content and relations structure, root, constrains. XML, DTD, Relational Source: Rcd Sales: SetOfRcd country region style shipdate units price Nested Relational(NR) model: Source.Sales country region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000
6
Includes atomic types α,set types setOf[α] and Rcd[α 1,…α k ] dynamic and placeholders in case of metadata Constraints conformation expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Nested Relational (NR) model
7
To perform translation, we must understand how two schemas correspond to each other. Simplest form of correspondence - value correspondence. Value Correspondence is a pair of source element and target element. Mapping Problem - Example Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id Source: Rcd Sales: SetOf Rcd country region style shipdate units price
8
Mapping Generation Algorithm: –Input: Source and Target schemas, and correspondences. –Output: declarative schema mapping For example: Mapping Example Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id for $s in Source.Sales exists $t in Target.CountrySales, $c in $t.Sales where $t.country = $s.country and $c.style = $s.style and $c.shipdate = $s.shipdate and $c.units = $s.units
9
expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Nested Relational (NR) model statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year
10
expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Linked Translations statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year
11
Before performing schema mapping,correspondences should be interpreted semantically for source and target. Primary Paths,Constrains,Logical Relations. Semantic Translation Model expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj Semantic association are represented in two ways Attributes organization into tables. Attributes within different tables associated using foreign key dependencies.
12
expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Primary path Translation statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year
13
Constraints Translation expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Each constraint is of P 1 P 2 B where P 1 and P 2 are bodies of primary paths and B is an equality condition relating the two paths. for P 1 exists P 2 where B NRI - Nested Referential Integrity
14
Constraints Translation statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year Target schema constraints translation:
15
A logical relation is the result of chasing a primary path of a schema using its NRIs. The chase is a relational method that enumerates logical joins based on dependencies on schemas. Chasing the primary path S 2 using the constraint r 1 can be represented by: S 2 : select * from g in expenseDB.grants Logical Relation r 1 : for g in expenseDB.grants exists c in expenseDB.companies where c.company.cid= g.grant.grantee S` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, where c.company.cid= g.grant.grantee
16
Chase is ensuring to link all related attributes according to constraints. Logical Relation r 2 : for g in expenseDB.grants exists p in expenseDB.projects where p.project.name= g.grant.proj S` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, where c.company.cid= g.grant.grantee S`` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, p in expenseDB.projects where c.company.cid= g.grant.grantee and g.grant.proj= p.project.name
17
Logical Relations in our Example T1: select * from s in statDB T2: select * from s in statDB, o in s.cityStat.orgs T3: select * from s in statDB, o in s.cityStat.orgs, f in o.org.fundings, f in s.cityStat.financials where f.financial.aid= f.fund.aid T4: select * from s in statDB, f in s.cityStat.financials S 1 : select * from c in expenseDB.companies S 2 : select * from g in expenseDB.grants, c in expenseDB.companies, p in expenseDB.projects where c.company.cid= g.grant.grantee and p.project.name= g.grant.proj S 3 : select * from p in expenseDB.projects All logical relations for source and target schemas A 2 chased with r 1 and r 2. B 3 chased with r 3.
18
Mapping Algorithm Value correspondences between source and target schemas can be interpreted as simple referential constraints. V 1 uses the primary paths S 1 from source and T 2 from target. V 1 : for c in expenseDB.companies exists s in statDB, o in s.cityStat.orgs where c.company.cname= o.org.name v 2 : for g in expenseDB.grants exists s in statDB, o in s.cityStat.orgs, f in o.org.fundings where g.grant.pi= f.fund.pi expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj v1v1 v2v2 v3v3
19
Clio Mapping
20
Basic Data Exchange Mapping
21
21 Data exchange scenarios may involve metadata transformations. Data-Metadata Translations Mapping systems support Data-to-Data transformations with fixed schemas. Goal: Extend mapping systems to support Data- Metadata Translations.
22
22 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 1 “USA”
23
23 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 2 “UK”
24
24 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK m 3 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 3 “Italy”
25
25 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value Select the elements to group Placeholder Copy elements’ values Copy elements’ labels Source.Sales Jan 120 223 89 Feb 83 168 56 Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Set of labels (strings) Dynamic selection of the source element Is a label value for $s in Source.Sales, {“USA”, “UK”, “Italy”} $c in {“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and $t.country = $c and $t.units = $s.($c) MetadatA-Data (MAD) mapping: Metadata-to-Data: Our solution
26
26 Target: Rcd Stockquotes: SetOf Rcd time symbols label value Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic element Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Run-time: The dynamic element defines the target instance and the target schema. Data-to-Metadata
27
StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM, Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 ) There are two possible interpretations for the target ndos: Consider this mapping and this source instance: Stockquotes (time: 0900, MSFT: 27.20 ) Stockquotes (time: 0900, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30 ) Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM Computed Target Instance Source Instance First alternative: Heterogeneous target records Computed Target Schema Data-to-Metadata: Heterogeneous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price
28
Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price StockTicker (time: 0900, Symbol : MSFT Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT Price: 27.30 ) There are two possible interpretations for the target ndos: Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Computed Target Instance Source Instance Computed Target Schema Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null ) Second alternative: Homogeneous target records
29
29 Natural solution for the Relational data model Stockquotes (time: 0900, MSFT : 27.20, IBM: null ) Stockquotes (time: 0900, MSFT : null, IBM: 120.00) Stockquotes (time: 0905, MSFT : 27.30, IBM: null ) Homogeneity Constraint: “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” for $t1 in Target.Stockquotes, $t2 in Target.Stockquotes, $a in dom ($t1) exists $a’ in dom ($t2) where $a = $a’ Stockquotes (time: 0900, MSFT : 27.20 ) Stockquotes (time: 0900, IBM : 120.00 ) Stockquotes (time: 0905, MSFT : 27.30 ) Natural solution for semi- structured data models (XSD, DTD, JSON) Data-to-Metadata: Homogenous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price
30
30 Source.Sales country region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000 Data-to-Metadata Mapping Target: Rcd Target: Rcd ByShipdateCountry: SetOf Choice ByShipdateCountry: SetOf Choice dates dates label 1 label 1 value 1 : Rcd value 1 : Rcd countries countries label 2 label 2 value 2 : SetOf Rcd value 2 : SetOf Rcd style style units units price price Source: Rcd Sales: SetOf Rcd Sales: SetOf Rcd country country region region style style shipdate shipdate units units price price Tee 11 1200 Elec. 12 3600 Tee 10 1600 Tee 12 2000 Tee 11 1200 Elec. 12 3600 Tee 10 1600 Tee 12 2000
31
31 MAD Mapping MetadatA-Data(MAD) mapping three steps: 1.Tableaux Set of logical relations for source and target schemas with extended expressions of placeholders and dynamic elements >. Tableaux for > includes the metadata label and the value label of >. Source: Rcd SalesByCountries : SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value { $x 1 Source.SalesByCountries, $x 2 >; $x 3 =$x 1.($x 2 ) }
32
32 MAD Mapping 2.Skeletons: n x m matrix of skeletons is constructed for the set of source tableaux and the set of target tableaux while each entry(i,j) can be potential mapping. 3.Creating MAD Mapping: At this stage, the value correspondences need to be matched against the tableaux in order to factor them into the appropriate skeletons. Source.Sales.country Target.CountrySales.country Matched against one or more source tableaux Matched against one or more target tableaux
33
MAD Mapping A correspondence path p 1 is said to match an absolute path p 2 on tableaux if p 2 is a prefix of p 1. After a match has been found, we then replace the longest possible suffix of the correspondence path with a variable in the tableau. Source.Sales.style Target.CountrySales.Sales.style Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id { $y o Target.CountrySales, $y 1 $y 0.Sales } Target tableaux { $x Source.Sales } Source tableaux $x.style = $y 1.style
34
Source.SalesByCountries. > Target.Sales.country Source.SalesByCountries.& > Target.Sales.units MAD Mapping Generation Example Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value Source Tableaux: { $x 1 Source.SalesByCountry, $x 2 >; $x 3 :=$x 1.($x 2 ) } Target Tableaux: { $y 1 Target.Sales} for $x 1 in Source.SalesByCountry; $x 2 >; exists $y1 in Target.Sales where $y 1.month = $x 1.month and $y 1.country = $x 2 and $y 1.units = $x 1.($x 2 )
35
35 for $s in Source.Sales exists $t in Target.ByShipdateCountry, $y in dates , $u in case $t of $y, $z in countries , $v in $u.($z) where $y = $s.shipdate and $z= $s.country and $v.style = $s.style and $v.units = $s.units and $v.price = $s.price for $s in Source.Sales exists $t in Target.ByShipdateCountry, $y in dates , $u in case $t of $y, $z in countries , $v in $u.($z) where $y = $s.shipdate and $z= $s.country and $v.style = $s.style and $v.units = $s.units and $v.price = $s.price MAD Mapping Generation Target: Rcd Target: Rcd ByShipdateCountry: SetOf Choice ByShipdateCountry: SetOf Choice dates dates label 1 label 1 value 1 : Rcd value 1 : Rcd countries countries label 2 label 2 value 2 : SetOf Rcd value 2 : SetOf Rcd style style units units price price Source: Rcd Sales: SetOf Rcd Sales: SetOf Rcd country country region region style style shipdate shipdate units units price price This is what we get from Clio [PVMHF 02] 1.Modify schemas with dynamic placeholders 2.Compile mappings and match correspondences.
36
36 Formal MAD Algorithm
37
37 Formal MAD Algorithm
38
38 Source schema S Target schema T Declarative (internal) representation GUI XSLTJava Executable code (XSLT, XQuery, Java) New construct to iterate over elements’ labels: placeholder Target schema can be incomplete: nested dynamic output schema (ndos) New constructs for the mapping language New mapping & query generation algorithms Including a query to generate the target schema. Data exchange with data-metadata support: Data to Data is a special case MAD vs Clio
39
39 Lots of related work in the relational setting: –FIRA/FISQL [Wyss,Robertson 2005] has an excellent survey. –SchemaSQL [Lakshmanan,Sadri,Subramanian 1996], FIRA/FISQL [Wyss,Robertson 2005] Extensions to SQL to handle metadata as data Only relational dynamic output schemas Language and semantics, NO transformations from GUI –Works about checking chase finite/infinite loop. Some Related Work
40
40 Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.