Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan
Data Exchange “Scientia potentia est” What is Data Exchange?: The process of taking data built under a source schema and transforming it into data built under a target schema Data Exchange is the restructuring of data
Data Exchange – why? 1.Today when companies merge they also merge information sources.
Data Exchange – why? 2. When several institutions are working on a joint venture – a combined database is
Data Exchange – why? 3. Refreshing and updating data base scheme
Few problems with data exchange 1.The labels in the Source Schema and the values Target Schema could be very different 2.Data could be kept in a plethora of ways For instance: Car price could be stored in Shekels and in U.S dollars 3. Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well
Data Exchange In the past Data Exchange was done manually, taking many resources such as time and money. Many researchers struggle with ways of improving data exchange
LocationList-priceAutomobi le Seniorit y Agent- name Belfast, NR650000Morris 82Gerry Adams Newry, NR500000Bentley Mark V 1Martin McGuiness IdNameCarmodelComm ission 48Nigel DoddsVauxhall Ian PaisleyFordT0.04 Schema Clunkers –R-Us Schema Buy-A-Wreck cars Car AGENTS Clunker table Antique Car Dealership CarModelpriceAgent-id Vauxhall14360,00048 FordModel T430,00066
Schema Clunkers –R-Us Schema Buy-A-Wreck Name Nigel Dodds Ian Paisley Agent- name Nigel Dodds Ian Paisley Matching Examples
Carmodel Vauxhall14 FordT Automobile Vauxhall 14 Ford T Schema Clunkers –R-Us Schema Buy-A-Wreck Matching Examples
Car type priceAgent-id Vauxh all 14360,00048 FordModel T 430,00066 IdCommission Schema Buy-A-Wreck cars Car AGENTS List-priceCar model Vauxhall Ford Model T Schema Clunkers –R-Us
Creating mappings: 1.schema matching: find matches 2.create query expressions: for automated data translation or exchange How do we match? Schema Matching Create Query expressions
Data Exchange 1.There may be no way to transform an instance given all of our constraints. 2. There may be numerous ways to transform the instance (possibly infinitely many). 3.We must identify and justify a best suited choice of solutions for our need.
S T Source schema S Target schema T Data Exchange - Summery To conclude: 1. Data exchange is exchanging data from a Source Schema to a Target Schema 2.It is a greatly dealt problem in the computerized world 3. Some Data exchange scenarios deal with Metadata
What is Metadata? Metadata: Data on Data. Metadata can come as: Video Audio Image Text
Why Do we need Meta – Data? Meta-Data helps us to understand data Can anyone tell what these numbers mean? Jan Feb
Why Do we need Meta – Data? Umbrella Sales Month USA UK Italy Jan Feb After adding Meta-Data…
Why Do we need Meta – Data? We all know this picture…
Why Do we need Meta – Data? What is this picture all about?
Why Do we need Meta – Data? Sir Edward Carson signing the Ulster Covenant
Why Do we need Meta – Data?
Wall Street, New York City, New York.
23 Data exchange scenarios may involve metadata transformations. Data-Metadata Translations Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world.
Data-Metadata Translations Mapping systems support Data-to-Data transformations with fixed schemas (Clio). Goal: Extend mapping systems to support Data-Metadata Translations.
Data Exchange Clio One software developed for simple graphic data exchange is “Clio” Clio corresponded values between the source scheme and the target scheme However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata the solution involving Metadata is based on Clio
Clio interface
27 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 1 “USA”
28 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “UK” and $t.units = $s.UK Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 2 “UK”
29 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “UK” and $t.units = $s.UK m 3 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “Italy” and $t.units = $s.Italy Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 3 “Italy”
30 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value Select the elements to group Placeholder Copy elements’ values Copy elements’ labels Source.Sales Jan Feb Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Set of labels (strings) Dynamic selection of the source element Is a label value for $s in Source.Sales, {“USA”, “UK”, “Italy”} $c in {“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and $ = $c and $t.units = $s.($c) MetadatA-Data (MAD) mapping: Metadata-to-Data: Our solution
31 Target: Rcd Stockquotes: SetOf Rcd time symbols label value Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic element Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Run-time: The dynamic element defines the target instance and the target schema. Data-to-Metadata
StockTicker (time: 0900, Symbol : MSFT, Price: ) StockTicker (time: 0900, Symbol : IBM, Price: ) StockTicker (time: 0905, Symbol : MSFT, Price: ) There are two possible interpretations for the target ndos: Consider this mapping and this source instance: Stockquotes (time: 0900, MSFT: ) Stockquotes (time: 0900, IBM: ) Stockquotes (time: 0905, MSFT: ) Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM Computed Target Instance Source Instance First alternative: Heterogeneous target records Computed Target Schema Data-to-Metadata: Heterogeneous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price
Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price StockTicker (time: 0900, Symbol : MSFT Price: ) StockTicker (time: 0900, Symbol : IBM Price: ) StockTicker (time: 0905, Symbol : MSFT Price: ) There are two possible interpretations for the target ndos: Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Computed Target Instance Source Instance Computed Target Schema Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null, IBM: ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null ) Second alternative: Homogeneous target records
34 The Homogenous approach is a MAD improvemnet Stockquotes (time: 0900, MSFT : 27.20, IBM: null ) Stockquotes (time: 0900, MSFT : null, IBM: ) Stockquotes (time: 0905, MSFT : 27.30, IBM: null ) Homogeneity Constraint: “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” Stockquotes (time: 0900, MSFT : ) Stockquotes (time: 0900, IBM : ) Stockquotes (time: 0905, MSFT : ) Natural solution for semi- structured data models (XSD, DTD, JSON) Data-to-Metadata: Homogenous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols symbols label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price
MAD Mapping MetadatA-Data(MAD) mapping three steps: 1.Preliminary mapping How do we map the Source schema to the Target schema Preliminary mapping for > includes the metadata label and the value label of >.
36 Source: Rcd SalesByCountries : SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value { $x 1 Source.SalesByCountries, $x 2 >; $x 3 =$x 1.($x 2 ) } Target.Sales month USA UK Italy Jan Feb Source.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Preliminary Mapping Label Value Transfer
37 MAD Mapping 2.Skeletons: n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i,j) can be potential mapping. 3.Creating MAD Mapping: At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons. Matched against one or more source mappings Matched against one or more target mappings
Source.SalesByCountries. > Source.SalesByCountries.& > Target.Sales.units MAD Mapping Generation Example Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value Source : { $x 1 Source.SalesByCountry, $x 2 >; $x 3 :=$x 1.($x 2 ) } Target : { $y 1 Target.Sales}
Source schema S Target schema T Declarative (internal) representation GUI XSLTJava Executable code (XSLT, XQuery, Java) New construct to iterate over elements’ labels: placeholder Target schema can be incomplete: nested dynamic output schema (ndos) New mapping & query generation algorithms Data exchange with data-metadata support: Data to Data is a special case MAD vs Clio
40 Fin.