Mapping Data to Queries Martin Hentschel Systems Group, ETH Zurich
Martin Hentschel/Systems Group, ETH “…, but the real advantage of XML is precisely that it allows you to go from Point A to destinations unknown.” -- Larry O’Brien, Microsoft 2
Martin Hentschel/Systems Group, ETH 3 Goals Integrate data from various data feeds Light-weight Easy to use Fast
Martin Hentschel/Systems Group, ETH 4 Goals Integrate data from various data feeds Light-weight Mapping rules Easy to use Based on common language (XQuery) Fast Implements research ideas (YFilter)
Martin Hentschel/Systems Group, ETH Targets Health care Electronic health records (Health Level 7) Finance Exchange of financial data (xBRL) Web services News feeds Weather Every domain which uses several data sources 5
Martin Hentschel/Systems Group, ETH Example Find the most powerful car 6 Ford 130 Ford 130 VW Golf 150 VW Golf 150
Martin Hentschel/Systems Group, ETH Example Find the most powerful car 7 Ford 130 Ford 130 VW Golf 150 VW Golf 150 datenis-adb; autois-acar; psis-ahp; datenis-adb; autois-acar; psis-ahp;
Martin Hentschel/Systems Group, ETH Example Find the most powerful car Apply standard XQuery 8 Ford 130 Ford 130 VW Golf 150 VW Golf 150 datenis-adb; autois-acar; psis-ahp; datenis-adb; autois-acar; psis-ahp; let $max := max(//hp) for $car in //car where $car/hp = $max return $car let $max := max(//hp) for $car in //car where $car/hp = $max return $car
Martin Hentschel/Systems Group, ETH Example Find the most powerful car Apply standard XQuery 9 Ford 130 Ford 130 VW Golf 150 VW Golf 150 datenis-adb; autois-acar; psis-ahp; datenis-adb; autois-acar; psis-ahp; let $max := max(//hp) for $car in //car where $car/hp = $max return $car let $max := max(//hp) for $car in //car where $car/hp = $max return $car VW Golf 150 VW Golf 150 Result
Martin Hentschel/Systems Group, ETH Usage Scenarios Continuous query processing 10 DSMS Querie s Rules Streaming Input Events Streaming Output Events
Martin Hentschel/Systems Group, ETH Usage Scenarios Publish/subscribe systems 11 Rules PublishersSubscribers Enhanced Broker Enhanced Broker Data Subscriptions Data
Martin Hentschel/Systems Group, ETH Usage Scenarios Data integration 12 Rules Source 1 Company‘s Data Store Data Source 2 Source x Homogeneous Data Handler Data Handler
Martin Hentschel/Systems Group, ETH The Is-A Rule Map XML elements Expresses a substitutability relationship Like in object oriented design Use the car wherever vehicles are expected It follows //vehicle also returns car elements Returned as car Not transformed into vehicle Consistent with OO-approach 13 car is-a vehicle;
Martin Hentschel/Systems Group, ETH The Is-A Rule Map path expressions XPath path expressions Left hand side may include predicates 14 german/car is-a auto; auto is-a german/car; german/car is-a auto; auto is-a german/car; < 100] is-a slow/vehicle; < 100] is-a slow/vehicle;
Martin Hentschel/Systems Group, ETH The Is-A Rule Specify contexts Element names could be used differently in different contexts Scope applicability of rules Further refinement 15 car in is-a auto; car in is-a auto;
Martin Hentschel/Systems Group, ETH The Is-A Rule Element construction Map elements Transform data, e.g. for Integration of very diverse data 16 auto as $a is-a {$a/ps * 0.74} ; auto as $a is-a {$a/ps * 0.74} ; Ford 100 Ford 100 VW Golf 150 VW Golf 150
Martin Hentschel/Systems Group, ETH Implementation Several possibilities MDQ approach -Native approach, novel MDQ data model -Allows lazy execution Query rewrite -E.g. //(car | auto | vehicle |...) -Does not scale Data translation -Translate input data -Big overhead 17
Martin Hentschel/Systems Group, ETH MDQ Data Model Classical XML tree model 18 Golf 150 Golf 150 auto psname „Golf“„150“ daten
Martin Hentschel/Systems Group, ETH MDQ Data Model MDQ data model Move names from nodes to edges 19 Golf 150 Golf 150 auto psname „Golf“„150“ daten
Martin Hentschel/Systems Group, ETH MDQ Data Model Application of mapping rules 20 Golf 150 Golf 150 auto ps name „Golf“„150“ daten daten is-a db; auto is-a car; ps is-a hp; daten is-a db; auto is-a car; ps is-a hp; db car hp
Martin Hentschel/Systems Group, ETH Lazy Evaluation, YFilter Built from left hand side of rules Non-deterministic finite state machine Main idea: Evaluate XQuery program Iterate through data model Report to YFilter Apply rules only when reaching an accepting state 21 R1: daten is-a db; R2: auto is-a car; R2: ps is-a hp; R1: daten is-a db; R2: auto is-a car; R2: ps is-a hp; * daten auto ps R1 R2 R3
Martin Hentschel/Systems Group, ETH Experiment: Throughput Complex query (multiple scans, joins) QR: too many unions, DT: overhead of translation 22
Martin Hentschel/Systems Group, ETH Experiment: Throughput Simple query Less unions for QR, DT: still overhead of translation 23
Martin Hentschel/Systems Group, ETH Experiment: Throughput 1 input message, bundle of queries evaluated at once QR: even more unions, DT: less overhead, only transforms input message once 24
Martin Hentschel/Systems Group, ETH Again: Advantages Performance Novel data model, lazy execution Light-weight Mappings rules are small units Extensibility Add more rules as new sources are adopted Flexibility Complex mappings through element constructors 25
Martin Hentschel/Systems Group, ETH The End Visit our website, LIVE DEMO! Write us, please! 26