Download presentation
Presentation is loading. Please wait.
1
Querying Heterogeneous Information Sources Using Source Descriptions Authors: Alon Y. Levy Anand Rajaraman Joann J. Ordille Presenter: Yihong Ding
2
Challenges for Information Integration Interrelated data over multiple information sources Large number of the sources Limited size of data in many of the sources Greatly variant details of interacting with each source
3
IM Architecture 1 2 3 Bucket algorithm
4
IM World View Product( Model ) Automobile( Model, Year, Category ) Motorcycle( Model, Year ) Car( Model, Year, Category ) NewCar( Model, Year, Category ) UsedCar( Model, Year, Category ) CarForSale( Model, Year, Category, Price, SellerContact ) Automobile CarMotorcycle Car UsedCarCarForSale Product Automobile Virtual Relations: Classes: NewCar
5
Source Descriptions For each source: Content Record Capability Record Web Sources for Automobile Application
6
Content Records of Auto Sources
7
Capability Records of Auto Sources desired input setpossible output set capable selection set
8
Query Reformulation Containing instead of equivalent –Incomplete source –Useful subset Utilizes Plan Generator to: –Prune irrelevant sources –Split query into subgoals –Generate conjunctive query plans –Find executable ordering of subgoals
9
The Bucket Algorithm Given: user query q, source descriptions {V i } 1.Find relevant source (fill buckets) For each relation g in query q Find V j that contains relation g Check that constraints in V j are compatible with q 2.Combine source relations {V j } from each bucket into a conjunctive query q’ and check for containment (q’ q)
10
The Bucket Algorithm: Example q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y 1992, Model(c,m), Price(c,p), ProductReview(m,y,r)
11
1. Filling the Buckets q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y 1992, Model(c,m), Price(c,p), ProductReview(m,y,r) V 1 (c 1 ) V 2 (c 2 ) V 3 (c 3 ) V 1 (c 1,t 1 ) V 2 (c 2,t 2 ) V 3 (c 3,t 3 ) V 1 (c 1,y 1 ) V 2 (c 2,y 2 ) V 3 (c 3,y 3 ) V 1 (c 1,m 1 ) V 2 (c 2,m 2 ) V 3 (c 3,m 3 ) V 1 (c 1,p 1 ) V 2 (c 2,p 2 ) V 3 (c 3,p 3 ) V 5 (m 5,y 5,r 5 ) CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y 1992 t=sportscar
12
2. Checking Containment User Query q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y 1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r) V 1 (c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c) 1992, Category(c)=sportscar}), V 5 (m,y,r)({m:Model(c), y:Year(c)}, {r}, {}). ? Expanded Query q’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y 1992
13
Finding an Executable Ordering CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y 1992 t=sportscar V 1 (c)V 1 (c,t)V 1 (c,y)V 1 (c,m)V 1 (c,p)V 5 (m,y,r) BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y 1992}
14
Experimental Results Query 1: Find titles and years of movies featuring Tom Hanks Query 2: Find titles and reviews of movies featuring Tom Hanks Query 3: Find telephone number(s) for Alaska Airlines
15
Conclusions Source descriptions as content record and capability record Bucket algorithm for query reformulation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.