Download presentation
Presentation is loading. Please wait.
Published byDarcy Moore Modified over 9 years ago
1
Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University
2
What queries would users like to ask? (1) n A continuously executing query that might involve matching tuples across several streams. “ stream to me average net traffic passing between two ComputingElements (CEs)” u need to specify in the query the age of tuples that can be matched (a “sliding window”) u e.g. “consider only tuples no older than 5 min. from now” Possibly interesting?
3
n A “latest snapshot” query that joins the latest values of keys. “return all CEs that Steve is allowed to use” (Resource Broker) This query would involve joining tuples from CE tables, VO tables and denied users tables Probably interesting! A “history” query involving self-joins and aggregation “what was the growth in net traffic since last week?” Possibly interesting? What queries would users like to ask? (2)
4
How can R-GMA answer such queries? Observation: n If all the relevant tuples are inside one DBMS, then we can pass the query on to that DBMS query engine. - EASY! n If there are > 1 relevant producers, then our mediator probably needs an execution engine!- HARD! In any case, we know that some R-GMA users are defining Archivers and querying these directly. However: the local answer may only be a subset of the global answer. they may get a wrong answer (if the query involved max, avg, count, etc.)
5
Answering Joins using Archivers tables: cpuLoad, discspace condition: country =‘britain’ Requirements: Complete views (I publish everything!) “Latest” or “History” query-type (so data in a database, not a buffer). A smart registry hmm.. just need to go to 1 Archiver. Tuple matching always needs to take place in the same database, and never across databases. e.g. “SELECT * FROM cpuload c, discspace s WHERE c.site = s.site” can easily be answered using site archivers
6
n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Problems with Answering Joins using Archivers (1) If a new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine!
7
Problems: n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Answering Joins using Archivers (2) u If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. u consider a Archiver at some site that pores the tuples from several StreamProducers into a LatestProducer u Therefore the Mediator can’t rely on the Archiver’s query engine to return a complete answer, and so must mediate (hard!). n What if one Archiver isn’t enough? F Consumer.canAnswer()? Consumer.getPlan() ? (“you need an Archiver with these declarations”) (“I can’t answer your query, but could answer this sub-query”) n What if the Archiver disappears before the consumer calls start()? n Would a “Latest Archiver” be up-to-date enough? new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine!
8
Problems: n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Answering Joins using Archivers (2) u If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. u consider a Archiver at some site that pores the tuples from several StreamProducers into a LatestProducer u Therefore the Mediator can’t rely on the Archiver’s query engine to return a complete answer, and so must mediate (hard!). n What if one Archiver isn’t enough? F Consumer.canAnswer()? Consumer.getPlan() ? (“you need an Archiver with these declarations”) (“I can’t answer your query, but could answer this sub-query”) n What if the Archiver disappears before the consumer calls start()? n Would a “Latest Archiver” be up-to-date enough? new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine!
9
Query Planning and Execution: F What are the relevant Producers? F What sub-queries should we send them? F How should results be combined and operated on? (need a query engine!) Where? Possible Query Engines: F MySQL - dump all the data into MySQL … easy! F Polar Star (Manchester) ?… compatability? Answering Joins without Archivers
10
Conclusions We could support some “global” join queries quite easily: when just one Archiver is enough (needs a smarter Registry) suggestions could be given when there isn’t one Archiver available (consumer.getPlan()) and/or ad hoc joins could answered (in-efficiently) by first loading data into MySQL But: what queries do users want to pose? shouldn’t we restrict users to using only StreamProducers?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.