Presentation is loading. Please wait.

Presentation is loading. Please wait.

1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Storage and Querying II.

Similar presentations


Presentation on theme: "1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Storage and Querying II."— Presentation transcript:

1 1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Storage and Querying II

2 1-2 Announcements r Hw3 is up. r Exams. r Sign-up for project presentations. r Schedule. r Course evaluation: Mon, 03.14. m Need volunteer.

3 1-3 Today r Storage. r Querying.

4 1-4 Data-Centric Storage

5 1-5 DCS r Data dissemination for sensor networks. r Naming-based storage.

6 1-6 Background r Sensornet ♦ A distributed sensing network comprised of a large number of small sensing devices equipped with processor memory radio ♦ Large volume of data r Data Dissemination Algorithm ♦ Scalable. ♦ Self-organizing. ♦ Energy efficient.

7 1-7 Some definitions r Observation ♦ Low-level output from sensors. ♦ E.g. detailed temperature and pressure readings. r Event ♦ Constellations of low-level observations. ♦ E.g. elephant-sighting, fire, intruder. r Query ♦ Used to elicit the event information from sensornets. ♦ E.g. is there an intruder? Where is the fire?

8 1-8 Data dissemination schemes r External Storage (ES) r Local Storage (LS) r Data-Centric Storage (DCS)

9 1-9 External Storage (ES)

10 1-10 Local Storage (LS)

11 1-11 Local Storage (LS)

12 1-12 Data-Centric Storage (DCS) r Events are named with keys. r DCS provides (key, value) pair. r DCS supports two operations: ♦ Put (k, v) stores v ( the observed data ) according to the key k, the name of the data ♦ Get (k) retrieves whatever value is stored associated with key k r Hash function ♦ Hash a key k into geographic coordinates. ♦ Put() and Get() operations on the same key k hash k to the same location.

13 1-13 DCS – Example (11, 28) Put(“elephant”, data) (11,28)=Hash(“elephant”)

14 1-14 DCS – Example (11, 28) (11,28)=Hash(“elephant”) Get(“elephant”)

15 1-15 DCS – Example – contd.. elephant fire

16 1-16 Geographic Hash Table (GHT) r Builds on ♦ Peer-to-peer Lookup Systems. ♦ Greedy Perimeter Stateless Routing. GHT GPSR Peer-to-peer lookup system

17 1-17 Comparison study r Metrics ♦ Total Messages Total packets sent in the sensor network. ♦ Hotspot Messages Maximal number of packets sent by any particular node.

18 1-18 Comparison study – cont’d.. r Assume ♦ n is the number of nodes ♦ Asymptotic costs of O(n) for floods O(n 1/2 ) for point-to-point routing ESLSDS Cost for StorageO(n 1/2 )0 Cost for Query0O(n)O(n 1/2 ) Cost for Response0O(n 1/2 )

19 1-19 Comparison Study –cont’d.. r D total, the total number of events detected r Q, the number of event types queries for r D q, the number of detected events of event types r No more than one query for each event type, so there are Q queries in total. r Assume hotspot occurs on packets sending to the access point.

20 1-20 Comparison Study – cont’d.. ESLSDCS Total Hotspot DCS is preferable if  Sensor network is large  D total >> max[D q, Q]

21 1-21 Summary r In DCS, relevant data are stored by name at nodes within the sensornets. r GHT hashes a key k into geographic coordinates, the key-value pair is stored at a node in the vicinity of the location to which its key hashes. r To ensure robustness and scalability, DCS uses Perimeter Refresh Protocol (PRP) and Structured Replication (SR). r Compared with ES and LS, DCS is preferable in large sensornet.

22 1-22 Multi-Resolution Storage

23 1-23 Goals r Provide storage and search for raw sensor data in data-intensive scientific operations. r Previous work: m Aggregation and querying. m Focus on applications whose interests are known a priori.

24 1-24 Approach r Lossy, progressively degrading storage.

25 1-25 Constructing the hierarchy Initially, nodes fill up their own storage with raw sampled data. Data

26 1-26 Constructing the hierarchy Organize network into grids, and hash in each to determine location of clusterhead (ref: DCS). Send compressed local time-series to clusterhead.

27 1-27 Processing at each level x time y … Get compressed summaries from children. Decode Re-encode at lower resolution and forward to parent. Store incoming summaries locally for future search.

28 1-28 Constructing the hierarchy Recursively send data to higher levels of the hierarchy.

29 1-29 Distributing storage load Hash to different locations over time to distribute load among nodes in the network.

30 1-30 r Eventually, all available storage gets filled, and we have to decide when and how to drop summaries. r Allocate storage to each resolution and use each allocated storage block as a circular buffer. What happens when storage fills up? Local Storage Allocation Res 1Res 2 Res 3Res 4 Local storage capacity

31 1-31 r Graceful query degradation: providing more accurate responses to queries on recent data and less accurate responses to queries on older data. Tradeoff between storage requirements and query quality Level 0 Level 1 Level 2 Storage time Query Accuracy high query accuracy low compactness low query accuracy high compactness low high How to allocate storage at each node to summaries at different resolutions to provide gracefully degrading storage and search capability?

32 1-32 Match system performance to user requirements Objective: Minimize worst case difference between user-desired query quality (blue curve) and query quality that the system can provide (red step function). Quality Difference Time Query Accuracy presentpast User provides a function, Q user that represents desired query quality degradation over time. Q system, with steps at times when summaries are aged. System provides a step function, Q system, with steps at times when summaries are aged. 95% 50%

33 1-33 For how long should summaries be stored? r To achieve desired query quality given system’s constraints. r Given m N sensor nodes. m Each node has storage capacity, S. m Users ask a set of typical queries, T. m Data is generated at resolution i at rate R i. m D(q,k) – query error when drilldown for query q terminates at level k. m Q user - User-desired quality degradation.

34 1-34 Solve Constraint Optimization Aging strategy with limited information Omniscient Strategy (baseline: when entire data is available. Training Strategy (when small training dataset from initial deployment). Greedy Strategy (when no data is available, use a simple weighted allocation to summaries). Coarse FinerFinest 1 : 2 : 4 No a priori information full a priori information

35 1-35 Distributed trace-driven implementation r Linux implementation. m Uses Emstar (J. Elson et al), a Linux-based emulator/simulator for sensor networks. m 3D Wavelet codec. m Query processing. r Geo-spatial precipitation dataset. m 15x12 grid (50km edge) of precipitation data from 1949-1994, from Pacific Northwest. r System parameters m Compression ratio: 6:12:24:48. m Training set: 6% of total dataset.

36 1-36 How efficient is search? Search is very efficient (<5% of network queried) and accurate for different queries studied.

37 1-37 Comparing aging strategies Training performs within 1% to optimal. Careful selection of parameters for the greedy algorithm can provide surprisingly good results (within 2-5% of optimal).

38 1-38 Summary r Progressive aging of summaries can be used to support long- term spatio-temporal queries in resource-constrained sensor network deployments. r We describe two algorithms: a training-based algorithm that relies on the availability of training datasets, and a greedy algorithm can be used in the absence of such data. r Our results show that m training performs close to optimal for the dataset that we study. m the greedy algorithm performs well for a well-chosen summary weighting parameter.

39 1-39 Continuously Adaptive Continuous Queries (CACQ)

40 1-40 CACQ Introduction r Proposed continuous query (CQ) systems are based on static plans. m But, CQs are long running. m Initially valid assumptions less so over time. r CACQ insight: apply continuous adaptivity. m Dynamic operator ordering. m Process multiple queries simultaneously. m Enables sharing of work & storage.

41 1-41 Outline r Background m Motivation m Continuous Queries m Eddies r CACQ m Contributions - Example driven explanation r Results & Experiments

42 1-42 Motivating applications r Building monitoring. r Variety of sensors (e.g., light, temperature, vibration, strain, etc.). r Variety of users with different interests (e.g., structural engineers, building managers, building users, etc.).

43 1-43 Continuous queries r Long running, “standing queries”. m From various users. m On a number of sensor streams. r Installed; continuously produce results until removed. r Lots of queries, over the same data sources m Opportunity for work sharing.

44 1-44 Eddies & adaptivity r Eddies (Avnur & Hellerstein, SIGMOD 2000): Continuous Adaptivity. r No static ordering of operators. r Routing policy dynamically orders operators on a per tuple basis.  done and ready bits encode where tuple has been, where it can go.

45 1-45 CACQ contributions r Adaptivity r Tuple lineage  In addition to ready and done, encode path tuple takes through operator. Enables sharing of work and state across queries. r Grouped filter m Efficiently compute selections over multiple queries. r Join sharing through State Modules (SteMs)

46 1-46 Eddies & CACQ : Single Query, Single Source Use ready bits to track what to do next All 1’s in single source Use done bits to track what has been done Tuple can be output when all bits set Routing policy dynamically orders tuples R  (R.a > 10) Eddy  (R.b < 15) R1R1 R1R1 R1R1 R1 a5 b25 R2 a15 b0 1 1 0 01 1 0 11 1 0 01 1 1 01 1 11 Rea dy Don e aa bb aa bb R  (R.a > 10) Eddy  (R.b < 15) R2R2 R2R2 R2R2 R2R2 R2R2 R2R2 SELECT * FROM R WHERE R.a > 10 AND R.b < 15

47 1-47 Evaluation r Real Java implementation on top of Telegraph QP m 4,000 new lines of code in 75,000 line codebase r Server Platform m Linux 2.4.10 m Pentium III 733, 756 MB RAM r Queries posed from separate workstation m Output suppressed r Lots of experiments in paper, just a few here

48 1-48 CACQ vs. NiagaraCQ Graph

49 1-49 Conclusion r CACQ: sharing and adaptivity for high performance monitoring queries over data streams. r Features m Adaptivity. Adapt to changing query workload without costly multi-query reoptimization. m Work sharing via tuple lineage. Without constraining the available plans. m Computation sharing via grouped filter. m Storage sharing via SteMs.


Download ppt "1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Storage and Querying II."

Similar presentations


Ads by Google