Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne.

Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne

Goals & Fundamental Observations  Goal: Improve sensor network lifetime AND:  Maintain the current DBMS abstraction and facilities while introducing algorithms to run queries efficiently  Add new capabilities for emerging applications such as summarization of data for getting rid of irrelevant data  Observations:  Communication in sensors is much more energy hungry than computation  Sensor Networks are made up of simple devices with no extensive data storage

Additional challenges  The overall system is very volatile  Changes in environment conditions can render readings inaccessible  Failure of nodes cannot be easily fixed  Nodes can run low on power over time  Data is dynamic  New data is being appended all the time  Serving multiple queries concurrently is problematic  Sensors are very limited on physically what they can observe at a given time

Fundamental Approaches  Collect all the sensor data to one or more data centers  Use a classical DBMS Energy inefficiencies due to redundant data collection, central point of failure, hot spots near root, has to collect data at the highest frequency for all potential queries and all the time Current DBMSs not fast enough for high-update applications Many facilities are redundant: RDBMS were built 25+ years ago Lack of certain convenient operations, e.g., continuous queries  Rebuild a DBMS for sensor networks and fix some of the problems on a central setting? Still energy inefficient due to centralization

Approaches Contd.  In-network storage and processing along with a capability to inject and collect data from any- where in the network for any number of centers  Already implied by communication costs dominating the computation costs in the network  But storage limitations require eliminating some data  Fundamentally different than current commercial RDBMSs

Query Classification for Sensor Networks  Continuous queries: that commonly span some long period of time  Snapshot queries: that collect data about now or some other point in time  Historical queries: collect summary data about past

Additional Operators  Use of only some of the sensors  Aggregation of data from multiple sensors  Correlation of data from sensors

Example Query SELECT min(humidity), town FROM sensors WHERE state = ‘Queensland’ GROUP BY town HAVING max(temperature) > 30 DURATION [now, now + 600 min] SAMPLING PERIOD 30 min

Extending SQL  Example: Cougar Sensor Network Database System (by Yong Yao and Johannes Gehrke)  Uses SQL like interface  After in-network processing, data is fed to a center  Optimizes for both resource usage and reaction time  Assumes that sensors are time synchronized  Each type of sensor is represented as an Abstract Data Type (ADT)  Each sensor is then an object of that ADT  Relations are virtual and append-only relations

Cougar Contd.  Has SELECT, FROM, WHERE, GROUP BY, HAVING, DURATION, and EVERY clauses  Now extended to have Gaussian ADTs (GADTs) to run probabilistic queries as sensors collect data with noise from physical phenomena: SELECT * FROM sensors WHERE sensor.temp.prob([10,20] >= 0.6) ‘Get the temperature data from sensors if it is ±5 of 15 degrees with at least 60 percent probability’

Execution Steps  Broadcast the query to the network  Collect data back  Not all data may be relevant and summarization of data may be utilized  Further analysis on a central system can be done if needed later  Note: Either a human or an automated system can be the origin of the query

Data Collection  Energy x Delay is the main composite metric  Methods:  Direct Independent Transmission  PEGASIS (Power-Efficient Gathering for Sensor Information Systems)  Binary Chain-based Scheme  Chain-based Three-level Scheme  Directed Diffusion  Tree-based Schemes  Multi-path Schemes  Hybrids

Direct Independent Transmission  Each node transmits to a center independently  Very energy inefficient  Nodes must watch out for collision and take turns  Hence the last message can be transmitted after a significant delay  First response may be very fast

PEGASIS  By Stephanie Lindsey and Cauligi Raghavendra  Assumes all nodes know the location of every other node  All nodes should be able to transmit data to the center in one hop  A greedy algorithm is used to construct a chain of sensor nodes starting farthest from the center  The chain is formed a priori  After every hop, data aggregation can be done  Leadership is transferred sequentially  May be energy efficient but delay is O(n)

PEGASIS To Center Leader End Start Sensors

Binary Chain-based Scheme  By the same authors from PEGASIS  It is a chain-based scheme like PEGASIS  Nodes are classified into levels  All nodes receiving a message at one level rise to the next level  At each level, number of nodes is halved  This is a CDMA only scheme (to prevent collisions)  Delay is O(log n)

Binary Chain-based Scheme Contd. To Center Step 1 Step 2 Step 3 Step 4

Chain-based Three-level Scheme  By the same authors from PEGASIS  For non-CDMA settings binary does not work  Again, a chain, like PEGASIS, is formed but the network is partitioned into groups that are far away from each other for simultaneous transmissions  Within a group, nodes transmit at the same time  One node of the group aggregates and goes to the next level  In the next level, all nodes are divided into two groups  Finally, all send to one node which sends to a center

Directed Diffusion  By Chalermek Intanagonwiwat and Ramesh Govidan and Deborah Estrin  Consists of:  Interest propagation E.g., location=[(100,100),(10, 200)], temperature=[10,20]  Gradient setup  Data delivery along reinforced path

Directed Diffusion Contd.

Tree-based Schemes  A routing tree rooted at a base station is used  The tree, that is utilized to distribute the query, is also utilized to collect the data  Example, TinyDB (by Samuel Madden and Michael Franklin and Joseph Hellerstein and Wei Hong)

TinyDB Contd.  Uses an epoch-based mechanism  Main disadvantage is that it can loose large subtrees/data due to central point of failure

Extensions  Report data only if it has changed from the previous report or consider whether a re-report will effect the final aggregation at all  Adapting to changing conditions in the network:

Multi-path Schemes  To prevent failures, the same sensor value can be sent along multiple paths  The main disadvantage is that the final value now may contain an approximation rather than an exact value  E.g., by Suman Nath and Philip Gibbons and Srinivasan Seshan and Zachary Anderson:

Hybrids  E.g., By Amit Manjhi and Suman Nath and Philip Gibbons  Benefits of both a tree mechanism as well as a multi-path mechanism: Base Station Tree Multi-path

Storing Data versus Data Collection  Rather than collecting data from individual sensors for every given query, sensors can be made to store their data in the network for point retrieval at a later time  Similar to creating rendezvous points

Geographic Hash Tables (GHTs)  By Sylvia Ratnasamy and Brad Karp and Li Yin and Fang Yu and Deborah Estrin and Ramesh Govindan and Scott Shenker  Assumes each node knows its location  Limited to point queries  Hashes keys to geographic locations  Stores a key-value pair on a sensor closest to the location  Geographic routing is used to access this data with a key later on  Replication on nearby nodes can be used for load sharing and failure resistance  Regions of data, rather than individual sensor readings, can also be hashed as an extension  The idea, in general, similar to publish-subscribe

GHTs Contd. Storage Query Source Query Source Data Source x x

Range Queries  GHTs do not work for range queries  A similar approach to Binary Chain-based Schemes can be used to one dimensional settings but storage, rather than collection is the goal: Road Sensors a b d c f e g h i j badc ca b fehg ge f d

Multidimensional Indexing  For multidimensional indexing, we can use:  Grid files with multidimensional range hashing  Quadtrees with block hashing  It is less clear how to map R-trees or k-d trees using hashing  In general, research on this front is at its infancy  Load balancing as well as minimizing communication overhead is a critical issue

DIMENSIONS System  By Deepak Ganesan and Ben Greenstein and Denis Perelyubskiy and Deborah Estrin and John Heidemann

DIFS System  By Benjamin Greenstein and Deborah Estrin and Ramesh Govindan and Sylvia Ratnasamy and Scott Shenker  A multi-rooted method  Nodes hold histograms  Even load distribution  I.e., we have many roots

Fractional Cascading  By Jie Gao and Leonidas Guibas and John Hershberger and Li Zhang  Request are commonly local, i.e., from a given node  GHTs can store data afar  Hence: Keep a fraction of distant data and keep detailed local data (use exponential decay)

Locality Preserving Hashing: DIM system  By Xin Li and Young Jin Kim and Ramesh Govindan and Wei Hong

Additional Issues: Data Aging  Algorithms are needed for summarizing aging data on sensors:  E.g., DIMENSIONS uses a monotonically decreasing function to discard data over time by creating new summaries

Summary and Future Directions  In-network processing is gaining momentum  Either collect data in an efficient manner  Or store data by creating good rendezvous-based mechanisms  Complex data aggregation mechanisms for sophisticated data analysis is commonly cited as a good research direction  Subquery generation and subquery trading is also a good research direction  Indexing with complex query processing is also at its infancy

Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne.

Similar presentations

Presentation on theme: "Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne.

Similar presentations

Presentation on theme: "Sensor Data Management Egemen Tanin Department of Computer Science and Software Engineering University of Melbourne."— Presentation transcript:

Similar presentations

About project

Feedback