Declarative Support for Sensor Data Cleaning Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Adaptive Accurate Indoor-Localization Using Passive RFID Xi Chen, Lei Xie, Chuyu Wang, Sanglu Lu State Key Laboratory for Novel Software Technology Nanjing.
A Deferred Cleansing Method for RFID Data Analytics IBM Almaden Research Center Jun Rao Sangeeta Doraiswamy Latha S. Colby University of California at.
Kien A. Hua Division of Computer Science University of Central Florida.
A Framework for Secure Data Aggregation in Sensor Networks Yi Yang Joint work with Xinran Wang, Sencun Zhu and Guohong Cao Dept. of Computer Science &
Design Considerations for High Fan-in Systems: The HiFi Approach Presented by Shawn Jeffery CIDR‘05 1/7/05 Michael J. Franklin, Shawn R. Jeffery, Sailesh.
StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003.
1 SAFIRE Project DHS Update – July 15, 2009 Introductions  Update since last teleconference Demo Video - Fire Incident Command Board (FICB) SAFIRE Streams.
1 Introduction to Wireless Sensor Networks. 2 Learning Objectives Understand the basics of Wireless Sensor Networks (WSNs) –Applications –Constraints.
Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by:
Design and Implementation of a Middleware for Sentient Spaces Bijit Hore, Hojjat Jafarpour, Ramesh Jain, Shengyue Ji, Daniel Massaguer Sharad Mehrotra,
Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Zero-programming Sensor Network Deployment 學生:張中禹 指導教授:溫志煜老師 日期: 5/7.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Sensor Networks JP Vasseur, Josh Bers, Yingying Chen Pandurang Kamat, Chip Elliott.
Multi-dimensional Range Query in Sensor Networks Xin Li,Young Jim Kim, Ramesh Govindan (University of Southern California ) Wei Hong (Intel Research Lab.
Interactive and Collaborative Visualization and Exploration of Massive Data Sets ---- UC Davis Visualization Investigators: Bernd Hamann, Ken Joy, Kwan-Liu.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science SPIRE: Scalable Processing of RFID Event Streams Yanlei Diao University of Massachusetts,
Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.
Continuous Data Stream Processing
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
ATSN 2009 Towards an Extensible Agent-based Middleware for Sensor Networks and RFID Systems Dirk Bade University of Hamburg, Germany.
Data Management Challenges and Opportunities in the Digital Home* ICME Amsterdam July 2005 Mike Franklin UC Berkeley *in collaboration with Intel Research.
Sensor Networks: Implications for Database Systems and Vice-Versa Michael Franklin January UCB Sensor Day.
Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Probabilistic Databases Amol Deshpande, University of Maryland.
HOL9396: Oracle Event Processing 12c
HiFi: Network-centric Query Processing in the Physical World SAP Research Forum February 2005 Mike Franklin UC Berkeley.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Wireless Sensor Networks for Habitat Monitoring Jennifer Yick Network Seminar October 10, 2003.
Adaptive Cleaning for RFID Data Streams VLDB /12/06 Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley.
Ivy – A Sensor Network Infrastructure for the College of Engineering ICM interface A sensor network, that like ivy, spreads through the environment and.
FPR Presentation Team Frij
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
COMP 410 Update. The Problems Story Time! Describe the Hurricane Problem Do this with pictures, lots of people, a hurricane, trucks, medicine all disconnected.
Semantic Streams: a Framework for Composable Semantic Interpretation of Sensor Data Kamin Whitehouse UC Berkeley EWSN, Feb 13, 2006 Joint with Feng Zhao.
Sensor accuracy in environmental sensor networks Alex Dionisio Calado Joan Suris Miret EPFL,
Introduction to Wireless Sensor Networks
PERVASIVE COMPUTING MIDDLEWARE BY SCHIELE, HANDTE, AND BECKER A Presentation by Nancy Shah.
Network Computing Laboratory HiFi Systems: Network-Centric Query Processing for the Physical World Michael J. Franklin, Shawn R. Jeffrey, et al UC Berkeley.
© 2010 IBM Corporation IBM Research - Ireland © 2014 IBM Corporation xStream Data Fusion for Transport Smarter Cities Technology Centre IBM Research.
1 SATWARE: A Semantic Middleware for Multi Sensor Applications Sharad Mehrotra.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
DEV333 Instrumenting Applications for Manageability with the Enterprise Instrumentation Framework David Keogh Program Manager Visual Studio Enterprise.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Adaptive Cleaning for RFID Data Streams. RFID: Radio Frequency IDentification.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Project Description Title: Room temperature monitoring based on ambient light level threshold Overview: An Arduino monitors the ambient light in a room.
Introduction to Wireless Sensor Networks
Virtual Machine Abstractions for Nomadic Pervasive Computing (NPC) Environment Presented by: Hen-I Yang, Nov. 29, 2006.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Interaction in Pervasive Computing Settings using Bluetooth-enabled Active tags and passive RFID Technology tegether with Mobile Phones PerCom 2003 F.
TAG: a Tiny AGgregation service for ad-hoc sensor networks Authors: Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong Presenter: Mingwei.
Dieter Gawlick, Oracle October, 2005 (GGF15 in Boston)
Applying Control Theory to Stream Processing Systems
IoT at the Edge Technical guidance deck.
RF2ID: A Reliable Middleware Framework for RFID Deployment
Weather-Adaptive Windows
Pervasive Data Access (PDA) Research Group
The Design of an Acquisitional Query Processor For Sensor Networks
On Using Semantic Complex Event Processing for Dynamic Demand Response
IoT at the Edge Technical guidance deck.
Adaptive Cleaning for RFID Data Streams
PSoup: A System for streaming queries over streaming data
Presentation transcript:

Declarative Support for Sensor Data Cleaning Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch Rock Stanford Corporation University (Intel Research Berkeley) Presented By: Venkatesh (venky) Raghavan & Abhishek Mukherji Disclaimer: Slides adapted / taken from the talk given by S. Jeffery in Pervasive ‘06

Current Approach Application Raw, dirty data Data Cleaning Application Data Cleaning Sensor devices Each application implements its own data cleaning Multiple accesses to a shared resource Each application implements its own data cleaning Multiple accesses to a shared resource

Data Cleaning - Infrastructure Approach Application Cleaning Infrastructure Raw, dirty data Cleaned data Data cleaning built, tested, and deployed once One point of access to sensor devices Data cleaning built, tested, and deployed once One point of access to sensor devices The Cleaning Infrastructure translates raw sensor data to cleaned data; applications are unaffected by the unreliable devices over which they are deployed.

Challenges How to build an infrastructure that supports:  Many types of sensors  Multiple applications  Different environments Two facets to our solution: 1) Pipeline of sensor cleaning tasks 2) Declarative query processing

Temporal and Spatial Granules ESP (Extensible Sensor stream Processing) uses high-level abstractions:  Temporal Granules  Spatial Granules Granules  Define units of time and space inside which the data are expected to be homogeneous Exploits the fact that many applications are not interested in individual readings or devices, but with higher-level data in time and space

Temporal Granules  Sensor devices produce data at a frequent rate  Applications are concerned with data from a larger time period Environment Monitoring application – model micro- climate of redwood tree  Reading required for every 5 minutes.  Solution: windowed processing to group readings

Spatial Granules Reading from devices physically close to each other are expected to be homogeneous Spatial granules defines the unit of space in which this homogeneity is expected to hold.

Sensor Cleaning Pipeline Point Smooth Merge Arbitrate Virtualize Cleaning Data Involves  A set of logically distinct operation  Each operation targets different aspects of the data, from finest (single readings) to coarsest (multiple sensors and various sources)  Uses temporal and spatial characteristics of sensor data

Declarative Query Processing Program stages with declarative queries CQL: continuous query extension to SQL Data stream system as processing engine  Real-time cleaning SELECT S.city, AVG(temp) FROM SOME_STREAM S [RANGE ‘5 seconds’] WHERE S.state = ‘California’ GROUP BY S.city Window Clause

Step 1: Point Operates: Single value of sensor stream. Purpose: Filter individual values  Errant (dirty / faulty) RFID tags  Obvious outliers  Conversion of raw data into tuples Heat Sensors  Output data into voltages. We have to convert that raw data into temperature by looking into calibration of that sensor.

Step 1: Point P P P P P P P P P P P P Point

Step 2: Smoothing Purpose: Interpolates (inserts) lost readings  Temporal interpolation  Outlier detection Method:Window based queries P P P P P P SS P P P P P P SS Temporal Granules Point Smooth

Step 3: Merge Purpose: Spatial interpolation  Example: Within a spatial granule, by computing the average of the readings from different motes and omitting individual readings that are outside of two deviations from the mean. P P P P P P SS M P P P P P P SS M Spatial Granules Point Smooth Merge

Step 3: Merge Outlier mote Average Functioning motes

Step 4: Arbitrate Purpose: Remove  c onflicting readings  de-duplication P P P P P P SS M A P P P P P P SS M Point Smooth Merge Arbitrate

Step 5: Virtualize Purpose: Multi-source integration P P P P P P SS M A V P P P P P P SS M Point Smooth Merge Arbitrate Virtualize

RFID Scenario Point Smooth Merge Arbitrate Virtualize Application arbitrate_input rfid_data smooth_input Query 2 Query 4 Query 3 On Sensor Smooth Each domain needs to modeled

RFID Scenario Fig: Expected OutputFig: Query 2 result using raw RFID Data

Smoothing Difference in Shelf 0 and Shelf 1 is likely due to issues with antenna ports on these particular RFID readers.

Arbitration

Moving Average (Window (w) = 3 time-stamps At t+2, Shelf 0: count(r1) = 2 Shelf 1: count(r1) = 3 tt+1t+2 RFID : r1 NOTE: Window size must be larger than the longest period of dropped reading. But not too large.

Conclusion An infrastructural approach to sensor data cleaning is necessary ESP: a pipelined declarative framework for building such infrastructure Application ESP Raw, dirty data Cleaned data