Download presentation
Presentation is loading. Please wait.
1
Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin
2
Mike Franklin UC Berkeley EECS Outline Perspective on shared infrastructure Scientific Applications Business Environments Data Cleaning as a Shared Service Other Core Services What’s core and what isn’t? Conclusions
3
Mike Franklin UC Berkeley EECS Scientific Instruments Cost: moderate Users: one Use: defense/navigation Scheduling: ad hoc Data Cleaning: cloth
4
Mike Franklin UC Berkeley EECS Scientific Instruments Cost: more Users: one Use: science Scheduling: ad hoc Data Cleaning:religious
5
Mike Franklin UC Berkeley EECS Scientific Instruments Cost: 100’s K$ (1880s $) Users: 100’s Use: science Scheduling: by committee Data Cleaning: grad students
6
Mike Franklin UC Berkeley EECS Scientific Instruments Cost: 100’s M$ (2010s $) Users: 1000’s-millions Use: science and education Scheduling:mostly static - SURVEY Data cleaning: mostly algorithmic Key Point: Enabled by modern (future) Data Management!
7
Mike Franklin UC Berkeley EECS Shared Infrastructure Sharing dictated by costs Costs of hardware Costs of deployment Costs of maintenance Pooled Resource Management Comptetitively Scheduled Statically Scheduled (surveys) Data Cleaning At the instrument By the applications (or end users) Other Services
8
Mike Franklin UC Berkeley EECS Shared Sensor Nets Macroscopes are expensive: to design to build to deploy to operate and maintain They will be shared resources: - across organizations - across apps w/in organizations Q: What are the right abstractions to support them?
9
Mike Franklin UC Berkeley EECS Traditional Shared Data Mgmt Point of Sale Inventory Data Feeds Data Warehouse Business Intelligence Etc. Reports Extract Transform Load Data Mart Dashboards Operational Systems ad hoc Queries Cleaning, Auditing, … Users All users/apps see only cleaned data: a.k.a. “TRUTH”
10
Mike Franklin UC Berkeley EECS Shared SensorNet Services Data Cleaning Scheduling Monitoring Actuation Tasking/ Programming EvolutionProvisioning We will need to understand the shared/custom tradeoffs for all of these. Quality Estimation Data Collection Query & Reporting
11
Mike Franklin UC Berkeley EECS Data Cleaning as a Shared Service
12
Mike Franklin UC Berkeley EECS Some Data Quality Problems with Sensors 1.(Cheap) sensors are failure and error prone (and people want their sensors to be really cheap). 2.Device interface is too low level for applications. 3.They produce too much (uninteresting) data. 4.They produce some interesting data, and it’s hard to tell case #3 from case #4. 5.Sensitive to environmental conditions.
13
Mike Franklin UC Berkeley EECS Problem 1a: Sensors are Noisy A simple RFID Experiment 2 adjacent shelves, 6 ft. wide 10 EPC-tagged items each, plus 5 moved between them RFID antenna on each shelf
14
Mike Franklin UC Berkeley EECS Shelf RIFD Test - Ground Truth
15
Mike Franklin UC Berkeley EECS Actual RFID Readings “Restock every time inventory goes below 5”
16
Mike Franklin UC Berkeley EECS Prob 1b: Sensors “Fail Dirty” 3 temperature-sensing motes in the same room Outlier Mote Average
17
Mike Franklin UC Berkeley EECS Problem 2: Low-level Interface Lack of good support for devices increases the complexity of sensor-based applications.
18
Mike Franklin UC Berkeley EECS Problems 3 and 4: The Wheat from the Chaff Shelf RFID reports (50 times/sec): there are 100 items on the shelf the 100 items are still on the shelf there are 99 items on the shelf the 99 items are still on the shelf
19
Mike Franklin UC Berkeley EECS Problem 5: Environment Read Rate vs. Distance Alien I2 Tag in a room on the 4th floor of Soda Hall Read Rate vs. Distance using same reader and tag in the room next door
20
Mike Franklin UC Berkeley EECS VICE: Virtual Device Interface [Jeffery et al., Pervasive 2006] Goal: Hide messy details of underlying physical devices. Error characteristics Failure Calibration Sampling Issues Device Management Physical vs. Virtual Fundamental abstractions: Spatial & temporal granules “Metaphysical Data Independence”
21
Mike Franklin UC Berkeley EECS VICE - A Virtual Device Layer “Virtual Device (VICE) API” Vice API is a natural place to hide much of the complexity arising from physical devices.
22
Mike Franklin UC Berkeley EECS The VICE Query Pipeline Multiple Receptors Single TupleWindow Vice Stages Generalization Arbitrate Clean Smooth Validate Analyze Join w/Stored Data On-line Data Mining
23
Mike Franklin UC Berkeley EECS RFID Smoothing w/Queries Time Raw readings Smoothed output RFID data has many dropped readings Typically, use a smoothing filter to interpolate SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id Smoothing Filter
24
Mike Franklin UC Berkeley EECS After Vice Processing “Restock every time inventory goes below 5”
25
Mike Franklin UC Berkeley EECS Adaptive Smoothing [Jeffery et al. VLDB 2006]
26
Mike Franklin UC Berkeley EECS Ongoing Work: Spatial Smoothing With multiple readers, more complicated Reinforcement A? B? A U B? A B? Arbitration A? C? All are addressed by statistical framework! U A B C D Two rooms, two readers per room
27
Mike Franklin UC Berkeley EECS Problems with a single Truth If you knew what was going to happen, you wouldn’t need sensors upside down airplane ozone layer hole Monitoring vs. Needle-in-a-haystack Probability-based smoothing may remove unlikely, but real events!
28
Mike Franklin UC Berkeley EECS Risks of too little cleaning GIGO Complexity- Burden on App Developers Efficiency (repeated work) Too much opportunity for error
29
Mike Franklin UC Berkeley EECS Risks of too much cleaning The appearance of a hole in the earth's ozone layer over Antarctica, first detected in 1976, was so unexpected that scientists didn't pay attention to what their instruments were telling them; they thought their instruments were malfunctioning. National Center for Atmospheric Research In fact, the data were rejected as unreasonable by data quality control algorithms
30
Mike Franklin UC Berkeley EECS One Truth for Sensor Nets? How clean is “clean-enough”? How much cleaning is too much? Answers are likely to be: domain-specific sensor-specific application-specific user-specific all of the above? How to split between shared and application-specific cleaning?
31
Mike Franklin UC Berkeley EECS Fuzzy Truth One solution is to make the shared interface richer. Probabilistic Data Management is also the key to “Calm Computing”
32
Mike Franklin UC Berkeley EECS Adding Quality Assessment A. Das Sarma, S. Jeffery, M. Franklin, J. Widom, “Estimating Data Stream Quality for Object- Detection Applications”, 3 rd Intl ACM SIGMOD Workshop on Information Quality in Info Sys, 2006
33
Mike Franklin UC Berkeley EECS ‘Data Furnace” Architecture Service Layer Probabilistic Reasoning Uncertainty Management Data Model Learning Complex Event Processing Data Archiving and Streaming Garafalakis et al. D.E. Bulletin, 3/06
34
Mike Franklin UC Berkeley EECS Rethinking Service Abstractions Data Cleaning Scheduling Monitoring Actuation Tasking/ Programming EvolutionProvisioning We will need to understand the shared/custom tradeoffs for all of these. Quality Estimation Query-Data Collection
35
Mike Franklin UC Berkeley EECS Conclusions Much current sensor research is focused on the “single user” or “single app” model. Sensor networks will be shared resources. Can leverage some ideas from current shared Data Management infrastructures. But, new solutions, abstractions, and architectures will be required.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.