MicroHash:An Efficient Index Structure for Flash-Based Sensor Devices Demetris Zeinalipour [ ] School of Pure and Applied Sciences.

Slides:



Advertisements
Similar presentations
IEEE SECON-2005, Tue, September 27, 2005 (C) Anirban Banerjee and Abhishek Mitra RISE & Co-S: A high performance Co-proceSsing Sensor architecture for.
Advertisements

Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.
Wireless Sensor Networks: An overview and experiences. Matthew Grove PEDAL Seminar Series, January 9th 2008.
Panoptes: A Scalable Architecture for Video Sensor Networking Applications Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng (OGI: The Oregon.
TOSSIM A simulator for TinyOS Presented at SenSys 2003 Presented by : Bhavana Presented by : Bhavana 16 th March, 2005.
1 Introduction to Wireless Sensor Networks. 2 Learning Objectives Understand the basics of Wireless Sensor Networks (WSNs) –Applications –Constraints.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
B+-tree and Hashing.
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010 NSF Workshop on Sustainable Energy Efficient Data Management (SEEDM), Arlington,
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
CS526 Wireless Sensor Networks Instructor: KD Kang.
4/30/031 Wireless Sensor Networks for Habitat Monitoring CS843 Gangalam Vinaya Bhaskar Rao.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Wireless Sensor Networks for Habitat Monitoring Jennifer Yick Network Seminar October 10, 2003.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1 The Google File System Reporter: You-Wei Zhang.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Indexing and Searching in Wireless Sensor Networks Demetris Zeinalipour [ ] School of Pure and Applied Sciences Open University of.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Introduction to Wireless Sensor Networks
Workload-aware Optimization of Query Routing Trees in Wireless Sensor Networks Panayiotis Andreou (Univ. of Cyprus) Demetris Zeinalipour-Yazti (Open Univ.
MINT Views: Materialized In-Network Top-k Views in Sensor Networks Demetrios Zeinalipour-Yazti (Uni. of Cyprus) Panayiotis Andreou (Uni. of Cyprus) Panos.
ETC: Energy-driven Tree Construction in Wireless Sensor Networks Panayiotis Andreou (Univ. of Cyprus) Andreas Pamboris (Univ. of California – San Diego)
Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection Network Structure.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
ELF: An Efficient Log-Structured Flash File System For Micro Sensor Nodes Hui Dai Michael Neufeld Richard Han University of Colorado at Boulder Computer.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
ResTAG: Resilient Event Detection with TinyDB Angelika Herbold -Western Washington University Thierry Lamarre -ENSEIRB Systems Software Laboratory, OGI.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
MicroHash:An Efficient Index Structure for Flash-Based Sensor Devices Demetris Zeinalipour [ ] School of Pure and Applied Sciences.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Xiong Junjie Node-level debugging based on finite state machine in wireless sensor networks.
Interfacing External Sensors to Telosb Motes April 06,2005 Raghul Gunasekaran.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
FSort: External Sorting on Flash-based Sensor Devices Panayiotis Andreou, Orestis Spanos, Demetrios Zeinalipour-Yazti, George Samaras University of Cyprus,
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Introduction to Wireless Sensor Networks
EASE: An Energy-Efficient In-Network Storage Scheme for Object Tracking in Sensor Networks Jianliang Xu Department of Computer Science Hong Kong Baptist.
Chapter 5 Record Storage and Primary File Organizations
Top-k Queries in Wireless Sensor Networks Amber Faucett, Dr. Longzhuang Li, In today’s world, wireless.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Why does it need? [USN] ( 주 ) 한백전자 Background Wireless Sensor Network (WSN)  Relationship between Sensor and WSN Individual sensors are very limited.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
- Pritam Kumat - TE(2) 1.  Introduction  Architecture  Routing Techniques  Node Components  Hardware Specification  Application 2.
File-System Management
Module 11: File Structure
CS522 Advanced database Systems
Demetrios Zeinalipour-Yazti (Univ. of Cyprus)
Hash-Based Indexes Chapter 11
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Hash-Based Indexes Chapter 10
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter 11 Instructor: Xin Zhang
Presentation transcript:

MicroHash:An Efficient Index Structure for Flash-Based Sensor Devices Demetris Zeinalipour [ ] School of Pure and Applied Sciences Open University of Cyprus IBM Research, Zurich, Switzerland, Dec. 12 th, 2008

2 Presentation Goals To provide an overview of Wireless Sensor Networks and related Data Acquisition Frameworks. To highlight some important storage and retrieval challenges that arise in this context.

3 This is a joint work with my collaborators at the University of California – Riverside. Our results were presented in the following papers: –"MicroHash: An Efficient Index Structure for Flash- Based Sensor Devices", D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos and W. Najjar, The 4th USENIX Conference on File and Storage Technologies (FAST’05), San Francisco, USA, December, –" Efficient Indexing Data Structures for Flash-Based Sensor Devices", S. Lin, D. Zeinalipour-Yazti, V. Kalogeraki, D. Gunopulos, W. Najjar, ACM Transactions on Storage (TOS), ACM Press, Vol.2, No. 4, pp , November Acknowledgements

4 Presentation Outline 1.Overview of Wireless Sensor Networks 2.Overview of Data Acquisition Frameworks 3.The MicroHash Index Structure. 4.MicroHash Experimental Evaluation 5.Conclusions and Future Work

5 Wireless Sensor Devices Resource constrained devices utilized for monitoring and researching the physical world at a high fidelity. Xbow’s i-mote2 UC-Riverside’s RISE Xbow’s TelosB UC-Berkeley’s Mica2dot Xbow’s Mica

6 Wireless Sensor Device Radio, used for transmitting the acquired data to some storage site (SINK) (9.6Kbps- 250Kbps Storage Sensors: Numeric readings in a limited range (e.g., temperature -40F..+250F with one decimal point precision) at a high frequency (2-2000Hz)

Wireless Sensor Network 7

8 Wireless Sensor Networks Applications have already emerged in: –Environmental and habitant monitoring –Seismic and Structural monitoring –Understanding Animal Migrations & Interactions between species –Automation, Tracking, Hazard Monitoring Scenarios, Urban Monitoring etc Great Duck Island – Maine (Temperature, Humidity etc). Golden Gate – SF, Vibration and Displacement of the bridge structure Zebranet (Kenya) GPS trajectory

9 Wireless Sensor Networks The Great Duck Island Study (Maine, USA) Large-Scale deployment by Intel Research, Berkeley in (Maine USA). Focuses on monitoring microclimate in and around the nests of endangered species which are sensitive to disturbance. They deployed more than 166 motes installed in remote locations (such as 1000 feets in the forest)

10 Wireless Sensor Networks WebServer

11 Wireless Sensor Networks The James Reserve Project, CA, USA Available at:

12 Wireless Sensor Networks Microsoft’s SenseWeb/SensorMap Technology Available at: SenseWeb: A peer-produced sensor network that consists of sensors deployed by contributors across the globe SensorMap: A mashup of SenseWeb’s data on a map interface Swiss Experiment (SwissEx) (6 sites on the Swiss Alps) Chicago (Traffic, CCTV Cameras, Temperature, etc.)

13 Characteristics 1.The Energy Source is limited. Energy source: AA batteries, Solar Panels 2.Local Processing is cheaper than transmitting over the radio. Transmitting 1 Byte over the Radio consumes as much energy as ~1200 CPU instructions. 3.Local Storage is cheaper than transmitting over the radio. Transmitting 512B over a single-hop 9.6Kbps (915MHz) radio requires 82,000μJ, while writing to local flash only 760μJ.

14 Presentation Outline 1.Overview of Wireless Sensor Networks (WSN) 2.Overview of Data Acquisition Frameworks 3.The MicroHash Index Structure 4.MicroHash Experimental Evaluation 5.Conclusions and Future Work

15 Centralized Storage A Database that collects readings from many Sensors. Centralized: Storage, Indexing, Query Processing, Triggers, etc.

16 Centralized Storage I Available at: Crossbow’s MoteView software NO in-network Aggregation/Filtering NO in-Network Storage

17 Centralized Storage II Available at: TinyDB - A Declarative Interface for Data Acquisition in Sensor Networks. In-Network Aggregation/Filtering Limited In-Network Storage (No Indexes) e.g., SELECT MAX(temp) FROM sensors

Centralized Storage: Conclusions Frameworks such as TinyDB: -Are suitable for continuous queries. -Push aggregation in the network but keep much of the processing at the sink. New Challenges: -Many applications DON’T require the continuous evaluation of user queries (e.g., historic query: Find the average temperature for the last 6 months) -In many applications there is no sink (e.g., remote deployments and mobile sensor networks) -Local Storage on sensor devices keeps growing. -RISE supports a 1GB external SD Card -I-Mote-2 support 32MB Flash/32MB SRAM 18

19 Our Model: In-Situ Data Storage 1.Data remains In-situ (at the generating site) in a sliding window fashion. 2.When required, users conduct on-demand queries to retrieve information of interest. A Network of Sensor Databases

20 Center for Conservation UCR: Research of Soil-Organisms –A set of sensors monitor the CO 2 levels in the soil over a large window of time. –Not a real-time application. –Most acquired values are not of particular interest. In-Situ Data Storage: Motivation D. Zeinalipour-Yazti, S. Neema, D. Gunopulos, V. Kalogeraki and W. Najjar, "Data Acquision in Sensor Networks with Large Memories", IEEE Intl. Workshop on Networking Meets Databases NetDB (ICDE'2005), Tokyo, Japan, 2005.NetDBICDE'2005

21 Presentation Outline 1.Overview of Wireless Sensor Networks 2.Overview of Data Acquisition Frameworks 3.The MicroHash Index Structure 4.MicroHash Experimental Evaluation 5.Conclusions and Future Work

22 Flash Memory at a Glance The most prevalent storage medium used for Sensor Devices is Flash Memory (NAND Flash) Fastest growing memory market (‘05 $8.7B, ‘06:$11B) (NAND) Flash Advantages Simple Cell Architecture (high capacity in a small surface) => Economical Reproduction Fast Random Access (50-80 μs) compared to 10-20ms in Disks Shock Resistant Power Efficient Surface mount NAND flash Removable NAND Devices

23 Asymmetric Read/Write Energy Cost! Measurements using RISE Flash Memory at a Glance 1.Write-Constrain: Writing can only be performed at a page granularity (256B~512B) to an empty page (if occupied we need to delete its content). 2.Delete-Constrain: Erasure of a page can only be performed at a block granularity (i.e. 8KB~64KB) 3.Wear-Constrain: Each page can only be written a limited number of times (typically 10, ,000) Energy (Page Size = 512 B) Read = 24 μJ Write =763μJ Block Erase =425μJ

24 MicroHash Index Objectives General Objectives Provide efficient access to any record stored on flash by timestamp or value. Execute a wide spectrum of queries based on our index, similarly to generic DB indexes. Design Objectives (Adhere to Flash Constrains) : Avoid wearing out specific pages. Minimize random access deletions of pages. Minimize main memory (SRAM) structures SRAM is extremely limited (8KB-64KB). Small memory-footprint => quick initialization.

25 Main Structures 4 Page Types: a) Root Page, b) Directory Page, c) Index Page and d) Data Page 4 Phases of Operation: a) Initialization, b) Growing, c) Repartition and d) Garbage Collect.

26 Growing the MicroHash Index Collect data in an SRAM buffer page P write When P write is full flush it out to flash media Next create index records for each data record in P write If SRAM gets full, Index pages are forced out to flash media by an LRU policy. (ts, 74F) Index Pages Buffer P write Buffer P write x Directory Index

Growing the MicroHash Index Flash Media A populated Flash Media idx: next empty page 27

28 Garbage Collection in MicroHash When the media gets full some pages need to be deleted => delete the oldest pages. Oldest Block? The next block following the idx pointer. Note: This might create invalid index records. This will be handled by our search algorithm

29 Directory Repartition in MicroHash MicroHash starts out with a directory that is segmented into equiwidth buckets –e.g., divide the temperature range [0,100] into c buckets) Not efficient as certain buckets will never be utilized –Consider the first few or last few buckets below.

30 Directory Repartition in MicroHash If bucket A links to more than τ index pages, evict the least used bucket B and segment the full bucket A into A and A’ We want to avoid bucket reassignments of old records as this would be very expensive C: #entries since last split S: timestamp of last addition A: >> Add(18) << _ _ _ Example: τ=3

31 Searching in MicroHash Searching by value “Find the timestamp (s) on which the temperature was 100F” –Simple operation in MicroHash –We simply find the right Directory Bucket, from there the respective index page and then data record (page-by-page) Searching by timestamp “Find the temperature of some sensor on a given timestamp tq” –Problem: Index pages are mixed together with data pages. –Solutions: 1. Binary Search (O(logn), 18 pages for a 128MB media) 2. LBSearch (less than 10 pages for a 128MB media) 3. ScaleSearch (better than LBSearch, ~4.5 pages for a 128MB media)

32 LBSearch and ScaleSearch Solutions to the Search-by-timestamp problem: A)LBSearch: We recursively create a lower bound on the position of tq until the given timestamp is located. B)ScaleSearch: Quite similar to LBSearch, however in the first step we proceed more aggressively (by exploiting data distribution) Query tq=500 tq=300 tq=350 tq=420 tq=490 tq=500

33 Searching Bottlenecks Index Pages written on flash might not be fully occupied When we access these pages we transfer a lot of empty bytes (padding) between the flash media and SRAM. Proposed Solutions: –Solution 1: Two-Phase Page Reads –Solution 2: ELF-like Chaining of Index Pages

34 Improving Search Performance Solution 1: Utilize Two-Phase Page Reads. –Reads the 8B header from the flash media. –Then read the correct payload in the next phase.

35 Improving Search Performance Solution 2: Avoid non-full index pages using ELF*. –ELF: a linked list in which each page, other than the last page, is completely full. –keeps copying the last non-full page into a newer page, when new records are requested to be added. *Dai et. al., Efficient Log Structured Flash File System, SenSys 2004

36 Presentation Outline 1.Overview of Wireless Sensor Networks 2.Overview of Data Acquisition Frameworks 3.The MicroHash Index Structure 4.MicroHash Experimental Evaluation 5.Conclusions and Future Work

37 Experimental Evaluation Implemented MicroHash in nesC. We tested it using TinyOS along with a trace- driven experimental methodology. Datasets: –Washington State Climate 268MB dataset contains readings in –Great Duck Island 97,000 readings between October and November Evaluation Parameters: i) Space Overhead, ii) Energy Overhead, iii) Search Performance 37

38 Space Overhead of Index Conclusions: Space Overhead is minimized with: a)A Larger Buffer b)No-Offset c)Pressure Data Measure: IndexPages/(DataPages+IndexPages) Two Index page layouts –Offset, an index record has the following form {datapageid,offset} –NoOffset, in which an index record has the form {datapageid} 128 MB flash media (256,000 pages)

39 Space Overhead of Index Black denotes the index pages 10KB Buffer 2.5KB Buffer Index OccupancyIndex | Data Pages Bitmap Representations of the Flash Media Index OccupancyIndex | Data Pages Increasing the Buffer Decreases the Index Overhead

40 Search Performance Measure: # of page reads to find a record by timestamp 2 Index page layouts (128MB flash, varying SRAM) –Anchor: Index Pages store the last known timestamp –No Anchor: Timestamp is only stored in Data Pages Conclusions: Search Performance is increased with: a)Larger Write Buffer during Indexing b)Anchors c)ScaleSearch

41 Indexing the Great Duck Island Trace Used 3KB index buffer and a 4MB flash card to store all the 97, byte data readings. –The index never requires more than 28% additional space –Indexing the records has only a small increase in energy demand: the energy cost of storing the records on flash without an index is 3042mJ –We were able to find any record by its timestamp with 4.75 page reads on average

42 Presentation Outline 1.Overview of Wireless Sensor Networks 2.Overview of Data Acquisition Frameworks 3.The MicroHash Index Structure 4.MicroHash Experimental Evaluation 5.Conclusions and Future Work

43 Conclusions & Future Work We proposed the MicroHash index, which is an efficient external memory hash index for sensor devices that addresses the distinct characteristics of flash memory Our experimental evaluation shows that the structure we propose is both efficient and practical Future work: –Develop a complete library of indexes and data structures (stacks, queues, b+trees, etc.) –Buffer optimizations and Online Compression –Support Range Queries

MicroHash:An Efficient Index Structure for Flash-Based Sensor Devices Demetris Zeinalipour Thank you! Questions? Related Publications "MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices", D. Zeinalipour,S. Lin, V. Kalogeraki, D. Gunopulos, W. Najjar, In USENIX FAST’05. " Efficient Indexing Data Structures for Flash-Based Sensor Devices", ACM Transactions on Storage (TOS), November Presentation and publications available at:

Backup Slides

46 The Anatomy of a Sensor Device Processor, in various (sleep, idle, active) modes Power sourceAA or Coin batteries, Solar Panels SRAM used for the program code and for in- memory buffering. LEDs used for debugging Radio, used for transmitting the acquired data to some storage site (SINK) (9.6Kbps-250Kbps) Sensors: Numeric readings in a limited range (e.g. temperature -40F..+250F with one decimal point precision) at a high frequency (2-2000Hz) Storage

47 Sensor Devices & Capabilities Sensing Capabilities Light Temperature Humidity Pressure Tone Detection Wind Speed Soil Moisture Location (GPS) etc…

48 In-Network Storage: Data Centric Storage Outline 1.Data is stored on specific nodes in the network (e.g., humidity on node A and temperature on node B) 2. Locating Data can be performed without flooding (e.g., temperature-related data is stored on node B). Temperature Store Q: SELECT nodeid where temp=100F

The Programming Cycle The Operating System TinyOS (UC-Berkeley): Component-based architecture that allows programmers to wire together the minimum required components in order to minimize code size and energy consumption (The operating system is really a number of libraries that can be statically linked to the sensor binary at compile time) The Programming Language nesC (Intel Research, Berkeley): an event-based C-variant optimized for programming sensor devices event result_t Clock.fire() { state = !state; if (state) call Leds.redOn(); else call Leds.redOff(); } “Hello World”: Blinking the red LED!

The Programming Cycle The Testing Environment Debugging code directly on a sensor device is a tedious procedure nesC allows programmers to compile their code to A Binary File that is burnt to the sensor A Binary File that runs on a PC TOSSIM (TinyOS Simulation) is the environment which allows programmers to simulate the PC binary directly on a PC. This enables accurate simulations, fine grained energy modeling (with PowerTOSSIM) and visualization (TinyViz)

The Programming Cycle The Pre-deployment Environment Once you have created and debugged you code you can perform a deployment in a laboratory environment. Harvard’s MoteLab uses 190 sensors, powered from wall power interconnected with an Ethernet connection. The Ethernet is just for debugging and reprogramming, while the Radio for actual communication between motes Motes can be reprogrammed through a web interface. Available at:

Page Types in MicroHash Root Page –contains information related to the state of the flash media, e.g. it contains the position of the last write (idx), the current write cycle (cycle) and meta information about the various indexes stored on the flash media Directory Page (the hash table) –contains a number of directory records (buckets) each of which contains the address of the last known index page mapped to this bucket. Index Page –contains a fixed number of index records and the 8 byte timestamp of the last known data record. The latter field, denoted as anchor is exploited by timestamp searches. Data Page –contains a fixed number of data records 52

53 Search Performance We compared MicroHash vs. ELF Index Page Chaining by searching all values in the range [20,100] Keeping full index pages in ELF increases search performance but decreases insertion performance. Decreasing indexing performance using ELF (15% more writes) Increasing search performance using ELF (10% less reads)