An Effective Coreset Compression Algorithm for Large Scale Sensor Networks Dan Feldman, Andrew Sugaya Daniela Rus MIT.

Slides:



Advertisements
Similar presentations
External sorting R & G – Chapter 13 Brian Cooper Yahoo! Research.
Advertisements

An Interactive-Voting Based Map Matching Algorithm
Computer Hardware Introduction. Computer System Components Input Keyboard, Mouse, Camera, Touch Pad Processing CPU Output Monitor, Printer Storage Floppy,
Music Analysis Josiah Boning TJHSST Senior Research Project Computer Systems Lab,
iDiary: From GPS Signals to a Text-Searchable Diary
Starting Out with C++, 3 rd Edition 1 Chapter 1. Introduction to Computers and Programming.
 The central processing unit (CPU) interprets and executes instructions.  The “brains” of the computer.  The speed of the processor is how fast it.
©Brooks/Cole, 2003 Chapter 5 Computer Organization.
COSC 120 Computer Programming
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
©Brooks/Cole, 2003 Chapter 5 Computer Organization.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
6/20/2015List Decoding Of RS Codes 1 Barak Pinhas ECC Seminar Tel-Aviv University.
CPSC 231 Sorting Large Files (D.H.)1 LEARNING OBJECTIVES Sorting of large files –merge sort –performance of merge sort –multi-step merge sort.
1 Factoring Large Numbers with the TWIRL Device Adi Shamir, Eran Tromer.
Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MIT.
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.
Distributed Structural Health Monitoring A Cyber-Physical System Approach Chenyang Lu Department of Computer Science and Engineering.
assumes basic arithmetic
A Survey of Mobile Phone Sensing Michael Ruffing CS 495.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Daniel, Stephen & Thomson 1. » Easy for learners to create mobile apps for Android smart phones » Visually fitting together puzzle piece-shaped "programming.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Sensys 2009 Speaker:Lawrence.  Introduction  Overview & Challenges  Algorithm  Travel Time Estimation  Evaluation  Conclusion.
How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
ICMAP-Shakeel 1 Infrastructure and Operations. ICMAP-Shakeel 2 Performance Variable for IT Functional capabilities and limitations Price-performance ratio.
An Overview of Using Computers
Living in a Digital World Discovering Computers Fundamentals, 2010 Edition.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Adaptive Real-Time Rendering of Planetary Terrains WSCG 2010 Raphaël Lerbour Jean-Eudes Marvie Pascal Gautron THOMSON R&D, Rennes, France.
Business Computer Information Systems I.  Knowing how to use a computer is a basic skill necessary to succeed in business or to function effectively.
Dr. Michael D. Featherstone Summer 2013 Introduction to e-Commerce Web Analytics.
RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY Efficient Transmission.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
1 P2P Layer Streaming for Heterogeneous Networks in PPSP K. Wu, Z. Lei, D. Chiu Kent Kangheng Wu 9/11/2010.
What is programming? Steps to solve a problem. Repeatable by executing or “doing” the steps again. UWP - Landgraf Some slides are from Dr. Qi Yang's notes.
Review Everything you need to know for the 1 st Quarter Test.
Computer Basics Terminology - Take Notes. What is a computer? well, what is the technical definition A computer is a machine that changes information.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Jennifer Campbell November 30,  Problem Statement and Motivation  Analysis of previous work  Simple - competitive strategy  Near optimal deterministic.
Lesson Objectives To understand the basic hardware of computers, and how they are made up To be able to compare performance of computers with price.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP14 Pattern Recognition Miguel Tavares.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
Basic Computer Organization Rashedul Hasan.. Five basic operation No matter what shape, size, cost and speed of computer we are talking about, all computer.
Typing Pattern Authentication Techniques 3 rd Quarter Luke Knepper.
Mining of Massive Datasets Ch4. Mining Data Streams
June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University.
Eric Minner & James Pittman. Outline Project Statement / Motivation Concept overview Quick computer vision overview Demo Lessons Learned Future Work.
4 Introduction Carrier-sensing Range Network Model Distributed Data Collection Simulation 6 Conclusion 2.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Chapter 1: Computer Basics Instructor:. Chapter 1: Computer Basics Learning Objectives: Understand the purpose and elements of information systems Recognize.
Chapter 1: Computer Basics Learning Objectives: Understand the purpose and elements of information systems Recognize the different types of computers Distinguish.
Basic Computer Components Unit 2. What is a computer?  A computer is an electronic device that accepts raw data and processes it into information that.
Exploring Strategies For Optimizing Knowledge Derivation From Imagery
How Has This Course Changed Your Perception of Digital Media
Section 2 Terms Autumn Buchsenschutz.
Click once to reveal the definition. Think of the answer
1-1 Logic and Syntax A computer program is a solution to a problem.
Computer Hardware and Software
Streaming & sampling.
Computer Hardware Introduction.
Near-Optimal (Euclidean) Metric Compression
COSC 320 Advanced Data Structures and Algorithm Analysis
Basic Computer Organization
Presentation transcript:

An Effective Coreset Compression Algorithm for Large Scale Sensor Networks Dan Feldman, Andrew Sugaya Daniela Rus MIT

=Data

How much data?

1 GPS Packet = 100 bytes (latitude, longitude, time)

1 GPS Packet = 100 bytes every 10 seconds

~40 Mb / hour or ~1 Gb / day

per device

~300 million smart phones sold in

For 100 million devices

~ 100 petabytes per day For 100 million devices

~ 100 thousand terabytes per day

2 terabytes each

x50000 / day

A lot of data.

GPS-points Data iPhones can collect high-frequency GPS traces GPS-point = (latitude, longitude, time) latitudelongitudetime :44: :44: :45: :45: :45: :45: :45: :45: :45: :45:11 ………

Example

3-D Visualization

Challenges Storing data on iPhone is expensive Transmission data is expensive Hard to interpret raw data Dynamic real-time streaming data

Key Insight: Identify Critical Points Approximate the n points by k << n semantically meaningful connected segments

Our Approach Central Expy, Singapore Ayer Rajah Expy, Singapore Chin Swee Rd, Singapore 261 Outram Rd, Singapore St Andrew's Rd, Singapore A Havelock Rd, Singapore A Raffles Ave, Singapore Raffles Blvd, Singapore N Buona Vista Rd, Singapore 5 Lower Kent Ridge Rd, Singapore 4 Medical Dr, Singapore Leonie Hill, Singapore 113 Devonshire Rd, Singapore Devonshire Rd, Singapore Grange Rd, Singapore 27 Grange Rd, Singapore Natl Youth Council, Singapore 25K Paterson Rd, Singapore Orchard Rd, Singapore Orchard Rd, Singapore timelatitudelongitude 8:44: :44: :45: :45: :45: :45: :45: :45: :45: :45: ………

Solution overview Semantically compress data points – Use coresets Fit lines to the semantic points – Use splines on coreset Reverse geo-cite to get directions

Problem Statement Input: set P of n data points in R d and integer k Output: optimal k-spline for P that provides semantic compression for large data set P

Related Work

Our Main Compression Theorem Example application

Streaming and Parallel Computation

Previous Work for streaming

p1p1 p2p2 p3p3 p4p4 p5p5 p7p7 p6p6 p8p8 p9p9 p 10 p 11 p 12 p 13 p 15 p 14 p 16 Streaming Compression using merge & reduce

Our Main Streaming Theorem

p1p1 p2p2 p3p3 p4p4 p5p5 p7p7 p6p6 p8p8 p9p9 p 10 p 11 p 12 p 13 p 15 p 14 p 16 Parallel computation

Summary Central Expy, Singapore Ayer Rajah Expy, Singapore Chin Swee Rd, Singapore 261 Outram Rd, Singapore St Andrew's Rd, Singapore A Havelock Rd, Singapore A Raffles Ave, Singapore Raffles Blvd, Singapore N Buona Vista Rd, Singapore 5 Lower Kent Ridge Rd, Singapore 4 Medical Dr, Singapore Leonie Hill, Singapore 113 Devonshire Rd, Singapore Devonshire Rd, Singapore Grange Rd, Singapore 27 Grange Rd, Singapore Natl Youth Council, Singapore 25K Paterson Rd, Singapore Orchard Rd, Singapore Orchard Rd, Singapore timelatitudelongitude 8:44: :44: :45: :45: :45: :45: :45: :45: :45: :45: ………

5000 points 300 points

Running time

Space

Tested Data sets NameNo. of Users Time Extent Data Size ~ Source Subject in Singapore 12 Days300kProbe device and iPhone application Taxi-Cabs in San-Francisco 5004 Months 300MBPublic data (“Crowdad”) Taxi-Cabs in Boston 254 Years15GBMIT

The Experiment

Experiments: Subject in Singapore Compression Ratio Error Ratio

Experiments: 500 San-Francisco Taxi-cabs

Website Coreset Display Data Display Visualization of Result of Algorithm - A Coreset

Contribution Semantic compression of data from sensors Line simplification using – One pass over data – Logarithmic space (for massive data sets) – Linear time – Provable bounded error