10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector.

Slides:



Advertisements
Similar presentations
VC Dimension – definition and impossibility result
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Kick-off Meeting, July 28, 2008 ONR MURI: NexGeNetSci Distributed Coordination, Consensus, and Coverage in Networked Dynamic Systems Ali Jadbabaie Electrical.
Onur G. Guleryuz & Ulas C.Kozat DoCoMo USA Labs, San Jose, CA 95110
What is Statistical Modeling
Discrete geometry Lecture 2 1 © Alexander & Michael Bronstein
Visual Recognition Tutorial
Non Linear Programming 1
x – independent variable (input)
Numerical Optimization
SEBD Tutorial, June Monitoring Distributed Streams Joint works with Tsachi Scharfman, Daniel Keren.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Power Laws Otherwise known as any semi- straight line on a log-log plot.
OBBTree: A Hierarchical Structure for Rapid Interference Detection Gottschalk, M. C. Lin and D. ManochaM. C. LinD. Manocha Department of Computer Science,
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
GPS/Dead Reckoning Navigation with Kalman Filter Integration
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Efficient Distance Computation between Non-Convex Objects By Sean Quinlan Presented by Sean Augenstein and Nicolas Lee.
Optimal Placement and Selection of Camera Network Nodes for Target Localization A. O. Ercan, D. B. Yang, A. El Gamal and L. J. Guibas Stanford University.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
07/21/2005 Senmetrics1 Xin Liu Computer Science Department University of California, Davis Joint work with P. Mohapatra On the Deployment of Wireless Sensor.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
1 E. Fatemizadeh Statistical Pattern Recognition.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Characterizing rooms …1 Characterizing rooms regarding reverberation time prediction and the sensitivity to absorption and scattering coefficient accuracy.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
3D Game Engine Design 1 3D Game Engine Design Ch D MAP LAB.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
Spectral Partitioning: One way to slice a problem in half C B A.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
1 Power Efficient Monitoring Management in Sensor Networks A.Zelikovsky Georgia State joint work with P. BermanPennstate G. Calinescu Illinois IT C. Shah.
1 Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from samples. There are.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Geometric Approach Geometric Interpretation:
Support vector machines
Deep Feedforward Networks
Solver & Optimization Problems
Department of Civil and Environmental Engineering
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
Research: algorithmic solutions for networking
Ying shen Sse, tongji university Sep. 2016
Optimization of Designs for fMRI
Instance Based Learning
Learning From Observed Data
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

10/5/ Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector Coloring the vector space Coloring the vector space Grey:: function > threshold Grey:: function > threshold White:: function <= threshold White:: function <= threshold Goal: determine color of global data vector (average). Goal: determine color of global data vector (average).

10/5/ Bounding the Convex Hull Observation: average is in the convex hull  Observation: average is in the convex hull  If convex hull monochromatic then average too If convex hull monochromatic then average too But – convex hull may become large But – convex hull may become large

10/5/ Drift Vectors Periodically calculate an estimate vector - the current global Periodically calculate an estimate vector - the current global Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Global average statistics vector is also the average of the drift vectors Global average statistics vector is also the average of the drift vectors

10/5/ The Bounding Theorem [SIGMOD’06] A reference point is known to all nodes A reference point is known to all nodes Each vertex constructs a sphere Each vertex constructs a sphere Theorem: convex hull is bounded by the union of spheres Theorem: convex hull is bounded by the union of spheres  Local constraints!  Local constraints!

10/5/ Basic Algorithm Basic Algorithm An initial estimate vector is calculated An initial estimate vector is calculated Nodes check color of drift spheres Nodes check color of drift spheres Drift vector is the diameter of the drift sphere Drift vector is the diameter of the drift sphere If any sphere non monochromatic: node triggers re-calculation of estimate vector If any sphere non monochromatic: node triggers re-calculation of estimate vector

10/5/ Reuters Corpus (RCV1-v2) 800,000+ news stories 800,000+ news stories Aug Aug Aug Aug Corporate/Industrial tagging Corporate/Industrial tagging n=10 10 nodes, random data distribution

10/5/ Trade-off: Accuracy vs. Performance Inefficiency: value of function on average is close to the threshold Inefficiency: value of function on average is close to the threshold Performance can be enhanced at the cost of less accurate result: Performance can be enhanced at the cost of less accurate result: Set error margin around the threshold value Set error margin around the threshold value

10/5/ Performance Analysis

10/5/ Performance Analysis (cntd.)

10/5/ Balancing Globally calculating average is costly Globally calculating average is costly Often possible to average only some of the data vectors. Often possible to average only some of the data vectors.

SRDC /5/ Shape Sensitivity [PODS’08] Fitting cover to Data Fitting cover to Data Fitting cover to threshold surface Fitting cover to threshold surface Specific function classes Specific function classes

SRDC /5/ Fitting Cover to Data (using the covariance matrix)

10/5/ Fitting Cover to Threshold Surface -- Reference Vector Selection

10/5/ Distance Fields Skeleton, Medial Axis

10/5/ Results – Shape Sensitivity

e ΔV1ΔV1 ΔV2ΔV2 ΔV3ΔV3 ΔV4ΔV4 ΔV5ΔV5 f(v(t))  T epep ΔVp1ΔVp1 ΔVp2ΔVp2 ΔVp3ΔVp3 ΔVp4ΔVp4 ΔVp5ΔVp5 v(t) Stricter local constraints if local predictions remain accurate Keeping up with v(t) movement Prediction-Based Geometric Monitoring [SIGMOD’12]

SRDC 2013 Local Constraints 17 Let the nodes communicate only when “something happens” Tell me only if your measurement is larger than 50! Send me your current measurements! Safe Zones!

SRDC 2013 Local Distributions Reasonable to assume future data will behave similarly… These Safe Zones save more communication!

SRDC 2013 Optimal Safe Zones Legal / Safe 2. Large: Minimize Communication

SRDC 2013 Example: Air quality monitoring 20 What are the optimal Safe Zones…?

SRDC 2013 The Optimization Problem 21 Is this Convex? Is this Linear? How many constraints are these? BAD NEWS: This problem is NP-hard.

SRDC 2013 The Optimization Problem Step 3: Use non-convex optimization toolboxes (e.g. Matlab’s “fmincon”).  These toolboxes use sophisticated Gradient Descent algorithms and return close-to-optimal results. X

SRDC Data Set How the data looks like

SRDC 2013 Ratio Queries 24 Example of triangular Safe Zones

SRDC 2013 Improvement over convex-hull cover method 25 Why do we improve so much? Up to 200 nodes were involved in the experiment. The average improvement was by a factor of 17.5 Up to 200 nodes were involved in the experiment. The average improvement was by a factor of ’000 hours

26 Higher Dimensions

SRDC 2013 Chi-Square Monitoring (5D) 27 Examples of axis aligned boxes as Safe Zones

SRDC 2013 Improvement over GM The improvement over the Geometric Method gets more substantial in higher dimensions. 28 1’000 hours 90 nodes

SRDC Safe Zones - Example

SRDC 2013 Biclique: Non-Convex Safe Zones 30 Safe Zone Algorithm (for 2 nodes): Take the data points, build a bipartite graph(how?), find the maximal Biclique, these are your Safe Zones!

SRDC 2013 Conclusions Local filtering for large-scale distributed data systems Local filtering for large-scale distributed data systems Saving in communication is unlimited Saving in communication is unlimited Bounded only by the aggregate over system lifetime Bounded only by the aggregate over system lifetime Saving bandwidth, central resources, power. Saving bandwidth, central resources, power. Not necessary to sacrifice precision and latency Not necessary to sacrifice precision and latency Less communication  more Privacy Less communication  more Privacy 10/5/

SRDC 2013