Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Similar presentations


Presentation on theme: "Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden."— Presentation transcript:

1 Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden

2 Data collection paradigm Base Station Query Distribute query Collect data New Query SQL-style query Redo process Goal: Push beyond simple data gathering devices paradigm

3 Example: temperature data from 10 nearby sensors: Slow changes over time Measurements correlated 4 hours of data send 5 numbers!! (yet very good approximation) Approximate measurements as send 500 numbers Collect all measurements: VS using Regression: Data is highly correlated Redundancy & Structure Build lower dimensional representation  Compression for data transmission  Provide nodes with local view of global state  …

4 The regression problem Given, basis functions Find coeffs w={w 1,…,w k } Precisely, minimize the residual error: N sensors K basis functions N sensors measurements weights K basis func

5 Regression solution where k×k matrix for k basis functions k×1 vector Problems: Invert A: too expensive in one mote “Gather” matrix A: NK 2 messages

6 Global temperature is complex Temperature surface is complex  Need complex basis functions? Lots of communication?

7 What are we missing? Temperature surface is complex but Lots of local structure! Local temperature regionsDo the right thing in the overlaps

8 Kernel regression Local basis functions for each region Kernels average between regions Distributed algorithm for obtaining coefficients Simple communication along a spanning tree Robust to lost messages Need global optimization to find optimal coefficients

9 Kernel regression  Sparse matrices 0 0 sensors basis functions (sparse) Sparse basis  Kernel basis functions have local support h1h1

10 Gaussian Elimination A is sparse ) Efficient Gaussian elimination: Complete system [A|b] After Gaussian elimination, solve linear system by k simple divisions subtract

11 Add message from node 1 One step of Gaussian elimination Distributed regression same matrices Complete system [A|b] Sensor 2 can locally compute w 2, w 3 1 2 This subsystem is enough to compute w 2, w 3 M12M12

12 12345. Specify regions. 1 Sensors compute small matrices that add up to [A|b]: 2. Message Passing. 3 Solve local Systems. 4 Distributed Regression: Solve global kernel regression problem with simple local communication

13 Communication pattern 1236754 High quality links may not align with kernel topology Kernels may not form a tree structure Kernels form a tree structure  Communication along a spanning tree Communication along spanning tree using junction tree data structure

14 Distributed junction trees K 1, K 3 K 1, K 2 K 3, K 4 K 4, K 6 K 3, K 5 K 5, K 6 K 1,K 2 K 4, K 6 K 1,K 3 K 3,K 5,K 6 K 1,K 2,K 3, K 4,K 5,K 6 K 1,K 2,K 3, K 4,K 5,K 6 K 1,K 2,K 3, K 4,K 5,K 6 K 1,K 3,K 4,K 5,K 6 K 1,K 2, K 3,K 4,K 6 K 5,K 6 K1,K1,, K 6, K6, K6 1 2 45 3 6  Any spanning tree transformed to a junction tree  Communication along junction tree guaranteed to obtain optimal parameters  Different spanning trees lead to different junction trees with different computation and communication complexity  See Paskin and Guestrin ’04 for spanning tree optimization

15 Robustness Robustness is key in sensor networks Nodes may be added to the network or fail Communication is unreliable Link qualities change over time Distributed regression messages are robust: Lost messages correspond to lost measurements Must make spanning tree and junction tree algorithms robust See Paskin and Guestrin ’04 for details

16 Locally, nodes obtain global view View from node 1:View from node 17:View from node 46: Global solution:

17 Temperature model for lab data

18 Convergence and robustness Distributed regression reliable communication Distributed regression 50% packets lost Offline solution

19 Incremental changes Distributed regression reliable communication Distributed regression 50% packets lost Offline solution Initializing with noon temperatures At 6pm, initializing from noon results

20 Residual error varies over time Average over regions Quadratic in time Linear in time Constant in time Regression with linear spatial components:

21 Effect of time window

22 Communication complexity

23 Extensions and applications Adaptive sampling Outlier and faulty sensor detection Contour finding Adaptive data modeling Basis function selection Model-based bit compression Bounds on bit precision for Gaussian elimination applicable Hierarchical models Unifying with wavelet-based approaches Currently applying similar ideas to probabilistic inference, actuator control, … See Paskin and Guestrin ’04 for details

24 Conclusions General distributed regression algorithm for sensor networks Robust to node and message losses Kernel regression is an effective model for wide range of sensor network data Provide basis for new more complex sensor network applications

25 Add message from node 1 One step of Gaussian elimination Distributed regression same matrices Complete system [A|b] Sensor 2 can locally compute w 2, w 3 12 This subsystem is enough to compute w 2, w 3 M12M12


Download ppt "Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden."

Similar presentations


Ads by Google