GPS Sensor Web Time Series Analysis Using SensorGrid Technology Robert Granat 1, Galip Aydin 2, Zhigang Qi 2, Marlon Pierce 2 1 Science Data Understanding.

Slides:



Advertisements
Similar presentations
E-DECIDER: QuakeSim Tools and Products Marlon Pierce, Co-Investigator Margaret Glasscoe, PI
Advertisements

Supervised Learning Recap
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids Galip Aydin Department Of Computer Science Indiana University.
Overview Full Bayesian Learning MAP learning
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Lecture 5: Learning models using EM
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Integrating Geographical Information Systems and Grid Applications Marlon Pierce Contributions: Yili Gong,
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Isolated-Word Speech Recognition Using Hidden Markov Models
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Graphical models for part of speech tagging
High Performance Web Service Architecture for Sensors and Geographic Information Systems Galip Aydin.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Breakout Session 2 Group 2 Data Delivery Maggi Glasscoe, Marlon Pierce, Shirley Tseng, Diane Williams, Shubaroop Ghosh, Swapan Nag, Lucien Cox, Anne Rosinksi,
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
QuakeSim Work: Web Services, Portlets, Real Time Data Services Marlon Pierce Contributions: Ahmet Sayar,
K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Particle Filters.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Using Topic-Based Publish/Subscribe for Managing Real Time GPS Streams Marlon Pierce, Galip Aydin, Zhigang Qi Community Grids Lab Indiana University 1.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
SensorGrid Galip Aydin June SensorGrid A flexible computing environment for coupling real-time data sources to High Performance Geographic Information.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Integrating Geographical Information Systems and Grid Applications Marlon Pierce Contributions: Ahmet Sayar,
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
November Geoffrey Fox Community Grids Lab Indiana University Net-Centric Sensor Grids.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Scripting based architecture for Management of Streams and Services in Real-time Grid Applications Authors Harshawardhan Gadgil, Geoffrey Fox, Shrideep.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Integrating Geographical Information Systems and Grid Applications
10701 / Machine Learning.
Hidden Markov Models Part 2: Algorithms
CONTEXT DEPENDENT CLASSIFICATION
Statistical based IDS background introduction
Yining ZHAO Computer Network Information Center,
Presentation transcript:

GPS Sensor Web Time Series Analysis Using SensorGrid Technology Robert Granat 1, Galip Aydin 2, Zhigang Qi 2, Marlon Pierce 2 1 Science Data Understanding Group, Jet Propulsion Laboratory 2 Community Grids Laboratory, Indiana University National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, CA

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 2 Introduction Modern earth sensor networks are producing large volumes of data. This demands three things: 1.Automated methods to search, analyze, and mine the data. 2.Infrastructure to connect sensors collecting data with users and methods. 3.Interfaces through which users can access data and employ methods. Here address these demands in a GPS sensor web context - but most of this work can be generalized to other contexts. We use RDAHMM, a hidden Markov model-based time series analysis method, and SensorGrid, a web infrastructure technology.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 3 Hidden Markov Models Statistical models for time series data. Can be used with continuous or discrete valued data. Fitting an HMM allows us to describe discrete modes of behavior to the system. Can be trained with labeled examples (supervised learning) or without labeled examples (unsupervised learning). Successful in many fields (e.g., speech processing, protein sequence analysis).

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 4 Hidden Markov Model Mechanics Q1Q1 Q2Q2 O1O1 O2O2 O3O3 Q3Q3 QTQT OTOT State Sequence Observations The HMM is a stochastic state machine: the state at each point in time is a probabilistic function of the previous state; likewise the observed output at that time is a probabilistic function of the current state. Noise

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 5 Hidden Markov Models for Geophysical Sensor Webs Classification of the observation into system/operational modes is the goal. Fitting an HMM automatically provides classification; the solution inherently implies an underlying sequence of discrete states. Observations are classified according to the state to which they belong. Below: the HMM state sequence for the time series above

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 6 Example of HMM Classification Seismograph data collected at 1Hz from a station in Pasadena, California. HMM states are color- coded. Classification was performed without guidance or labeled training examples.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 7 Challenges of Geophysical Data Large volumes of data collected by sensor webs (e.g., GPS/seismic networks, ocean buoys). Little or no labeled training data - so we are almost always in an unsupervised learning mode. A priori system information is often unavailable or unreliable. Data is complicated enough to induce large numbers of local maxima. Standard Expectation-Maximization fitting method is vulnerable to local maxima issues in the absence of constraints based on a priori information.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 8 Regularized Deterministic Annealing Expectation-Maximization RDAEM is a method for overcoming the problems inherent in basic EM. Deterministic annealing modifies the objective function based on a computational temperature that flattens or accentuates features. The annealing method greatly reduces the sensitivity of the method to initial conditions, but gets stuck in certain structural local maxima with duplicate states. We overcome this problem by adding regularization terms that bias the solution away from those local maxima.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 9 Comparison of EM and RDAEM We compare the methods with two metrics: 1)The log likelihood of the solutions: Quality. 2)The number of maxima found in repeated tests: Stability. Conclusion: RDAEM has equal quality and greater stability.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 10 SensorGrid Architecture Major components : Real-Time filters Grid Messaging Substrate Information Service Filters can be run as Web Services to create workflows. Filter Chains can be deployed for complex processing. Streaming messaging provide high-performance transfer options. NaradaBrokering provides a robust message-passing infrastructure.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 11 Real-Time Filters Real-time data processing is supported by employing filters around publish/subscribe messaging system. The filters are extended from a generic class to inherit publish and subscribe capabilities. They can be connected in parallel or serial as chains to solve complex problems.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 12 SOPAC GPS Network 8 networks for 80 stations produce 1Hz high resolution data. Socket based real-time binary-RYO format access is available, but not utilized! We developed filters to provide multiple format (RYO, ASCII, GML) real-time streaming access.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 13 Integration with SCIGN and SOPAC GPS Step 1: Raw GPS data (1Hz) is converted to RYO format and made available through a data server. Step 2: Data is passed through a series of filters that perform format conversion and station separation. Message passing is handled through NaradaBrokering. In this context, analysis applications - such as RDAHMM - are viewed as just another filter. Step 3: Data is passed to the RDAHMM analysis application.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 14 RDAHMM GPS Results via SensorGrid A Google Maps interface allows a user to selection GPS stations. Models are fit to a large initial body of data from each station (assumes body of data is representative). Trained models are applied to incoming data from each station. Currently data are held in 10 minute buffers, analyzed and then presented to the user (near-real time, the 10-minute buffer time is arbitrarily chosen). Additional interfaces exist for exploration of archived data. Segmented time series can be used to perform exploratory science, search data catalogs, and detect anomalies.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 15 RDAHMM Integration and Visualization with Real-Time Filters

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 16 Real-Time positions on Google maps

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 17 Recording and Replaying Sensor Streams Filters can be used to record and replay scenarios, such as Earthquakes in GPS case. We developed RYO Recorder and RYO Publisher Filters. The RYO Recorder creates daily archives of the GPS Streams. RYO Publisher can be used to play daily or certain segments of the records. We replayed the 2004 Southern California Earthquake using Parkfield GPS network archive

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 18 Conclusions We have developed analysis and infrastructure methods for GPS sensor web data. These methods are not network or data specific and can be extended to other sensor networks and data types. A hidden Markov model-based time series analysis method provides robust segmentation and classification results that can be applied in near-real time (next step: full real time). SensorGrid infrastructure allows robust and flexible connections between data sources, applications, and users. Demo of the user interface (with Scripps collaborators) at Tue. afternoon poster session G23B-1289.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 19 Hidden Markov Model Parameters Initial probabilities State-to-state transition probabilities Output distributions Where A hidden Markov modelwithstates consists of

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 20 Hidden Markov Model Expectation-Maximization EM is the standard method for fitting HMMs to data. Iterative, starts with an initial model guess. “E”-step: Calculate the expectation of the log likelihood of the model given an estimate of the unknown parameters. “M”-step: Maximize the expected value of the log likelihood in the unknown parameters. The so-called Q-function optimized in the “M”-step is is an estimate of the state assignment. is an estimate of the state transitions.

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 21 Regularization Terms: Gaussian Output Distributions We modify the likelihood objective function with the following improper prior: This prior is smallest when the means are identical. It manifests as a regularization term added to the Q-function: To maintain concavity of the Q-function, the regularization weight must be constrained according to

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 22 Slide Master

National Aeronautics and Space Administration Jet Propulsion Laboratory - California Institute of Technology GPS Sensor Web Time Series Analysis Using SensorGrid Technology 23 SignOffPage National Aeronautics and Space AdministrationJet Propulsion Laboratory California Institute of Technology Pasadena, CA