From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Applications of one-class classification
Bayesian Belief Propagation
Pattern Finding and Pattern Discovery in Time Series
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Dynamic Bayesian Networks (DBNs)
Dimension reduction (1)
Data Mining and the OptIPuter Padhraic Smyth University of California, Irvine.
An Overview of Machine Learning
Supervised Learning Recap
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Pattern Recognition and Machine Learning
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Conditional Random Fields
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Today Introduction to MCMC Particle filters and MCMC
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Unifying Review of Linear Gaussian Models
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Crash Course on Machine Learning
24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.
Data Mining Techniques
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Anomaly detection with Bayesian networks Website: John Sandiford.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
14 October, 2010LRI Seminar 2010 (Univ. Paris-Sud)1 Statistical performance analysis by loopy belief propagation in probabilistic image processing Kazuyuki.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
CS Statistical Machine learning Lecture 24
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Particle Filtering for Geometric Active Contours
Statistical Models for Automatic Speech Recognition
Machine Learning Basics
Expectation-Maximization & Belief Propagation
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
LECTURE 15: REESTIMATION, EM AND MIXTURES
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000

Monday’s talk: An introduction to data mining General concepts Focus on current practice of data mining: main message is be aware of the “hype factor” Today’s talk: Modeling structure and patterns

Further Reading on Data Mining Review Paper: – –P. Smyth, “Data mining: data analysis on a grand scale?”, preprint of review paper to appear in Statistical Methods in Medical Research Text (forthcoming) –Principles of Data Mining D. J Hand, H. Mannila, P. Smyth MIT Press, late 2000

Nonlinear Regression Pattern Finding Computer Vision, Signal Recognition Flexible Classification Models Scalable Algorithms Graphical Models Hidden Variable Models “Hot Topics” Hidden Markov Models Belief Networks Support Vector Machines Mixture/ Factor Models Classification Trees Association Rules Deformable Templates Model Combining Wavelets

Theme: From Grid Points to Patterns A Mismatch –earth science is concerned with structures and objects and their dynamic behavior global: EOF patterns, trends, anomalies local: storms, eddies, currents, etc –but much of earth science modeling is at the grid level models are typically defined at the lowest level of the “object hierarchy”

Theme: From Grid Points to Patterns Models are often down here Structure of Scientific Interest e.g., local: storm, eddy, etc global: EOF, trend, etc

Examples of Grid Models Analysis: Markov Random Fields (e.g., Besag, Geman and Geman) –p(x1|neighbors of x1,all other pixels) = p(x1|neighbors of x1) – p(x1,….xN) = product of clique functions –Problem only models “low-level” pixel constraints no systematic way to include information about shape Simulation: GCM models –grid model for 4d, first-principles equations –produces vast amounts of data –no systematic way to extract structure from GCM output

The Impact of Massive Data Sets Traditional Spatio-Temporal Data Analysis –visualization, EDA: look at the maps, spot synoptic patterns But with Massive Data Sets……. –e.g., GCM: multivariate fields, high resolution, many years –impossible to manually visualize Proposal –pattern analysis and modeling can play an important role in data abstraction –many new ideas and techniques for pattern modeling are now available

Data Abstraction Methods Simple Aggregation Basis Functions/Dimension Reduction –EOFs/PCA, Wavelets, Kernels Latent Variable Models –mixture models –hidden variable models Local Spatial and/or Temporal Patterns –e.g., trajectories, eddies, El Nino, etc Less widely-used Relatively widely-used

A Modeling Language: Graphical Models p(A,B,C,D) =  p(x|parents(x)) = p(D|C)p(C|A)p(B|A)p(A) joint distribution = product of local factors A B C D

Two Advantages of Graphical Models Communication –clarifies independence relations in the multivariate model Computation (Inference) –posterior probabilities can be calculated efficiently tree structure: linear in number of variables graph with loops: depends on clique structure –Exists completely general algorithms for inference e.g., see Lauritzen and Spiegelhalter, JRSS, 1988 for more recent work see Learning in Graphical Models, M. I. Jordan (ed), MIT Press, 1999

The Hidden Markov Model X1X1 X2X2 X3X3 XTXT Y1Y1 Y2Y2 Y3Y3 YTYT Time Observed Hidden

The Hidden Markov Model X1X1 X2X2 X3X3 XTXT Y1Y1 Y2Y2 Y3Y3 YTYT Time Observed Hidden P(X,Y) =  p(x t | x t-1 ) p(y t | x t ) Markov ChainConditional Density

The Hidden Markov Model Standard Model –Discrete X, m values: Multivariate Y Inference and Estimation –Estimation: Baum-Welch algorithm (uses EM) –Inference: scales as O(m 2 T), linear in length of chain same as graphical models (Smyth, Heckerman, Jordan, 1997) What it is useful for: –“compresses” high dimensional Y dependence into lower- dimensional X –model dependence at X level rather than at Y level –learned states can be viewed as dynamic clusters –widely used in speech recognition

Kalman Filter Models, etc X1X1 X2X2 X3X3 XTXT Y1Y1 Y2Y2 Y3Y3 YTYT Time Observed Hidden If the X’s are real-valued, Gaussian => Kalman filter model If p(Y|X) is tree-structured => spatio-temporal tree structure

Application: Coupling Precipitation and Atmospheric Models Problem –separate models for precipitation and atmosphere over time –how to couple both together into a single model? “downscaling” Hidden Markov approach – Hughes, Guttorp, Charles (Applied Statistics, 1999) –coupled data recorded on different time and space scales –dependence is “compressed” into hidden Markov state dependence –Nonhomogenous in time: atmospheric measurements modulate Markov transitions

X1X1 X2X2 X3X3 XTXT A1A1 A2A2 A3A3 ATAT P1P1 P2P2 P3P3 PTPT Precipitation Atmospheric Measurements Hidden Weather States

Precipitation Measurements Spatially irregular Daily totals (binarized)

Atmospheric Measurements Interpolated to regular grid SLP, temp, GH Twice/day

Joint Data

“Weather-state” model “Weather states” –small discrete set of distinct weather states –assumed to be Markov over time –unobserved => hidden Markov model –Represent atmosphere by locally derived variables Spatial precipitation –relatively simple autologistic model –only dependent on weather state Algorithm “discovered” 6 physically plausible weather states –validated out of sample Example of automated structure discovery ?

Finite Mixture Models Y C Hidden Observed P(Y) =  p(Y|c) p(c) Component Densities Class Probabilities

Finite Mixture Models Estimation –Direct application of EM Uses –density estimation, approximate p(Y) as linear combination of simple components –model-based clustering interpret component models as clusters probabilistic membership of data points, overlap can use Bayesian methods, cross-validation to find K

Application: Clustering of Geopotential Height EOF1 EOF2

Application: Clustering of Geopotential Height EOF1 EOF2 3 Gaussian solution consistently chosen bv cross-validation Clusters agree with analysis of Cheng and Wallace (1995) Smyth, Ide, Ghil, JAS 1999

Conditional Independence Mixture Models Y2Y2 C Y1Y1 YdYd Component Densities Class Probabilities P(Y) =  p(Y|c) p© =  (  p(Y i |c) ) p(c) Note: Y’s are marginally dependent: model dependence via C

Mixtures of PCA bases

Formulate a probabilistic model for PCA Learn mixture of PCAs using EM (Tipping and Bishop, 1999)

Multiple Cause/Factor Models Y2Y2 C Y1Y1 YdYd p(Y) =  p(Y|c,d) p(c) =  (  p(Y i |c,d) ) p(c)p(d) D Intuition: Y’s are a result of multiple (hidden) factors See Dunmur and Titterington (1999)

Summary so far on Mixture Models Mixture Model/Latent Variable models –key idea is that hidden state is an abstraction/categorization –probabilistic modeling allows a systematic approach many models can be expressed in graphical form parameters can be learned via EM model structure can be automatically chosen –many exotic variations of these models being proposed in machine learning/neural network learning –learning of hidden variables discovery of structure

Clustering Objects from Sequential Observations Say we want to cluster eddies as a function of time-evolution of –shape –intensity –position –velocity, etc Two problems here: –1. “extract” eddy features (shape, etc) from raw grid data –2. How can we cluster these “objects” different durations: how do we define distance?

Probabilistic Model-Based Approach p(Y) =  p(Y|c) p(c) Y could be a time-series, curve, sequence, etc => p(Y|c) is a density function on time-series, curves, etc => mixtures of density models for time-series, curves, etc EM generalizes nicely => general framework for clustering objects (Cadez, Gaffney, Smyth, KDD 2000)

Clusters of Markov Behavior B C D A B C D A B C D A Cluster 1Cluster 2 Cluster 3

Mixtures of Curves Regression Clustering –model each curve as a regression function f(y|x) –hypothesize a generative model probability p k of being chosen for cluster k given cluster k, a noisy version of f k (y|x) is generated –mixture model, can learn the K noisy functions using EM algorithm –(Gaffney and Smyth, KDD99) –significant improvement on k-means variable length trajectories, multi-dimensional trajectories –can use non-parametric kernel regression for component models

Detecting and Clustering Cyclone Trajectories Background –extra-tropical cyclone-center trajectories detected as (x,y) functions of time –North Atlantic data clustered into 3 distinct groups by Blender et al (QJRMS, 1997) –clusters have distinct physical interpretation, allow for “higher-level” analysis of data, e.g., state transitions Limitations –(x,y) trajectories treated as fixed-length vectors so that vector- based clustering can be used (k-means) –forces all trajectories to be of same length, ignores smoothness

Modeling an Object’s Shape Parametric Template Model for Shape of Interest –e.g., boundary template modeled as smooth parametric function

Deformable Templates Probabilistic Interpretation –mean shape –spatial variability about mean shape –defines a density in shape space (Dryden and Mardia, 1998)

Deformable Templates A probabilistic model enables many applications –object recognition: what is the probability under the model? –spatial segmentation based on both shape and intensity –matching/registration –principal component directions –estimation of shape parameters from data –evolution of shape parameters over time –clusters,mixtures of shapes –compositional hierarchies of shapes –Provides a sound statistical foundation for shape analysis applications: automated analysis of medical images –Probabilistic approach means it can be coupled to other spatial and temporal models

Example: Pattern-Matching in Time Series Problem: “Find similar patterns to this one in a time-series archive”

Example: Pattern-Matching in Time Series Problem: “Find similar patterns to this one in a time-series archive” Is this similar ?

Model-Based Approach: 1d Deformable Templates  Segmental hidden semi-Markov model (Ge and Smyth, KDD 2000)  Detection via “maximum likelihood parsing” S1S1 S2S2 STST Segments States

Pattern-Based End-Point Detection TIME (SECONDS) Original Pattern Detected Pattern End-Point Detection in Semiconductor Manufacturing

Heterogeneity among Objects  

Form of Population Density  

Mixture Models, No Variability  This is in effect the model we used for clustering sequences, curves, etc., earlier

Potential Application: Storm Tracking Observed Data (past storms) Parameters for Individual Storms Population Density in Parameter Space New Data

Software Tools for Model Building There is a bewildering number of possible models –concept of “data analysis strategy” –branching factor is very high –all the modeling comes to naught unless scientists can use it! Desirable to have “toolkits” that scientists can use on their own –graphical models are a start, although perhaps not ideal use graphical models as a language for model representation details of estimation (EM, Bayes, etc) are hidden from user probabilistic representation language allows “plug and play” see BUGS for a Bayesian version of this idea see Buntine et al, KDD99, for “algorithm compilers”

Conclusions Motivation: grid-level -> structures,patterns Patterns can be described, modeled, and analyzed statistically –latent variable models –hidden Markov models –deformable templates –hierarchical models Significant recent work in pattern recognition, neural networks, machine learning on these topics –recent emphasis on probabilistic formalisms Need more effort in transferring to science applications –systematic model-building framework/tools –education