1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Probabilistic analog of clustering: mixture models
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
Supervised Learning Recap
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Hidden Markov Models Theory By Johan Walters (SR 2003)
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Pattern Recognition and Machine Learning
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Speaker Adaptation for Vowel Classification
Reduced Support Vector Machine
Conditional Random Fields
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Collaborative Filtering Matrix Factorization Approach
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
PATTERN RECOGNITION AND MACHINE LEARNING
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Gaussian Processes Li An Li An
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan.
M Machine Learning F# and Accord.net.
Flat clustering approaches
John Lafferty Andrew McCallum Fernando Pereira
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
Computer vision: models, learning and inference
Machine Learning Basics
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Recovering Temporally Rewiring Networks: A Model-based Approach
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
Hidden Markov Models Part 2: Algorithms
Estimating Networks With Jumps
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Regulation Analysis using Restricted Boltzmann Machines
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Linear Discrimination
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le Song

2 Hubble Ultra Deep Field Learning Dynamic Models Hidden Markov Models e.g. for speech recognition Dynamic Bayesian Networks e.g. for protein/gene interaction System Identification e.g. for control [source: Wikimedia Commons] [source: SISL ARLUT] [Bagnell & Schneider, 2001] [source: UAV ETHZ] Key Assumption: SEQUENCED observations What if observations are NOT SEQUENCED?

3 When are Observations not Sequenced? Galaxy evolution dynamics are too slow to watch Slow developing diseases Alzheimers Parkinsons Biological processes measurements are often destructive [source: STAGES] [source: Getty Images] [source: Bryan Neff Lab, UWO] How can we learn dynamic models for these?

4 Outline Linear Models [Huang and Schneider, ICML, 2009] Nonlinear Models [Huang, Song, Schneider, AISTATS, 2010] Combining Sequence and Unsequenced Data [Huang and Schneider, NIPS, 2011]

5 Problem Description Estimate A from the sample of x i ’s

6 Doesn't seem impossible …

7 Identifiability Issues

8

9 A Maximum Likelihood Approach suppose we knew the dynamic model and the predecessor of each point …

10 Likelihood continued

11 Likelihood (continued) we don’t know the time either so also integrate out over time then use the empirical density as an estimate for the resulting marginal distribution

12 Unordered Method (UM): Estimation

13 Expectation Maximization

14 input output Sample Synthetic Result

15 Partial-order Method (PM)

16 Partial Order Approximation (PM) Perform estimation by alternating maximization Replace UM's E-step with a maximum spanning tree on the complete graph over data points -weight on each edge is probability of one point being generated from the other given A and  -enforces a global consistency on the solution M-step is unchanged: weighted regression

17 Learning Nonlinear Dynamic Models [Huang, Song, Schneider, AISTATS, 2010]

18 Learning Nonlinear Dynamic Models An important issue Linear model provides a severely restricted space of models -we know a model is wrong because the regression yields large residuals and low likelihoods The nonlinear models are too powerful; they can fit anything! Solution: restrict the space of nonlinear models 1.form the full kernel matrix 2.use a low-rank approximation of the kernel matrix

19 Synthetic Nonlinear Data: Lorenz Attractor Estimated gradients by kernel UM

20 Ordering by Temporal Smoothing

21 Ordering by Temporal Smoothing

22 Ordering by Temporal Smoothing

23 Evaluation Criteria

24 Results: 3D-1

25 Results: 3D-2

26 3D-1: Algorithm Comparison

27 3D-2: Algorithm Comparison

28 Methods for Real Data 1.Run k-means to cluster the data 2.Find an ordering of the cluster centers TSP on pairwise L1 distances (TSP+L1) OR Temporal Smoothing Method (TSM) 3.Learn a dynamic model for the cluster centers 4.Initialize UM/PM with the learned model

29 Gene Expression in Yeast Metabolic Cycle

30 Gene Expression in Yeast Metabolic Cycle

31 Results on Individual Genes

32 Results over the whole space

33 Cosine score in high dimensions Probability of random direction achieving a cosine score > 0.5 dimension

34 Suppose we have some sequenced data linear dynamic model: perform a standard regression: what if the amount of data is not enough to regress reliably?

35 Regularization for Regression add regularization to the regression: can the unsequenced data be used in regularization? ridge regression: lasso:

36 Lyapunov Regularization Lyapunov equation relates dynamic model to steady state distribution: Q – covariance of steady state distribution 1.estimate Q from the unsequenced data! 2.optimize via gradient descent using the unpenalized or the ridge regression solution as the initial point

37 Lyapunov Regularization: Toy Example 2-d linear system 2 nd column of A fixed at the correct value given 4 sequence points given 20 unsequenced points A =  = 1

38 Lyapunov Regularization: Toy Example

39 Results on Synthetic Data Random 200 dimensional sparse (1/8) stable system

40 Work in Progress … cell cycle data from: [Zhou, Li, Yan, Wong, IEEE Trans on Inf Tech in Biomedicine, 2009] 49 features on protein subcellular location 34 sequences having a full cycle and length at least 30 were identified another 11,556 are unsequenced use the 34 sequences as ground truth and train on the unsequenced data A set of 100 sequenced images A tracking algorithm identified 34 sequences

41 Preliminary Results: Protein Subcellular Location Dynamics cosine score normalized error

42 Conclusions and Future Work Demonstrated ability to learn (non)linear dynamic models from unsequenced data Demonstrated method to use sequenced and unsequenced data together Continuing efforts on real scientific data Can we do this with hidden states?

43 EXTRA SLIDES

44 Real Data: Swinging Pendulum Video

45 Results: Swinging Pendulum Video

46

47