CSCI 5822 Probabilistic Models of Human and Machine Learning

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Next Semester CSCI 5622 – Machine learning (Matt Wilder)  great text by Hastie, Tibshirani, & Friedman great text ECEN 5018 – Game Theory ECEN 5322 –

Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.

Dynamic Bayesian Networks (DBNs)

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Chapter 4: Linear Models for Classification

Uncertainty Representation. Gaussian Distribution variance Standard deviation.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Thanks to Nir Friedman, HU

Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force.

Computer Vision Linear Tracking Jan-Michael Frahm COMP 256 Some slides from Welch & Bishop.

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.

EM and expected complete log-likelihood Mixture of Experts

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Statistical Learning (From data to distributions).

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Bayesian evidence for visualizing model selection uncertainty Gordon L. Kindlmann

CS Statistical Machine learning Lecture 24

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

Gaussian Processes For Regression, Classification, and Prediction.

Tracking with dynamics

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

The Uniform Prior and the Laplace Correction Supplemental Material not on exam.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Data Modeling Patrice Koehl Department of Biological Sciences

Oliver Schulte Machine Learning 726

Bayesian Estimation and Confidence Intervals

Hidden Markov Models.

水分子不時受撞,跳格子(c.p. 車行) 投骰子 (最穩定) 股票 (價格是不穏定,但報酬過程略穩定) 地震的次數 (不穩定)

LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)

ICS 280 Learning in Graphical Models

Tracking Objects with Dynamics

Probabilistic reasoning over time

Bayes Net Learning: Bayesian Approaches

Maximum Likelihood Estimation

Probabilistic Robotics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

CSCI 5822 Probabilistic Models of Human and Machine Learning

A Latent Space Approach to Dynamic Embedding of Co-occurrence Data

CSCI 5822 Probabilistic Models of Human and Machine Learning

Introduction to particle filter

State Estimation Probability, Bayes Filtering

Probabilistic Reasoning over Time

CSCI 5822 Probabilistic Models of Human and Machine Learning

Hidden Markov Autoregressive Models

Particle Filtering ICS 275b 2002.

Multidimensional Integration Part I

Introduction to particle filter

CSCI 5822 Probabilistic Models of Human and Machine Learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

LECTURE 07: BAYESIAN ESTIMATION

Chapter14-cont..

Hidden Markov Models Lirong Xia.

Parametric Methods Berlin Chen, 2005 References:

Mathematical Foundations of BME

Probabilistic reasoning over time

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Presentation transcript:

CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

Changepoint Detection

Changepoints Observed time series stock market prices factory accidents medical monitoring Dynamics over an interval are produced by some stationary generative process stationary = parameters constant Changepoint = time at which generative dynamics of generative process switch Latent variable

Examples I generated: (1) mean of gaussian changes, (2) variance of gaussian changes, (3) drift rate of drift-diffusion process changes

Generative Model Y1 Y2 Y3 X1 X2 X3 C2 C3 YT XT CT …

Two Tasks Online detection Offline detection Y1 Y2 Y3 X1 X2 X3 C2 C3 YT XT CT … Online detection P(Ct|X1,…Xt) Offline detection P(Ct|X1,…XT) Both assume parametric model of P(Yt|Yt-1,Ct) Both require marginalizing over {Yt}

Alternative Representation of Changepoints Run Number of time steps during which generative process is stationary E.g., Q Q Q Q R R R R R R R R R S S S 4 9 3 Rt: how long is the run up-to-but-not-including time t

Bayesian Online Changepoint Detection (Adams & MacKay, 2007) Rt is a random variable Rt {0,1, 2, 3, …, t-1} Compute posterior P(Rt | X1, …, Xt) Transitions Rt = Rt-1 + 1 Rt = 0 Otherwise, P(Rt | Rt-1) = 0

Bayesian Online Changepoint Detection UPPER FIGURE - Nulear magnetic response during drilling of a well - mean switches, variance constant (light grey line) solid line is predictive mean, E(X_t+1) Dotted line is predictive error bars (in estimate of mean)

Run Length: Memoryless Transition P(Rt | Rt-1) doesn’t depend on how long the run is P(Rt > r+s | Rt ≥ s) = P(Rt > r) Geometric distribution P(Rt = 0 | Rt-1 = r) = 1/λ P(Rt = r+1 | Rt-1 = r) = 1 – 1/λ 1 𝜆 Y1 Y2 Y3 X1 X2 X3 C2 C3 YT XT CT …

Run Length: Memoried Transition P(Rt | Rt-1) depends on run length H(τ): hazard function probability that run length = τ given that run length >= τ Memoryless case H(τ) = 1/λ

Representation Shift Cool thing about run-length representation is that even when transition is memoried, exact inference can be performed in the model. Whenever Rt=0, link form Yt-1 to Yt is severed Whenever Rt>0, link from Rt to Yt is severed → messy dependence among changepoints disappears Y1 Y2 Y3 X1 X2 X3 C2 C3 YT XT CT … Y1 Y2 Y3 X1 X2 X3 R2 R3 YT XT RT …

Inference Marginal predictive distribution Posterior on run length P(x_t+1 | r_t, x_t(r) ) is on next slide same recursion step used for HMM updates

Posteriors On Generative Parameters Given Run Length requires inferring generative parameters Only the past observations within the current run matter E.g., If r8=3, then x5, x6, x7 are relevant for predicting x8 Suppose x is Gaussian with nonstationary mean Start with prior, Compute likelihood Because x’s are assumed conditionally independent, computation can be done incrementally With conjugate priors, it’s all simple bookkeeping of sufficiency statistics

Algorithm Space complexity is... Time complexity is … sufficient statistics nu_1(0): the "1" is the current time and the 0 is the run length -- nu-prior is like the imaginary count (how heavily to weight prior) \xi_1(0) is whatever statistics we're keeping (e.g., heads and tails) suppose we're assuming that the MEAN changes with each changepoint answer: both are linear in number of data points Space complexity is... Time complexity is … linear in number of data points

Dow Jones volatility

Coal Mine Disasters