Download presentation
Presentation is loading. Please wait.
Published byJeffery Holt Modified over 9 years ago
1
Julian Center on Regression for Proportion Data July 10, 2007 (68)
2
MaxEnt2007 Regression For Proportion Data Julian Center Creative Research Corp. Andover, MA, USA
3
MaxEnt2007Julian Center Overview Introduction Introduction What is proportion data? What is proportion data? What do we mean by regression? What do we mean by regression? Examples Examples Why should you care? Why should you care? Coordinate Transformation to Facilitate Regression. Coordinate Transformation to Facilitate Regression. Measurement Models Measurement Models Multinomial Multinomial Laplace Approximation to Multinomial Laplace Approximation to Multinomial Log-Normal Log-Normal Regression Models Regression Models Kernal Regression (Nadaraya-Watson Model) Kernal Regression (Nadaraya-Watson Model) Gaussian Process Regression Gaussian Process Regression With Log Normal Measurements With Log Normal Measurements With Multinomial Measurements – Expectation Propagation With Multinomial Measurements – Expectation Propagation Conclusion Conclusion
4
MaxEnt2007Julian Center What is Proportion Data?
5
MaxEnt2007Julian Center What is Regression? Regression = Smoothing + Calibration + Interpolation. Regression = Smoothing + Calibration + Interpolation. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Accounts for measurement “noise”. Accounts for measurement “noise”. Determines p(r|x). Determines p(r|x).
6
MaxEnt2007Julian Center Examples Geostatistics: Composition of rock samples at different locations. Geostatistics: Composition of rock samples at different locations. Medicine: Response to different levels of treatment. Medicine: Response to different levels of treatment. Political Science: Opinion polls across different demographic groups. Political Science: Opinion polls across different demographic groups. Climate Research: Climate Research: Infer climate history from fossil pollen samples. Infer climate history from fossil pollen samples. Calibrate model using present day samples from known climates. Calibrate model using present day samples from known climates. Typically, examine 400 pollen grains and sort into 14 categories Typically, examine 400 pollen grains and sort into 14 categories
7
MaxEnt2007Julian Center Why Should You Care? Either, you have proportion data to analyze. Either, you have proportion data to analyze. Or, you want to do pattern classification. Or, you want to do pattern classification. Or, you want to use a similar approach to your problem. Or, you want to use a similar approach to your problem. Transform constrained variables so that a Laplace approximation makes sense. Transform constrained variables so that a Laplace approximation makes sense. Two different regression techniques. Two different regression techniques. Expectation Propagation for improving model fit. Expectation Propagation for improving model fit.
8
MaxEnt2007Julian Center Coordinate Transformation Well-known regression methods can’t deal with the pesky constraints of the simplex. Well-known regression methods can’t deal with the pesky constraints of the simplex. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. Then we can model probability distributions on real vectors and relate them to distributions on the simplex. Then we can model probability distributions on real vectors and relate them to distributions on the simplex.
9
MaxEnt2007Julian Center Coordinate Transformation The rows of T span the orthogonal Complement of 1 (d+1) Symmetric Softmax Activation Function Centered Log Ratio Linkage Function We can always find T by the Gram-Schmidt Process
10
MaxEnt2007Julian Center ln(y 1 )=- ln(y 2 ) f Softmax is insensitive to this direction. Coordinate Transformation ln(y 2 ) ln(y 1 ) Image of Simplex Under ln y1y1 y2y2 Simplex
11
MaxEnt2007Julian Center Measurement Models Multinomial Multinomial Log-Normal Log-Normal
12
MaxEnt2007Julian Center Measurement Model - Multinomial -
13
MaxEnt2007Julian Center Multinomial Measurement Model R1= S=400
14
MaxEnt2007Julian Center Measurement Model - Laplace Approximation - Some regression methods assume a Gaussian measurement model. Some regression methods assume a Gaussian measurement model. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Let’s try a Laplace approximation to each measurement. Let’s try a Laplace approximation to each measurement. Laplace Approximation: Laplace Approximation: Find the peak of the log-likelihood function. Find the peak of the log-likelihood function. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick an amplitude factor to match the height of the peak. Pick an amplitude factor to match the height of the peak.
15
MaxEnt2007Julian Center Measurement Model - Laplace Approximation -
16
MaxEnt2007Julian Center Laplace Approximation to Multinomial
17
MaxEnt2007Julian Center Laplace Approximation to Multinomial
18
MaxEnt2007Julian Center Laplace Approximation to Multinomial
19
MaxEnt2007Julian Center Laplace Approximation to Multinomial
20
MaxEnt2007Julian Center Laplace Approximation to Multinomial
21
MaxEnt2007Julian Center Laplace Approximation to Multinomial
22
MaxEnt2007Julian Center Measurement Model - Log-Normal - e.g. Over-dispersion or under-dispersion
23
MaxEnt2007Julian Center Regression Models Way of relating data taken under different conditions. Way of relating data taken under different conditions. Intuition: Similar conditions should produce similar data. Intuition: Similar conditions should produce similar data. The best to use methods depends on the problem. The best to use methods depends on the problem. Two methods considered here: Two methods considered here: Nadaraya-Watson model. Nadaraya-Watson model. Gaussian Process model. Gaussian Process model.
24
MaxEnt2007Julian Center Nadaraya-Watson Model Based on applying Parzen density estimation to the joint distribution of f and x Based on applying Parzen density estimation to the joint distribution of f and x
25
MaxEnt2007Julian Center x f All Data Points
26
MaxEnt2007Julian Center x f Nadaraya-Watson Model
27
MaxEnt2007Julian Center Nadaraya-Watson Model
28
MaxEnt2007Julian Center Nadaraya Watson Model
29
MaxEnt2007Julian Center Nadaraya-Watson Model Problem: We must compare a new point to every training point. Problem: We must compare a new point to every training point. Solution: Solution: Choose a sparse set of “knots”, and center density components only on knots. Choose a sparse set of “knots”, and center density components only on knots. Adjust weights and covariances by “diagnostic training”. Adjust weights and covariances by “diagnostic training”. Mixture model training tools apply. Mixture model training tools apply.
30
MaxEnt2007Julian Center x f Sparse Nadaraya-Watson Model
31
MaxEnt2007Julian Center Gaussian Process Model Probability distribution on functions. Probability distribution on functions. Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). For any finite collection of points, the corresponding function values are jointly Gaussian. For any finite collection of points, the corresponding function values are jointly Gaussian.
32
MaxEnt2007Julian Center x f Gaussian Process Model
33
MaxEnt2007Julian Center Applying Gaussian Process Regression to Proportion Data Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Use Expectation Propagation to improve fit. Use Expectation Propagation to improve fit.
34
MaxEnt2007Julian Center Sparse Gaussian Process Model
35
MaxEnt2007Julian Center Sparse Gaussian Process Model
36
MaxEnt2007Julian Center Sparse Gaussian Process Model
37
MaxEnt2007Julian Center Sparse Gaussian Process Model
38
MaxEnt2007Julian Center GP– Log-Normal Model
39
MaxEnt2007Julian Center GP– Log-Normal Model
40
MaxEnt2007Julian Center GP – Log-Normal Model 1 1
41
MaxEnt2007Julian Center GP Multinomial Model
42
MaxEnt2007Julian Center Expectation Propagation Method
43
MaxEnt2007Julian Center Expectation Propagation Method
44
MaxEnt2007Julian Center Expectation Propagation Method
45
MaxEnt2007Julian Center Expectation Propagation Method
46
MaxEnt2007Julian Center Expectation Propagation Method
47
MaxEnt2007Julian Center Expectation Propagation Method
48
MaxEnt2007Julian Center Choosing the Regression Model If you have two samplings taken under the same conditions, do you want to treat them as coming from a bimodal distribution (NW Model) or combine them into one big sampling (GP Model)?
49
MaxEnt2007Julian Center Conclusion A coordinate transformation makes it possible to analyze proportion data with known regression methods. A coordinate transformation makes it possible to analyze proportion data with known regression methods. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit.
50
MaxEnt2007Julian Center
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.