Julian Center on Regression for Proportion Data July 10, 2007 (68)

MaxEnt2007 Regression For Proportion Data Julian Center Creative Research Corp. Andover, MA, USA

MaxEnt2007Julian Center Overview Introduction Introduction What is proportion data? What is proportion data? What do we mean by regression? What do we mean by regression? Examples Examples Why should you care? Why should you care? Coordinate Transformation to Facilitate Regression. Coordinate Transformation to Facilitate Regression. Measurement Models Measurement Models Multinomial Multinomial Laplace Approximation to Multinomial Laplace Approximation to Multinomial Log-Normal Log-Normal Regression Models Regression Models Kernal Regression (Nadaraya-Watson Model) Kernal Regression (Nadaraya-Watson Model) Gaussian Process Regression Gaussian Process Regression With Log Normal Measurements With Log Normal Measurements With Multinomial Measurements – Expectation Propagation With Multinomial Measurements – Expectation Propagation Conclusion Conclusion

MaxEnt2007Julian Center What is Proportion Data?

MaxEnt2007Julian Center What is Regression? Regression = Smoothing + Calibration + Interpolation. Regression = Smoothing + Calibration + Interpolation. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Accounts for measurement “noise”. Accounts for measurement “noise”. Determines p(r|x). Determines p(r|x).

MaxEnt2007Julian Center Examples Geostatistics: Composition of rock samples at different locations. Geostatistics: Composition of rock samples at different locations. Medicine: Response to different levels of treatment. Medicine: Response to different levels of treatment. Political Science: Opinion polls across different demographic groups. Political Science: Opinion polls across different demographic groups. Climate Research: Climate Research: Infer climate history from fossil pollen samples. Infer climate history from fossil pollen samples. Calibrate model using present day samples from known climates. Calibrate model using present day samples from known climates. Typically, examine 400 pollen grains and sort into 14 categories Typically, examine 400 pollen grains and sort into 14 categories

MaxEnt2007Julian Center Why Should You Care? Either, you have proportion data to analyze. Either, you have proportion data to analyze. Or, you want to do pattern classification. Or, you want to do pattern classification. Or, you want to use a similar approach to your problem. Or, you want to use a similar approach to your problem. Transform constrained variables so that a Laplace approximation makes sense. Transform constrained variables so that a Laplace approximation makes sense. Two different regression techniques. Two different regression techniques. Expectation Propagation for improving model fit. Expectation Propagation for improving model fit.

MaxEnt2007Julian Center Coordinate Transformation Well-known regression methods can’t deal with the pesky constraints of the simplex. Well-known regression methods can’t deal with the pesky constraints of the simplex. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. Then we can model probability distributions on real vectors and relate them to distributions on the simplex. Then we can model probability distributions on real vectors and relate them to distributions on the simplex.

MaxEnt2007Julian Center Coordinate Transformation The rows of T span the orthogonal Complement of 1 (d+1) Symmetric Softmax Activation Function Centered Log Ratio Linkage Function We can always find T by the Gram-Schmidt Process

MaxEnt2007Julian Center ln(y 1 )=- ln(y 2 ) f Softmax is insensitive to this direction. Coordinate Transformation ln(y 2 ) ln(y 1 ) Image of Simplex Under ln y1y1 y2y2 Simplex

MaxEnt2007Julian Center Measurement Models Multinomial Multinomial Log-Normal Log-Normal

MaxEnt2007Julian Center Measurement Model - Multinomial -

MaxEnt2007Julian Center Multinomial Measurement Model R1= S=400

MaxEnt2007Julian Center Measurement Model - Laplace Approximation - Some regression methods assume a Gaussian measurement model. Some regression methods assume a Gaussian measurement model. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Let’s try a Laplace approximation to each measurement. Let’s try a Laplace approximation to each measurement. Laplace Approximation: Laplace Approximation: Find the peak of the log-likelihood function. Find the peak of the log-likelihood function. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick an amplitude factor to match the height of the peak. Pick an amplitude factor to match the height of the peak.

MaxEnt2007Julian Center Measurement Model - Laplace Approximation -

MaxEnt2007Julian Center Laplace Approximation to Multinomial

MaxEnt2007Julian Center Measurement Model - Log-Normal - e.g. Over-dispersion or under-dispersion

MaxEnt2007Julian Center Regression Models Way of relating data taken under different conditions. Way of relating data taken under different conditions. Intuition: Similar conditions should produce similar data. Intuition: Similar conditions should produce similar data. The best to use methods depends on the problem. The best to use methods depends on the problem. Two methods considered here: Two methods considered here: Nadaraya-Watson model. Nadaraya-Watson model. Gaussian Process model. Gaussian Process model.

MaxEnt2007Julian Center Nadaraya-Watson Model Based on applying Parzen density estimation to the joint distribution of f and x Based on applying Parzen density estimation to the joint distribution of f and x

MaxEnt2007Julian Center x f All Data Points

MaxEnt2007Julian Center x f Nadaraya-Watson Model

MaxEnt2007Julian Center Nadaraya-Watson Model

MaxEnt2007Julian Center Nadaraya Watson Model

MaxEnt2007Julian Center Nadaraya-Watson Model Problem: We must compare a new point to every training point. Problem: We must compare a new point to every training point. Solution: Solution: Choose a sparse set of “knots”, and center density components only on knots. Choose a sparse set of “knots”, and center density components only on knots. Adjust weights and covariances by “diagnostic training”. Adjust weights and covariances by “diagnostic training”. Mixture model training tools apply. Mixture model training tools apply.

MaxEnt2007Julian Center x f Sparse Nadaraya-Watson Model

MaxEnt2007Julian Center Gaussian Process Model Probability distribution on functions. Probability distribution on functions. Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). For any finite collection of points, the corresponding function values are jointly Gaussian. For any finite collection of points, the corresponding function values are jointly Gaussian.

MaxEnt2007Julian Center x f Gaussian Process Model

MaxEnt2007Julian Center Applying Gaussian Process Regression to Proportion Data Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Use Expectation Propagation to improve fit. Use Expectation Propagation to improve fit.

MaxEnt2007Julian Center Sparse Gaussian Process Model

MaxEnt2007Julian Center GP– Log-Normal Model

MaxEnt2007Julian Center GP – Log-Normal Model 1 1

MaxEnt2007Julian Center GP Multinomial Model

MaxEnt2007Julian Center Expectation Propagation Method

MaxEnt2007Julian Center Choosing the Regression Model If you have two samplings taken under the same conditions, do you want to treat them as coming from a bimodal distribution (NW Model) or combine them into one big sampling (GP Model)?

MaxEnt2007Julian Center Conclusion A coordinate transformation makes it possible to analyze proportion data with known regression methods. A coordinate transformation makes it possible to analyze proportion data with known regression methods. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit.

MaxEnt2007Julian Center

Julian Center on Regression for Proportion Data July 10, 2007 (68)

Similar presentations

Presentation on theme: "Julian Center on Regression for Proportion Data July 10, 2007 (68)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Julian Center on Regression for Proportion Data July 10, 2007 (68)

Similar presentations

Presentation on theme: "Julian Center on Regression for Proportion Data July 10, 2007 (68)"— Presentation transcript:

Similar presentations

About project

Feedback