Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition

Slides:

Advertisements

Similar presentations

Active Appearance Models

Advertisements

Optimization with Constraints

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.

Regularization David Kauchak CS 451 – Fall 2013.

L1 sparse reconstruction of sharp point set surfaces

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

Engineering Optimization

MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.

Pattern Recognition and Machine Learning

Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

CMPUT 466/551 Principal Source: CMU

by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright

Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.

Separating Hyperplanes

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Visual Recognition Tutorial

Numerical Optimization

Motion Analysis (contd.) Slides are from RPI Registration Class.

CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.

Chebyshev Estimator Presented by: Orr Srour. References Yonina Eldar, Amir Beck and Marc Teboulle, "A Minimax Chebyshev Estimator for Bounded Error Estimation"

Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.

Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Unconstrained Optimization Problem

Lecture 10: Support Vector Machines

Advanced Topics in Optimization

Why Function Optimization ?

Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.

Normalised Least Mean-Square Adaptive Filtering

Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.

1 Chapter 8 Nonlinear Programming with Constraints.

ENCI 303 Lecture PS-19 Optimization 2

Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,

I N V E N T I V EI N V E N T I V E A Morphing Approach To Address Placement Stability Philip Chong Christian Szegedy.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

IGARSS 2011, Vancouver, Canada HYPERSPECTRAL UNMIXING USING A NOVEL CONVERSION MODEL Fereidoun A. Mianji, Member, IEEE, Shuang Zhou, Member, IEEE, Ye Zhang,

Endmember Extraction from Highly Mixed Data Using MVC-NMF Lidan Miao AICIP Group Meeting Apr. 6, 2006 Lidan Miao AICIP Group Meeting Apr. 6, 2006.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

Review of Spectral Unmixing for Hyperspectral Imagery Lidan Miao Sept. 29, 2005.

Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.

Comparative Analysis of Spectral Unmixing Algorithms Lidan Miao Nov. 10, 2005.

Selection and Recombination Temi avanzati di Intelligenza Artificiale - Lecture 4 Prof. Vincenzo Cutello Department of Mathematics and Computer Science.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 117 Penalty and Barrier Methods General classical constrained.

Regularized Least-Squares and Convex Optimization.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.

Optimal Control.

Multiplicative updates for L1-regularized regression

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Closed-form Algorithms in Hybrid Positioning: Myths and Misconceptions

Probabilistic Models for Linear Regression

Statistical Learning Dong Liu Dept. EEIS, USTC.

Ellipse Fitting COMP 4900C Winter 2008.

Large Scale Support Vector Machines

Hairong Qi, Gonzalez Family Professor

Advanced deconvolution techniques and medical radiography

Calibration and homographies

Presentation transcript:

Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition Lidan Miao AICIP Group Meeting Feb. 23, 2006

Motivation Why do we choose maximum entropy? What role does the maximum entropy play in the algorithm? In what sense the algorithm converges to the optimal solution? (Is there anything else beyond the maximum entropy?)

Decomposition Problem Decompose a mixture into constituent materials (assumed to be known in this presentation) and their proportions. Mixing model Given A, x, find s Linear regression problem Physical constraints Non-negativity Sum of s’ components is 1

QP & FCLS[1] Quadratic programming (QP) Nonlinear optimization Computationally expensive Fully constrained least squares (FCLS) Integrates SCLS and NCLS NCLS is based on standard algorithm NNLS Is the least square estimation the best in all cases?

Geometric Illustration Mixing model & convex combination Unconstrained least squares Solve linear combination problem Constrained least squares (QP and FCLS) Sum-to-one: on the line connecting a1 and a2 Nonnegative: in the cone C determined by a1 and a2 Feasible set of convex combination: line segment a1a2

MaxEnt[2] Objective function Optimization method Limitation Maximize entropy Optimization method Penalty function method Limitation Low convergence rate Theoretically, Kk needs to go to infinity For each Kk, s has no closed form solution, numerical method is needed (gradient descent) Low performance when SNR is high It can never fits the measurement model as Kk can not be infinity Negative relative entropy

Gradient Descent MaxEnt Optimization formulation Minimize negative entropy Optimization method Lagrange multiplier method Gradient descent learning

Convergence Analysis of GDME Initialization: (neg-entropy) Lambda will warp the original objective function to fit the data measurement model. Searching in the feasible set True s1: 0.1619 Estimation: 0.1638 Like MaxEnt[2], the solution is obtained by warping the objective function. Unlike MaxEnt[2], the force is a vector instead of a scalar, which is not necessary to be infinity to fit the measurement model.

Convergence Analysis (cont) Take the first iteration as an example The multiplier is the scaled error vector S depends on the inner product of A and lambda The denominator of s is only for normalization Exponential function is used to generate a nonnegative number The key is the inner product

Convergence Analysis (cont) 2D case is simple for visualization Where to move? Objective? Proof

Convergence Rate

Stopping Conditions FCLS: S components are all non-negative Generates least square solutions, minimizes ||As-x|| When SNR is low, the algorithm overfits the noise MaxEnt[2]: the same with FCLS The solution is not least squares Can never fits the data perfectly GDME: S is relatively stable Is able to give least square solution The solution lies somewhere between equal abundances and the least square estimation, which determined by the stopping condition

Two groups of testing Data Experimental Results MaxEnt[2] is too slow to be applied to a big image, so the simple data in ref[2] is used Two groups of testing Data Class Mean Covariance A 73 33 29 2.5 1.6 1.3 1.6 2.5 1.6 1.3 1.6 2.1 B 89 46 74 11.3 7.3 12.2 7.3 6.0 9.4 12.2 9.4 17.3 C 109 61 100 9.6 6.1 9.0 6.1 5.2 7.7 9.0 7.7 14.9 Class Mean Covariance D 92.6 48.1 79.9 5.4 3.1 4.7 3.1 2.9 4.0 4.7 4.0 7.1 E 96.3 50.5 82.0 12.2 6.3 5.7 6.3 4.4 5.0 5.7 5.0 13.2 F 100.2 53.6 89.8 6.0 4.2 7.1 4.2 4.2 6.5 7.1 6.5 12.0 Results Mix Method Err ABC Conventional MaxEnt[2] FCLS GDME 0.0994 [2] 0.0832 [2] 0.1163 0.0789 DEF 0.3304 [2] 0.1591 [2] 0.2819 0.1363 250 mixed pixels with the abundance randomly generated The average of 50 runs Par: 500, 4

Experimental Results (cont) Apply to synthetic hyperspectral images Metrics: ARMSE, AAD, AID

Summary of GDME The same target as QP and FCLS, i.e., min ||As-x|| The maximum entropy formulation is used to incorporate the two constraints through exponential function and normalization. Does maximum entropy really play a role? By carefully selecting stopping condition, GDME on average is able to generate better performance in terms of abundance estimation. The convergence rate is faster than QP and MaxEnt[2] and similar to FCLS (based on experiments) GDME is more flexible, which presents strong robustness under low SNR cases GDME presents better performance when source vectors are close to each other

Future Work Speed up the learning algorithm Investigate optimal stop conditions (what’s the relationship between SNR and stop condition?) Study the performance w.r.t the number of constituent materials

Reference [1] D. C. Heinz and C.-I Chang. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sensing, vol.39, no. 3, pp. 529-545, 2001. [2] S. Chettri and N. Netanyahu. Spectral unmixing of remotely sensed imagery using maximum entropy. In Proc. of SPIE, vol. 2962, pp. 55–62, 1997.