Lecture 13 L1 , L∞ Norm Problems and Linear Programming

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
Environmental Data Analysis with MatLab
Lecture 1 Describing Inverse Problems. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability.
Lecture 20 Continuous Problems Linear Operators and Their Adjoints.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Lecture 22 Exemplary Inverse Problems including Filter Design.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Lecture 18 Varimax Factors and Empircal Orthogonal Functions.
Lecture 7 Backus-Gilbert Generalized Inverse and the Trade Off of Resolution and Variance.
Lecture 3 Probability and Measurement Error, Part 2.
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
Lecture 5 A Priori Information and Weighted Least Squared.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 15 Nonlinear Problems Newton’s Method. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 21 Continuous Problems Fréchet Derivatives.
Lecture 24 Exemplary Inverse Problems including Vibrational Problems.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 3 Review of Linear Algebra Simple least-squares.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Visual Recognition Tutorial
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Maximum likelihood (ML)
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Today Wrap up of probability Vectors, Matrices. Calculus
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Computacion Inteligente Least-Square Methods for System Identification.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
The Maximum Likelihood Method
Modelling data and curve fitting
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Lecture 13 L1 , L∞ Norm Problems and Linear Programming In this lecture, we show how to solve problems involving the L1 norm.

Syllabus Lecture 01 Describing Inverse Problems Lecture 02 Probability and Measurement Error, Part 1 Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least Squares Lecture 05 A Priori Information and Weighted Least Squared Lecture 06 Resolution and Generalized Inverses Lecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08 The Principle of Maximum Likelihood Lecture 09 Inexact Theories Lecture 10 Nonuniqueness and Localized Averages Lecture 11 Vector Spaces and Singular Value Decomposition Lecture 12 Equality and Inequality Constraints Lecture 13 L1 , L∞ Norm Problems and Linear Programming Lecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor Analysis Lecture 18 Varimax Factors, Empircal Orthogonal Functions Lecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20 Linear Operators and Their Adjoints Lecture 21 Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture Review Material on Outliers and Long-Tailed Distributions Derive the L1 estimate of the mean and variance of an exponential distribution Solve the Linear Inverse Problem under the L1 norm by Transformation to a Linear Programming Problem Do the same for the L∞ problem The first section emphasizes the link between p.d.f.’s and measures of error. The seconds applies the same likelihood method as was previously applied to the Gaussian p.d.f. The last two show how to solve general L1 problems

Part 1 Review Material on Outliers and Long-Tailed Distributions If the error has a long tailed p.d.f. the data will have lots of outliers so outliers are unimportant

Review of the Ln family of norms Just a review of the formulas for the norms

higher norms give increaing weight to largest element of e Higher the norm, the more important an outlier is The higher norms give preferential weight to the largest of the errors. Fig ?. Hypothetical prediction error, ei(zi), and its absolute value, raised the powers, 1, 2 and 10. While most elements of |ei| are numerically significant, only a few elements of |ei|10 are. MatLab script gda03_??.

limiting case The limiting case say the norm of e is the absolute value of its largest element.

but which norm to use? it makes a difference! Different estimate of the model parameters for each norm.

outlier L1 L2 L∞ Example of fitting a straight line to data. Note the outlier. The higher the norm, the more the outlier “pulls down” the line. Straight line problem solved under three different norms. This is a worst-case example, since the data have a horrible outlier (be sure to point it out). Fig ?. Straight line fits to (z,d) pairs where the error is measured under the L1, L2 and L∞ norms. The L1 norm gives the least weight to the one outlier. MatLab script gda03_??.

Answer is related to the distribution of the error Answer is related to the distribution of the error. Are outliers common or rare? B) A) Comparison of a long-tailed (left) and short-tailed p.d.f. Use low norms for long-tailed p.d.f.’s, high norms for short tailed p.d.f.’s Fig ?. A) Long-tailed probability density function. B) Short-tailed probability density function. MatLab script gda03_??. short tails outliers uncommon outliers important use high norm gives high weight to outliers long tails outliers common outliers unimportant use low norm gives low weight to outliers

as we showed previously … use L2 norm when data has Gaussian-distributed error as we will show in a moment … use L1 norm when data has Exponentially-distributed error

comparison of p.d.f.’s Gaussian Exponential Emphasize the “square” in the Gaussian and the “absolute value” in the exponential Fig. 8.1. Comparison of exponential probability density distribution (red) with a Normal probability density distribution (blue) of equal variance, σ2=1. The exponential probability density distribution is longer tailed. MatLab script gda08_01.

to make realizations of an exponentially-distributed random variable in MatLab mu = sd/sqrt(2); rsign = (2*(random('unid',2,Nr,1)-1)-1); dr = dbar + rsign .* ... random('exponential',mu,Nr,1); Note that Matlab’s “exponential” distribution is one-sized, so must fudge by multiplying it by randomly chosen sign,

Part 2 Derive the L1 estimate of the mean and variance of an exponential distribution Scenario: we measure the temperature in the room (presumed spatially uniform) N times. The thermometer has exponentially-distributed error. What’s the best estimate of the mean (the temperature in the room) and the variance of the measurement.

use of Principle of Maximum Likelihood maximize L = log p(dobs) the log-probability that the observed data was in fact observed with respect to unknown parameters in the p.d.f. e.g. its mean m1 and variance σ2 We applied this principle many times before when discussing Gaussian-distributed errors.

Previous Example: Gaussian p.d.f. We did this a few lectures ago. Note that taking the log gets rid of the exponential and changes products of variables into sums of variables. The maximization is accomplished by setting derivatives to zero, as usual.

solving the two equations The two equations are easy to solve. The first can only be zero when the summation is zero, which yields the formula for m1. Once m1 is known, solving the second is trivial.

solving the two equations I use the word almost, since the usual formula for standard deviation has a factor of (N-1) in it, not the factor of N that appears above. However, this difference is insignificant when N is large. usual formula for the sample mean almost the usual formula for the sample standard deviation

New Example: Exponential p.d.f. Note that taking the log gets rid of the exponential and changes products of variables into sums of variables. The maximization is accomplished by setting derivatives to zero, as usual. Note that the derivative of an |x| is sign(x): if x>0 then dx/dx=1. if x<0 then d(-x)/dx=-1 so derivative is +1 if x>0 and -1 if x<0 (and 0 if x=0)

solving the two equations m1est=median(d) and The first equation is satisfied if the sum sign(di-<d>)=0. So you must pick <d> so that half of the di are greater than <d> and half are smaller. Then half of the signs are +1 and half are -1, and their sum is zero. But that’s just the formula for the median.

solving the two equations m1est=median(d) and The median is robust, in the sense that adding 1 datum such as an outlier can only change the median from one centrally-located di to a nearest-neighbor and thus still centrally located di. more robust than sample mean since outlier moves it only by “one data point”

E(m) (A) (C) (B) mest L1 error as a function of m, when N is even. Note that it’s flat between the two central data. L1 error as a function of m, when N is odd. Note that the estimate IS the central datum. L2 error is a smooth quadratic curve. Fig. 8.2. Inverse problem to estimate the mean, mest, of N observations. (A) L1 error , E(m), (red curve) with even N: The error has a flat minimum, bounded by two observations (circles), and the solution is non-unique. (B) L1 error with odd N: The error is minimum at a one of the observation points, and the solution is unique. (C) The L2 error, both odd and even N: The error is minimum at a single point, which may not correspond to an observation point, and the solution is unique. MatLab script gda08_02.

these properties carry over to the general linear problem observations When the number of data are even, the solution in non-unique but bounded The solution exactly satisfies one of the data these properties carry over to the general linear problem The notion of finite bounds is new. There is no analog in the L2 world. In certain cases, the solution can be non-unique but bounded The solution exactly satisfies M of the data equations

Part 3 Solve the Linear Inverse Problem under the L1 norm by Transformation to a Linear Programming Problem Now we develop the tools needed to solve practical problems.

the Linear Programming problem review the Linear Programming problem Here’s the formal statement of the Linear Programming problem ...

Case A The Minimum L1 Length Solution Analogous to the minimum L2 length problem that we applied to Gm=d when it was underdetermined.

subject to the constraint minimize subject to the constraint Gm=d The formal statement of the problem. Emphasize the absolute value ...

minimize subject to the constraint Gm=d weighted L1 solution length (weighted by σm-1) subject to the constraint Gm=d usual data equations

transformation to an equivalent linear programming problem In the course, we encounter many cases where a “harder” problem can be transformed into an “easier” one.

Here’s the transformation. We’ll go through it step by step.

all variables are required to be positive First, all the variables are required to be positive. Notice that there are 5 variables. We’ll define them in a minute. all variables are required to be positive

usual data equations with m=m’-m’’ Notice that one of the equality constraints is Gm=d, written with the substitution m=m’-m’’.

standard trick in linear programming “slack variables” standard trick in linear programming to allow m to have any sign while m1 and m2 are non-negative This substitution is a standard trick in linear programming. It relaxed the positivity constraint on m. While m’ and m’’ are constrained to be positive, their difference is not required to be positive.

same as The other two equality constraints can be rewritten as shown ...

can always be satisfied by choosing an appropriate x’ if + Suppose that m-<m> is positive. In the first equation since x is non-negative alpha must be greater than m-<m>. In the second equation, you can always choose x’ to satisfy the equation, regardless of the value of alpha. so the second equation does not constrain alpha. can always be satisfied by choosing an appropriate x’ then α ≥ (m-<m>) since x≥0

if - can always be satisfied by choosing an appropriate x’ Suppose that m-<m> is negative. In the second equation since x is non-negative alpha must be greater than -(m-<m>). In the first equation, you can always choose x to satisfy the equation, regardless of the value of alpha. so the first equation does not constrain alpha. can always be satisfied by choosing an appropriate x’ then α ≥ -(m-<m>) since x≥0

taken together then α ≥|m-<m>| Taken together the two equations imply that alpha is greater than the absolute value of m-<m> taken together then α ≥|m-<m>|

same as minimizing weighted solution length minimizing z same as minimizing weighted solution length so minimizing the weighted sum of alpha is the same as minimizing the weighted sum of abs(m-<m>). Clever!

Case B Least L1 error solution (analogous to least squares)

transformation to an equivalent linear programming problem

This transformation is similar to the minimum-length case This transformation is similar to the minimum-length case. Once again, 5 variables, all positive.

so previous argument applies There are two complimentary equality constraints, rigged to constrain that α ≥|Gm-d| so mimimizing sum alpha is the same a minimizing sum abs(d-Gm) same as α – x = Gm – d α – x’ = -(Gm – d) so previous argument applies

MatLab % variables % m = mp - mpp % x = [mp', mpp', alpha', x', xp']' % mp, mpp len M and alpha, x, xp, len N L = 2*M+3*N; x = zeros(L,1); f = zeros(L,1); f(2*M+1:2*M+N)=1./sd; MatLab implementation Note that there are lots of variables. This is a limitation of the method – problems can easily be too big to handle.

% equality constraints Aeq = zeros(2*N,L); beq = zeros(2*N,1); % first equation G(mp-mpp)+x-alpha=d Aeq(1:N,1:M) = G; Aeq(1:N,M+1:2*M) = -G; Aeq(1:N,2*M+1:2*M+N) = -eye(N,N); Aeq(1:N,2*M+N+1:2*M+2*N) = eye(N,N); beq(1:N) = dobs; % second equation G(mp-mpp)-xp+alpha=d Aeq(N+1:2*N,1:M) = G; Aeq(N+1:2*N,M+1:2*M) = -G; Aeq(N+1:2*N,2*M+1:2*M+N) = eye(N,N); Aeq(N+1:2*N,2*M+2*N+1:2*M+3*N) = -eye(N,N); beq(N+1:2*N) = dobs; Equality constraints

% inequality constraints A x <= b % part 1: everything positive A = zeros(L+2*M,L); b = zeros(L+2*M,1); A(1:L,:) = -eye(L,L); b(1:L) = zeros(L,1); % part 2; mp and mpp have an upper bound. A(L+1:L+2*M,:) = eye(2*M,L); mls = (G'*G)\(G'*dobs); % L2 mupperbound=10*max(abs(mls)); b(L+1:L+2*M) = mupperbound; Inequality constraints are simple; everything positive.

% solve linear programming problem [x, fmin] = linprog(f,A,b,Aeq,beq); fmin=-fmin; mest = x(1:M) - x(M+1:2*M); Call to MatLab’s linear porgamming function and then compute m added benefit is that fmin is the L1 error

di outlier zi Example: Red-true; Green L1; Blue L2. L2 is badly affected by the outlier. L1 is not. Fig. 8.3. Curve fitting using the L1 norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈1. The L1 fit (green curve) is not as affected by the outlier as a standard L2 (least-squares) fit (blue curve). MatLab script gda08_03.

the mixed-determined problem of minimizing L+E can also be solved via transformation but we omit it here In other words, add prior information that m is close to <m>.

Part 4 Solve the Linear Inverse Problem under the L∞ norm by Transformation to a Linear Programming Problem Nice to know, but actually, one rarely (if ever) wants to solve this problem.

we’re going to skip all the details and just show the transformation for the overdetermined case Its explained in the text.

minimize E=maxi (ei /σdi) where e=dobs-Gm Emphasize that the Linf problem is to minimize the largest prediction error. The transformation is similar to the ones that we’ve seen previously ...

note α is a scalar Except that alpha is a scalar, not a vector ...

di outlier zi Example: Red-true; Green Linf; Blue L2. Linf is very badly affected by the outlier. Fig. 8.4. Curve fitting using the L∞ norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈0.37. The L∞ fit (green curve) is more affected by the outlier than is a standard L2 (least-squares) fit (blue curve). MatLab script gda08_04.