Zeroing in on the Implicit Function Theorem The Implicit Function Theorem for several equations in several unknowns.
So where do we stand? Solving a system of m equations in n unknowns is equivalent to finding the “zeros” of a vector-valued function from n → m. When n > m, such a system will “ typically ” have infinitely many solutions. In “ nice ” cases, the solution will be a function from n-m → m.
So where do we stand? Solving linear systems is easy; we are interested in the non-linear case. We will ordinarily not be able to solve a system “ globally. ” But under reasonable conditions, there will be a solution function in a (possibly small!) region around a single “ known ” solution.
x y (a,b)(a,b)(a,b)(a,b) (a,b)(a,b)(a,b)(a,b) y = g(x) For a Single Equation in Two Unknowns In a small box around (a,b), we hope to find g(x). And we can, provided that the y- partials at (a,b) are continuous and non-zero.
x y (a,b)(a,b)(a,b)(a,b) (a,b)(a,b)(a,b)(a,b) y = g(x) Start with a point (a,b) on the contour line, where the partial with respect to y is not 0: Make the box around (a,b) small enough so that all of the y-partials in this box are “close” to D.
x y (a,b)(a,b)(a,b)(a,b) (a,b)(a,b)(a,b)(a,b) y = g(x) Start with a point (a,b) on the contour line, where the partial with respect to y is not 0: x Fix x. For this x, construct function Iterate x (y). What happens?
A Whole Family of Quasi-Newton Functions Remember that (a,b) is a “known” solution. and There are a whole bunch of these functions: There is one for each x value.
A Whole Family of Quasi-Newton Functions Remember that (a,b) is a “known” solution to f(x,y)=0, and If x = a, then we have The “best of all possible worlds” Leibniz method! If x a, then we have The “pretty good” Quasi-Newton method!
What are the issues? We have to make sure the iterated maps converge---how do we do this? “Pretty good” quasi-Newton’s method If we choose D near enough to f’(p) so that |Q’(p)| < ½, iterated maps will converge in a neighborhood of p. How does that work in our case? If we make sure that the partials of f with respect to y are all near enough to D to guarantee that | x ’(y) | < ½ for all (x,y) in the square, then the iterated maps will converge.
The Role of the Derivative If we have a function f: [a,b] → which is differentiable on (a,b) and continuous on [a,b], the Mean Value Theorem says that there is some c in [a,b] such that If |f (x)| < k < 1 for all x in [a,b] Likewise, if |f (x)| > k > 1 for all x in [a,b] Distances contract by a factor of k! Distances expand by a factor of k!
The Role of the Derivative If p is a fixed point of f in [a,b], and |f (x)| < k < 1 for all x in [a,b], then Likewise, if |f (x)| > k > 1 for all x in [a,b], f moves other points farther and farther away from p. (Repelling fixed point!) f moves other points closer to p by a factor of k! But f (p) = p, so Each time we re-apply the next iterate is even closer to p! (Attracting fixed point!)
What are the issues? Not so obvious... we have to work a bit to make sure we get the right fixed point. (We don’t leave the box!) (a,b)(a,b)(a,b)(a,b) x y x
Systems of Equations and Differentiability A vector valued function f of several variables is differentiable at a vector p if in some small neighborhood of p, the graph of f “looks a lot like” an affine function. That is, there is a linear transformation Df(p) so that for all z “close” to p, Suppose that f (p) = 0. When can we solve f(z) = 0 in some neighborhood of p? Where Df(p) is the Jacobian matrix made up of all the partial derivatives of f.
Systems of Equations and Differentiability For z “close” to p, Since f (p) = 0, When can we solve f(z) = 0 in some neighborhood of p? Answer: Whenever we can solve Because the existence of a solution depends on the geometry of the function.
When can we do it? is a linear system. We understand linear systems extremely well. We can solve for the variables y 1, y 2, and y 3 in terms of x 1 and x 2 if and only if the sub-matrix... is invertible.
A bit of notation To simplify things, we will write our vector valued function F : n+m → m. We will write our “input” variables as concatenations of n-vectors and m-vectors. e.g. (y,x)=(y 1, y 2,..., y n, x 1, x 2,..., x m ) So when we solve F(y,x)=0 we will be solving for the y-variables in terms of the x-variables.
The System We can solve for the variables y 1, y 2,... y n in terms of x 1, x 2,..., x m if and only if the sub-matrix... is invertible.
So the Condition We Need is the invertibility of the matrix We will refer to the inverse of the matrix D as D -1.
The Implicit Function Theorem F : n+m → m has continuous partials. Suppose b n and a m with F(b,a)=0. The n x n matrix that corresponds to the y- partials of F (denoted by D) is invertible. Then “near” a there exists a unique function g(x) such that F(g(x),x)=0; moreover g(x) is continuous.
What Function Do We Iterate? The 2-variable case. Fix x. For this x, construct function Iterate x (y). Where
What Function Do We Iterate? The 2-variable case. Fix x. For this x, construct function Iterate x (y). The n-variable case. Fix x. For this x, construct function Iterate x (y). Multi-variable Parallel?
n+m t a x g(x)g(x) We want g continuous & unique F ( g(x),x ) = 0 mm (b,a)(b,a) r b r nn t r Partials of F in the ball are all close to the partials at (b,a)
(b,a)(b,a) r n+m Notation: Let dF(y,x) denote the n x n submatrix made up of all the y partial derivatives of F at (y,x). Step I: Since D is invertible, D -1 0. We choose r as follows:
We choose r as follows: use continuity of the partials of F to choose r small enough so that for all (y,x) B r (b,a) Then So ! Multivariable Mean Value Theorem Notation: Let dF(y,x) denote the n x n submatrix made up of all the y partial derivatives of F at (y,x). Step I: Since D is invertible, D -1 0.
There are two (highly non-trivial) steps in the proof: In Step 1 we choose the radius r of the ball in n+m so that the partials in the ball are all “close” to the (known) partials at (b,a). This makes x contract distances on the ball, forcing convergence to a fixed point. (Uses continuity of the partial derivatives.) In Step 2 we choose the radius t around a “small” so as to guarantee that our iterates stay in the ball. In other words, our “initial guess” for the quasi-Newton’s method is good enough to guarantee convergence to the “correct” root. The two together guarantee that we can find value a g(x) which solves the equation. We then map x to g(x).
Since this is the central idea of the proof, I will reiterate it: In Step 1 we make sure that iterated maps on our “pretty good” quasi-Newton function actually converge. In Step 2, by making sure that they didn’t “leave the box,” we made sure the iterated maps converged to the fixed point we were aiming for. That is, that they didn’t march off to some other fixed point outside the region. (a,b)(a,b)(a,b)(a,b) x y x
Final thoughts The Implicit Function Theorem is a DEEP theorem of Mathematics. Fixed point methods show up in many other contexts: For instance: They underlie many numerical approximation techniques. They are used in other theoretical contexts, such as in Picard’s Iteration proof of the fundamental theorem of differential equations. The iteration functions frequently take the form of a “quasi-Newton’s method.”