Download presentation
Presentation is loading. Please wait.
Published byColin Lindsey Modified over 8 years ago
1
1
2
2
3
A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced by the inner product. To say that H is a complex inner product space means that H is a complex vector space on which there is an inner product x,y associating a complex number to each pair of elements x,y of H, that satisfies the properties:realcomplexinner product spacecomplete metric space 3
4
y,x is the complex conjugate of x,y:complex conjugate x,y is linear in its first argument. For all complex numbers a and b,linear The inner product is positive definite:positive definite where the case of equality holds precisely when x = 0. 4
5
The norm defined by the inner product, is the real-valued functionnorm the distance between two points x,y in H is defined in terms of the norm by That this function is a distance function means it is symmetric in x and y, the distance between x and itself is zero, and otherwise the distance between x and y must be positive, the triangle inequality holds, meaning that the length of one leg of a triangle xyz cannot exceed the sum of the lengths of the other two legs:triangle inequality This last property is ultimately a consequence of the more fundamental Cauchy–Schwarz inequality, which asserts Cauchy–Schwarz inequality with equality if and only if x and y are parallel. 5
6
6
7
7
8
8
9
A problem is well-posed if its solution: exists is unique depends continuously on the data (e.g. it is stable) A problem is ill-posed if it is not well-posed. In the context of this class, well-posedness is mainly used to mean stability of the solution. 9
10
eidetic generalization: the process of imagination of possible cases rather than observation of actual ones. eidos: properties, kinds or types of ideal species that entities may exemplify eidetic variation: possible changes an individual can undergo while remaining an instance of a given type of an essence 10
11
Popper’s claim that empirical data are not sufficient for obtaining any pattern. in addition to empirical data, one needs some conceptual data expressing prior knowledge about properties of a desired function. In 1990s, Poggio and Girosi proposed modifying the empirical error functional: Ψ is a functional expressing some global property (such as smoothness) of the function to be minimized 11
12
12
13
13
14
For such operator A :X → Y between two Hilbert spaces,, an inverse problem determined by A is a task of finding for g ∈ Y (called data) some f ∈ X (called solution) such that A(f) = g. When X and Y finite dimensional: linear operators can be represented by matrices. infinite dimensional: typical operators are integral ones. Fredholm integral equations of the first and second kind: 14
15
Hadamard introduced the definition of ill-posedness. Ill-posed problems are typically inverse problems. As an example, assume g is a function in Y and u is a function in X, with Y and X Hilbert spaces. Then given the linear, continuous operator L, consider the equation g = Lu. The direct problem is to compute g given u; the inverse problem is to compute u given the data g. In the learning case L is somewhat similar to a “sampling” operation and the inverse problem becomes the problem of finding a function that takes the values f (xi ) = yi, i = 1,...n The inverse problem of finding u is well-posed when the solution exists, is unique and is stable, that is depends continuously on the initial data g. 15
16
When there is no solution For an operator A : X → Y, let R(A) = {g ∈ Y|( ∃ f ∈ X)(A(f) = g)} denotes its range and πclR(A) :Y → clR(A) the projection of Y onto the closure of R(A) in every continuous operator A between two Hilbert spaces has an adjoint A ∗ satisfying for all f ∈ X and all g ∈ Y, 16
17
If the range of A is closed, then there exists a unique continuous linear pseudoinverse operator A + :Y → X such that for every g ∈ Y: for every g ∈ Y AA + (g) = πclR(g) A + = (A * A) + A * = A * (AA) + 17
18
To solve more general least-squares problems, define Moore–Penrose pseudoinverses for all continuous linear operator A : X → Y between two Hilbert spaces X and Y every continuous linear operator have a continuous linear pseudo-inverse. Just ones whose range is closed in Y. If the range is not closed, then A + is only defined for those g ∈ Y, for which π clR(A) (g) ∈ R(A). 18
19
Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix: A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution. 19
20
A method of improving stability of solutions of ill- conditioned inverse problems, called regularization. The basic idea in the treatment of ill-conditioned problems use some a priori knowledge about solutions to disqualify meaningless ones. such knowledge can be: some regularity condition on the solution expressed existence of derivatives up to a certain order with bounds on the magnitudes of these derivatives some localization condition such as a bound on the support of the solution or its behavior at infinity. Tikhonov’s regularization: penalizes undesired solutions by adding a term called a stabilizer. 20
21
Ψ is a functional called stabilizer. The regularization parameter γ plays the role of a trade-off between the least square solution and the penalization expressed by Ψ. Typical choice of a stabilizer is the square of the norm on X, for which the original problem is replaced with a minimization of the functional : 21
22
For this stabilizer, regularized solutions always exist. pseudosolutions, which in the infinite dimensional case do not exist for those data g, for which πclR(A)(g) R(A). For every continuous operator A : X → Y between two Hilbert spaces and for every > 0, there exists a unique operator: 22
23
Even when the original inverse problem does not have a unique solution, for every γ > 0 the regularized problem has a unique solution. due to the uniform convexity of the functional. With γ decreasing to zero, the solutions Aγ(g) of the regularized problems converge to the normal pseudosolution A + (g). 23
24
Localization: Smoothness: 24
25
Learning of neural networks from examples is also an inverse problem For a given training set find an unknown input- output function operator performs the evaluations of an input-output function at the input data from the training set 25
26
Empirical error function representation with So minimization of empirical error function is an inverse problem Where v is output data vector Finding a pseudosolution of this inverse problem is equivalent to the minimization of the empirical error functional E z over X. 26
27
to take advantage of characterizations of pseudosolutions and regularized solutions from theory of inverse problems, solutions of the inverse problem defined by the operator Lu should be searched for in suitable Hilbert spaces, on which all evaluation operators of the form (5) are continuous norms can express some undesired properties of input-output functions 27
28
reproducing kernel Hilbert space (RKHS) as a Hilbert space of pointwise defined real-valued functions on a nonempty set Ω such that all evaluation functionals are continuous, i.e., for every x ∈ Ω, the evaluation functional F x, defined for any f ∈ X as: is continues (bounded). 28
29
every RKHS is uniquely determined by a symmetric positive semidefinite kernel K : Ω × Ω → R, i.e., a symmetric function of two variables satisfying for all m, all (w 1,..., w m ) ∈ R m, and all (x 1,..., x m ) ∈ Ω m, K is Symmetric K is PD 29
30
30
31
31
32
32
33
As on every RKHS HK(Ω), all evaluation functionals are continuous, for every sample of input data u = (u1,..., um), the operator is continuous. Moreover, its range is closed because it is finite dimensional. So one can apply results from theory of inverse problems 33
34
and K[u] is the Gram matrix of the kernel K with respect to the vector u defined as f + minimizes the empirical error. f+ can be interpreted as an input-output function of a neural network with one hidden layer of kernel units and a single linear output unit. 34
35
35
36
f + and f γ minimizing and, resp., are linear combinations of representers K u1,..., K um of input data u1,..., um, but the coefficients of the two linear combinations are different. 36
37
K is positive definite the row vectors of the matrix K[u] are linearly independent. But when the distances between the data u 1,..., u m are small, the row vectors might be nearly parallel and the small eigenvalues of K[u] might cluster near zero. small changes of v can cause large changes of f +. 37
38
the matrix can be rank-deficient it has a cluster of small eigenvalues and a gap between large and small eigenvalues the matrix can represent a discrete ill-posed problem when its eigenvalues gradually decay to zero without any gap in its spectrum 38
39
1. Linear separation simplifies classification. In some cases, even data which are not linearly separable can be transformed into linearly separable ones. 39
40
2. Stabilizers of the form are special cases of squares of norms on RKHS generated by convolution kernels For kernels the value of stabilizerat any is expressed as Gaussian kernel is an example of convolution kernel with positive FT. 40
41
41
42
3. Reformulation of minimization of the empirical error functional as an inverse problem In RKHS, all evaluation functions are continuous, which is necessary for application tools. 42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.