Download presentation
Presentation is loading. Please wait.
Published byJanel Washington Modified over 8 years ago
1
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling Lecture 2. Data exploration Coryn Bailer-Jones
2
2 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Last week... ● supervised vs. unsupervised learning ● generalization and regularization ● regression vs. classification ● linear regression (fit via least squares) – assume global linear fit; stable (low variance) but biased ● k nearest neighbours – assumes local constant fit; less stable (high variance) but less biased ● more complex models permit lower errors on training data – but we want models to generalize – need to control complexity / nonlinearity (regularization) ⇒ assume some degree of smoothness. But how much?
3
3 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2-class classification: K-nn and linear regression © Hastie, Tibshirani, Friedman (2001) with enough training data, wouldn't k-nn be best?
4
4 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction The curse of dimensionality ● for p=10, to capture 1% of data must cover 63% of range of each input variable (95% for p=100) ● as p increases – distance to neighbours increases – most neighbours are near boundary ● to maintain density (i.e. properly sample variance), number of templates must increase as N p Data uniformly distributed in unit hypercube Define neighbour volume with edge length e (e<1) neighbour volume = e p p = no. of dimensions r = fraction of unit data volume e = r 1/p
5
5 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Overcoming the curse ● Avoid it by dimensionality reduction – throw away less relevant inputs – combine inputs – use domain knowledge to select/define features ● Make assumptions about the data – structured regression ● this is essential: an infinite number of functions pass through a finite number of data points – complexity control ● e.g. smoothness in a local region
6
6 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Data exploration ● density modelling – smoothing ● visualization – identify structure, esp. nonlinear ● dimensionality reduction – overcome 'the curse' – stabler, simpler, more easily understood models – identify relevant variables (or combinations thereof)
7
7 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation (non-parametric)
8
8 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation: histograms Bishop (1995)
9
9 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Kernel density estimation K() is a fixed kernel function with bandwidth h. K = no. neighbours N = total no. points V = volume occupied by K neighbours Simple (Parzen) kernel:
10
10 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Gaussian kernel Bishop (1995) where N is entire data set
11
11 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction K-NN density estimation K = no. neighbours N = total no. points V = volume occupied by K neighbours Overcome fixed kernel size: Vary search volume size, V, until reach K neighbours Bishop (1995)
12
12 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Histograms and 1D kernel density estimation From MASS4 section 5.6. See R scripts on web.
13
13 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2D kernel density estimation From MASS4 section 5.6. See R scripts on web.
14
14 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Classification via (parametric) density modelling
15
15 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Maximum likelihood estimate of parameters
16
16 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Example: modelling PDF with two Gaussians class 1 = (0.0, 0.0) = (0.5, 0.5) class 2 = (1.0, 1.0) = (0.7, 0.3) See R scripts on web page
17
17 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Capturing variance: Principal Components Analysis (PCA)
18
18 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis For given data vector a, minimizing b is equivalent to maximizing c
19
19 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis: the equations
20
20 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA example: MHD stellar spectra N=5144 optical spectra 380 – 520 nm in p=820 bins Area normalized Show variance in spectral type (SpT) (Bailer-Jones et al. 1998)
21
21 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: average spectrum
22
22 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: first 20 eigenvectors
23
23 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: admix. coefs. vs. SpT
24
24 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra
25
25 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA reduced reconstruction
26
26 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction quality for the MHD spectra shape of curve also depends on signal-to-noise level
27
27 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction of an M star Key: - no. of PCs used - normalized reconstruction error:
28
28 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA: Explanation is not discrimination PCA has no class information, so cannot provide optimal discrimination
29
29 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA summary ● Linear projection of data which captures and orders variance – PCs are linear combinations of data which are uncorrelated and of highest variance – equivalent to a rotation of the coordinate system ● Data compression via a reduced reconstruction ● New data can be projected onto the PCs ● Reduced reconstruction acts as a filter – removes rare features (low variance measured across whole data set) – poorly reconstructs non-typical objects
30
30 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA as filter residual reconstructed spectrum (R=25, E=5.4%) original spectrum
31
31 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA ● What happens if there are fewer vectors than dimensions, i.e. N < p ?
32
32 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Summary ● curse of dimensionality ● density estimation – non-parametric: histograms, kernel method, k-nn ● trade-off between number of neighbours and volume size – parametric: Gaussian; fitting via maximum likelihood ● Principal Components Analysis – Principal Components ● are the eigenvectors of the covariance matrix ● are orthonormal ● ordered set describing directions of maximum variance – reduced reconstruction: data compression – a linear transformation (coordinate rotation)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.