Download presentation
Presentation is loading. Please wait.
Published byAudra Turner Modified over 8 years ago
1
Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification Applied to Global Climate Data Carlos Henrique Ribeiro Lima New York – Dec/2006
2
Outline 1.Goals 2.Motivation and Dataset 3.Methodology 4. Results 1.Low-Dimensional Manifold 2.KNN on Low-Dimensional Manifold 5.Conclusion
3
1. Goals 1. Use of kernel PCA based on Semidefinite Embedding to identify the low-dimensional, non-linear, manifold of climate data sets identification of main modes of spatial variability; 2. Classification on the feature space predictions on the original space (KNN method);
4
2. Motivation Dataset of Monthly Sea Surface Temperature (SST) Huge economical and social impacts of extreme El Nino events (e.g. 1997) Need of forecasting models!
5
2. Dataset Monthly Sea Surface Temperature (SST) Data from Jan/1856 to Dec/2005 1. Latitudinal Band: 25oS-25oN 2. Grid with 599 cells; 3. Training data: Jan/1856 to Dec/1975 = 120 years 4. Testing set: Jan/1976 to Dec/2005 = 30 years 5. Input matrix: n = 1440 points m = 599 dimensions
6
3. Methodology 1) Semidefinite Embedding (Code from K. Q. Weinberger) Semipositive definiteness Inner product centered on the origin Isometry - local distances of the input space are preserved on the feature space 2) KNN Euclidian Distance 3) Probabilistic Forecasting Skill Score (RPS)
7
4. Results Low-Dimensional Manifold
8
4. Results Labeling on the feature space
9
4. Results Forecasts – Testing Set KNN method and skill score E.g. March – 1997; 1) Want to predict the class of nino3 in Dec/1997 lead time = 9 months. 2) KNN on feature space (March:1856 to 1975); 3) Take classes and weights of the k neighbors; 4) Skill score.
10
4. Results Forecasts – Testing Set KNN method and skill score – El Nino of 1982 and 1997
11
5. Conclusions 1.Semidefinite Embedding performs well on the SST data (high dimensional just 3 dimensions ~90%of exp. variance); 2.KNN method provides very good classification and forecasts; 3.Need to check sensibility to change in some parameters (# local neighbors, #KNN); 4.Plan to extend to other climate datasets; 5.Try other metrics, multivariate data, etc.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.