Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Upgrades to the MODIS near-IR Water Vapor Algorithm and Cirrus Reflectance Algorithm For Collection 6 Bo-Cai Gao & Rong-Rong Li Remote Sensing Division,
Atmospheric Iron Flux and Surface Chlorophyll at South Atlantic Ocean: A case study Near Patagonia J. Hernandez*, D. J. Erickson III*, P. Ginoux†, W. Gregg‡,
Topology-Based Analysis of Time-Varying Data Scalar data is often used in scientific data to represent the distribution of a particular value of interest,
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
1 Center for Ocean-Land-Atmosphere Studies, George Mason University, Fairfax, VA Department of Atmospheric, Oceanic and Earth Sciences, George.
Global Warming and Climate Sensitivity Professor Dennis L. Hartmann Department of Atmospheric Sciences University of Washington Seattle, Washington.
Principal Component Analysis
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data and Interpretation What have you learnt?. The delver into nature’s aims Seeks freedom and perfection; Let calculation sift his claims With faith.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.
Extraction of high-level features from scientific data sets Eui-Hong (Sam) Han Department of Computer Science and Engineering University of Minnesota Research.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Extreme Events and Climate Variability. Issues: Scientists are telling us that global warming means more extreme weather. Every year we seem to experience.
Chapter 2 Dimensionality Reduction. Linear Methods
Siddharth Manay Chandrika Kamath Center for Applied Scientific Computing 2 March 2005 Progress Report on Data Analysis Work at LLNL: Aug’04 - Feb’05
Hyperspectral Imaging Alex Chen 1, Meiching Fong 1, Zhong Hu 1, Andrea Bertozzi 1, Jean-Michel Morel 2 1 Department of Mathematics, UCLA 2 ENS Cachan,
Extensions of PCA and Related Tools
Report on Sensitivity Analysis Radu Serban Keith Grant, Alan Hindmarsh, Steven Lee, Carol Woodward Center for Applied Scientific Computing, LLNL Work performed.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
Interactions between volcanic eruptions and El Niño: Studies with a coupled atmosphere-ocean model C. Timmreck, M. Thomas, M. Giorgetta, M. Esch, H.-F.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
El Niño-Southern Oscillation in Tropical Column Ozone and A 3.5-year signal in Mid-Latitude Column Ozone Jingqian Wang, 1* Steven Pawson, 2 Baijun Tian,
Access Control Via Face Recognition Progress Review.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
1 Discussion of Observational Biases of Some Aircraft Types at NCEP Dr. Bradley Ballish NCEP/NCO/PMB 7 September 2006 “Where America’s Climate and Weather.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Volcanic Climate Impacts and ENSO Interaction Georgiy Stenchikov Department of Environmental Sciences, Rutgers University, New Brunswick, NJ Thomas Delworth.
The climate and climate variability of the wind power resource in the Great Lakes region of the United States Sharon Zhong 1 *, Xiuping Li 1, Xindi Bian.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
SCALES IN PHYSICAL GEOGRAPHY
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Relationship between interannual variations in the Length of Day (LOD) and ENSO C. Endler, P. Névir, G.C. Leckebusch, U. Ulbrich and E. Lehmann Contact:
Simulated and Observed Atmospheric Circulation Patterns Associated with Extreme Temperature Days over North America Paul C. Loikith California Institute.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
Discovery of Climate Indices using Clustering Michael Steinbach Steven Klooster Christopher Potter Rohit Bhingare, School of Informatics University of.
Principal Component Analysis (PCA)
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Applications of AMSU-Based Hydrological Products for Climate Studies Ralph.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
2D-LDA: A statistical linear discriminant analysis for image matrix
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
SDM Center Techniques for feature identification in scientific data Chandrika Kamath (LLNL) with Erick Cantú-Paz, Imola Fodor, Cyrus Harrison, Nicole Love,
Imola K. Fodor, Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory IPAM Workshop January, 2002 Exploring the.
Analysis of FMRI Data: Principles and Practice Robert W Cox, PhD Scientific and Statistical Computing Core National Institute of Mental Health Bethesda,
Climate Change ??? How Is it Detected?. Difficulties of Detecting Climate Change Climate systems are intrinsically noisy Climate systems are intrinsically.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Hybrid Bayesian Linearized Acoustic Inversion Methodology PhD in Petroleum Engineering Fernando Bordignon Introduction Seismic inversion.
1/39 Seasonal Prediction of Asian Monsoon: Predictability Issues and Limitations Arun Kumar Climate Prediction Center
Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification.
Lab 2: Global Average Temperature PART THREE
AIR/SEA INTERACTION El Nino
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Climate Variability and Change
Lecture 14 PCA, pPCA, ICA.
Seasonal Forecasting Using the Climate Predictability Tool
Data Pre-processing Lecture Notes for Chapter 2
Volcanic Climate Impacts and ENSO Interaction
Ocean/atmosphere variability related to the development of tropical Pacific sea-surface temperature anomalies in the CCSM2.0 and CCSM3.0 Bruce T. Anderson,
Presentation transcript:

Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension Reduction and Sampling First SDM ISIC All-Hands Meeting UCRL. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract W-7405-Eng-48.

Dimension Reduction and Sampling at LLNL-2 CASC The SDM ISIC aims to minimize the effort researchers spend in managing their data l LLNL is participating in several of the tasks, including —data mining to improve the management of data l Problem: data from simulations and experiments is high dimensional (i.e. many features) l Querying the features can help in understanding the data — but, searching in a high-dimensional space is difficult l May want to cluster similar objects for efficient access —but, clustering is expensive in high dimensions  We plan to address the problem of high dimensionality using techniques for dimension reduction and sampling originally developed in data mining.

Dimension Reduction and Sampling at LLNL-3 CASC Our work on dimension reduction will help both data management and mining l Reducing the dimensions will improve —searching (task 3.1, LBNL) —clustering (task 2.1, ORNL) l Dimension reduction is expensive if many data items —use a sample of the data items —techniques for sampling in presence of rare events l We will focus on climate and high-energy-physics data —complements work at ORNL (climate), LBNL (HEP) —but, techniques applicable to other data as well  We only report the.8 FTE work funded under SciDAC; however, our data mining research is more extensive. See

Dimension Reduction and Sampling at LLNL-4 CASC There are two different ways in which we can view dimension reduction l Reduce the number of features representing a data item l Reduce the number of basis vectors used to describe the data: if some of the are small, they can be ignored Features Features Data items

Dimension Reduction and Sampling at LLNL-5 CASC Our work on climate data focuses on reducing the number of basis vectors l Domain expert Dr. Benjamin Santer (LLNL climate) l Climate scientists are interested in understanding the change in the earth’s surface temperature l Simulated and observed data are mixtures of volcano, El Niño, and other effects l Our goal is to separate the signals corresponding to different effects —traditional approaches such as principal component analysis (PCA) have not worked —separation difficult as El Chichón and Pinatubo volcano eruptions coincided with El Niño events —our approach is to use independent component analysis (ICA)  Dimension reduction supporting scientific discovery

Dimension Reduction and Sampling at LLNL-6 CASC The raw data is as monthly temperatures on a 144x73 spatial grid on 17 vertical levels ICA Volcano El Niño Other effects January 1979 raw temperatures (Kelvin) on the 144x73 latitude by longitude grid at 1000hPa pressure level. Data from NCEP.

Dimension Reduction and Sampling at LLNL-7 CASC Initially, we applied ICA to global monthly mean anomaly temperatures Time series of global monthly mean anomalies, Jan Dec vertical levels level1: 1000hPa, lowest altitude level17: 10hPa, highest altitude

Dimension Reduction and Sampling at LLNL-8 CASC Next, we ran experiments with simulated data to understand the behavior of ICA (i) Two original sources (ii) Two mixed signals from the original ICA estimates correctly the shapes of the two independent components (ICs). With additional processing, we can also estimate the relative contributions of the two ICs in the two mixed signals. (iii) Sources (ICs) recovered from (ii) ICA mix

Dimension Reduction and Sampling at LLNL-9 CASC Original decomposition of the two mixed signals (-): sine (--) and volcano (-.) (i) Signal 1 (ii) Signal 2

Dimension Reduction and Sampling at LLNL-10 CASC l After proper post-processing, ICA estimates remarkably well the underlying independent components and their appropriate contributions in the mixed signals (i) Signal 1 (ii) Signal 2 ICA decomposition of the two mixed signals (-): sine (--) and volcano (-.)

Dimension Reduction and Sampling at LLNL-11 CASC ICA can also separate “noise” used as an extra component in the mixing 3 original sources 3 mixed signals 3 estimated ICs mix ICA

Dimension Reduction and Sampling at LLNL-12 CASC Original decomposition of 3 mixed signals (-) : El Niño (--), volcano (-.), and noise (..) Cooling in global series at the arrow is in fact a combination of an ENSO warming and a volcano cooling. Without the volcano eruption, the El Nino warming would dominate, resulting in warmer global temperatures. (i) Signal 1 (ii) Signal 2 (iii) Signal 3

Dimension Reduction and Sampling at LLNL-13 CASC ICA decomposition of 3 mixed signals (-): El Niño (--), volcano (-.), and noise (..) Although not perfect in terms of the exact amplitudes, ICA clearly separates the cooling effect of the volcano from the warming effect of El Nino. (i) Signal 1 (ii) Signal 2 (iii) Signal 3

Dimension Reduction and Sampling at LLNL-14 CASC Our future plans include work with HEP data and collaborators at ORNL and LBNL l Complete the work on the climate problem —our results with artificial data are encouraging —identify appropriate ICA model for climate data l Make the ICA software accessible to SciDAC scientists l Try ICA and other dimension reduction techniques in the context of the STAR high-energy-physics data —reduce number of features —investigate sampling to reduce computation —collaborate with LBNL (data, searching) l Investigate incremental PCA —monitor climate simulations using indices based on the principal components —collaborate with ORNL (data, clustering)