Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Machine learning continued Image source:
Motion Patterns Alla Petrakova & Steve Mussmann. Trajectory Clustering Trajectory clustering is a well-established field of research in Data Mining area.
Dimension reduction (1)
Data Visualization STAT 890, STAT 442, CM 462
Principal Component Analysis
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Data Mining – Intro.
KDD for Science Data Analysis Issues and Examples.
Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Summarized by Soo-Jin Kim
N. Laskaris. [ IEEE SP Magazine, May 2004 ] N. Laskaris, S. Fotopoulos, A. Ioannides ENTER-2001.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Wide Field Imagers in Space and the Cluster Forbidden Zone Megan Donahue Space Telescope Science Institute Acknowledgements to: Greg Aldering (LBL) and.
Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
“Very high resolution global ocean and Arctic ocean-ice models being developed for climate study” by Albert Semtner Extremely high resolution is required.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Visualization and Exploration of Temporal Trend Relationships in Multivariate Time-Varying Data Teng-Yok Lee & Han-Wei Shen.
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
NAME SWG th Annual NOAA Climate Diagnostics and Prediction Workshop State College, Pennsylvania Oct. 28, 2005.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
School of Computer Science & Engineering
Statistics 202: Statistical Aspects of Data Mining
Transfer Learning in Astronomy: A New Machine Learning Paradigm
The Elements of Statistical Learning
Machine Learning Ali Ghodsi Department of Statistics
Bringing Organism Observations Into Bioinformatics Networks
Special Topics in Data Mining Applications Focus on: Text Mining
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
3.1.1 Introduction to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning – a Probabilistic Perspective
Presentation transcript:

Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006

Abstract The emerging field of scientific data mining addresses the need for scientists to search for, quantify, and visualize information from massive data sets generated by large scale observations and simulations. Statistical machine learning algorithms have enormous potential to provide data-driven solutions to fundamental scientific questions. We present results from ongoing projects in astrophysics and climate modeling at LBNL, including the search for Type Ia supernovae from astronomical observations, and the analysis of hurricanes and tropical storms in climate simulations.

Supervised Learning for Supernova Recognition Supervised learning techniques, also known as classification methods, offer powerful, adaptive methods for the automatic detection and recognition of rare objects from large scale digital sky surveys. We have integrated state-of-the-art classifiers (Support Vector Machines, boosted decision trees) into the nightly pipeline of the Nearby Supernova Factory at Lawrence Berkeley Laboratory, to quickly and automatically rank more than 500,000 subimages found nightly in order of confidence that they contain supernovae.

potential supernova on galaxy measured features (geometry, brightness, signal-to-noise, …) large sets of astronomical imagery learned decision boundary identifies supernovae new supernova candidates ranked by confidence

Dimensionality Reduction of Supernova Spectra Dimensionality reduction techniques decompose data into a small number of basis elements that capture useful attributes of the original signals. Spectra measured from observed supernovae are projected onto a lower dimensional subspace or manifold whose dimensions are associated with physical properties such as luminosity and phase. These decompositions are useful for validating synthetic spectra generated from simulations, classifying observed spectra by luminosity and phase, and discovering new classes of supernovae.

synthetic time-varying spectraobserved time-varying spectra low-dimensional projection encodes parameters such as phase or luminosity basis spectra clustering and matching in low-dimensional space

Spatiotemporal Analysis of Climate Data Statistical modeling of high-dimensional, time-varying, non-stationary time series enable the discovery of temporal, spatial, and multivariate statistical dependencies in large-scale climate simulations. Using latent variable models of tropical storm trajectories, we can infer the statistical relationships between space- time-varying atmospheric variables along a storm trajectory and the likelihood of the storm evolving into an intense hurricane. Hierarchical, stochastic models can predict the influence of global-scale, long-term climate patterns on such short-term, local events.

collection of tropical storm tracks total precipitable water multivariate time series for each track sea level pressure surface friction velocity large-scale, high-resolution simulations yield many examples for probabilistic modeling