Data Mining and the OptIPuter Padhraic Smyth University of California, Irvine.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Motion Patterns Alla Petrakova & Steve Mussmann. Trajectory Clustering Trajectory clustering is a well-established field of research in Data Mining area.
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
Cambio Climático y eventos extremos: un enfoque dinámico Grupo de Física del Clima E.Sánchez Gómez, A. Ruiz de Elvira, W. Cabos Narváez, F.J. Alvarez García.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
B1 -Biogeochemical ANL - Townhall V. Rao Kotamarthi.
Panelist: Shashi Shekhar McKnight Distinguished Uninversity Professor University of Minnesota Cyber-Infrastructure (CI) Panel,
Padhraic Smyth, July 2004: 1 Data Mining and Science Padhraic Smyth Information and Computer Science University of California, Irvine July 2004 SC4DEVO-1.
Probabilistic Analysis of a Large-Scale Urban Traffic Sensor Data Set Jon Hutchins, Alexander Ihler, and Padhraic Smyth Department of Computer Science.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Data Mining – Intro.
Modeling Count Data over Time Using Dynamic Bayesian Networks Jonathan Hutchins Advisors: Professor Ihler and Professor Smyth.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
19 April, 2017 Knowledge and image processing algorithms for real-life applications. Dr. Maria Athelogou Principal Scientist & Scientific Liaison Manager.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
ASCR Scientific Data Management Analysis & Visualization PI Meeting Exploration of Exascale In Situ Visualization and Analysis Approaches LANL: James Ahrens,
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Cluster Analysis of Tropical Cyclone Tracks and ENSO Suzana J. Camargo, Andrew W. Robertson, International Research Institute for Climate Prediction, Columbia.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Chapter 4 Realtime Widely Distributed Instrumention System.
From Grid Data to Patterns and Structures Padhraic Smyth Information and Computer Science University of California, Irvine July 2000.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
DMC-104: Geography and Environment
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
GeoSpatial and GeoTemporal Informatics for dynamic and complex systems May Yuan.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
DRIVE Net: A Large-Scale Online Data Platform for Performance Analysis and Decision Support Yinhai Wang PacTrans STAR Lab University of Washington
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Computational Platform Jim Miller GE Research.
Lan Xia (Yunnan University) cooperate with Prof. Hans von Storch and Dr. Frauke Feser A study of Quasi-millennial Extratropical Cyclone Activity using.
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-5 Chapters 9 and 10.
FACULTY EXTERNSHIP OPPORTUNITIES IN DATA SCIENCE AND DATA ANALYTICS Facilitated by: FilAm Software Technology, Clark Freeport Zone Ecuiti, San Francisco,
夏兰 Lan Xia (Yunnan University) Hans von Storch and Frauke Feser (Institute of Coastal Research, Helmholtz Ceter Geesthacht: Germany) A comparison of quasi-millennial.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
VisIt Project Overview
Database management system Data analytics system:
Pathology Spatial Analysis February 2017
Modern Data Management
Cristian Ferent and Alex Doboli
Introduction C.Eng 714 Spring 2010.
SDM workshop Strawman report History and Progress and Goal.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Data Mining and the OptIPuter Padhraic Smyth University of California, Irvine

Data Mining of Spatio-Temporal Scientific Data –Modern scientific data analysis increasingly data-driven data often consist of massive spatio-temporal streams –Research focus characterizing spatio-temporal structure in data statistical models for object shapes, trajectories, patterns... data mining from scientific data streams (NSF, Optiputer) recognition of waveforms in time-series archives (JPL,NASA) inference of dynamic gene-regulation networks from data (NIH) Markov models for spatio-temporal weather patterns (DOE) clustering and modeling of storm trajectories (LLNL)

Image-voxel Data (“slices” of olfactory bulb in rats) Automatic segmentation of cellular structures of interest (glomelular layer) Thematic maps Data mining Scientific discovery

Image-voxel Data (Remote sensing AVIRIS spectral data) Focus of attention on wavelengths of interest Thematic maps Data mining Scientific discovery

What’s wrong with this information flow? “One-way” –Flow of information is from data to scientist Real scientific investigation is “two-way” Scientist interacts, explores, queries the data Most current data mining/analysis tools are relatively poor at handling interaction –Algorithms are “black-box”, do not allow scientists to be “in the loop” –Algorithms have no representation of the scientist’s prior knowledge or goals (no user models) –OptIPuter project “next generation” data mining tools for effective exploration of massive 2d/3d data sets

OptIPuter focus in Data Mining Data –2d (or multi-d) spatio-temporal image/voxel data Goals –Allow scientists to explore these massive data sets in an efficient and flexible manner leveraging the OptIPuter architecture –Produce interactive software tools that allow scientists to explore massive data in an interactive manner: automated segmentation, thematic maps, focus of interest Technical Challenges –Scaling statistical algorithms to massive data streams –Providing mechanisms for effective scientific interaction –Developing algorithms for automated “focus-of-attention”

Analysis of Extra-Tropical Cyclones Extra-tropical cyclone = mid-latitude storm Practical Importance –Highly damaging weather over Europe –Important water-source in United States Scientific Importance –Influence of climate on cyclone frequency, strength, etc. –Impact of cyclones on local weather patterns [with Scott Gaffney (UCI), Andy Robertson (IRI/Columbia), Michael Ghil (UCLA)]

Sea-Level Pressure Data –Mean sea-level pressure (SLP) on a 2.5° by 2.5° grid –Four times a day, every 6 hours, over 20 years Blue indicates low pressure

Winter Cyclone Trajectories

Clustering Methodology Mixtures of curves –model as mixtures of noisy linear/quadratic curves note: true paths are not linear use the model as a first-order approximation for clustering Advantages –allows for variable-length trajectories –allows coupling of other “features” (e.g., intensity) –provides a quantitative (e.g., predictive) model –[contrast with k-means for example]

Clusters of Trajectories

Applications Visualization and Exploration –improved understanding of cyclone dynamics Change Detection –can quantitatively compare cyclone statistics over different era’s or from different models Linking cyclones with climate and weather –correlation of clusters with NAO index –correlation with windspeeds in Northern Europe