Extreme-Scale Distribution-Based Data Analysis

Slides:



Advertisements
Similar presentations
Cyberinfrastructure for Coastal Forecasting and Change Analysis
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
The 7 th Ultrascale Visualization Workshop November 12, 2012 Salt Lake City.
Visual Analytics Research at WPI Dr. Matthew Ward and Dr. Elke Rundensteiner Computer Science Department.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Information Technology, Informatics, & Information Science How do they relate to each other? to each other?
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
19 April, 2017 Knowledge and image processing algorithms for real-life applications. Dr. Maria Athelogou Principal Scientist & Scientific Liaison Manager.
Big Data Course Plans at Purdue Ananth Iyer. Big Data/Analytics Coursera course on Big Data by Bill Howe claims that Big Data involves issues of
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
SKA-cba-ase NSF and Science of Design Avogadro Scale Engineering Center for Bits & Atoms November 18-19, 2003 Kamal Abdali Computing & Communication.
ASCR Scientific Data Management Analysis & Visualization PI Meeting Exploration of Exascale In Situ Visualization and Analysis Approaches LANL: James Ahrens,
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
In Situ Sampling of a Large-Scale Particle Simulation Jon Woodring Los Alamos National Laboratory DOE CGF
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.
1 Enabling Webscale Research in Europe Julien Masanès European Archive Foundation Consultation Workshop, Brussels, 19/1/2010.
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Mathematics and Computer Science & Environmental Research Divisions ARGONNE NATIONAL LABORATORY Regional Climate Simulation Analysis & Vizualization John.
Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and.
Simulated vascular reconstruction in a virtual operating theatre Robert G. Belleman, Peter M.A. Sloot Section Computational Science, University of Amsterdam,
Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.
Light-Weight Data Management Solutions for Scientific Datasets Gagan Agrawal, Yu Su Ohio State Jonathan Woodring, LANL.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
Parallel I/O Performance: From Events to Ensembles Andrew Uselton National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory.
The Simigle Image Search Engine Wei Dong
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.
SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW SC’11 UltraVis Workshop, November 13, 2011 Yu Su*, Gagan Agrawal*, Jon Woodring†
Large Scale Time-Varying Data Visualization Han-Wei Shen Department of Computer and Information Science The Ohio State University.
CHAPTER 4 THE VISUALIZATION PIPELINE. CONTENTS The focus is on presenting the structure of a complete visualization application, both from a conceptual.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Evaluating Climate Visualization: An Information Visualization Approach -By Mridul Sen 1.
Efficient Evaluation of XQuery over Streaming Data
Extreme-Scale Distribution Based Data Analysis
On the Death of SciVis Han-Wei Shen
(Multivariate) Extremes and Impacts on the Land Surface
Ling Qiu1,2,3, Carlos M. Carrillo3, and Francisco Munoz-Arriola,3,4,5
CSCE 990: Advanced Distributed Systems
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Tutorial Overview February 2017
Scientific Achievement
Homogeneity Guided Probabilistic Data Summaries for Analysis and Visualization of Large-Scale Data Sets Ohio State University (Shen) Problem: Existing.
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
Gagan Agrawal The Ohio State University
Range Likelihood Tree: A Compact and Effective Representation for Visual Exploration of Uncertain Data Sets Ohio State University (Shen) Problem: Uncertainty.
Emulator of Cosmological Simulation for Initial Parameters Study
Time-varying volume visualization and compression
Gaurab KCa,b, Zachary Mitchella,c and Sarat Sreepathia
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
Process Wind Tunnel for Improving Business Processes
Presentation transcript:

Extreme-Scale Distribution-Based Data Analysis Han-Wei Shen (Lead PI), Gagan Agrawa, Huamin Wang The Ohio State University Tom Peterka Argonne National Laboratory Jonathan Woodring, Joanne Wendelberger Los Alamos National Laboratory Research Goals Develop distribution-based data analysis and visualization techniques to support DOE’s exascale applications Efficient computation, representation, and query of data distributions In situ data reduction, summarization, and triage Distribution-based data analytics and visualization Climate: POP and MPAS-O (LANL) Superconductivity: SOScon (ANL) Cosmology: HACC (ANL) Science Applications Distribution-Based Visual Analytics Why Distributions? A compact representation of data Many statistics of the data can be derived Information flow across the data analytics pipeline can be analyzed Regions of high information content can be identified Parameters of various visualization algorithms can be optimized Allow detailed data analysis and inferences Support many needs of in situ data analysis Data reduction Data summarization Data triage Feature extraction and indexing \ Research Tasks Computation and Representation of Distributions Computing distributions from bitmap indices Supporting efficient range distribution query Statistics-preserving block decomposition Data Summarization, Reduction, and Triage Spatial domain data summarization, reduction, and triage Value domain summarization, reduction, and triage Temporal domain summarization, reduction, and triage Distribution-based Visual Analytics Distribution-based multivariate data analysis Feature-driven view seleciton and control Query-driven visual analysis with distributions Software Library (EDDA) A C/C++ library for entropy and distribution computation for large scale datasets Different information-theoretic measurement Distributed computation via MPI Support of various data types In Situ Data Analysis/Visualization Pipeline Acknowledgement This work was supported by DOE Office of Science, Advanced Scientific Computing Research, under award number DE-DC0012495, program manager Lucy Nowell.