DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

Slides:

Advertisements

Similar presentations

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Advertisements

Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

28 April, 2005ISGC 2005, Taiwan The Efficient Handling of BLAST Applications on the GRID Hurng-Chun Lee 1 and Jakub Moscicki 2 1 Academia Sinica Computing.

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

NUMA Tuning for Java Server Applications Mustafa M. Tikir.

1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.

A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.

Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.

07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.

Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,

GePSeA: A General Purpose Software Acceleration Framework for Lightweight Task Offloading Ajeet SinghPavan BalajiWu-chun Feng Dept. of Computer Science,

PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.

Storage in Big Data Systems

Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.

High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.

1 Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications Mark K. Gardner (Virginia Tech) Wu-chun Feng (Virginia.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can HankendiAyse K. Coskun Boston.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.

Computer Science Department of 1 Massively Parallel Genomic Sequence Search on Blue Gene/P Heshan Lin (NCSU) Pavan Balaji.

Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

The Globus Project: A Status Report Ian Foster Carl Kesselman

Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.

Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.

Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.

Hiding Periodic I/O Costs in Parallel Applications Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003.

1 Efficient Data Handling in Large-Scale Sequence Database Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU and ORNL) Wu-chun Feng (LANL  VT) Al Geist (ORNL)

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.

Running BLAST on the cluster system over the Pacific Rim.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Connections to Other Packages The Cactus Team Albert Einstein Institute

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.

Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.

Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

Research Overview Gagan Agrawal Associate Professor.

Parallel IO for Cluster Computing Tran, Van Hoai.

Bioinformatics Computation in the Cloud A Joint Collaboration Between Microsoft’s External Research and eXtreme Computing Groups

Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,

Database Allocation Strategies for Parallel BLAST Evaluation on Clusters Louis Woods.

Applying Control Theory to Stream Processing Systems

Department of Computer Science University of California, Santa Barbara

Parallel I/O System for Massively Parallel Processors

Adaptive Data Refinement for Parallel Dynamic Programming Applications

Database System Architectures

Department of Computer Science University of California, Santa Barbara

Relax and Adapt: Computing Top-k Matches to XPath Queries

Parallel Exact Stochastic Simulation in Biochemical Systems

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge National Lab ECPI: 2005 – 2008

DOE Network PI Meeting 2005 Data-Intensive Applications on NLCF  Data-processing applications Bio sequence DB queries, simulation data analysis, visualization  Challenges Rapid data growth (data avalanche)  Computation requirement  I/O requirement  Needs ultra-scale machines Less studied than numerical simulations  Scalability on large machines Complexity and heterogeneity  Case-by-case static optimization costly

DOE Network PI Meeting 2005 Run-Time Data Management  Parallel execution plan optimization Example: genome vs. database sequence comparison on 1000s of processors Data placement crucial for performance/scalability Issues  Data partitioning/replication  Load balancing  Efficient parallel I/O w. scientific data formats I/O subsystem performance lagging behind Scientific data formats widely used (HDF, netCDF)  Further limits applications’ I/O performance Issues  Library overhead  Metadata management and accesses

DOE Network PI Meeting 2005 Proposed Approach  Adaptive run-time optimization For parallel execution plan optimization  Connect scientific data processing to relational databases  Runtime cost modeling and evaluation For parallel I/O w. scientific data formats  Library-level memory management  Hiding I/O costs  Caching, prefetching, buffering

DOE Network PI Meeting 2005 Prelim Result 1: Efficient Data Accesses for Parallel Sequence Searches  BLAST Widely used bio sequence search tool NCBI BLAST Toolkit  mpiBLAST Developed at LANL Open source parallelization of BLAST using database partitioning Increasingly popular: more than 10,000 downloads since early 2003 Directly utilizing NCBI BLAST Super linear speedup with small number of processors

DOE Network PI Meeting 2005 Data Handling in mpiBLAST Not Efficient  Databases partitioned statically before search Inflexible: re-partitioning required to use different No. of procs Management overhead: generating large number of small files, hard to manage, migrate and share  Results processing and output serialized by the master node  Result: rapidly growing non-search overhead as No. of procs grows Output data size grows - Search 150k queries against NR database

DOE Network PI Meeting 2005 pioBLAST  Efficient, highly scalable parallel BLAST implementation [IPDPS ’05] Improves mpiBLAST Focus on data handling Up to order of magnitude improvement on overall performance Currently being merged with mpiBLAST  Major contributions Applying collective I/O techniques to bioinformatics, enabling  Dynamic database partitioning  Parallel database input and result output Efficient result data processing  Removing master bottleneck

DOE Network PI Meeting 2005 pioBLAST Sample Performance Results  Platform: SGI Altix at ORNL GHz Itanium2 processors 8GB memory per processor  Database: NCBI nr (1GB)  Node scalability tests (top figure) Queries – 150k queries randomly sampled from nr Varied no. of processors

DOE Network PI Meeting 2005 Prelim Result 2: Active Buffering  Hides periodic I/O costs behind computation phases [IPDPS ’02, ICS ’02, IPDPS ’03, IEEE TPDS (to appear)]  Organizes idle memory resources into buffer hierarchy  Masks costs of scientific data formats  Panda Parallel I/O Library University of Illinois Client-server architecture  ROMIO Parallel I/O Library Argonne National Lab Popular MPI-IO implementation, included in MPICH Server-less architecture ABT (Active Buffering with Threads)

DOE Network PI Meeting 2005 Write Throughput w. Active Buffering w/o buffer overflow w. buffer overflow

DOE Network PI Meeting 2005 Prelim Result 3: Application-level Prefetching  GODIVA Framework: hides periodic input costs behind computation phases [ICDE ’04] General Object Data Interface for Visualization Applications In-memory database managing data buffer locations Relational database-like interfaces Developer controllable prefetching and caching Developer-supplied read functions

DOE Network PI Meeting 2005 On-going Research  Parallel execution plan optimization Explore optimization space of bio sequence processing tools on large-scale machines Develop algorithm-independent cost models  Efficient parallel I/O w. scientific data formats Investigate unified caching, prefetching and buffering  DOE Collaborators Team led by Al Geist and Nagiza Samatova (ORNL) Team leb by Wu-Chun Feng (LANL)