High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Virtual Machine Usage in Cloud Computing for Amazon EE126: Computer Engineering Connor Cunningham Tufts University 12/1/14 “Virtual Machine Usage in Cloud.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
8.
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures Min Si PhD student at University of Tokyo, Tokyo, Japan Advisor : Yutaka.
FACULTY OF COMPUTER SCIENCE OUTPUT DD  annual event from students for students with contact to industry (~800 visitors)  live demonstrations  research.
4/27/2006Education Technology Presentation Visual Grid Tutorial PI: Dr. Bina Ramamurthy Computer Science and Engineering Dept. Graduate Student:
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
What is Concurrent Programming? Maram Bani Younes.
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
An Introduction to the Common Component Architecture for the poster: A Study of the Common Component Architecture (CCA) Forum Software Daniel S. Katz,
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Spring 2011 CIS 4911 Senior Project Catalog Description: Students work on faculty supervised projects in teams of up to 5 members to design and implement.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
The Globus Project: A Status Report Ian Foster Carl Kesselman
Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
Hiding Periodic I/O Costs in Parallel Applications Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Introduction to Microsoft Windows 2000 Integrated support for client/server and peer-to-peer networks Increased reliability, availability, and scalability.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
7. Grid Computing Systems and Resource Management
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
Pre-calculated Fluid Simulator States Tree Marek Gayer and Pavel Slavík C omputer G raphics G roup Department of Computer Science and Engineering Faculty.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Derek Weitzel Grid Computing. Background B.S. Computer Engineering from University of Nebraska – Lincoln (UNL) 3 years administering supercomputers at.
Background Computer System Architectures Computer System Software.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
ENHANCING PERFORMANCE OF DATA MIGRATION VIA PARALLEL DATA COMPRESSION
Applying Control Theory to Stream Processing Systems
Data Warehousing and Data Mining
Hybrid Programming with OpenMP and MPI
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12, 2003

2 Roadmap Introduction Research area description Past research Future research directions

3 About Myself Xiaosong Ma –Pronunciation: Shiao-song –Homepage through the faculty directory Brief bio –B.S., Peking University, China –Ph.D., UIUC Hobbies –Traveling –Food –Photography, movies, tennis …

4 High-Performance Computing Enabled by increasing computational power –Scientific computation –Parallel data mining –Web data processing High-performance computing in daily life –Weather forecast –Web crawling and web search –Games, movie graphics, virtual reality

5 Past Research I/O performance optimization for parallel applications –High-level buffering and prefetching techniques –Hiding the I/O cost –Utilizes idle resources for maximizing inter-task parallelism –Lightweight database support for visualization applications –Making optimizations portable and adaptive

6 Parallel I/O in Scientific Simulations Write-intensive Collective and periodic Bottleneck-prone “Poor stepchild” Traditional collective I/O focused on data transfer Computation … I/O Computation I/O Computation I/O Computation …

7 Active Buffering Hides periodic I/O costs behind computation phases [IPDPS ’02, ICS ’02, IPDPS ’03] Organizes idle memory resources into buffer hierarchy Controlled by state machines –Flexible regarding buffer space availability –Adapts to applications’ output pattern –Flexible software architecture

8 AB vs. Asynchronous I/O

9 Deployment of Active Buffering Panda Parallel I/O Library –University of Illinois –Client-server architecture ROMIO Parallel I/O Library –Argonne National Lab –Popular MPI-IO implementation, included in MPICH –Server-less architecture –ABT (Active Buffering with Threads)

10 Sample Execution with ABT Data reorganization and buffering comp. phase 1 comp. phase 2 comp. phase 3 comp. phase 4 I/O phase 1 I/O phase 2 I/O phase 3 time

11 I/O in Visualization Periodic reads Dual modes of operation –Interactive –Batch-mode Harder to overlap I/O with computation Computation … I/O Computation I/O Computation I/O Computation

12 Lightweight Data Management Process large number of datasets –Scientific data are structured –Conventional DBMS rarely used in parallel scientific codes GODIVA framework [ICDE ’04] –General Object Data Interface for Visualization Applications –In-memory database managing data buffer locations –Relational database-like interfaces –Developer controllable prefetching and caching –Developer-supplied read functions

13 GODIVA Architecture

14 Sample Record Instance Sample query –Where is the temperature array holding block_0003 at time-step in a fluid record?

15 Prefetching and Caching process unit –readUnit –addUnit and waitUnit –finishUnit and deleteUnit // add all units. addUnit("fluid_file1", read_file); addUnit("fluid_file2", read_file); // process array records in fluid_file1 waitUnit("fluid_file1"); do_visualization_computation("fluid_file1"); deleteUnit("fluid_file1"); // process array records in fluid_file2 waitUnit("fluid_file2"); do_visualization_computation("fluid_file2"); deleteUnit("fluid_file2");

16 Voyager on a Single-processor Workstation

17 Voyager on a Dual-processor Cluster node

18 Future work: I/O Performance Prediction Objective: to predict the I/O time for high- performance applications Challenge: lack of information in the Grid environment –Knowledge on applications or systems not available –Hard to simulate real applications in real environments –Hard to predict scalability –How do we parameterize an application?

19 Future work: Sci. Data Management Objective: to manage data in scientific applications effectively and efficiently Challenge: two research world not well connected –Conventional databases not suitable for HPC –Scientific databases designed for specific applications –General approach? Need to handle storage and I/O for different types of datasets and their distribution

20 Summary Wide area of potential research –Parallel computing –Databases –Operating systems/storage systems Many open problems and new challenges

21 References [ICDE ’04] Xiaosong Ma, Marianne Winslett, John Norris, Xiangmin Jiao and Robert Fiedler, GODIVA: Lightweight Data Management for Scientific Visualization, the 20th International Conference on Data Engineering, 2004 [PhD Thesis] Xiaosong Ma, Hiding Periodic I/O Costs for Parallel Applications, PhD thesis, University of Illinois, 2003 [IPDPS ’03] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Improving MPI-IO Output Performance with Active Buffering Plus Threads, 2003 International Parallel and Distributed Processing Symposium [PDSECA ’03] Xiaosong Ma, Xiangmin Jiao, Michael Campbell and Marianne Winslett, Flexible and Efficient Parallel I/O for Large-Scale Multi-component Simulations, The 4th Workshop on Parallel and Distributed Scientific and Engineering Computing with Applications [ICS ’02] Jonghyun Lee, Xiaosong Ma, Marianne Winslett and Shengke Yu, Active Buffering Plus Compressed Migration: An Integrated Solution to Parallel Simulations' Data Transport Needs, the 16th ACM International Conference on Supercomputing [IPDPS ’02] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Faster Collective Output through Active Buffering, 2002 International Parallel and Distributed Processing Symposium