N Tropy: A Framework for Knowledge Discovery in a Virtual Universe Harnessing the Power of Parallel Grid Resources for Astronomical Data Analysis Jeffrey.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
SLA-Oriented Resource Provisioning for Cloud Computing
Introduction CSCI 444/544 Operating Systems Fall 2008.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Introductory Courses in High Performance Computing at Illinois David Padua.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
Reference: Message Passing Fundamentals.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Derek C. Richardson (U Maryland) PKDGRAV : A Parallel k-D Tree Gravity Solver for N-Body Problems FMM 2004.
Astro-DISC: Astronomy and cosmology applications of distributed super computing.
Enabling Knowledge Discovery in a Virtual Universe Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey P. Gardner Andrew.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Performance Evaluation of Parallel Processing. Why Performance?
N Tropy: A Framework for Knowledge Discovery in a Virtual Universe Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Architectures of distributed systems Fundamental Models
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Web and Grid Services from Pitt/CMU Andrew Connolly Department of Physics and Astronomy University of Pittsburgh Jeff Gardner, Alex Gray, Simon Krughoff,
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Enabling Rapid Development of Parallel Tree-Search Applications Harnessing the Power of Massively Parallel Platforms for Astrophysical Data Analysis Jeffrey.
Science on Supercomputers: Pushing the (back of) the envelope Jeffrey P. Gardner Pittsburgh Supercomputing Center Carnegie Mellon University University.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Multiprocessor Systems Distributed.
Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.
Background Computer System Architectures Computer System Software.
Dynamic Load Balancing Tree and Structured Computations.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
A service Oriented Architecture & Web Service Technology.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Applied Operating System Concepts
Lecture 2: Performance Evaluation
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Clouds , Grids and Clusters
Distributed Shared Memory
Parallel Objects: Virtualization & In-Process Components
Performance Evaluation of Adaptive MPI
Grid Computing B.Ramamurthy 9/22/2018 B.Ramamurthy.
Grid Computing Done by: Shamsa Amur Al-Matani.
Operating System Concepts
Parallel Programming in C with MPI and OpenMP
Operating System Concepts
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

N Tropy: A Framework for Knowledge Discovery in a Virtual Universe Harnessing the Power of Parallel Grid Resources for Astronomical Data Analysis Jeffrey P. Gardner, Andy Connolly Pittsburgh Supercomputing Center University of Pittsburgh Carnegie Mellon University

Mining the Universe can be Computationally Expensive Paradigm shift is astronomy: Sky Surveys Astronomy now generates ~ 1TB data per night With Virtual Observatories, one can pool data from multiple catalogs. Computational requirements are becoming much more extreme relative to current state of the art. There will be many problems that would be impossible without parallel machines. There will be many more problems for which throughput can be substantially enhanced by parallel machines.

Tightly-Coupled Parallelism (what this talk is about) Data and computational domains overlap Computational elements must communicate with one another Examples: N-Point correlation functions New object classification Density estimation Intersections in parameter space Solution(?): N Tropy

N-Point Correlation Function Example: N-Point correlation function 2-, 3-, and 4-Point Correlation Functions measure a variety of cosmological parameters and effects such as baryon fraction, biased galaxy formation, weak lensing, non-Gaussian early-Universe perturbations. For the Sloan Digital Sky Survey (currently one of the largest sky catalogs) 2-pt, O(N 2 ): CPU hours 3-pt, O(N 3 ): CPU weeks 4-pt, O(N 4 ): 100 CPU years! For future catalogs, this will only get worse!

The Challenge of Parallel Data Mining Parallel programs are hard to write! Steep learning curve to learn parallel programming Lengthy development time Parallel world is dominated by simulations: Code is often reused for many years by many people Therefore, you can afford to spend lots of time writing the code. Data Mining does not work this way: Rapidly changing scientific inqueries Less code reuse (Even the simulation community rarely does data mining in parallel) Data Mining paradigm mandates rapid software development! Observational Astronomy has almost no parallel programming experience

The Goal GOAL: Minimize development time for parallel applications. GOAL: Enable scientists with no parallel programming background (or time to learn) to still implement their algorithms in parallel. GOAL: Provide seamless scalability from single processor machines to Grid-distributed MPPs. GOAL: Do not restrict inquiry space.

Methodology Limited Data Structures: Most (all?) efficient data analysis methods use trees. This is because most analyses perform searches through a multidimensional parameter space Limited Methods: Analysis methods perform a limited number of fundamental operations on these data structures.

Vision: A Parallel Framework Computational Steering Python? (C? / Fortran?) Framework (“Black Box”) C++ or CHARM++ User serial compute routines Web Service Layer (at least from Python) Domain Decomposition Tree Traversal Parallel I/O User serial I/O routines VO XML? SOAP? WSDL? SOAP? Key: Framework Components Tree Services User Supplied Result Tracking Workload Scheduling User traversal/decision routines

Proof of Concept: PHASE 1 (complete) Convert parallel N-Body code “PKDGRAV*” to 3-point correlation function calculator by modifying existing code as little as possible. *PKDGRAV developed by Tom Quinn, Joachim Stadel, and others at the University of Washington PKDGRAV (aka GASOLINE) benefits: Highly portable MPI, POSIX Threads, SHMEM, Quadrics, & more Highly scalable 92% linear speedup on 512 processors Scalability accomplished by sophisticated interprocessor data caching < 1 in 100,000 off-PE requests actually result in communication. Development time: Writing PKDGRAV: ~10 FTE years (could be rewritten in ~2) PKDGRAV -> 2-Point: 2 FTE weeks 2-Point -> 3-Point: >3 FTE months

PHASE 1 Performance 10 million particles Spatial 3-Point 3->4 Mpc (SDSS DR1 takes less than 1 minute with perfect load balancing)

N Tropy Proof of Concept: PHASE 2 N Tropy (Currently in progress) Use only Parallel Management Layer and Interprocessor Communication Layer of PKDGRAV. Rewrite everything else from scratch Computational Steering Layer Parallel Management Layer Serial Layer Gravity CalculatorHydro Calculator PKDGRAV Functional Layout Executes on master processor Coordinates execution and data distribution among processors Executes on all processors Interprocessor Communication Layer Passes data between processors

N Tropy Proof of Concept: PHASE 2 N Tropy (Currently in progress) PKDGRAV benefits to keep: Flexible client-server scheduling architecture Threads respond to service requests issued by master. To do a new task, simply add a new service. Portability Interprocessor communication occurs by high-level requests to “Machine-Dependent Layer” (MDL) which is rewritten to take advantage of each parallel architecture. Advanced interprocessor data caching < 1 in 100,000 off-PE requests actually result in communication.

N Tropy Design Computational Steering Layer General-purpose tree building and tree walking routines UserTestCells UserTestParticles Domain decomposition Tree Traversal Parallel I/O PKDGRAV Parallel Management Layer “User” Supplied Layer UserCellAccumulate UserParticleAccumulate Result tracking UserCellSubsume UserParticleSubsume Tree Building 2-Point and 3-Point algorithm are now complete! Interprocessor communication layer Layers retained from PKDGRAV Layers completely rewritten Key: Tree Services Web Service Layer

N Tropy “Meaningful” Benchmarks The purpose of this framework is to minimize development time! Rewriting user and scheduling layer to do an N-body gravity calculation:

N Tropy “Meaningful” Benchmarks The purpose of this framework is to minimize development time! Rewriting user and scheduling layer to do an N-body gravity calculation: 3 Hours

N Tropy New Features (PHASE 3: Coming Soon) Dynamic load balancing Workload and processor domain boundaries can be dynamically reallocated as computation progresses. Data pre-fetching Predict request off-PE data that will be needed for upcoming tree nodes. Work with CMU Auton-lab to investigate active learning algorithms to prefetch off- PE data.

N Tropy New Features (PHASE 4) Computing across grid nodes Much more difficult than between nodes on a tightly-coupled parallel machine: Network latencies between grid resources 1000 times higher than nodes on a single parallel machine. Nodes on a far grid resources must be treated differently than the processor next door: Data mirroring or aggressive prefetching. Sophisticated workload management, synchronization