Download presentation
Presentation is loading. Please wait.
Published byFerdinand Evans Modified over 8 years ago
1
Steve Gallo Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 10, 2010
2
Outline Center for Computational Research Technology audit at CCR Our vision for Technology Audit Proposed Technology Audit The Team User Surveys – Finholt Application Kernels TGMoD Portal
3
Center for Computational Research More than 10 years experience delivering HPC in an academic setting Mission: Enabling and facilitating research within the university Staff: 15 FTE Provide Cycles, software engineering, scientific computing/modeling, medical informatics, visualization Computational Cycles Delivered in 2009: 360,000 jobs run (1000 per day) 720,000 CPU days delivered $10M Infrastructure Upgrades in 2010: (6000 cores, 800 TB storage) NSF MRI – Netezza Data Intensive HPC Cluster NSF CDI – GPU Cluster NYSERDA Green IT Cluster NIH S10 – HPC Cluster Portal/Tool Development WebMO (Chemistry) iNquiry (Bioinformatics) UBMoD (Metrics on Demand) NYSTAR HPC 2 Consortium UB, RPI, StonyBrook, Brookhaven, NYSERNet Bringing HPC to NYS Industry
4
UB Metrics on Demand has been good for CCR UBMoD: Web-based Interface for On-demand Metrics CPU cycles delivered, Storage, Queue Statistics, etc Customized interface (Provost, Dean, Chairs, Faculty)
5
Technology Audit – Our Vision TA will help TG:XD and resource providers maximize and measure impact of the infrastructure investment on science and engineering research TA will help TG/RP’s provide certifiably good service -- by providing independent, quantitative metrics, and analysis of related policies TA will help TG/RPs in early identification of issues TA will help ensure that users’ needs are met, improving the effectiveness of minor as well as major users Suite of lightweight application kernels to facilitate low-level monitoring and performance analysis Users/Centers can contribute/deposit their own kernels Provides a means for on-demand testing as well as setting performance expectations for end users Role-based system allows RP support personnel to view and customize their interface to prioritize local/individual information Broader Impact Auditing Framework will be disseminated to other non-TD:XD entities such as university-based HPC centers
6
TA Should Benefit Everyone Organized access to data metrics, from system administrators to program managers Role-based to assure privacy as well as priority Comprehensive view of deliverables At the end of the day, it is about how much science/engineering research is getting done (a good story that benefits from quantitative measures) Inform decisions about future platforms and goals
7
TG:xD Technology Audit TG User Needs and Usability Analysis Successful TA effort dependent on quality of the tools and services developed and their fit to users needs Emphasize continuous improvement in deployed technologies Application Kernel Framework On-demand and passive monitoring of TG:XD infrastructure using lightweight application kernel framework Diagnostic set of tools Application Kernel Toolkit Users can contribute/deposit their own kernel TG Metrics on Demand (TGMoD) GUI Interface Display results of all metrics Role based display: User, PI, Sys Admin, Center Director, NSF Program Officer, etc
8
TA Architecture
9
Technology Audit - The Team UB – Application Kernels and TGMoD Interface Dr. Thomas Furlani (PI): Director, CCR (computational chemist) Dr. Matt Jones (coPI): Associate Director, CCR (computational physicist) Mr. Steve Gallo: Lead Software Engineer, CCR Mr. Andrew Bruno: Software Engineer, CCR, Mr. Jonathan Bednasz: Senior Programmer/Analyst, CCR Dr. Vipin Chaudhary (coPI): Computer Science Architect/Professor Computational Scientist (TBD), Scientific Programmer (TBD), Technical Project Manager (TBD) University of Michigan – User Surveys Dr. Thomas Finholt (SP): Professor and Associate Dean School of Information Post Doctoral Associate (TBD) Indiana – Middleware Dr. Gregor von Lazewski (coPI): Indiana’s FutureGrid Project Scientific Programmer (TBD) Tech-X - Middleware Dr. Mark Green
10
Organizational Chart
11
Every RP likely has some already Typically based on local experience and application expertise Computationally lightweight Run continuously (and on demand) on TG:XD to actively measure performance Custom built and derived from open source codes such as GAMESS, NWChem, NAMD, OpenFOAM, etc Measure system performance Local scratch, global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, network latency and bandwidth Application kernel toolbox Kernels available to community Provides framework for creating new application kernels Software developers can deposit additional kernels Potentially leverage existing INCA tools to facilitate kernel reporting Application Kernels
12
TG Metrics on Demand Portal Based on open source UBMoD tool Display Metrics System performance (Application Kernel performance) Help desk tickets, science gateway utilization, publications, etc Opportunity for data mining/analysis of statistics RESTful API for data mining research or custom external interfaces For example, Google gadget Custom Report Builder Multiple File Export Capability - Excel, PDF, XML, RSS, etc Role based: View tailored to role of user and customizable End user and PI - Improve throughput and facilitate utilization System administrators - Monitor system performance - Identify problems before they impact users Center Directors, NSF Program Officer
13
TGMoD Interface Prototype
14
Application Benchmarks WRF: Mesoscale atmospheric modeling code OOCORE: Out-of-core solver GAMESS: Quantum chemistry code MILC: Particle physics lattice QCD code PARATEC: Parallel Total Energy Code (electronic structure) HOMME: Global atmospheric model NWChem: Quantum chemistry code NAMD: Molecular Dynamics OpenFoam: CFD PeTSC: PDE solver
15
Technology Audit – Our Vision The Technology Audit (TA) will both provide quality assurance and quality control for TG (XD) to help maximize the positive impact of the infrastructure investment on science and engineering research. Coupled with a strong user needs and usability analysis program, TA will be leveraged to help ensure that users’ needs are met, with particular emphasis on improving the effectiveness of major users, bringing novice users up to speed rapidly and attracting non-traditional users. TA will also provide necessary inputs to other stakeholders, namely, NSF, resource providers (RP) and the TG user community represented by the Science Advisory Board, interested in the optimal usage of TG (XD) resources. TA will also strive to provide quantitative and qualitative metrics of performance rapidly to all of these stakeholders.
16
BioACE (bioinformatics) Sun V880 (3), Sun 6800 Sun 280R (2) Intel P4 Servers Sun 3960: 7 TB Disk Storage EMC SAN 25 TB Disk; 190 TB Tape 32 GB RAM; 400 GB Disk Faculty Clusters Physics Chemistry Environmental Engineering Nuclear Medicine School of Management CCR Compute Resources (13 Tflops) Linux Cluster (10TF) 1600 P4 Processors (3.2 GHz) Myrinet Interconnect 2000 GB RAM 30 TB SAN; 60 TB Local Disk Linux Cluster (3.0TF) 512 P4 Processors (3.0 GHz) GigE Interconnect Itanium Altix3700 (0.4TF) Shared Memory 64 Processors (1.3GHz ITF2) 256 GB RAM, 2.5 TB Disk
17
Benchmarks HPCC (HPC Challenge Benchmarks): HPL: Linpack TPP Benchmark measures FP rate STREAM: sustainable memory bandwidth (GB/s) PTRANS (parallel matrix transpose): communication capacity of interprocessor network RandomAccess: rate of random updates of memory FFTE: floating point rate of 1D (double) DFT B_eff: communication bandwidth and latency NASA Parallel Benchmarks (NPB) Serial, MPI, and hybrid MPI-OpenMP Derived from CFD, 5 kernels + 3 pseudo-applications, many potential problem sizes Fortran, Java, and grid (GridNPB) versions available
18
Application Kernels Every RP likely has some already Typically based on local experience and application expertise Computationally lightweight Run continuously (and on demand) on TG:XD to actively measure performance Custom built and derived from open source codes such as GAMESS, NWChem, NAMD, OpenFOAM, etc Measure system performance Local scratch, global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, network latency and bandwidth Application kernel toolbox Kernels available to community Provides framework for creating new application kernels Software developers can deposit additional kernels TG Metrics on Demand (TGMoD) Portal Based on open source UBMoD tool Portal usage, help desk tickets, publications, etc Role based: View tailored to role of user Sys Admin, PI, Center Director, NSF Program Officer Opportunity for data mining/analysis of statistics RESTful API for data mining research or custom built external interfaces For example, Google gadget RSS feeds Focus on managing/improving access to a very large amount of information, customized to individual needs Technology Audit Details
19
TG Metrics on Demand (TGMoD) Portal Based on open source UBMoD tool Portal displays usage, metrics, help desk tickets, publications, etc. Role based: View tailored to role of user Sys Admin, PI, Center Director, NSF Program Officer Customizable reporting facilities Opportunity for data mining/analysis of statistics RESTful API for data mining research or custom built external interfaces For example, Google gadget Roles-based RSS feeds Focus on managing/facilitating access to a very large amount of information, customized to individual needs Technology Audit Details
20
Tom Furlani, PhD Director, Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 3, 2010
21
Application Kernel Toolbox Application Kernel Toolbox Provides scripts and descriptive framework for creating new application kernels in varied application areas Scripts likely perl/XML based for automating recording and reporting of output results, designed to be easily injected into TGMoD database Self-describing output fields for each performance metric, potential early termination conditions Cross-platform generic run tools/scripts to simplify resource requests, minimize number of dependencies on specific platform resources TGMoD interface allows customized scheduling requests for specific kernel groups Potentially leverage existing INCA tools to facilitate kernel reporting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.