Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve Gallo Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 10, 2010.

Similar presentations


Presentation on theme: "Steve Gallo Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 10, 2010."— Presentation transcript:

1 Steve Gallo Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 10, 2010

2 Outline  Center for Computational Research  Technology audit at CCR  Our vision for Technology Audit  Proposed Technology Audit  The Team  User Surveys – Finholt  Application Kernels  TGMoD Portal

3 Center for Computational Research  More than 10 years experience delivering HPC in an academic setting  Mission: Enabling and facilitating research within the university  Staff: 15 FTE  Provide  Cycles, software engineering, scientific computing/modeling, medical informatics, visualization  Computational Cycles Delivered in 2009:  360,000 jobs run (1000 per day)  720,000 CPU days delivered  $10M Infrastructure Upgrades in 2010: (6000 cores, 800 TB storage)  NSF MRI – Netezza Data Intensive HPC Cluster  NSF CDI – GPU Cluster  NYSERDA Green IT Cluster  NIH S10 – HPC Cluster  Portal/Tool Development  WebMO (Chemistry)  iNquiry (Bioinformatics)  UBMoD (Metrics on Demand)  NYSTAR HPC 2 Consortium  UB, RPI, StonyBrook, Brookhaven, NYSERNet  Bringing HPC to NYS Industry

4 UB Metrics on Demand has been good for CCR  UBMoD: Web-based Interface for On-demand Metrics  CPU cycles delivered, Storage, Queue Statistics, etc  Customized interface (Provost, Dean, Chairs, Faculty)

5 Technology Audit – Our Vision  TA will help TG:XD and resource providers maximize and measure impact of the infrastructure investment on science and engineering research  TA will help TG/RP’s provide certifiably good service -- by providing independent, quantitative metrics, and analysis of related policies  TA will help TG/RPs in early identification of issues  TA will help ensure that users’ needs are met, improving the effectiveness of minor as well as major users  Suite of lightweight application kernels to facilitate low-level monitoring and performance analysis  Users/Centers can contribute/deposit their own kernels  Provides a means for on-demand testing as well as setting performance expectations for end users  Role-based system allows RP support personnel to view and customize their interface to prioritize local/individual information  Broader Impact  Auditing Framework will be disseminated to other non-TD:XD entities such as university-based HPC centers

6 TA Should Benefit Everyone  Organized access to data metrics, from system administrators to program managers  Role-based to assure privacy as well as priority  Comprehensive view of deliverables  At the end of the day, it is about how much science/engineering research is getting done (a good story that benefits from quantitative measures)  Inform decisions about future platforms and goals

7 TG:xD Technology Audit  TG User Needs and Usability Analysis  Successful TA effort dependent on quality of the tools and services developed and their fit to users needs  Emphasize continuous improvement in deployed technologies  Application Kernel Framework  On-demand and passive monitoring of TG:XD infrastructure using lightweight application kernel framework  Diagnostic set of tools  Application Kernel Toolkit Users can contribute/deposit their own kernel  TG Metrics on Demand (TGMoD) GUI Interface  Display results of all metrics  Role based display: User, PI, Sys Admin, Center Director, NSF Program Officer, etc

8 TA Architecture

9 Technology Audit - The Team  UB – Application Kernels and TGMoD Interface  Dr. Thomas Furlani (PI): Director, CCR (computational chemist)  Dr. Matt Jones (coPI): Associate Director, CCR (computational physicist)  Mr. Steve Gallo: Lead Software Engineer, CCR  Mr. Andrew Bruno: Software Engineer, CCR, Mr. Jonathan Bednasz: Senior Programmer/Analyst, CCR  Dr. Vipin Chaudhary (coPI): Computer Science Architect/Professor  Computational Scientist (TBD), Scientific Programmer (TBD), Technical Project Manager (TBD)  University of Michigan – User Surveys  Dr. Thomas Finholt (SP): Professor and Associate Dean School of Information  Post Doctoral Associate (TBD)  Indiana – Middleware  Dr. Gregor von Lazewski (coPI): Indiana’s FutureGrid Project  Scientific Programmer (TBD)  Tech-X - Middleware  Dr. Mark Green

10 Organizational Chart

11  Every RP likely has some already  Typically based on local experience and application expertise  Computationally lightweight  Run continuously (and on demand) on TG:XD to actively measure performance  Custom built and derived from open source codes such as GAMESS, NWChem, NAMD, OpenFOAM, etc  Measure system performance  Local scratch, global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, network latency and bandwidth  Application kernel toolbox  Kernels available to community  Provides framework for creating new application kernels  Software developers can deposit additional kernels  Potentially leverage existing INCA tools to facilitate kernel reporting Application Kernels

12 TG Metrics on Demand Portal  Based on open source UBMoD tool  Display Metrics  System performance (Application Kernel performance)  Help desk tickets, science gateway utilization, publications, etc  Opportunity for data mining/analysis of statistics  RESTful API for data mining research or custom external interfaces  For example, Google gadget  Custom Report Builder  Multiple File Export Capability - Excel, PDF, XML, RSS, etc  Role based: View tailored to role of user and customizable  End user and PI - Improve throughput and facilitate utilization  System administrators - Monitor system performance - Identify problems before they impact users  Center Directors, NSF Program Officer

13 TGMoD Interface Prototype

14 Application Benchmarks  WRF: Mesoscale atmospheric modeling code  OOCORE: Out-of-core solver  GAMESS: Quantum chemistry code  MILC: Particle physics lattice QCD code  PARATEC: Parallel Total Energy Code (electronic structure)  HOMME: Global atmospheric model  NWChem: Quantum chemistry code  NAMD: Molecular Dynamics  OpenFoam: CFD  PeTSC: PDE solver

15 Technology Audit – Our Vision  The Technology Audit (TA) will both provide quality assurance and quality control for TG (XD) to help maximize the positive impact of the infrastructure investment on science and engineering research. Coupled with a strong user needs and usability analysis program, TA will be leveraged to help ensure that users’ needs are met, with particular emphasis on improving the effectiveness of major users, bringing novice users up to speed rapidly and attracting non-traditional users.  TA will also provide necessary inputs to other stakeholders, namely, NSF, resource providers (RP) and the TG user community represented by the Science Advisory Board, interested in the optimal usage of TG (XD) resources.  TA will also strive to provide quantitative and qualitative metrics of performance rapidly to all of these stakeholders.

16  BioACE (bioinformatics)  Sun V880 (3), Sun 6800  Sun 280R (2)  Intel P4 Servers  Sun 3960: 7 TB Disk Storage  EMC SAN  25 TB Disk; 190 TB Tape  32 GB RAM; 400 GB Disk  Faculty Clusters  Physics  Chemistry  Environmental Engineering  Nuclear Medicine  School of Management CCR Compute Resources (13 Tflops)  Linux Cluster (10TF)  1600 P4 Processors (3.2 GHz)  Myrinet Interconnect  2000 GB RAM  30 TB SAN; 60 TB Local Disk  Linux Cluster (3.0TF)  512 P4 Processors (3.0 GHz)  GigE Interconnect  Itanium Altix3700 (0.4TF)  Shared Memory  64 Processors (1.3GHz ITF2)  256 GB RAM, 2.5 TB Disk

17 Benchmarks  HPCC (HPC Challenge Benchmarks):  HPL: Linpack TPP Benchmark measures FP rate  STREAM: sustainable memory bandwidth (GB/s)  PTRANS (parallel matrix transpose): communication capacity of interprocessor network  RandomAccess: rate of random updates of memory  FFTE: floating point rate of 1D (double) DFT  B_eff: communication bandwidth and latency  NASA Parallel Benchmarks (NPB)  Serial, MPI, and hybrid MPI-OpenMP  Derived from CFD, 5 kernels + 3 pseudo-applications, many potential problem sizes  Fortran, Java, and grid (GridNPB) versions available

18 Application Kernels  Every RP likely has some already  Typically based on local experience and application expertise  Computationally lightweight Run continuously (and on demand) on TG:XD to actively measure performance Custom built and derived from open source codes such as GAMESS, NWChem, NAMD, OpenFOAM, etc  Measure system performance Local scratch, global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, network latency and bandwidth  Application kernel toolbox Kernels available to community Provides framework for creating new application kernels Software developers can deposit additional kernels TG Metrics on Demand (TGMoD) Portal  Based on open source UBMoD tool  Portal usage, help desk tickets, publications, etc  Role based: View tailored to role of user Sys Admin, PI, Center Director, NSF Program Officer  Opportunity for data mining/analysis of statistics RESTful API for data mining research or custom built external interfaces For example, Google gadget  RSS feeds  Focus on managing/improving access to a very large amount of information, customized to individual needs Technology Audit Details

19 TG Metrics on Demand (TGMoD) Portal  Based on open source UBMoD tool  Portal displays usage, metrics, help desk tickets, publications, etc.  Role based: View tailored to role of user Sys Admin, PI, Center Director, NSF Program Officer Customizable reporting facilities  Opportunity for data mining/analysis of statistics RESTful API for data mining research or custom built external interfaces For example, Google gadget  Roles-based RSS feeds  Focus on managing/facilitating access to a very large amount of information, customized to individual needs Technology Audit Details

20 Tom Furlani, PhD Director, Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 3, 2010

21 Application Kernel Toolbox  Application Kernel Toolbox  Provides scripts and descriptive framework for creating new application kernels in varied application areas  Scripts likely perl/XML based for automating recording and reporting of output results, designed to be easily injected into TGMoD database  Self-describing output fields for each performance metric, potential early termination conditions  Cross-platform generic run tools/scripts to simplify resource requests, minimize number of dependencies on specific platform resources  TGMoD interface allows customized scheduling requests for specific kernel groups  Potentially leverage existing INCA tools to facilitate kernel reporting


Download ppt "Steve Gallo Center for Computational Research SUNY Buffalo Technology Audit for TG:XD March 10, 2010."

Similar presentations


Ads by Google