Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable.

Slides:



Advertisements
Similar presentations
PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Chapter 3 Software Two major types of software
Computer Software.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Overview of SQL Server Alka Arora.
The BioBox Initiative: Bio-ClusterGrid Gilbert Thomas Associate Engineer Sun APSTC – Asia Pacific Science & Technology Center.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Computational Chemistry, WebMO, and Energy Calculations
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
National Center for Supercomputing Applications GridChem: Integrated Cyber Infrastructure for Computational Chemistry Sudhakar.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
Classification of Computers
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Chemistry I Spring – Understand what CSM is – Be able to apply WebMO in learning chemistry.
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
07/06/11 New Features of WS-PGRADE (and gUSE) 2010 Q Q2 Miklós Kozlovszky MTA SZTAKI LPDS.
WebMO: A Web-Based Interface for MOPAC Jordan R. Schmidt and William F. Polik Department of Chemistry, Hope College, Holland, MI

CHAPTER TEN AUTHORING.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Renaissance Computing Institute: An Overview Lavanya Ramakrishnan, John McGee, Alan Blatecky, Daniel A. Reed Renaissance Computing Institute.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Tom Furlani Director, Center for Computational Research SUNY Buffalo Metrics for HPC September 30, 2010.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Microsoft Management Seminar Series SMS 2003 Change Management.
INFSO-RI Enabling Grids for E-sciencE Running ECCE on EGEE clusters Olav Vahtras KTH.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
GridChem Architecture Overview Rion Dooley. Presentation Outline Computational Chemistry Grid (CCG) Current Architectural Overview CCG Future Architectural.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Microsoft Desktop Virtualization Kiran N R Optimized Desktop – TSP Microsoft Corporation.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Tackling I/O Issues 1 David Race 16 March 2010.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Integrating Scientific Tools and Web Portals
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Dev Test on Windows Azure Solution in a Box
A Quick Introduction to the WebMO Computational Interface
Presentation transcript:

Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable simulation-based science and engineering The Role of High Performance Computation in Economic Development Rensselaer Polytechnic Institute October , 2008

Outline  How Did Computation Become so Important  Bringing HPC to the Researcher’s Desktop  Portals  Grid Computing  Example Portals  Research  Center for Computational Research Overview  Understanding Protein Chemistry Photoactive Yellow Protein  Toward Petascale level calculations

How did computation become critical? 1940’s  Revolution in  Computing  Storage  Networking/Communication Today 1980’s 1TB - $120.

Computing Revolution Microprocessor Revolution How long would 1 hr calc today take on a PC from 1984? Slide courtesy – Dan Reed, RENCI   Mechanical, relay  7 year doubling   Tube, transistor  2.3 year doubling   Microprocessor  1 – 1.5 year doubling  Exponentials  Transistor density 2X in ~18 months (Moore’s Law)  Graphics: 100X in 3 years  WAN bandwidth: 64X in 2 years  Storage: 7X in 2 years 24 Years!

The Storage Revolution  Megabyte  5 MB: complete works of Shakespeare  Terabyte: 1,000,000 MB – ~$120 today  The text in 1 million books  Entire U.S. Library of Congress is 10TB of text  50,000 trees made into paper and printed  Large Hadron Collider Experiment– 15 TB/day  Petabyte: 1000 terabytes  20 million four-drawer filing cabinets full of text  The Data Tsunami - Many sources  Agricultural, Medical, Environmental, Engineering, Financial  Why so much data?  More sensors – higher resolution  Faster/cheaper storage capability  Faster processors – generate more data!  The challenge: extracting insight!  Without being overwhelmed

Advanced Networking  Networks are the 21st century interstate highway system  expertise and information - the real product  Removes the barriers of time and space Eisenhower Interstate SystemNational Lambda Rail Network

Enabling SBES for Non-Experts  Bringing HPC to the desktop  Analogous to impact of Windows vs DOS for PC’s Brought computing/internet to the home  Many users need periodic, but infrequent access  Experiment driven  Ease of use is key  Shouldn’t need to know about OS, compilers, queuing system, etc  GUI Interface, Web-based, Access anywhere  How do we get there?  Focus on development of portals, custom software and tools, data models, GUI’s, etc.  Provide training on the use of these tools  Ex: nanoHUB – one stop resource for nanotechnology

“Old School” Computing Input File VPN software Secure Shell software Unix commands Use VPN to access network Secure login to front-end machine Create subdirectory Upload input data file Add keywords to Input file Secure file transfer Identify keywords for model Edit input file Create PBS script file Edit file Application command line Set number of processors PBS format and syntax Set path and variables Submit job to queue Set run time and queue PBS commands Monitor job

Portal Driven Computing Input File Secure login to web portal Upload input data file Select model and run job Monitor job View Output in Browser View Output Open Browser Monitor Jobs Select Model

What is an Application Portal?  No consistent definition  Web-based  On-line simulation from you browser  Simulation typically doesn’t run on your PC  Doesn’t have to be grid enabled  WebMO  Computational Chemistry Portal  nanoHUB  Web-based resource for research, education and collaboration in nanotechnology  Includes application portals (tools)

Portal Basics  Remote Access to simulations and compute power V Application Server Authentication Internet ccr.buffalo.edu Remote Desktop Run Simulation Export Display

Application Portals  Benefits  Scientists able to focus on research rather than details of computing environment  Underlying infrastructure complexities are hidden  Transparently integrate compute and data resources  Moving application to a web-based interface provides ubiquitous access  Single sign-on – Don’t have to maintain accounts on many machines  Challenges  Requires close collaboration between domain experts and developers  Developers must be aware of and hide underlying complexity  Must be easy to use (web-based, GUI)  Must provide full application functionality

Grid Enabling Applications  Why Needed  Scientists require an ever growing amount of compute and storage resources  Experiments may have requirements beyond the capabilities of a single data center  Datasets are growing at a tremendous rate  Grid Computing  Provides infrastructure for data and job management  Handles authentication of users across administrative and political domains  Provides monitoring of resources and user jobs  Allows researchers to harness the power of multiple datacenters for large experiments  Provide reusable interface to commonly used functions: Job status, job submission, file management

Example Portals  WebMO – Computational Chemistry  REDfly – Bioinformatics  iNquiry: Common web interface to many command-line tools  GenePattern: Scientific workflow and genomic analysis tools

CCR Computational Chemistry Portal CCR iNquiry Bioinformatics Portal, Glimmer page  Based on WebMO:   CCR portal: webmo.ccr.buffalo.edu  Extensive QC Support  Gaussian, GAMESS, NWChem, Q-Chem, Mopac, Molpro, Tinker  Interfaces with batch queues on U2 and several faculty clusters

Computational Chemistry Portal  Browser based login  Menu driven

Computational Chemistry Portal  Choose level of theory

Computational Chemistry Portal  View output

Computational Chemistry Portal  ……including vibrational modes

Database/Portal Development  REDfly (Regulatory Element Database for Fly) Database of transcriptional regulatory elements  Aggregates data from multiple offline & online sources  Over 2100 entries  Most comprehensive resource of curated animal regulatory elements  Fully searchable, includes DNA sequence, gene expression data, link-outs to other databases  Extensive collaboration with other online data sources using web services

CCR Bioinformatics Portal  Based on iNquiry:   Web portal: inquiry.ccr.buffalo.edu  Extensive Application Support  Includes popular open- source bioinformatics packages  EMBOSS, *PHYLIP, HMMer, BLAST, MPI-BLAST, NCBI Toolkit, Glimmer, Wise2,*ClustalW, *BLAT, *FASTA  Extensible for customized application interfaces  Uses U2 Compute Cluster as Computational Engine

National Library Statistics Portal  Association of Academic Health Science Libraries (AAHSL)  Online custom survey tool with custom features not found in general purpose web surveys  Online creation and review of electronic surveys by AAHSL editor Volumes, gate count, services offered, salaries  Support for role-based access restrictions (AAHSL editor, committee members, library directors, staff)  Tools for tracking library surveys and survey results  Automatic notifications  Custom retrospective data analysis and charting tools for peer library groups

TITAN - Modeling Geohazards  Modeling of Volcanic Flows, Mud flows (flash flooding), and Avalanches  Benefits for Developers  Developers – too much time supporting user installations  Support single web-based portal  CCR supports back-end infrastructure  Frees developers to focus on improving the models, science  Integrate information from several sources  Simulation results  Remote sensing  GIS data  Web enable for remote access

Metrics on Demand Portal  UBMoD: Web-based Interface for On-demand Metrics  CPU cycles delivered, Storage, Queue Statistics, etc  Role based interface (User, Faculty, Staff, Admin)  Available in open source :

Center for Computational Research  Under NYS Center for Excellence in Bioinformatics & Life Sciences  Moved to New Buffalo Life Sciences Complex Building  Leading Academic Supercomputing Site  Mission: “Enabling and facilitating research within the University community”  Enable Research by Providing  high-end computing and visualization resources, software engineering, scientific computing/modeling, bioinformatics/computational biology, scientific and urban visualization, advanced computing systems  Industrial Outreach/Technology Transfer to WNY  Education, Outreach and Training in WNY

2007 Highlights  Computational Cycles Delivered in 2007:  224 different users submitted jobs (88 research groups)  354,447 jobs run (almost 1000 per day)  700,000 CPU days delivered  200 new user accounts created  CIT/CCR Collaboration to Improve Research Computing  Condor deployment  Portal/Tool Development  Make machines easier to use WebMO (Chemistry) iNquiry (Bioinformatics) UBMoD (Metrics on Demand)  Accountability  On-line real-time metrics  UB 2020 Campus Master Planning  3D models of all 3 campuses  NYSGrid

CCR Research & Projects  Urban Simulation and Visualization  Accident Reconstruction  Risk Mitigation (GIS)  Medical Imaging  High School Workshops  Cluster Computing  Data Fusion  Groundwater Flow Modeling  Turbulence and Combustion Modeling  Molecular Structure Determination  Protein Folding Prediction  Data Mining – Digital Gov, Library  Grid Computing  Computational Chemistry  Biomedical Engineering  Bioinformatics

Photoactive Yellow Protein  Simple prototype of Rhodpsin family of proteins  Chromophore is located completely inside the protein pocket  Protein environment causes absorption shift from 2.70 eV (gas phase) to 2.78 eV (protein) yielding the yellow color

Chromophore Spectra Measured  Experimental spectra of the protein active site in vacuum, in a protein and in water solution  Provides insight into environmental effects on electronic spectra, large shift of absorption maximum  Can gauge accuracy of theory

Modeling the System  Combined Quantum Mechanical / Molecular Mechanical Method  System is divided into a QM part and a MM part  QM used in to model “important” part of system; MM used to model remainder  The QM part includes the active site of the protein  The MM part includes the rest of the protein, as well as surrounding water molecules QM

QM versus MM based Methods QM Calculations Advantages: Very accurate, based on first principles (ab initio, DFT - there are not empirical parameters involved), can treat bond breaking and formation Disadvantages: Time consuming, limited to small molecular systems (~100 atoms) MM Calculations Advantages: Very fast, capable to calculate entire proteins or solutions (~100,000 atoms) Disadvantages: Less accurate, based on empirical parameters, not capable to calculate chemical reactions (electrons are not involved) QM/MM

Why use the QM/MM Method?  Improved accuracy (QM) and faster (MM)  Model active site of proteins  Drug-receptor binding  Electrostatic effects  Steric effects  Interpretation of experimental data  Vibrational spectra  Electronic spectra  Mechanism of enzymatic activity  Reaction profiles  Thermal motion effects on reactivity

Modeling Protein Dynamics 1.Run MM based Molecular Dynamics simulation 2.From MD simulation, randomly select protein conformations (snapshots) 3.Run QM/MM simulation for each snapshot 4.Generate results based on averages taken from snapshots Protein dynamics time Goal: Understand how protein thermal dynamics effects function

Getting Results Faster  Carry out QM/MM calcs simultaneously for many snapshots (protein conformations)

QM/MM Calc for Each Snapshot  After MD, protein snapshots are randomly selected (1000)  Full geometry optimization of the ligand inside the fixed protein matrix (Q-Chem)  QM: DFT/B3LYP/6-31+G* (ligand)  MM: AMBER (protein + water)  Electronic excitations (Q-Chem):  QM: TDDFT/B3LYP/aug-cc-pVTZ (ligand)  MM: AMBER (protein + water) 4500 water molecules

Active Site (chromophore) The active site of yellow protein chromophore - 4-hydroxy-cinnemic acid Real Chromophore Model Chromophore

CPU Demand - Current Calculation  MD Simulation  1600 CPU hours  Select 1000 Snapshots  Each Snapshot (54 CPU Hours)  Combined QM/MM Geometry Optimization 24 CPU hours (3 hours on 8 processors)  Electronic Excitation Calc 30 CPU Hours  Total for all 1000 snapshots + MD Simulation  55,600 CPU Hours (2300 CPU Days)

Results Electronic Excitation Gas-Phase (eV) Protein (eV) Solution (eV) Calculated (0.06)  = (0.04)  =0.45 Experiment  =  =0.40 ( ) - standard deviation  - change relative to the gas phase Electronic excitations of the chromophore

Toward Petascale Level Calc  More accurate MD simulation  Larger water sphere (50 A radius) ~12,000 water molecules  500 hours on 32 processors - 16,000 CPU hours  More accurate QM/MM simulations  Larger basis set  350 hours on 16 processors CPU hours  Better statistics  100,000 MD snapshots (560,000,000 CPU hours)  2 MD simulations - 1,120,000,000 CPU hours!

Power of Parallel Processing  Assume a modest 4X increase in processor performance/computational efficiency over the next few years  Reduce requirement to about 10,000,000 CPU days  Translates to 100 CPU days on 100,000 cores  Combined QM/MM simulations of this scale possible on petascale level hardware

Acknowledgements  Portal Development  Steve Gallo, Dr. Matt Jones, Jon Bednasz, Rob Leach  Combined QM/MM Calculations  Dr. Marek Friendorf  Funding  NIH