Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable.

Similar presentations


Presentation on theme: "Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable."— Presentation transcript:

1 Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable simulation-based science and engineering The Role of High Performance Computation in Economic Development Rensselaer Polytechnic Institute October 22 - 24, 2008

2 Outline  How Did Computation Become so Important  Bringing HPC to the Researcher’s Desktop  Portals  Grid Computing  Example Portals  Research  Center for Computational Research Overview  Understanding Protein Chemistry Photoactive Yellow Protein  Toward Petascale level calculations

3 How did computation become critical? 1940’s  Revolution in  Computing  Storage  Networking/Communication Today 1980’s 1TB - $120.

4 Computing Revolution Microprocessor Revolution How long would 1 hr calc today take on a PC from 1984? Slide courtesy – Dan Reed, RENCI  1890-1945  Mechanical, relay  7 year doubling  1945-1985  Tube, transistor  2.3 year doubling  1985-2005  Microprocessor  1 – 1.5 year doubling  Exponentials  Transistor density 2X in ~18 months (Moore’s Law)  Graphics: 100X in 3 years  WAN bandwidth: 64X in 2 years  Storage: 7X in 2 years 24 Years!

5 The Storage Revolution  Megabyte  5 MB: complete works of Shakespeare  Terabyte: 1,000,000 MB – ~$120 today  The text in 1 million books  Entire U.S. Library of Congress is 10TB of text  50,000 trees made into paper and printed  Large Hadron Collider Experiment– 15 TB/day  Petabyte: 1000 terabytes  20 million four-drawer filing cabinets full of text  The Data Tsunami - Many sources  Agricultural, Medical, Environmental, Engineering, Financial  Why so much data?  More sensors – higher resolution  Faster/cheaper storage capability  Faster processors – generate more data!  The challenge: extracting insight!  Without being overwhelmed

6 Advanced Networking  Networks are the 21st century interstate highway system  expertise and information - the real product  Removes the barriers of time and space Eisenhower Interstate SystemNational Lambda Rail Network

7 Enabling SBES for Non-Experts  Bringing HPC to the desktop  Analogous to impact of Windows vs DOS for PC’s Brought computing/internet to the home  Many users need periodic, but infrequent access  Experiment driven  Ease of use is key  Shouldn’t need to know about OS, compilers, queuing system, etc  GUI Interface, Web-based, Access anywhere  How do we get there?  Focus on development of portals, custom software and tools, data models, GUI’s, etc.  Provide training on the use of these tools  Ex: nanoHUB – one stop resource for nanotechnology

8 “Old School” Computing Input File VPN software Secure Shell software Unix commands Use VPN to access network Secure login to front-end machine Create subdirectory Upload input data file Add keywords to Input file Secure file transfer Identify keywords for model Edit input file Create PBS script file Edit file Application command line Set number of processors PBS format and syntax Set path and variables Submit job to queue Set run time and queue PBS commands Monitor job

9 Portal Driven Computing Input File Secure login to web portal Upload input data file Select model and run job Monitor job View Output in Browser View Output Open Browser Monitor Jobs Select Model

10 What is an Application Portal?  No consistent definition  Web-based  On-line simulation from you browser  Simulation typically doesn’t run on your PC  Doesn’t have to be grid enabled  WebMO  Computational Chemistry Portal  nanoHUB  Web-based resource for research, education and collaboration in nanotechnology  Includes application portals (tools)

11 Portal Basics  Remote Access to simulations and compute power V Application Server Authentication Internet ccr.buffalo.edu Remote Desktop Run Simulation Export Display

12 Application Portals  Benefits  Scientists able to focus on research rather than details of computing environment  Underlying infrastructure complexities are hidden  Transparently integrate compute and data resources  Moving application to a web-based interface provides ubiquitous access  Single sign-on – Don’t have to maintain accounts on many machines  Challenges  Requires close collaboration between domain experts and developers  Developers must be aware of and hide underlying complexity  Must be easy to use (web-based, GUI)  Must provide full application functionality

13 Grid Enabling Applications  Why Needed  Scientists require an ever growing amount of compute and storage resources  Experiments may have requirements beyond the capabilities of a single data center  Datasets are growing at a tremendous rate  Grid Computing  Provides infrastructure for data and job management  Handles authentication of users across administrative and political domains  Provides monitoring of resources and user jobs  Allows researchers to harness the power of multiple datacenters for large experiments  Provide reusable interface to commonly used functions: Job status, job submission, file management

14 Example Portals  WebMO – Computational Chemistry  REDfly – Bioinformatics  iNquiry: Common web interface to many command-line tools  GenePattern: Scientific workflow and genomic analysis tools

15 CCR Computational Chemistry Portal CCR iNquiry Bioinformatics Portal, Glimmer page  Based on WebMO:  www.webmo.net www.webmo.net  CCR portal: webmo.ccr.buffalo.edu  Extensive QC Support  Gaussian, GAMESS, NWChem, Q-Chem, Mopac, Molpro, Tinker  Interfaces with batch queues on U2 and several faculty clusters

16 Computational Chemistry Portal  Browser based login  Menu driven

17 Computational Chemistry Portal  Choose level of theory

18 Computational Chemistry Portal  View output

19 Computational Chemistry Portal  ……including vibrational modes

20 Database/Portal Development  REDfly (Regulatory Element Database for Fly) Database of transcriptional regulatory elements  Aggregates data from multiple offline & online sources  Over 2100 entries  Most comprehensive resource of curated animal regulatory elements  Fully searchable, includes DNA sequence, gene expression data, link-outs to other databases  Extensive collaboration with other online data sources using web services

21 CCR Bioinformatics Portal  Based on iNquiry:  www.bioteam.net www.bioteam.net  Web portal: inquiry.ccr.buffalo.edu  Extensive Application Support  Includes popular open- source bioinformatics packages  EMBOSS, *PHYLIP, HMMer, BLAST, MPI-BLAST, NCBI Toolkit, Glimmer, Wise2,*ClustalW, *BLAT, *FASTA  Extensible for customized application interfaces  Uses U2 Compute Cluster as Computational Engine

22 National Library Statistics Portal  Association of Academic Health Science Libraries (AAHSL)  Online custom survey tool with custom features not found in general purpose web surveys  Online creation and review of electronic surveys by AAHSL editor Volumes, gate count, services offered, salaries  Support for role-based access restrictions (AAHSL editor, committee members, library directors, staff)  Tools for tracking library surveys and survey results  Automatic notifications  Custom retrospective data analysis and charting tools for peer library groups

23 TITAN - Modeling Geohazards  Modeling of Volcanic Flows, Mud flows (flash flooding), and Avalanches  Benefits for Developers  Developers – too much time supporting user installations  Support single web-based portal  CCR supports back-end infrastructure  Frees developers to focus on improving the models, science  Integrate information from several sources  Simulation results  Remote sensing  GIS data  Web enable for remote access

24 Metrics on Demand Portal  UBMoD: Web-based Interface for On-demand Metrics  CPU cycles delivered, Storage, Queue Statistics, etc  Role based interface (User, Faculty, Staff, Admin)  Available in open source :

25 Center for Computational Research  Under NYS Center for Excellence in Bioinformatics & Life Sciences  Moved to New Buffalo Life Sciences Complex Building  Leading Academic Supercomputing Site  Mission: “Enabling and facilitating research within the University community”  Enable Research by Providing  high-end computing and visualization resources, software engineering, scientific computing/modeling, bioinformatics/computational biology, scientific and urban visualization, advanced computing systems  Industrial Outreach/Technology Transfer to WNY  Education, Outreach and Training in WNY

26 2007 Highlights  Computational Cycles Delivered in 2007:  224 different users submitted jobs (88 research groups)  354,447 jobs run (almost 1000 per day)  700,000 CPU days delivered  200 new user accounts created  CIT/CCR Collaboration to Improve Research Computing  Condor deployment  Portal/Tool Development  Make machines easier to use WebMO (Chemistry) iNquiry (Bioinformatics) UBMoD (Metrics on Demand)  Accountability  On-line real-time metrics  UB 2020 Campus Master Planning  3D models of all 3 campuses  NYSGrid

27 CCR Research & Projects  Urban Simulation and Visualization  Accident Reconstruction  Risk Mitigation (GIS)  Medical Imaging  High School Workshops  Cluster Computing  Data Fusion  Groundwater Flow Modeling  Turbulence and Combustion Modeling  Molecular Structure Determination  Protein Folding Prediction  Data Mining – Digital Gov, Library  Grid Computing  Computational Chemistry  Biomedical Engineering  Bioinformatics

28 Photoactive Yellow Protein  Simple prototype of Rhodpsin family of proteins  Chromophore is located completely inside the protein pocket  Protein environment causes absorption shift from 2.70 eV (gas phase) to 2.78 eV (protein) yielding the yellow color

29 Chromophore Spectra Measured  Experimental spectra of the protein active site in vacuum, in a protein and in water solution  Provides insight into environmental effects on electronic spectra, large shift of absorption maximum  Can gauge accuracy of theory

30 Modeling the System  Combined Quantum Mechanical / Molecular Mechanical Method  System is divided into a QM part and a MM part  QM used in to model “important” part of system; MM used to model remainder  The QM part includes the active site of the protein  The MM part includes the rest of the protein, as well as surrounding water molecules QM

31 QM versus MM based Methods QM Calculations Advantages: Very accurate, based on first principles (ab initio, DFT - there are not empirical parameters involved), can treat bond breaking and formation Disadvantages: Time consuming, limited to small molecular systems (~100 atoms) MM Calculations Advantages: Very fast, capable to calculate entire proteins or solutions (~100,000 atoms) Disadvantages: Less accurate, based on empirical parameters, not capable to calculate chemical reactions (electrons are not involved) QM/MM

32 Why use the QM/MM Method?  Improved accuracy (QM) and faster (MM)  Model active site of proteins  Drug-receptor binding  Electrostatic effects  Steric effects  Interpretation of experimental data  Vibrational spectra  Electronic spectra  Mechanism of enzymatic activity  Reaction profiles  Thermal motion effects on reactivity

33 Modeling Protein Dynamics 1.Run MM based Molecular Dynamics simulation 2.From MD simulation, randomly select protein conformations (snapshots) 3.Run QM/MM simulation for each snapshot 4.Generate results based on averages taken from snapshots Protein dynamics time Goal: Understand how protein thermal dynamics effects function

34 Getting Results Faster  Carry out QM/MM calcs simultaneously for many snapshots (protein conformations)

35 QM/MM Calc for Each Snapshot  After MD, protein snapshots are randomly selected (1000)  Full geometry optimization of the ligand inside the fixed protein matrix (Q-Chem)  QM: DFT/B3LYP/6-31+G* (ligand)  MM: AMBER (protein + water)  Electronic excitations (Q-Chem):  QM: TDDFT/B3LYP/aug-cc-pVTZ (ligand)  MM: AMBER (protein + water) 4500 water molecules

36 Active Site (chromophore) The active site of yellow protein chromophore - 4-hydroxy-cinnemic acid Real Chromophore Model Chromophore

37 CPU Demand - Current Calculation  MD Simulation  1600 CPU hours  Select 1000 Snapshots  Each Snapshot (54 CPU Hours)  Combined QM/MM Geometry Optimization 24 CPU hours (3 hours on 8 processors)  Electronic Excitation Calc 30 CPU Hours  Total for all 1000 snapshots + MD Simulation  55,600 CPU Hours (2300 CPU Days)

38 Results Electronic Excitation Gas-Phase (eV) Protein (eV) Solution (eV) Calculated 3.073.31(0.06)  =0.24 3.52(0.04)  =0.45 Experiment 2.702.78  =0.08 3.10  =0.40 ( ) - standard deviation  - change relative to the gas phase Electronic excitations of the chromophore

39 Toward Petascale Level Calc  More accurate MD simulation  Larger water sphere (50 A radius) ~12,000 water molecules  500 hours on 32 processors - 16,000 CPU hours  More accurate QM/MM simulations  Larger basis set  350 hours on 16 processors - 5600 CPU hours  Better statistics  100,000 MD snapshots (560,000,000 CPU hours)  2 MD simulations - 1,120,000,000 CPU hours!

40 Power of Parallel Processing  Assume a modest 4X increase in processor performance/computational efficiency over the next few years  Reduce requirement to about 10,000,000 CPU days  Translates to 100 CPU days on 100,000 cores  Combined QM/MM simulations of this scale possible on petascale level hardware

41 Acknowledgements  Portal Development  Steve Gallo, Dr. Matt Jones, Jon Bednasz, Rob Leach  Combined QM/MM Calculations  Dr. Marek Friendorf  Funding  NIH


Download ppt "Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable."

Similar presentations


Ads by Google