Download presentation
Presentation is loading. Please wait.
Published byCecily Spencer Modified over 9 years ago
1
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical Informatics Stanford Computer Science NPACI Site Visit July 21-22, 1999
2
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Overview Molecular Science vision and roadmap Molecular Science project accomplishments Alpha project: Bioinformatics Infrastructure for Large-Scale Analyses Overview of plans
3
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science Is Changing... The genome sequencing project gives us unprecedented access to biological molecular information New experimental technologies (gene arrays) giving new access to functional information Experiment & theory refining structural data Combinatorial chemistry allows design of molecules New paradigm: Collect the data first, then mine it later with hypotheses
4
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Vision for Molecular Science Thrust Understand how fundamental molecular properties contribute to macroscopic phenomena in chemistry and biology. Simulate molecular dynamics for large systems (e.g., biological molecules). Port existing codes to parallel machines, test them, and apply to problems not currently within reach (CR, MS, PTE). Create databases for molecular systems to support exploratory analysis, hypothesis generation, communication, dissemination. Create and populate data schema for critical areas: Biological macromolecules, MD trajectories, quantum computations (DICE, PTE). Create visualization technologies for communication/analysis (IE, MS). Provide hardened tools to scientific community for use. Identify critical algorithms requiring HPC, implement on NPACI hardware. Conduct education, outreach, and training of scientists/students (EOT, IE, CR).
5
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE IE META Molecular Science Advancing understanding of biochemical structure and function Bioinformatics infrastructure Large-scale molecular dynamics GenBank Molecular Trajectory DB PDB 2002200019992001 CHARMM AMBER Molecular dynamics Algorithms: Comparison Phylogeny Alignment Scanning DICE Federated data collections Remote database analysis Protein Folding Enhanced molecular chemistry Molecular chemistry Quantum chemistry Transition states Imaging Algorithms Bioinformatics
6
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Projects and Accomplishments Biological Data Representation & Query (SDSC, Rutgers, Stanford, Washington U, U Texas) All-vs.-all comparison of 3-D protein structures (SDSC) Sitesscanning code for 3-D features (Stanford) Genetic alg. code for large phylogenetic trees (U Texas) CORBA for distributed access to ligand DB (Rutgers) Enhanced Biological Imaging (U Chicago, U Houston) Port of “optimal line” code for EM reconstruction
7
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Projects and Accomplishments Transition States in Complex Systems (UC Berkeley) Wrapped CHARMM, AMBER, CPMD to oversample rare events Quantum Reaction Dynamics (Caltech) Ported code for multi-atom reactions to HP Exemplar Management (Stanford) New thrust management Thrust meeting in September 1998 Two high-profile alpha projects (CHARMM, Analyses) One strategic application collaboration (AMBER)
8
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Alpha Projects in Molecular Science Rationale Molecular science computing is, for the most part, workstation-based, and the uses of HPC are limited but critical: Long-time-scale, accurate simulations Large scans over data collections, both O(N) and O(N 2 ) Global optimizations of structures, alignments, networks The requirements for technology support for all are significant Grid computing = metasystems Movement of large amounts of data = data-intensive computing
9
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large-Scale Analyses Need to construct prototype analyses Establish feasibility of doing analyses routinely Debug infrastructure for supporting analyses Provide templates for “copy-and-edit” duplication
10
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Databases and Analyses PDB (SDSC, Stanford) Linear scan searching for active sites All-by-all comparisons for clustering Genbank (Washington U) All-by-all comparison of sequences over set of alignment parameters, followed by clustering Linear scan through results to find new relations Molecular Dynamics Trajectory DB (U Houston) Linear scan through time cuts of trajectory to look for features of interest (e.g., form/unform active site)
11
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Required Technologies Data-intensive Computing Robust connection to computational grid (Legion) Language for describing data schema to SRB Strategies for moving large amounts of data to NPACI CPUs Metasystems Registration of key algorithms within Legion for platforms Robust connection to large data stores (SRB) Reusable scripts for running analyses
12
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large Analyses Goal: Create reusable templates and demonstrate value Protein Analysis in Legion O(N) PDB in SRB GenBank in SRB MDTDB in SRB GeneArray DB in SRB Full Scale Runs of Algorithms on Databases Critical Databases Enabled for Grid Computing 1999 200020012002 Sequence Analysis in Legion O(N 2 ) Phylogeny programs in Legion O(N 2 ) Templates for large scale O(N) and O(N 2 ) Analyses Report & Evangelize to Scientific Community
13
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE FY00 Milestones Connect SRB data model to PDB schema (XML) Connect SRB data model to Genbank (XML) Register linear PDB algorithms in Legion Register sequence algorithms for Genbank Analyze scheduling challenges for linear scans and all-vs.-all analyses Run linear scans on PDB and all by all on subset of Genbank
14
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE FY01 Milestones Connect SRB model to MDTDB Run full Genbank all-vs.-all analyses and analysis of MD trajectories Register phylogenetic algorithms with Legion Optimize analyses with improved scheduling Report results to computational science community Evangelize capabilities to computational science community
15
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large Analyses Goal: Create reusable templates and demonstrate value Protein Analysis in Legion O(N) PDB in SRB Genbank in SRB MDTDB in SRB GeneArray DB in SRB Full Scale Runs of Algorithms on Databases Critical Databases Enabled for Grid Computing 1999 200020012002 Sequence Analysis in Legion O(N 2 ) Phylogeny programs in Legion O(N 2 ) Templates for large scale O(N) and O(N 2 ) Analyses Report & Evangelize to Scientific Community
16
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Benefits Novel science enabled Comprehensive scans of 3-D structure for functional sites Bird’s-eye understanding of sequence space Improved understanding of protein dynamics Most comprehensive phylogenetic trees ever constructed Capabilities made routine and widely available Templates for experiments made available Time, space estimates for computations for those making allocation requests In-house expertise at making these work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.