NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

Slides:



Advertisements
Similar presentations
C3.ca in Atlantic Canada Virendra Bhavsar Director, Advanced Computational Research Laboratory (ACRL) Faculty of Computer Science University of New Brunswick.
Advertisements

U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Dairian Wan | Bioinformatics © 2003, Genentech 1 6/1/2015 Bioinformatics Overview 8 November 2004 Dairian Wan.
BIOINFORMATICS Ency Lee.
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Problem-Solving Environments: The Next Level in Software Integration David W. Walker Cardiff University.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
UK e-Science and the White Rose Grid Paul Townend Distributed Systems and Services Group Informatics Research Institute University of Leeds.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Bioinformatics Protein structure prediction Motif finding Clustering techniques in bioinformatics Sequence alignment and comparison Phylogeny Applying.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Discovery Environments Susan L. Graham Chief Computer Scientist Peter.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
gpucomputing.net is a research and development community site dedicated to fostering collaborative and interdisciplinary work on the various disciplines.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Brains to Bays --Scaleable Visualization Toolkits Arthur J. Olson Interaction Environments.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
DISTRIBUTED COMPUTING
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
A Brief Overview Andrew K. Bjerring President and CEO.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Discovery Environments: Thrust Areas Susan L. Graham University of California, Berkeley.
Renaissance Computing Institute: An Overview Lavanya Ramakrishnan, John McGee, Alan Blatecky, Daniel A. Reed Renaissance Computing Institute.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Mcs/ HPC challenges in Switzerland Marie-Christine Sawley General Manager CSCS SOS8, Charleston April,
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI.
Studying Protein Folding on the Grid: Experiences Using CHARMM on NPACI Resources under Legion University of Virginia Anand Natrajan Marty A. Humphrey.
COMPUTERS IN BIOLOGY Elizabeth Muros INTRO TO PERSONAL COMPUTING.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
EB3233 Bioinformatics Introduction to Bioinformatics.
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
High-Performance and Grid Computing for Neuroinformatics: NIC and Cerebral Data Systems Allen D. Malony University of Oregon Professor Department of Computer.
Bioinformatics Curriculum Issues, goals, curriculum.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Switzerland joining EGEE 7 March, 2005, GSI- Darmstadt by CSCS, the Swiss National Supercomputing Centre.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Sub-fields of computer science. Sub-fields of computer science.
Bioinformatics Overview
MATLAB Distributed, and Other Toolboxes
National Institute of Standards and Technology
What contribution can automated reasoning make to e-Science?
Bioinformatics Capstone Project
High-throughput Biological Data The data deluge
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical Informatics Stanford Computer Science NPACI Site Visit July 21-22, 1999

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Overview Molecular Science vision and roadmap Molecular Science project accomplishments Alpha project: Bioinformatics Infrastructure for Large-Scale Analyses Overview of plans

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science Is Changing... The genome sequencing project gives us unprecedented access to biological molecular information New experimental technologies (gene arrays) giving new access to functional information Experiment & theory refining structural data Combinatorial chemistry allows design of molecules New paradigm: Collect the data first, then mine it later with hypotheses

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Vision for Molecular Science Thrust Understand how fundamental molecular properties contribute to macroscopic phenomena in chemistry and biology. Simulate molecular dynamics for large systems (e.g., biological molecules). Port existing codes to parallel machines, test them, and apply to problems not currently within reach (CR, MS, PTE). Create databases for molecular systems to support exploratory analysis, hypothesis generation, communication, dissemination. Create and populate data schema for critical areas: Biological macromolecules, MD trajectories, quantum computations (DICE, PTE). Create visualization technologies for communication/analysis (IE, MS). Provide hardened tools to scientific community for use. Identify critical algorithms requiring HPC, implement on NPACI hardware. Conduct education, outreach, and training of scientists/students (EOT, IE, CR).

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE IE META Molecular Science Advancing understanding of biochemical structure and function Bioinformatics infrastructure Large-scale molecular dynamics GenBank Molecular Trajectory DB PDB CHARMM AMBER Molecular dynamics Algorithms: Comparison Phylogeny Alignment Scanning DICE Federated data collections Remote database analysis Protein Folding Enhanced molecular chemistry Molecular chemistry Quantum chemistry Transition states Imaging Algorithms Bioinformatics

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Projects and Accomplishments Biological Data Representation & Query (SDSC, Rutgers, Stanford, Washington U, U Texas) All-vs.-all comparison of 3-D protein structures (SDSC) Sitesscanning code for 3-D features (Stanford) Genetic alg. code for large phylogenetic trees (U Texas) CORBA for distributed access to ligand DB (Rutgers) Enhanced Biological Imaging (U Chicago, U Houston) Port of “optimal line” code for EM reconstruction

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Projects and Accomplishments Transition States in Complex Systems (UC Berkeley) Wrapped CHARMM, AMBER, CPMD to oversample rare events Quantum Reaction Dynamics (Caltech) Ported code for multi-atom reactions to HP Exemplar Management (Stanford) New thrust management Thrust meeting in September 1998 Two high-profile alpha projects (CHARMM, Analyses) One strategic application collaboration (AMBER)

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Alpha Projects in Molecular Science Rationale Molecular science computing is, for the most part, workstation-based, and the uses of HPC are limited but critical: Long-time-scale, accurate simulations Large scans over data collections, both O(N) and O(N 2 ) Global optimizations of structures, alignments, networks The requirements for technology support for all are significant Grid computing = metasystems Movement of large amounts of data = data-intensive computing

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large-Scale Analyses Need to construct prototype analyses Establish feasibility of doing analyses routinely Debug infrastructure for supporting analyses Provide templates for “copy-and-edit” duplication

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Databases and Analyses PDB (SDSC, Stanford) Linear scan searching for active sites All-by-all comparisons for clustering Genbank (Washington U) All-by-all comparison of sequences over set of alignment parameters, followed by clustering Linear scan through results to find new relations Molecular Dynamics Trajectory DB (U Houston) Linear scan through time cuts of trajectory to look for features of interest (e.g., form/unform active site)

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Required Technologies Data-intensive Computing Robust connection to computational grid (Legion) Language for describing data schema to SRB Strategies for moving large amounts of data to NPACI CPUs Metasystems Registration of key algorithms within Legion for platforms Robust connection to large data stores (SRB) Reusable scripts for running analyses

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large Analyses Goal: Create reusable templates and demonstrate value Protein Analysis in Legion O(N) PDB in SRB GenBank in SRB MDTDB in SRB GeneArray DB in SRB Full Scale Runs of Algorithms on Databases Critical Databases Enabled for Grid Computing Sequence Analysis in Legion O(N 2 ) Phylogeny programs in Legion O(N 2 ) Templates for large scale O(N) and O(N 2 ) Analyses Report & Evangelize to Scientific Community

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE FY00 Milestones Connect SRB data model to PDB schema (XML) Connect SRB data model to Genbank (XML) Register linear PDB algorithms in Legion Register sequence algorithms for Genbank Analyze scheduling challenges for linear scans and all-vs.-all analyses Run linear scans on PDB and all by all on subset of Genbank

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE FY01 Milestones Connect SRB model to MDTDB Run full Genbank all-vs.-all analyses and analysis of MD trajectories Register phylogenetic algorithms with Legion Optimize analyses with improved scheduling Report results to computational science community Evangelize capabilities to computational science community

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bioinformatics Infrastructure for Large Analyses Goal: Create reusable templates and demonstrate value Protein Analysis in Legion O(N) PDB in SRB Genbank in SRB MDTDB in SRB GeneArray DB in SRB Full Scale Runs of Algorithms on Databases Critical Databases Enabled for Grid Computing Sequence Analysis in Legion O(N 2 ) Phylogeny programs in Legion O(N 2 ) Templates for large scale O(N) and O(N 2 ) Analyses Report & Evangelize to Scientific Community

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Benefits Novel science enabled Comprehensive scans of 3-D structure for functional sites Bird’s-eye understanding of sequence space Improved understanding of protein dynamics Most comprehensive phylogenetic trees ever constructed Capabilities made routine and widely available Templates for experiments made available Time, space estimates for computations for those making allocation requests In-house expertise at making these work