Computational Biology and Bioinformatics in Computer Science Lenwood S. Heath Department of Computer Science 2160J Torgersen Hall Virginia Tech Department Seminar Series September 9, 2005
9/9//2005 Computational Biology and Bioinformatics Overview Computational biology and bioinformatics (CBB) What is it? History at VT Some biological terminology CBB faculty and projects Education in CBB Bioinformatics option GBCB Conclusion 9/9//2005 Computational Biology and Bioinformatics
Computational Biology and Bioinformatics (CBB) Computational biology — computational research inspired by biology Bioinformatics — application of computational research (computer science, mathematics, statistics) to advance basic and applied research in the life sciences Agriculture Basic biological science Medicine Both ideally done within multidisciplinary collaborations 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB History (Part I) Biological modeling (Tyson, Watson): > 20 years Computational biology, genome rearrangements (Heath): > 10 years Fralin Biotechnology sponsored faculty advisory committee centered on bioinformatics: 1998-2000 Biochemistry; biology; CALS; computer science (Heath, Watson); statistics; VetMed Provost provided $1 million seed money First VT bioinformatics hire (Gibas, biology, 1999) 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB History (Part II) Outside initiative submitted to VT for a campus bioinformatics center — 1998 Discussions of bioinformatics advisory committee contributed to a proposal to the Gilmore administration — 1999 Governor Gilmore puts plans and money for bioinformatics center in budget — 1999-2000 Virginia Bioinformatics Institute (VBI) established July, 2000; housed in CRC 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Virginia Bioinformatics Institute (VBI) Established by the state in July, 2000; high visibility Applies computational and information technology in biological research Research faculty (currently, about 18) expertise includes Biochemistry Comparative Genomics Computer Science Drug Discovery Human and Plant Pathogens More than $43 million funded research Mathematics Physics Simulation Statistics 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB History (Part III) Bioinformatics course and curriculum development began with faculty subcommittee — 1999 Courses supporting bioinformatics now in many life science and computational science departments, including: Biology Biochemistry Computer Science Plant Pathology, Physiology, and Weed Science (PPWS) Mathematics Statistics 9/9//2005 Computational Biology and Bioinformatics
Some Molecular Biology The encoded instruction set for an organism is kept in DNA molecules. Each DNA molecule contains 100s or 1000s of genes. A gene is transcribed to an mRNA molecule. An mRNA molecule is translated to a protein (molecule). 9/9//2005 Computational Biology and Bioinformatics
Elaborating Cellular Function Regulation Degradation Transcription Translation DNA mRNA Protein (Genetic Code) Reverse Transcription Protein functions: Structure Catalyze chemical reactions Regulate transcription Thousands of Genes! 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Chromosomes Large molecules of DNA: 104 to 108 base pairs. Human chromosomes: 22 matched pairs plus X and Y. A gene is a subsequence of a chromosome that encodes a protein. Proteins associated with regulation are present in chromosomes. Every gene is present in every cell. Only a fraction of the genes are in use (“expressed”) at any time. 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Genomics Genomics: Discovery of genetic sequences and the ordering of those sequences into individual genes, into gene families, and into chromosomes. Identification of sequences that code for gene products/proteins and sequences that act as regulatory elements. 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Functional Genomics Functional Genomics: The biological role of individual genes, mechanisms underlying the regulation of their expression, and regulatory interactions among them. 9/9//2005 Computational Biology and Bioinformatics
Challenges for Computer Science Analyzing and synthesizing complex experimental data Representing and accessing vast quantities of information Pattern matching Data mining Gene discovery Function discovery Modeling the dynamics of cell function 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB Faculty in CS Chris Barrett (VBI, CS) Vicky Choi Roger Ehrich Edward A. Fox Lenny Heath Madhav Marathe (VBI, CS) T. M. Murali Chris North Alexey Onufriev Naren Ramakrishnan Adrian Sandu Eunice Santos João Setubal (VBI, CS) Cliff Shaffer Anil Vullikanti (VBI, CS) Layne Watson Liqing Zhang 9/9//2005 Computational Biology and Bioinformatics
Established CBB Faculty Layne Watson Lenny Heath Cliff Shaffer Naren Ramakrishnan Eunice Santos 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Layne Watson Professor of Computer Science and Mathematics Expertise: algorithms; image processing; high performance computing; optimization; scientific computing Computational biology: has worked with John Tyson (biology) for over 20 years JigCell: cell-cycle modeling environment; with Tyson, Shaffer, Ramakrishnan, Pedro Mendes of VBI Expresso: microarray experimentation; with Heath, Ramakrishnan 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Lenny Heath Professor of Computer Science Expertise: algorithms; theoretical computer science; graph theory Computational biology: worked in genome rearrangements 10 years ago Bioinformatics: concentration in past 5 years Expresso: microarray experimentation; with Ramakrishnan, Watson Multimodal networks Computational models of gene silencing 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Cliff Shaffer Associate Professor of Computer Science Expertise: algorithms; problem solving environments; spatial data structures; JigCell: cell-cycle modeling environment; with Ramakrishnan, Tyson, Watson 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Naren Ramakrishnan Associate Professor of Computer Science Expertise: data mining; machine learning; problem solving environments JigCell: cell-cycle modeling problem solving environment; with Shaffer, Watson Expresso: microarray experimentation; with Heath, Watson Proteus — inductive logic programming system for biological applications Computational models of gene silencing 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Eunice Santos Associate Professor of Computer Science Expertise: Algorithms; computational biology; computational complexity; parallel and distributed processing; scientific computing Relevant bioinformatics project: modeling progress of breast cancer 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics New CBB Faculty T. M. Murali (2003) CS bioinformatics hire Alexey Onufriev (2003) CS bioinformatics hire Adrian Sandu (2004) CS hire João Setubal (Early 2004) VBI and CS Vicky Choi (2004) CS bioinformatics hire Liqing Zhang (2004) CS bioinformatics hire Chris Barrett, Madhav Marathe (Fall 2004) VBI and CS Anil Vullikanti (Fall 2004) VBI and CS Yang Cao (January, 2006) CS bioinformatics hire 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics T. M. Murali Assistant Professor of Computer Science Hired in 2003 for bioinformatics group Expertise: algorithms; computational geometry; computational systems biology Projects: Functional gene annotation xMotif — find patterns of coexpression among subsets of genes RankGene — rank genes according to predictive power for disease 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Alexey Onufriev Assistant Professor of Computer Science Hired in 2003 for bioinformatics group Expertise: Computational and theoretical biophysics and chemistry; structural bioinformatics; numerical methods; scientific programming Projects: Biomolecular electrostatics Theory of cooperative ligand binding Protein folding Protein dynamics — how does myoglobin uptake oxygen? Computational models of gene silencing 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Adrian Sandu Associate Professor of Computer Science Hired in 2003 Expertise: Computational science; numerical methods; parallel computing; scientific and engineering applications Computational science: New generation of air quality models computational tools for assimilation of atmospheric chemical and optical measurements into atmospheric chemical transport models 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics João Setubal Research Associate Professor at VBI Associate Professor of Computer Science Joined in early 2004 Expertise: algorithms; computational biology; bacterial genomes Comparative genomics 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Vicky Choi Assistant Professor of Computer Science Hired in 2004 for bioinformatics group Expertise: computational biology; algorithms Projects: Algorithms for genome assembly Protein docking Biological pathways 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Liqing Zhang Assistant Professor of Computer Science Hired in 2004 for bioinformatics group Expertise: evolutionary biology; bioinformatics Research interests: Comparative evolutionary genomics Functional genomics Multi-scale models of bacterial evolution 9/9//2005 Computational Biology and Bioinformatics
Selected CBB Research Projects JigCell Expresso Multimodal Networks Computational Modeling of Gene Silencing 9/9//2005 Computational Biology and Bioinformatics
JigCell: A PSE for Eukaryotic Cell Cycle Controls Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa, Clifford A. Shaffer, Layne T. Watson, Naren Ramakrishnan, and John J. Tyson Departments of Computer Science and Biology 9/9//2005 Computational Biology and Bioinformatics
Cell Cycle of Budding Yeast Sister chromatid separation Cdc20 PPX Lte1 Esp1 Budding Pds1 Esp1 Tem1 Net1P Esp1 Bub2 Cdc15 Cln2 SBF Unaligned chromosomes Pds1 SBF Net1 RENT Mcm1 Unaligned chromosomes Cdh1 Mcm1 Cdc20 Mad2 Cdc20 Cdc14 Cln2 Clb2 Clb5 Cln3 Cdc15 and Bck2 Cdh1 Mcm1 APC Clb2 Cdc14 growth Swi5 CDKs SCF Sic1 P Sic1 ? Cdc14 Cdc20 MBF Clb5 DNA synthesis Esp1 9/9//2005 Computational Biology and Bioinformatics
JigCell Problem-Solving Environment Experimental Database Wiring Diagram Differential Equations Parameter Values Analysis Simulation Visualization Automatic Parameter Estimation 9/9//2005 Computational Biology and Bioinformatics
Why do these calculations? Is the model “yeast-shaped”? Bioinformatics role: the model organizes experimental information. New science: prediction, insight JigCell is part of the DARPA BioSPICE suite of software tools for computational cell biology. 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Expresso: A Next Generation Software System for Microarray Experiment Management and Data Analysis 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Scenarios for Effects of Abiotic Stress on Gene Expression in Plants 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics The Expresso Pipeline 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Proteus — Data Mining with ILP ILP (inductive logic programming) — a data mining algorithm for inferring relationships or rules Proteus — efficient system for ILP in bioinformatics context Flexibly incorporates a priori biological knowledge (e.g., gene function) and experimental data (e.g., gene expression) Infers rules without explicit direction 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Fusion — Chris North “Snap together” visualization environment Interactively linked data from multiple sources Data mining in the background 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Sequence Analysis Evolution implies changes in genomic sequence through mutations and other mechanisms Genomic or protein sequences that are similar are called homologous Algorithms to detect homology provide access to evolutionary relationships and perhaps function conservation through genomic data. 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Networks in Bioinformatics Mathematical Model(s) for Biological Networks Representation: What biological entities and parameters to represent and at what level of granularity? Operations and Computations: What manipulations and transformations are supported? Presentation: How can biologists visualize and explore networks? 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Reconciling Networks Munnik and Meijer, FEBS Letters, 2001 Shinozaki and Yamaguchi-Shinozaki, Current Opinion in Plant Biology, 2000 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Multimodal Networks Nodes and edges have flexible semantics to represent: Time Uncertainty Cellular decision making; process regulation Cell topology and compartmentalization Rate constants Phylogeny Hierarchical 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Using Multimodal Networks Help biologists find new biological knowledge Visualize and explore Generating hypotheses and experiments Predict regulatory phenomena Predict responses to stress Incorporate into Expresso as part of closing the loop 9/9//2005 Computational Biology and Bioinformatics
Computational Modeling of Gene Silencing (CMGS) Lenwood S. Heath, Richard Helm, Alexey Onufriev, Naren Ramakrishnan, and Malcolm Potts Departments of Computer Science and Biochemistry 9/9//2005 Computational Biology and Bioinformatics
RNA Interference (RNAi) 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CMGS System 9/9//2005 Computational Biology and Bioinformatics
Other CBB Research Projects Bacterial genomics — Setubal xMotif — Murali Plant Orthologs and Paralogs (POPS) Heath, Murali, Setubal, Zhang, Ruth Grene (plant physiology) Protein structure and docking — Choi Whole-genome functional annotation — Murali Modeling biomolecular systems — Onufriev 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB Education at VT CS has been training CS graduate students in CBB since 2000 Graduate bioinformatics option established in a number of participating departments — 2003 Ph.D. program in Genetics, Bioinformatics, and Computational Biology (GBCB) — 2003 First GBCB students arrived, Fall, 2003; now in third year 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics CBB Education in CS A key department of the Ph.D. program in Genetics, Bioinformatics, and Computational Biology (GBCB) Computation for the Life Sciences I, II Algorithms in Bioinformatics Systems Biology Structural Bioinformatics and Computational Biophysics Databases for Bioinformatics 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Conclusions Important research area in department Close collaboration between life scientists and computational scientists from the beginning of CBB research at VT Educational approach insists on adequate multidisciplinary background Multidisciplinary collaborators work closely on a regular basis Contributions to biology or medicine essential outcomes 9/9//2005 Computational Biology and Bioinformatics
9/9//2005 Computational Biology and Bioinformatics Supported by: Next Generation Software Information Technology Research NSF 9/9//2005 Computational Biology and Bioinformatics