CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO (O)
Technical Definitions NIH ( Bioinformatics: “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”. Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.
Course Topics l Data interpretation in analytical technologies l Data management and computational infrastructure l Discovery from data mining l Modeling, prediction and design l Theoretical in silico biology Cover classical/mainstream bioinformatics problems from computer science prospective
Discovery from Data Mining (I)
l Data source å Genomic / protein sequence å Microarray data å Protein interaction l Complicated data å Large-scale, high-dimension å Noisy (false positives and false negatives) Discovery from Data Mining (II)
Pattern/knowledge discovery from data å many biological data are generated by biological processes which are not well understood å interpretation of such data requires discovery of convoluted relationships hidden in the data X which segment of a DNA sequence represents a gene, a regulatory region X which genes are possibly responsible for a particular disease Discovery from Data Mining (III)
Modeling, Prediction and Design (I) l Modeling and prediction of biological objects/processes å Sequence comparison å Secondary structure prediction å Gene finding å Regulatory sequence identification
l Prediction of outcomes of biological processes å computing will become an integral part of modern biology through an iterative process of l From prediction to engineering design å Drug design å Protein structure prediction to protein engineering å Design genetically modified species model formulation computational prediction experimental validation Modeling, Prediction and Design (II)
Scope of Bioinformatics data management; data mining; modeling; prediction; theory formulation engineering aspect scientific aspect bioinformatics an indispensable part of biological science genes, proteins, protein complexes, pathways, cells, organisms, ecosystem computer science, biology, statistics mathematics, physics, chemistry, engineering,…
Bioinformatics Foundations l Technology l Biology/medicine l Computer Science l Statistics l From interdisciplinary field to a distinct discipline
Course Coverage l A general introduction to the field of bioinformatics å problems definitions: from biological problem to computable problem å key computational techniques l A way of thinking: tackling “biological problem” computationally å how to look at a biological problem from a computational point of view å how to formulate a computational problem to address a biological issue å how to collect statistics from biological data å how to build a computational model å how to design algorithms for the model å how to test and evaluate a computational algorithm å how to access confidence of a prediction result
Dong’s top 10 list for computational methods in BI 1. Dynamic programming 2. Neural network 3. Hidden Markov Model 4. Hypothesis test 5. Bayesian statistics 6. Clustering 7. Information theory 8. Support Vector Machine 9. Maximum likelihood 10. Sampling search (Gibbs, Monte Carlo, etc)
1. “Solved” problems 2. “Developed” areas with remaining challenges hard to solve 3. Developing areas 4. Emergent areas 5. Future directions Research Areas
l DNA sequence base calling and assembly l Pairwise sequence comparison l Protein secondary structure prediction l Disordered region in proteins l Transmembrane segment prediction l Subcellular localization l Signal peptide prediction l Protein geometry l Homology modeling l Physical/genetic mapping informatics “Solved” Problems
l Gene finding l Phylogenetic tree construction and evolution l Protein docking l Drug design l Protein design l Linkage analysis and quantitative traits (QTL) l Microarray data collection l Gene expression clustering “ Developed ” areas with remaining challenges
l Multiple sequence comparison and remote homolog search l Repetitive sequence analysis l Protein structure comparison l Protein tertiary structure prediction l RNA secondary structure prediction l Regulatory sequence analysis l Computational proteomics l Protein interaction networks l Gene ontology and function prediction l Computational neural science and applications in various species and systems (e.g., cancer) Developing Areas
l Pathway (regulatory network) prediction l ChIP-chip analysis l Tiling array analysis l Haplotype/SNP analysis l Computational comparative genomics l Text (literature) mining l Small RNA and anti-sense regulation l Alternative splicing prediction l Computational metabolomics Emergent Areas
l Genome semantics l Membrane protein structure prediction l RNA tertiary structure prediction l Post-translational modification l Dynamics of regulatory networks l Virtual cell/organism modeling l Phenotype-genotype relationship l … (nobody knows) Possible future directions
Where the science is going? (1) l Bioinformatics has been a “technology” to biological research: Interpretation of data generated by bench biologists l We start to see a trend that computational predictions can guide experimental design l With more high-throughput technologies become available, discovery-driven science will play increasingly more important roles in biology research l With computational techniques continue to mature for biological applications, we will see more and more computational applications with powerful prediction capabilities
Where the science is going? (2) l Like physics, where general rules and laws are taught at the start, biology will surely be presented to future generations of students as a set of basic systems duplicated and adapted to a very wide range of cellular and organismic functions, following basic evolutionary principles constrained by Earth’s geological history. --Temple Smith, Current Topics in Computational Molecular Biology
Major research centers (1) l National Center for Biotechnology Information (NCBI) of NIH ( å the home of many important databases including GenBank å the home of many important bioinformatics tools including BLAST
l European Molecular Biology Laboratory (EMBL) ( å has some of the most powerful research groups in bioinformatics å Has numerous tools and databases Major research centers (2)
l Sanger Institute ( l The Institute for Gonomic Research (TIGR, l Swiss-Prot ( Major research centers (3)
Major Universities in US l University of California at Santa Cruz l University of California at San Diego l Washington University l University of Southern California l Stanford University l Columbia University l Boston University l Harvard University l MIT l Virginia Tech
Major journals å Bioinformatics å Nucleic Acids Research å Genome Research å Journal of Computational Biology å Journal of Bioinformatics and Computational Biology å In silico Biology å Briefings in bioinformatics å Applied Bioinformatics å IEEE/ACM Transactions on Computational Biology and Bioinformatics å Proteins: structure, function and bioinformatics å Journal of Computer Science and Technology å Genomics, Proteomics and Bioinformatics å …
Major conferences å Intelligent Systems for Molecular Biology (ISMB) å Annual Conference on Computational Biology (RECOMB) å IEEE/Computational Systems Bioinformatics Conference (CSB) å Pacific Symposium on Biocomputing (PSB) å European Conference on Computational Biology (ECCB) å IEEE Conference on Biotechnology and Bioinformatics (BIBE) å International Workshop on Genome Informatics (GIW) å Asia-Pacific Bioinformatics Conference (APBC) å …
Academicians l Michael Waterman l Phil Green l Gene Myers l Barry Honig l No Nobel Price Winner yet…
Discussions l Scope of the new biology (large-scale) l Technology (tool development) vs. science (biological application) l Knowledge vs. prediction l Experimental vs. computational/theoretical l First principle vs. empirical / statistical l Automated vs. curated One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.
Choosing Bioinformatics as Career - 1 l Field outlook l Must be a believer of bioinformatics (for its value to science) l Must have a strong motivation and willing to walk extra miles (learn more disciplines) l Technologist vs. technician
Choosing Bioinformatics as Career - 2 l Molecular & cellular and evolutionary biology å understanding the science l Computational, mathematical, and statistical sciences å mastering the techniques l High-throughput measurement technologies å Knowing what biological data are obtainable