Download presentation
Presentation is loading. Please wait.
1
Algorithms in Computational Biology
Tanya Berger-Wolf Compbio.cs.uic.edu/~tanya/teaching/CompBio January 17, 2017
2
1D, 2D, 3D representation of DNA
3
The Central Dogma of Molecular Biology
DNA -> RNA -> Protein
4
The Central Dogma of Molecular Biology
DNA-RNA-Protein [photo credit (three slides): Bis2A (
5
The Central Dogma of Molecular Biology
(1) This happens in the cell nucleus, (2) not every letter is transcribed, (3) some are promoter regions involved in regulation of transcription and DNA replicaiton (4) some are “introns,” or untranslated regions of DNA, transcribed to RNA, but then spliced out before translation, (5) others are called “junk DNA” for their apparent lack of understood function, some are retroviral DNA, and some may really do little but space other DNA stretches, or ... possibly nothing.
6
The Central Dogma of Molecular Biology
Process of translation. mRNA (messenger RNA) is processed by a ribosome and uses tRNA (transfer RNA).
7
The Central Dogma of Molecular Biology
4^3 = 64 possible nucleotide triplet arrangements; result in 20 amino acids. More or less universal! Some organisms have minor departures—which triplet encodes a particular amino acid.
8
The Central Dogma of Molecular Biology
9
DNA Sequencing
12
What is Computational Biology?
No standard definition! Our definition: computational techniques for biological problems Data acquisition, management and representation (bioinformatics) Pattern analysis and data mining (bioinformatics) Data analysis and optimization Using bio data to solve other problems (medicine, public policy, etc.) Computational biology touches all parts of computer science Databases Data streaming HPC and systems Networking Algorithms Privacy and security Image processing Visualization
13
Why is CompBio Important?
Biology perspective More and more biological information is available => need for effectively accessing and using the information As more detailed information is available different questions can be asked (models of evolution) => requires new math Computer science perspective Excellent application domain Poses special computational challenges Brings computer science closer to scientific discovery Currently growing …
14
CompBio and Other Fields
Computer Science Biology Information Management Biochemistry Molecular Biology Bioinformatics/ CompBio Theoretical CS Machine Learning Data Mining Biophysics Numerical Computing Applied Mathematics & Statistics
15
CompBio and Bioinformatics
From Chris Burge’s MIT Open Courseware 7.91/20.490/6.874/HST.506
17
1980s: Sequence Alignment/Search
Which specific residues/positions in a pair of proteins are homologous? Smith-Waterman alignment algorithm What RNA secondary structure has minimum folding free energy? Nussinov algorithm Zuker algorithm How to rapidly and reliably find homologs to a query sequence in a sequence database? FastA and BLAST algorithms and associated statistics Temple F. Smith and Michael S. Waterman Copyright Michael Waterman Ruth Nussinov Michael Zuker
18
Al Gore Learns to Search PubMed
NCBI Director David Lipman (far left) coaches Vice President Gore (seated) as he searches PubMed. NIH Director Harold Varmus (center) and NLM Director Donald Lindberg (far right) look on. June 26, Photograph by the National Center for Biotechnology Information; in the public domain.
19
1990s: HMMs, Ab Initio Protein Structure Prediction, Genomics, Comparative Genomics
How to identify domains in a protein? How to identify genes in a genome? Hidden Markov Models as a framework for such problems How to study gene expression globally, infer gene function from expression? Microarrays and clustering How to predict protein function by comparing genomes? gene fusions, phylogenetic profiling, etc. How to predict protein structure directly from primary sequence? Rosetta algorithm
20
2000s Part 1: The human genome is sequenced, assembled, annotated genomics becomes fashionable
Criag Venter, public domain image Photo of the Human Genome project pioneers © Mayo Foundation for Medical Education and Research. All rights reserved. Ewan Birney, public domain image Jim Kent, public domain image
21
2000s Part 2: Biological Experiments Become High-Throughput, Computational Biology Becomes more Biological Courtesy of Marc Vidal. Used with permission. Massively parallel data collection – transcriptomics, proteomics, interactomics, metagenomics Using sequence and array data to address fundamental questions about transcription, splicing, microRNAs, translation, epigenetics, protein structure/function, development, evolution, disease, etc. Integrated computational/experimental approaches Rise of bioimage informatics Courtesy of Marc Vidal. Courtesy of Donald G. Moerman and Benjamin D. Williams. License: CC-BY. Source: Moerman, D. G. and Williams, B. D. "Sarcomere Assembly in C. Elegans Muscle" (January 16, 2006), WormBook, ed. The C. elegans Research Community, WormBook.
22
Topics in Bioinformatics
Genomics Proteomics Transcriptomics Text Mining Biology Literature … … …In this paper, we report the discovery of a new gene that affects DNA reproduction in … Gene expression & regulation Genes Proteins (Function) DNA Sequences Microarray data Protein Sequences AATTCATGAAAATCGTATACTGGTCTGGTACCGGC TGAGAAAATGGCAGAGCTCATCGCTAAAGGTA TCTGGTAAAGACGTCAACACCATCAACGTGTC ACATCGATGAACTGCTGAACGAAGATATCCTG TTGCTCTGCCATGGGCGATGAAGTTCTCGAGG MKIVYWSGTGNTEKMAELIAKGIIESGKDV DELLNEDILILGCSAMGDEVLEESEFEPFIE KVALFGSYGWGDGKWMRDFEERMNGYG PDEAEQDCIEFGKKIANI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.