Welcome to CS374 Algorithms in Biology

Slides:



Advertisements
Similar presentations
Molecular Genetics Chapter 12
Advertisements

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Basic Molecular Biology for CS374 Scientific Method: The widely held philosophy that a theory can never be proved, only disproved, and that all attempts.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Basic Molecular Biology Many slides by Omkar Deshpande.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Basic Molecular Biology for CS262 Omkar Deshpande.
Chapter 3 The Biological Basis of Life. Introduction Genetics is the study of how one trait transfers from one generation to the next Involves process.
RNA and Protein Synthesis
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Marc Schaub Andreas Sundquist Monday & Wednesday.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium.
Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of.
CS5263 Bioinformatics Lecture 1: Introduction Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Bioinformatics in the Biology Curriculum Gloria Rendon NCSA July 2008.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Transcription and Translation
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
From DNA to Proteins Lesson 1. Lesson Objectives State the central dogma of molecular biology. Describe the structure of RNA, and identify the three main.
Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Molecular Biology Primer for CS and engineering students Alan Qi Jan. 10, 2008.
Protein Synthesis. Central Dogma After discovering the double helix structure Crick went on to study how DNA serves as the hereditary molecule of life.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
11.1 Genes are made of DNA. Griffith Experiment Viral DNA Background Virus – a package of nucleic DNA wrapped in a protein shell that must use a host.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Protein Synthesis 6C transcription & translation.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Lesson Overview Lesson OverviewFermentation Lesson Overview 13.1 RNA.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Lesson Overview Lesson OverviewFermentation Lesson Overview 13.1 RNA.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
Introduction to molecular biology Data Mining Techniques.
Lesson Overview 13.1 RNA.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Lesson Overview 13.1 RNA.
Using BLAST to Identify Species from Proteins
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Lesson Overview 13.1 RNA Objectives: Contrast RNA and DNA.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Chapter 17 Hon. Adv. Biology Notes 12/01/06
Transcription and Translation
Lesson Overview 13.1 RNA
Lesson Overview 13.1 RNA.
Basic Molecular Biology
Lesson Overview 13.1 RNA.
Lesson Overview 13.1 RNA.
Using BLAST to Identify Species from Proteins
Lesson Overview 13.1 RNA.
Lesson Overview 13.1 RNA.
Presentation transcript:

Welcome to CS374 Algorithms in Biology

Overview Administrivia Molecular Biology and Computation DNA, proteins, cells, evolution Some examples of CS in biology Computer Scientists vs Biologists

CS374: Algorithms in Biology cs374.stanford.edu Attendance At most 2 classes missed without affecting grade Lectures Most important requirement Select an available topic and a day, send email to Serafim Read papers, meet with Serafim (1hr) 1-2 weeks before lecture Schedule long (2 hr) meeting the day before lecture Slides due at noon before lecture

CS374: Algorithms in Biology cs374.stanford.edu Scribing Please sign up on a first-come first-serve basis Due 1 week after lecture, edited & distributed 2 weeks after lecture Relly will help you edit Summaries Select 1 lecture among first 10, 1 lecture among rest Find one relevant paper Write a 1-page summary of the paper Paper reference Abstract Discussion Ask Relly for questions/feedback Have fun!

Structure of DNA double helix Phosphate Group Sugar Nitrogenous Base A, C, G, T T C A G DNA Physicist Ornithologist

DNA to RNA, and genes G A U C RNA: carries the “message” for “translating”, or “expressing” one gene DNA, ~3x109 long in humans Contains ~ 22,000 genes transcription translation folding

Structure of proteins Composed of a chain of amino acids. R | H2N--C--COOH H 20 possible groups Sequence of amino acids folds to form a complex 3-D structure. The structure of a protein is intimately connected to its function.

All living organisms are composed of cells

Genetics in the 20th Century

21st Century Technology drives an information revolution AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation A folding 1 C U G

Some examples of central role of CS 1. Sequencing AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides ~500 nucleotides

Some examples of central role of CS 1. Sequencing AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides A big puzzle ~60 million pieces Computational Fragment Assembly Introduced ~1980 1995: assemble up to 1,000,000 long DNA pieces 2000: assemble whole human genome

Complete genomes today More than 300 complete genomes have been sequenced

DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation 2 A folding 1 C U G

2. Gene Finding Where are the genes? In humans: ~22,000 genes ~1.5% of human DNA

2. Gene Finding Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Splice sites Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Stop codon TAG/TGA/TAA Splice sites The problem of predicting genes means to give coordinates for the exon boundaries. The first kind of information that prediction algorithms use, is the regular structure of a gene. Every gene starts with an ATG codon, and then exons alternate with introns; at the exon-intron boundaries, the splice sites, there are short words that are approximately preserved.

atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga Topic in CS374: Finding genes by comparing genomes of different species atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga

2 3 1 DNA to RNA, and genes easy RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation 2 easy A 3 folding 1 C U G

3. Protein Folding The amino-acid sequence of a protein determines the 3D fold The 3D fold of a protein determines its function Can we predict 3D fold of a protein given its amino-acid sequence? Holy grail of compbio—35 years old problem Molecular dynamics, robotics, machine learning, computational geometry Topics on Proteins in CS374 Protein Structure Finding the -helix motif Protein Domains Molecular Dynamics & Drug Targets 2. Protein Classification Machine Learning Graph Flow techniques Protein Comparison Latest multiple alignment tools

More than 200 complete genomes have been sequenced

Evolution

Evolution at the DNA level next generation OK OK OK X X Still OK?

4. Sequence Comparison Sequence conservation implies function Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes

Sequence Comparison—Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC query DB BLAST Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research

Comparison of Human, Mouse, and Rat

More DNA is coming… Topics on Genomics in CS374 Indexing Large Databases Newest encoding techniques 2. Genomic Rearrangements Finding the order of shuffles between two genomes Repeat Detection Identifying selfish sequences that replicate across DNA Gene Finding Finding genes by comparing DNA of different mammals Finding conserved elements How do we quantify how much evolution “likes” a given region?

5. Clustering of Microarrays Clinical prediction of Leukemia type 2 types Acute lymphoid (ALL) Acute myeloid (AML) Different treatment & outcomes Predict type before treatment? Bone marrow samples: ALL vs AML Measure amount of each gene

6. Protein networks Fresh research area Topics on Protein Networks in CS374 Integration Build networks from multiple sources 2. Alignment Compare networks across species Mathematical properties Modular, scale free Systems Biology The cell as a dynamic system 5. Graph Algorithms Fresh research area Construct networks from multiple data sources Navigate networks Compare networks across organisms Statistics Machine learning Graph algorithms Databases

Some goals of biology for the next 50 years List all molecular parts that build an organism Genes, proteins, other functional parts Understand the function of each part Understand how parts interact Study how function has evolved across all species Find genetic defects that cause diseases Design drugs rationally Sequence the genome of every human, use it for personalized medicine

Computer Scientists vs Biologists

Computer scientists vs Biologists (almost) Nothing is ever true or false in Biology Everything is true or false in computer science

Computer scientists vs Biologists Biologists strive to understand the complicated, messy natural world Computer scientists seek to build their own clean and organized virtual worlds

Computer scientists vs Biologists Biologists are obsessed with being the first to discover something Computer scientists are obsessed with being the first to invent or prove something

Computer scientists vs Biologists Biologists are comfortable with the idea that all data have errors Computer scientists are not

Computer scientists vs Biologists Computer scientists get high-paid jobs after graduation Biologists typically have to complete one or more 5-year post-docs...

Computer Science is to Biology what Mathematics is to Physics “Antedisciplinary” Science What is computational biology? http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010006