Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TA: Eugene Fratkin Tuesday&Thursday 2:45-4:00 Skilling Auditorium.

Slides:



Advertisements
Similar presentations
BNFO 615 Data Analysis in Bioinformatics Instructor Zhi Wei.
Advertisements

• Exam II Tuesday 5/10 – Bring a scantron with you!
Basic Molecular Biology for CS374 Scientific Method: The widely held philosophy that a theory can never be proved, only disproved, and that all attempts.
Basic Molecular Biology Many slides by Omkar Deshpande.
Basic Molecular Biology for CS262 Omkar Deshpande.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Marc Schaub Andreas Sundquist Monday & Wednesday.
A Zero-Knowledge Based Introduction to Biology Cory McLean 26 Sep 2008 Thanks to George Asimenos.
CS262 Introduction to Biology. Sources John Kimball’s Biology Pages Wikipedia Warning: ∀ rule ∃ exception.
© 2010 Pearson Education, Inc. Lectures by Chris C. Romero, updated by Edward J. Zalisko PowerPoint ® Lectures for Campbell Essential Biology, Fourth Edition.
CS374 A Zero-Knowledge Based Introduction to Biology.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Marc Schaub Andreas Sundquist Monday & Wednesday.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Eugene Davydov Christina Pop Monday & Wednesday.
Welcome to CS374 Algorithms in Biology. Overview Administrivia Molecular Biology and Computation  DNA, proteins, cells, evolution  Some examples of.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Welcome to CS374 Algorithms in Biology
The Molecular Biology of the Gene Identifying the Genetic Material Mendel’s experiments—inherit chromosomes that contain genes The Question now: –What.
Transcription and Translation
Unit 7 RNA, Protein Synthesis & Gene Expression Chapter 10-2, 10-3
How does DNA work? What is a gene?
Chapter 2 An Introduction to Genes and Genomes. Introduction to Molecular Biology.
Proteins are made by decoding the Information in DNA Proteins are not built directly from DNA.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
CHAPTER 12 PROTEIN SYNTHESIS AND MUTATIONS -RNA -PROTEIN SYNTHESIS -MUTATIONS.
How Proteins Are Made Mrs. Wolfe. DNA: instructions for making proteins Proteins are built by the cell according to your DNA What kinds of proteins are.
Translation PROTEIN SYNTHESIS. 4 Components used in Translation 1.mRNA- the message to be translated into protein. 2.Amino acids- the building blocks.
1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Human Biology Sylvia S. Mader Michael Windelspecht Chapter.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Fig Second mRNA base First mRNA base (5 end of codon) Third mRNA base (3 end of codon)
RNA Structure Like DNA, RNA is a nucleic acid. RNA is a nucleic acid made up of repeating nucleotides.
Chapter 11 DNA and Genes.
The Purpose of DNA To make PROTEINS! Proteins give us our traits (ex: one protein gives a person blue eyes, another gives brown Central Dogma of Molecular.
End Show Slide 1 of 39 Copyright Pearson Prentice Hall 12-3 RNA and Protein Synthesis 12–3 RNA and Protein Synthesis.
Online – animated web site 5Storyboard.htm.
Transcription and Translation
1 Human chromosomes: 50->250 million base pairs. Average gene: 3000 base pairs.
CS273a A Zero-Knowledge Based Introduction to Biology Courtesy of George Asimenos.
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
UCAG U UUU Phenylalanine (Phe) UCU Serine (Ser)UAU Tyrosine (Tyr)UGU Cysteine (Cys)U UUC PheUCC SerUAC TyrUGC CysC UUA Leucine (Leu)UCA SerUAA STOPUGA.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
G U A C G U A C C A U G G U A C A C U G UUU UUC UUA UCU UUG UCC UCA
Protein Synthesis Translation e.com/watch?v=_ Q2Ba2cFAew (central dogma song) e.com/watch?v=_ Q2Ba2cFAew.
Gene Translation:RNA -> Protein How does a particular sequence of nucleotides specify a particular sequence of amino acids?nucleotidesamino acids The answer:
From DNA to Protein.
Translation PROTEIN SYNTHESIS.
Whole process Step by step- from chromosomes to proteins.
Please turn in your homework
Protein Synthesis: Translation
A Zero-Knowledge Based Introduction to Biology
BIOLOGY 12 Protein Synthesis.
RNA Ribonucleic Acid.
A Zero-Knowledge Based Introduction to Biology
Section Objectives Relate the concept of the gene to the sequence of nucleotides in DNA. Sequence the steps involved in protein synthesis.
Protein Synthesis Translation.
UNIT 3: Genetics-DNA vs. RNA
Transcription You’re made of meat, which is made of protein.
20.2 Gene Expression & Protein Synthesis
How is the genetic code contained in DNA used to make proteins?
Transcription and Translation
Transcription and Translation
Today’s notes from the student table Something to write with
Transcription and Translation
Translation.
Replication, Transcription, Translation PRACTICE
Bellringer Please answer on your bellringer sheet:
Do now activity #5 How many strands are there in DNA?
Replication, Transcription, Translation PRACTICE
Replication, Transcription, Translation PRACTICE
Presentation transcript:

Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TA: Eugene Fratkin Tuesday&Thursday 2:45-4:00 Skilling Auditorium

Goals of this course Introduction to Computational Biology & Genomics  Basic concepts and scientific questions  Why does it matter?  Basic biology for computer scientists  In-depth coverage of algorithmic techniques  Current active areas of research Useful algorithms  Dynamic programming  String algorithms  HMMs and other graphical models for sequence analysis

Topics in CS262 Part 1: Basic Algorithms  Sequence Alignment & Dynamic Programming  Hidden Markov models, Context Free Grammars, Conditional Random Fields Part 2: Topics in computational genomics and areas of active research  DNA sequencing  Comparative genomics  Genes: finding genes, gene regulation  Proteins, families, and evolution  Networks of protein interactions

Course responsibilities Homeworks  4 challenging problem sets, 4-5 problems/pset Due at beginning of class Up to 3 late days (24-hr periods) for the quarter  Collaboration allowed – please give credit Teams of 2 or 3 students Individual writeups If individual (no team) then drop score of worst problem per problem set (Optional) Scribing  Due one week after the lecture, except special permission  Scribing grade replaces 2 lowest problems from all problem sets First-come first-serve, staff list to sign up

Reading material Books  “Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchison Chapters 1-4, 6, 7-8, 9-10  “Algorithms on strings, trees, and sequences” by Gusfield Chapters 5-7, 11-12, 13, 14, 17 Papers Lecture notes

Birth of Molecular Biology DNA Phosphate Group Sugar Nitrogenous Base A, C, G, T PhysicistOrnithologist

T C A C T G G C G A G T C A G C G A G U C A G C DNARNA A - T G - C T  U

DNA DNA is written 5’ to 3’ by convention AGACC = GGTCT 3’ 5’ 3’

Chromosomes H1DNA H2A, H2B, H3, H4 ~146bp telomere centromere nucleosome chromatin In humans: 2x22 autosomes X, Y sex chromosomes

The Genetic Dogma 3’ 5’ 3’ TAGGATCGACTATATGGGATTACAAAGCATTTAGGGA...TCACCCTCTCTAGACTAGCATCTATATAAAACAGAA ATCCTAGCTGATATACCCTAATGTTTCGTAAATCCCT...AGTGGGAGAGATCTGATCGTAGATATATTTTGTCTT AUGGGAUUACAAAGCAUUUAGGGA...UCACCCUCUCUAGACUAGCAUCUAUAUAA (transcription) (translation) Single-stranded RNA protein Double-stranded DNA

DNA to RNA to Protein to Cell DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding

Gene Transcription 3’ 5’ 3’ G A T T A C A... C T A A T G T...

Gene Transcription 3’ 5’ 3’ The promoter lies upstream of a gene Transcription factors recognize transcription factor binding sites and bind to them, forming a complex RNA polymerase binds the complex G A T T A C A... C T A A T G T...

Gene Transcription 3’ 5’ 3’ The two strands are separated G A T T A C A... C T A A T G T...

Gene Transcription 3’ 5’ 3’ An RNA copy of the 5’ → 3’ sequence is created from the 3’ → 5’ template G A T T A C A... C T A A T G T... G A U U A C A

Gene Transcription 3’ 5’ 3’ G A U U A C A... G A T T A C A... C T A A T G T... pre-mRNA5’3’

RNA Processing 5’ cap poly(A) tail intron exon mRNA 5’ UTR3’ UTR pre-mRNA

Gene Structure 5’3’ promoter 5’ UTR exons3’ UTR introns coding non-coding

How many? Genes:  ~22,000 in the human genome Exons per gene: ~ 8 on average (max: 148) Nucleotides per exon: 170 on average (max: 12k) Nucleotides per intron: 5,500 on average (max: 500k) Nucleotides per gene: 45k on average (max: 2,2M)

Composed of a chain of amino acids. R | H 2 N--C--COOH | H Proteins 20 possible groups Alanine Arginine Asparagine Aspartate Cysteine Glutamate Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine

R R | | H 2 N--C--COOH H 2 N--C--COOH | | H H Proteins Alanine Arginine Asparagine Aspartate Cysteine Glutamate Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine

Dipeptide R O R | II | H 2 N--C--C--NH--C--COOH | | H H This is a peptide bond Alanine Arginine Asparagine Aspartate Cysteine Glutamate Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine

Protein structure Linear sequence of amino acids folds to form a complex 3-D structure The structure of a protein is intimately connected to its function

Translation The ribosome synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid. mRNA P siteA site

The Genetic Code UCAG U UUU Phenylalanine (Phe)UCU Serine (Ser)UAU Tyrosine (Tyr)UGU Cysteine (Cys)U UUC PheUCC SerUAC TyrUGC CysC UUA Leucine (Leu)UCA SerUAA STOPUGA STOPA UUG LeuUCG SerUAG STOPUGG Tryptophan (Trp)G C CUU Leucine (Leu)CCU Proline (Pro)CAU Histidine (His)CGU Arginine (Arg)U CUC LeuCCC ProCAC HisCGC ArgC CUA LeuCCA ProCAA Glutamine (Gln)CGA ArgA CUG LeuCCG ProCAG GlnCGG ArgG A AUU Isoleucine (Ile)ACU Threonine (Thr)AAU Asparagine (Asn)AGU Serine (Ser)U AUC IleACC ThrAAC AsnAGC SerC AUA IleACA ThrAAA Lysine (Lys)AGA Arginine (Arg)A AUG Methionine (Met) or STARTACG ThrAAG LysAGG ArgG G GUU Valine (Val)GCU Alanine (Ala)GAU Aspartic acid (Asp)GGU Glycine (Gly)U GUC ValGCC AlaGAC AspGGC GlyC GUA ValGCA AlaGAA Glutamic acid (Glu)GGA GlyA GUG ValGCG AlaGAG GluGGG GlyG

Translation (tRNA) C C A Tryptophan anticodon

Translation 5’... A U U A U G G C C U G G A C U U G A... 3’ UTR Met Start Codon AlaTrpThr

Translation 5’... A U U A U G G C C U G G A C U U G A... 3’

Translation MetAla 5’... A U U A U G G C C U G G A C U U G A... 3’ Trp

Errors? What if the transcription / translation machinery makes mistakes? What is the effect of mutations in coding regions?

Reading Frames G C U U G U U U A C G A A U U A G

Synonymous Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G G C U U G U U U G C G A A U U A G Ala Cys Leu Arg Ile

Missense Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G G C U U G G U U A C G A A U U A G Ala Trp Leu Arg Ile

Nonsense Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G A G C U U G A U U A C G A A U U A G Ala STOP

Frameshift G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G C U U G U U A C G A A U U A G Ala Cys Tyr Glu Leu

Noncoding RNA 3’ 5’ 3’ G A U U A C A... G A T T A C A... C T A A T G T... 5’3’

Genetics in the 20 th Century

21 st Century AGTAGCACAGACTACGACGAGA CGATCGTGCGAGCGACGGCGTA GTGTGCTGTACTGTCGTGTGTG TGTACTCTCCTCTCTCTAGTCT ACGTGCTGTATGCGTTAGTGTC GTCGTCTAGTAGTCGCGATGCT CTGATGTTAGAGGATGCACGAT GCTGCTGCTACTAGCGTGCTGC TGCGATGTAGCTGTCGTACGTG TAGTGTGCTGTAAGTCGAGTGT AGCTGGCGATGTATCGTGGT AGTAGGACAGACTACGACGAGACGAT CGTGCGAGCGACGGCGTAGTGTGCTG TACTGTCGTGTGTGTGTACTCTCCTC TCTCTAGTCTACGTGCTGTATGCGTT AGTGTCGTCGTCTAGTAGTCGCGATG CTCTGATGTTAGAGGATGCACGATGC TGCTGCTACTAGCGTGCTGCTGCGAT GTAGCTGTCGTACGTGTAGTGTGCTG TAAGTCGAGTGTAGCTGGCGATGTAT CGTGGT

Computational Biology Organize & analyze massive amounts of biological data  Enable biologists to use data  Form testable hypotheses  Discover new biology AGTAGCACAGACTACGACGAGA CGATCGTGCGAGCGACGGCGTA GTGTGCTGTACTGTCGTGTGTG TGTACTCTCCTCTCTCTAGTCT ACGTGCTGTATGCGTTAGTGTC GTCGTCTAGTAGTCGCGATGCT CTGATGTTAGAGGATGCACGAT GCTGCTGCTACTAGCGTGCTGC TGCGATGTAGCTGTCGTACGTG TAGTGTGCTGTAAGTCGAGTGT AGCTGGCGATGTATCGTGGT

DNA to RNA to Protein to Cell DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding

Some Topics in CS Sequencing AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides ~500 nucleotides

Some Topics in CS Sequencing AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides Computational Fragment Assembly Introduced ~ : assemble up to 1,000,000 long DNA pieces 2000: assemble whole human genome A big puzzle ~60 million pieces

Complete genomes today More than 300 complete genomes have been sequenced

Where are the genes? 2. Gene Finding In humans: ~22,000 genes ~1.5% of human DNA

atg tga ggtgag caggtg cagatg cagttg caggcc ggtgag

3. Molecular Evolution

Evolution at the DNA level OK X X Still OK? next generation

4. Sequence Comparison Sequence conservation implies function Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes

Sequence Comparison—Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research query DB BLAST

5. RNA Structure Predict: Given: AGCAGAGUGG … an unfolded RNA sequence AGCACAGUGA … + aligned homologs ACUAGACAGG … CGCCGAGUCG … AGCAGUGUGG … bulge loop helix (stem) hairpin loop internal loop multi- branch loop which nucleotides base pair?

6. Protein networks Fresh research area Construct networks from multiple data sources Navigate networks Compare networks across organisms

Computer Scientists vs Biologists

Computer scientists vs Biologists Nothing is ever true or false in Biology Everything is true or false in computer science

Computer scientists vs Biologists Biologists strive to understand the complicated, messy natural world Computer scientists seek to build their own clean and organized virtual worlds

Biologists are obsessed with being the first to discover something Computer scientists are obsessed with being the first to invent or prove something Computer scientists vs Biologists

Biologists are comfortable with the idea that all data have errors Computer scientists are not Computer scientists vs Biologists

Computer scientists get high-paid jobs after graduation Biologists typically have to complete one or more 5-year post-docs... Computer scientists vs Biologists

Computer Science is to Biology what Mathematics is to Physics