Lecture 2 Gene discovery

Slides:



Advertisements
Similar presentations
CH 11.4 & 11.5 “DNA to Polypeptide”.
Advertisements

Transcription and Translation
2.7 DNA Replication, transcription and translation
Transcription and Translation
VII RNA and Protein Synthesis
Transcription and Translation
Chapter 13.1 and 13.2 RNA, Ribosomes, and Protein Synthesis
RNA and Protein Synthesis
RNA & Protein Synthesis.
SC.912.L.16.5 Protein Synthesis: Transcription and Translation.
Protein Synthesis 6C transcription & translation.
8.4 Transcription KEY CONCEPT Transcription converts a gene into a single-stranded RNA molecule.
12-3 RNA and Protein Synthesis
Peptide Bond Formation Walk the Dogma RECALL: The 4 types of organic molecules… CARBOHYDRATES LIPIDS PROTEINS (amino acid chains) NUCLEIC ACIDS (DNA.
12-3 RNA AND PROTEIN SYNTHESIS. 1. THE STRUCTURE OF RNA.
What is central dogma? From DNA to Protein
Transcription and Translation How genes are expressed (a.k.a. How proteins are made) Biology.
Chapter 15: Protein Synthesis
PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION. TRANSLATING THE GENETIC CODE ■GENES: CODED DNA INSTRUCTIONS THAT CONTROL THE PRODUCTION OF PROTEINS WITHIN.
GENOME: an organism’s complete set of genetic material In humans, ~3 billion base pairs CHROMOSOME: Part of the genome; structure that holds tightly wound.
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
Composed of 4 nucleotides, that always pair the same.
Protein Synthesis Traits are determined by proteins (often enzymes) *Protein – 1 or more polypeptide chains *Polypeptide – chain of amino acids linked.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
Question of the DAY Jan 14 During DNA Replication, a template strand is also known as a During DNA Replication, a template strand is also known as a A.
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
Protein Synthesis. RNA vs. DNA Both nucleic acids – Chains of nucleotides Different: – Sugar – Types of bases – Numbers of bases – Number of chains –
CH 12.3 RNA & Protein Synthesis. Genes are coded DNA instructions that control the production of proteins within the cell…
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
DNA to RNA to Protein. RNA Made up of 1. Phosphate 2. Ribose (a sugar) 3. Four bases RNA bases are: Adenine Guanine Cytosine Uracil (instead of thymine)
PROTEIN SYNTHESIS.
CH 12.3 RNA & Protein Synthesis.
The Central Dogma Transcription & Translation
RNA & Protein synthesis
Transcription and Translation
Transcription: DNA  mRNA
Transcription Part of the message encoded within the sequence of bases in DNA must be transcribed into a sequence of bases in RNA before translation can.
From DNA to Proteins Transcription.
Transcription and Translation
Protein Synthesis in Detail
Transcription and Translation
RNA.
Outline What is an amino acid / protein
Transcription and Translation
12-3 RNA and Protein Synthesis
Transcription and Translation
Overview of Protein Synthesis And RNA Processing
Transcription and Translation
Daily Warm-Up Dec. 11th -What are the three enzymes involved with replication? What is the function of each? Homework: -Read 13.1 Turn in: -Nothing.
Transcription and Translation
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Transcription and Translation
Copyright Pearson Prentice Hall
How genes on a chromosome determine what proteins to make
Copyright Pearson Prentice Hall
Transcription and Translation
12-3 RNA and Protein Synthesis
Reading Frames and ORF’s
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
Replication, Transcription, Translation
Copyright Pearson Prentice Hall
DNA Transcription and Translation
RNA.
Gene Discovery.
Transcription and Translation
12-3: RNA and Protein Synthesis (part 1)
Gene Discovery.
DNA Deoxyribonucleic Acid.
Presentation transcript:

Lecture 2 Gene discovery

The Central Dogma

Transcription RNA polymerase is the enzyme that builds an RNA strand from a gene RNA that is transcribed from a gene is called messenger RNA (mRNA)

RNA RNA is like DNA except: backbone is a little different usually single stranded the base uracil (U) is used in place of thymine (T) A strand of RNA can be thought of as a string composed of the four letters: A, C, G, U

The Genetic Code 64 combinations: 20 amino acids + stop codon

Genes include both coding regions as well as control regions

Fasta format >YAH1 sacCer1.chr16:73363-73881 ATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGAACAT CGCAGCACATCTTTTACGCACCTCTCCATCTCTGCTCACACGCACCACCA CAACCACAAGATTTCTGCCCTTCTCTACGTCTTCGTTCTTAAACCATGGC CATTTGAAAAAACCGAAACCAGGCGAAGAACTGAAGATAACTTTTATTCT GAAGGATGGCTCCCAGAAGACGTACGAAGTCTGTGAGGGCGAAACCATCC TGGACATCGCTCAAGGTCACAACCTGGACATGGAGGGCGCATGCGGCGGT TCTTGTGCCTGCTCCACCTGTCACGTCATCGTTGATCCAGACTACTACGA TGCCCTGCCGGAACCTGAAGATGATGAAAACGATATGCTCGATCTTGCTT ACGGGCTAACAGAGACAAGCAGGCTTGGGTGCCAGATTAAGATGTCAAAA GATATCGATGGGATTAGAGTCGCTCTGCCCCAGATGACAAGAAACGTTAA TAACAACGATTTTAGTTAA >GAL4 sacCer1.chr16:79711-82356 ATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAA AAAGCTCAAGTGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGA ACAACTGGGAGTGTCGCTACTCTCCCAAAACCAAAAGGTCTCCGCTGACT AGGGCACATCTGACAGAAGTGGAATCAAGGCTAGAAAGACTGGAACAGCT ATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATTTTGAAAATGG ATTCTTTACAGGATATAAAAGCATTGTTAACAGGATTATTTGTACAAGAT AATGTGAATAAAGATGCCGTCACAGATAGATTGGCTTCAGTGGAGACTGA TATGCCTCTAACATTGAGACAGCATAGAATAAGTGCGACATCATCATCGG AAGAGAGTAGTAACAAAGGTCAAAGACAGTTGACTGTATCGATTGACTCG GCAGCTCATCATGATAACTCCACAATTCCGTTGGATTTTATGCCCAGGGA TGCTCTTCATGGATTTGATTGGTCTGAAGAGGATGACATGTCGGATGGCT TGCCCTTCCTGAAAACGGACCCCAACAATAATGGGTTCTTTGGCGACGGT TCTCTCTTATGTATTCTTCGATCTATTGGCTTTAAACCGGAAAATTACAC

Translation Codon: triplet of nucleotides Start codon: ATG >YAH1 sacCer1.chr16:73363-73881 ATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGAACATCGCAGCACATCTTTTACGCACCTCTCCATCTCTGCTCACACGCACCACCACAACCACAAGATTTCTGCCCTTCTCTACGTCTTCGTTCTTAAACCATGGCCATTTGAAAAAACCGAAACCAGGCGAAGAACTGAAGATAACTTTTATTCTGAAGGATGGCTCCCAGAAGACGTACGAAGTCTGTGAGGGCGAAACCATCCTGGACATCGCTCAAGGTCACAACCTGGACATGGAGGGCGCATGCGGCGGTTCTTGTGCCTGCTCCACCTGTCACGTCATCGTTGATCCAGACTACTACGATGCCCTGCCGGAACCTGAAGATGATGAAAACGATATGCTCGATCTTGCTTACGGGCTAACAGAGACAAGCAGGCTTGGGTGCCAGATTAAGATGTCAAAAGATATCGATGGGATTAGAGTCGCTCTGCCCCAGATGACAAGAAACGTTAATAACAACGATTTTAGTTAA Codon: triplet of nucleotides Start codon: ATG Stop codon: TAA

Translation >YAH1 sacCer1.chr16:73363-73881 ATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGAACATCGCAGCACATCTTTTACGCACCTCTCCATCTCTGCTCACACGCACCACCACAACCACAAGATTTCTGCCCTTCTCTACGTCTTCGTTCTTAAACCATGGCCATTTGAAAAAACCGAAACCAGGCGAAGAACTGAAGATAACTTTTATTCTGAAGGATGGCTCCCAGAAGACGTACGAAGTCTGTGAGGGCGAAACCATCCTGGACATCGCTCAAGGTCACAACCTGGACATGGAGGGCGCATGCGGCGGTTCTTGTGCCTGCTCCACCTGTCACGTCATCGTTGATCCAGACTACTACGATGCCCTGCCGGAACCTGAAGATGATGAAAACGATATGCTCGATCTTGCTTACGGGCTAACAGAGACAAGCAGGCTTGGGTGCCAGATTAAGATGTCAAAAGATATCGATGGGATTAGAGTCGCTCTGCCCCAGATGACAAGAAACGTTAATAACAACGATTTTAGTTAA M--L--K--I--V--T--R--A--G--H--T--A--R--I--S--N--I--A--A--H--L--L--R--T--S--P--S--L--L--T--R--T--T--T--T--T--R--F--L--P--F--S--T--S--S--F--L--N--H--G--H--L--K--K--P--K--P--G--E--E--L--K--I--T--F--I--L--K--D--G--S--Q--K--T--Y--E--V--C--E--G--E--T--I--L--D--I--A--Q--G--H--N--L--D--M--E--G--A--C--G--G--S--C--A--C--S--T--C--H--V--I--V--D--P--D--Y--Y--D--A--L--P--E--P--E--D--D--E--N--D--M--L--D--L--A--Y--G--L--T--E--T--S--R--L--G--C--Q--I--K--M--S--K--D--I--D--G--I--R--V--A--L--P--Q--M--T--R--N--V--N--N--N--D--F--S--* MLKIVTRAGHTARISNIAAHLLRTSPSLLTRTTTTTRFLPFSTSSFLNHGHLKKPKPGEELKITFILKDGSQKTYEVCEGETILDIAQGHNLDMEGACGGSCACSTCHVIVDPDYYDALPEPEDDENDMLDLAYGLTETSRLGCQIKMSKDIDGIRVALPQMTRNVNNNDFS*

If reading frame is unknown TCTCTACGATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGAACATCGCAGCACATCTTTTACGCACCTCTCCATCTCTGCTCACACGCACCACCACAACCACAAGATTTCTGCCCTTCTCTACGTCTTCGTTCTTAAACCATGGCCATTTGAAAAAACCGAAACCAGGCGAAGAACTGAAGATAACTTTTATTCTGAAGGATGGCTCCCAGAAGACGTACGAAGTCTGTGAGGGCGAAACCATCCTGGACATCGCTCAAGGTCACAACCTGGACATGGAGGGCGCATGCGGCGGTTCTTGTGCCTGCTCCACCTGTCACGTCATCGTTGATCCAGACTACTACGATGCCCTGCCGGAACCTGAAGATGATGAAAACGATATGCTCGATCTTGCTTACGGGCTAACAGAGACAAGCAGGCTTGGGTGCCAGATTAAGATGTCAAAAGATATCGATGGGATTAGAGTCGCTCTGCCCCAGATGACAAGAAACGTTAATAACAACGATTTTAGTTAATGCCCTGC

Open reading frame (ORF) One can represent a genome of length n as a sequence of n/3 codons The three ‘stop’ codons (TAA,TAG, and TGA) break this sequence into segments, one between every two consecutive stop codons The subsegments of these that start from a start codon (ATG) are ORFs

Six reading frames TCTCTACGATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGTGAA reading frame 1 S--L--R--C--*--K--L--L--L--G--L--D--T--Q--L--E--Y--R--E-- reading frame 2 CTCTACGATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGTGAA L--Y--D--A--E--N--C--Y--S--G--W--T--H--S--*--N--I--V-- reading frame 3 TCTACGATGCTGAAAATTGTTACTCGGGCTGGACACACAGCTAGAATATCGTGAA S--T--M--L--K--I--V--T--R--A--G--H--T--A--R--I--S--* reading frame 4 (reverse complement frame 1) TTCACGATATTCTAGCTGTGTGTCCAGCCCGAGTAACAATTTTCAGCATCGTAGAGA F--T--I--F--*--L--C--V--Q--P--E--*--Q--F--S--A--S--*--R reading frame 5 (reverse complement frame 2) TCACGATATTCTAGCTGTGTGTCCAGCCCGAGTAACAATTTTCAGCATCGTAGAGA S--R--Y--S--S--C--V--S--S--P--S--N--N--F--Q--H--R--R reading frame 6 (reverse complement frame 3) CACGATATTCTAGCTGTGTGTCCAGCCCGAGTAACAATTTTCAGCATCGTAGAGA H--D--I--L--A--V--C--P--A--R--V--T--I--F--S--I--V--E

Size of ORF Total number of codons: 43 = 64 Assuming random occurrences of A,C,G,Ts with equal probability: The probability of a codon being start codon is: 1/64 The probability of a codon being stop codon is: 3/64 Define S to be the length of an ORF (the number of codons, excluding the stop-codon) Question: what is the probability distribution of S ?

Distribution of randomly occurred ORF length P(S=s) = (1-p)s-1 p where p = 3/64, s>0

Significance measure Suppose you discovered an ORF with length s. How surprised is this, if assuming A,C,G,Ts are randomly distributed? Statistics: Null model: A,C,G,Ts are randomly distributed with equal probability -> P(s)=(1-p)s-1p P-value: The probability of observing an ORF with S≥s under the null model. P-value = P(S≥s) =x=s (1-p)x-1p = 1 - x=1s (1-p)x-1p

Gene discovery in higher order organisms More complicated than ORF discovery due to more complex gene structure: multiple exons separated by introns. Methods: Statistical models of codon usage Markov models of gene structure Comparing across different species

RNA Splicing: pre mRNA --> mRNA

Genes include both coding regions as well as control regions