1 Annotation of the bacteriophage 933W genome: an in- class interactive web-based exercise.

Slides:



Advertisements
Similar presentations
The DNA Story Germs, Genes, and Genomics 4. Heredity Genes DNA Manipulating DNA.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Today’s Lecture Topics
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Unit 1: DNA and the Genome Key area 8: Genomic sequencing.
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
BIO513: Lecture 1. Central dogma “The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information.
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Model Organisms and Databases. Model Organisms Characteristics of model organisms in genetics studies –Genetic history well known –Short life cycle; large.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Synthetic biology Genome engineering Chris Yellman, U. Texas CSSB.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
Lesson 10 Bioinformatics
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
AP Biology A Lot More Advanced Biotechnology Tools Sequencing.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Organizing information in the post-genomic era The rise of bioinformatics.
Chapter 21 Eukaryotic Genome Sequences
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Chap. 1 basic concepts of Molecular Biology Introduction to Computational Molecular Biology Chapter 1.
MCB 7200: Molecular Biology Biotechnology terminology Common hosts and experimental organisms Transcription and translation Prokaryotic gene organization.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Chapter 1 Introduction.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Starter What do you know about DNA and gene expression?
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
Visualizing Biosciences Genomics & Proteomics. “Scientists Complete Rough Draft of Human Genome” - New York Times, June 26, 2000 The problem: –3 billion.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Microbial genomics.
Chapter 14 Genomes and Genomics
Ch 12: Genomes.
MCB 7200: Molecular Biology
Annotating with GO: an overview
BIOL 433 Plant Genetics Term 2,
Today’s Lecture Topics
PBIO 4500/5500: Biotechnology and Genetic Engineering
Genomes and Their Evolution
EL: To find out what a genome is and how gene expression is regulated
Department of Genetics • Stanford University School of Medicine
Genomes and Their Evolution
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Genome Center of Wisconsin, UW-Madison
Today… Review a few items from last class
Genomes and Their Evolution
CHMI 2227E Biochemistry I Gene expression
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BIOL 433 Plant Genetics Term 2,
BSC1010: Intro to Biology I K. Maltz Chapter 21.
From Mendel to Genomics
Evolution of Genomes Chapter 21.
A Lot More Advanced Biotechnology Tools
Presentation transcript:

1 Annotation of the bacteriophage 933W genome: an in- class interactive web-based exercise

2 Genome: The entire collection of genetic information of an organism Everybody has a genome –Viruses, bacteria, archaea, eukaryotes (fungi, plants & animals) –They can range from thousands to billions of base pairs (bp) of DNA (some viruses have RNA genomes) –Genomes can consist of one or many chromosomes –Chromosomes can be linear or circular

3 History of DNA sequencing and genome research -Methods for determining the sequence of DNA were developed in the early 1970’s. -Frederick Sanger and colleagues determine the first complete genome sequence of all 5,375 nucleotides of bacteriophage  X174 (sequence completed in 1977, Nobel prize awarded in 1980). [Sanger F. et al., Nature 265, (1977)] 10 genes

4 Sanger developed a method called "shotgun" sequencing and completed the 48,502 bp genome of bacteriophage lambda in 1982 A map of the lambda genome This method allows sequencing projects to proceed much faster and is still commonly used. History of DNA sequencing and genome research (cont.) [Sanger F. et al. J. Mol. Biol. 162, (1982)] 46 genes Animation

Haemophilus influenza sequenced Craig Venter and colleagues at The Institute for Genomic Research (TIGR) reported the first complete genome sequence of a (nonviral) organism, Haemophilus influenza. used shotgun sequencing assembled ~24,000 DNA fragments into the whole genome using the “TIGR assembler” software 1,830,138 bp genome 13 months to sequence 1,709 genes

Yeast Genome Sequenced Saccharomyces cerevisiae (ale yeast) The yeast genome sequence was completed by an international consortium (74 labs) started in chromosomes, 12,070,900 bp. ~6,269 genes Cells of S. cerevisiae by David Baumler

7 Other eukaryotic genomes Caenorhabditis elegans Nematode ~19,000 genes, completed 1998 Drosophila melanogaster, Fruit fly ~13,000 genes, completed 2000 Arabidopsis thaliana (plant) ~26,000 genes, completed 2000 Humans??? Increase in genome size

First human chromosome sequenced 2001 Human genome “completed” 23 chromosomes (haploid genome), 3,038,000,000 bp Francis Collins and Craig Venter September 2007, Venter publishes the sequence of his own diploid genome Venter announces plan to sequence 10,000 human genomes in 10 years in the future $100 human genomes ~30-40,000 genes The Human genome

9 Part of a genome sequence TCAGCGAAGATGAGATAGTTTTTAAAGGTGGGATTTCCCCACCTTTAAAAAGCGAGAAGTCCCGGTTTTAA AGAGGAGTAAAATCCTCTTTTTCTAGCCCACTCAGGTGGTTTTTTTGGTTTTCGCTCCTTGCCGCATCTTC TGTGCCTTTGATGGCGGCTGGTTGGGGTGAAAGGCTGCATATTCCAGAATTTCAGACAGTAGATTGTTTTT GAAATCTTCCGTTTTATCGTTGACGAACTTAACCATCCTGTTGAAATCATCTTCCTTTGATACACCTTCAG GAAATGCCTTAGGAACTGATGTTTGGCTATCCAAGGCATCTTGCAATATCTGCACGATCTCCGAATTCATT GATCGCCCATTGGCCTTTGCTCTGGCGGCAACTGCGTCACGCATACCGTCAGGCATCCTAACTGTAAATCT CTCAATGAAAGCTGGATCTTCTTTTTCAGTCATCATCTTAAACCATAAAAATTTATACAAAACACACTAGC ATCATATTGACATTACCCACAATGACATCATAATGGTGTCAGGCATCAAAATGATGTCATCATGACAAGGG GAAAGTAAATGCAAGATGTTCTCTATACAGGTCGTAAGAACGACAGCTTTCAGCTTCGTCTGCCTGAGCGA ATGAAAGAAGAGATCCGTCGCATGGCAGAGATGGACGGCATTTCGATTAATTCTGCAATCGTGCAGCGCCT TGCTAAAAGCTTGCGTGAGGAAAGAGTTAATGGGCAGTAAAAACAGCGAAGCCCGGAAGTGTGGGGACACT AACCGGGCTTCTAATGTCAGTTACCTAGCGGGAAACCAACAATGACCAGTATAGCAATCTTTGAAGCAGTA AACACTATCTCTCTTCCATTCCACGGACAGAAGATCATAACTGCGATGGTGGCGGGTGTGGCGTATGTGGC AATGAAGCCCATCGTGGAAAACATCGGTTTAGACTGGAAGAGCCAGTATGCCAAGCTCGTTAGTCAGCGTG AAAAGTTCGGGTGTGGTGATATCACCATACCTACCAAAGGTGGTGTTCAGCAGATGCTTTGCATCCCTTTG AAGAAACTGAATGGATGGCTCTTCAGCATTAACCCAGCAAAAGTACGTGATGCAGTTCGTGAAGGTTTAAT TCGCTATCAAGAAGAGTGTTTTACAGCTTTGCACGATTACTGGAGCAAAGGTGTTGCAACGAATCCCCGGA CACCGAAGAAACAGGAAGACAAAAAGTCACGCTATCACGTTCGCGTTATTGTCTATGACAACCTGTTTGGT GGATGCGTTGAATTTCAGGGGCGTGCGGATACGTTTCGGGGGATTGCATCGGGTGTAGCAACCGATATGGG ATTTAAGCCAACAGGATTTATCGAGCAGCCTTACGCTGTTGAAAAAATGAGGAAGGTCTACTGATTGGCGT ATTGGAAGGCGCAAAAAGAAAAGCCAGCAGATGGGCTGCTGGCATTCATTGGGTATATGAACTTTCGGAGA ACATATGAAGTCAATTATCAAGCATTTTGAGTTTAAGTCAAGTGAAGGGCATGTAGTGAGCCTTGAGGCTG CAAGCTTTAAAGGCAAGCCAGTTTTTTTAGCAATTGATTTGGCTAAGGCTCTCGGGTACTCAAATCCGTCA

10 Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1.-identifying elements on the genome, a process called “structural annotation” or “gene finding”. Today much of this is automated with computers, yet ~50-90% of the actual genes can be predicted, still requires a person(s) to finish predicting them all. 2.-attaching information to these elements such as their molecular and biological functions. What exactly are annotations?

11 Structural annotation consists of the identification of genomic elements (e.g. genes). Open Reading Frames (ORFs) also called coding sequences (CDSs) must have a start codon and a stop codon location of regulatory motifs (such as promoters and ribosome binding sites) This step is typically automated using gene prediction software Annotation step #1: Structural Annotation Example of a gene - the start codon is green and the stop codon is red The genetic code (Courtesy of the National Institutes of Health)

12 Functional annotation: consists in attaching biological information to genomic elements. biochemical function involved regulation and interactions expression cellular location Three examples of annotations for one gene: Name/synonym: a short “word” used to refer to the gene (Ex. ureC) Product: a descriptive protein name (Ex. Urease gamma subunit) Function : Describes what the protein does (Ex. Catalyzes the hydrolysis of urea to form ammonia and carbon dioxide) Annotation step #2

13 When is the gene product a “Hypothetical Protein” When a gene is identified, but the predicted protein sequence doesn’t have an analog in protein database(s) (for example the search in Interproscan returns no result) A protein whose existence is predicted, but there is no evidence that it is expressed in vivo These are called hypothetical protein, and are added as product annotations Sometimes the function of a hypothetical protein can be predicted by searching for domains in a protein database, often though they are annotated as function “unknown” Even in the genome of “the most studied microorganism”, non- pathogenic E. coli K-12, ~30% of the genes are annotated as hypothetical proteins.

14 A genbank file Name of the gene (ureC) Product Function Organism from which the sequence was characterized List of annotated features Structural annotation

15 How is all of the sequence data stored and what do we do with it? National Center for Biotechnology Information (NCBI) ( Where are all of the genome sequences? What tools are there to use? #1 BLAST – search for similar sequences #2 PubMed – search for related literature

16 We are going to annotate a phage genome today What type of genes should we anticipate finding in the phage genome? Structural components of a phage Phage replication proteins Machinery for integration into the host genome You are going to annotate the bacteriophage 933W genome. This phage was found in the genome of E. coli O157:H7. The phage genome contains the genes stx2A and stx2B that encode the shiga toxin 2 protein, that contributes to disease in humans. Animation Courtesy of Microbelibrary.org

17 Tools you will use to annotate 933W #1 ERIC database: this is where you will get the sequences and record your functional annotations. #2 BLAST: this is a tool you will use to find similar sequences in the NCBI database of all publicly available known and predicted proteins #3 InterproScan: this is a tool you will use to find similar sequences in a database of protein families (groups of related proteins) and domains (functionally significant subregions of proteins)

18 Links to additional resources ERIC – Enteropathogen Resource Integration Center Home page: Annotation guide: NCBI – National Center for Biotechnology Information Home page: BLAST home: BLAST guide: Interpro – a database of protein families and domains Home page: Manual: InterproScan: For additional information on using Blast and Interproscan, we recommend the book “Bioinformatics for Dummies”