Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008.

Slides:



Advertisements
Similar presentations
Introduction to genomes & genome browsers
Advertisements

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Genomics, Genetics and Biochemistry
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V What are Tandem repeats? - How does one find.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Alternative splicing and evolution Daniel Jeffares.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
The Human Genome Project Public: International Human Genome Sequencing Consortium (aka HUGO) Private: Celera Genomics, Inc. (aka TIGR)
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Manipulating the Genome: DNA Cloning and Analysis 20.1 – 20.3 Lesson 4.8.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Comparative Genomics of the Eukaryotes
Synthetic biology Genome engineering Chris Yellman, U. Texas CSSB.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
RICE GENOMICS: Progress and prospects. What is genomics?  The genome of a plant, animal or microbe is the totality of its genetic information including.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
歐亞書局 PRINCIPLES OF BIOCHEMISTRY Chapter 9 DNA-Based Information Technologies.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
1. What is a gene? Definition: A gene is a discrete unit of DNA (or RNA in some viruses) that encodes a nucleic acid or protein product that contributes.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Genomics BIT 220 Chapter 21.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Lecture 10 Genes, genomes and chromosomes
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Molecular Genetics Introduction to
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Chapter 18.
How many genes are there?
Accessing and visualizing genomics data
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
The genome of prokaryotes and eukaryotes- nuclear and extranuclear genetic organization.
MCB 7200: Molecular Biology
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
BIOL 433 Plant Genetics Term 2,
Genomics: Sequencing Is the Basis for Identifying and Mapping All Genes in a Genome Genomics, the study of genomes, encompasses structural genomics, functional.
EL: To find out what a genome is and how gene expression is regulated
Today… Review a few items from last class
Genomes and Their Evolution
BIOL 2416 Chapter 1: Genetics: An Introduction
Genome organization and Bioinformatics
Evolution of eukaryote genomes
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
BIOL 433 Plant Genetics Term 2,
From Mendel to Genomics
Introduction to Sequencing
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008

Contents:  Introduction  Methods  Genomic Structure  Annotation  Genomic Content  Concluding Remarks

All animals are equal but some animals are more equal than others. George Orwell, Animal Farm

Introduction  Popular models for genetic exploration: House mouse Yeast Escherichia coli Corn Caenorhabditis elegans Arabidopsis Zebrafish  Drosophila is the most popular model

Why is so much attention paid to Drosophila genome?

 61 % human diseases have recognizable correspondence in genetic code of fruit fly  50 % of protein sequences have analogs with mammals Drosophila is  used in genetic simulations of some human diseases, including Parkinson's disease, Alzheimer's sclerosis and disease of Hantington  used for exploration of mechanisms laid in the basis of immunity, diabetes, cancer and narcotic dependence  model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans Drosophila and human development are homologous processes. Unlike humans, Drosophila is subject to easy genetic manipulation. As a result, most of what we know about the molecular basis of animal development has come from studies of model systems such as Drosophila.

The Object of Investigation

There are 12 genomes of Drosophila species were sequenced and results were published in three papers : Mark D. Adams et al. The Genome Sequence of Drosophila melanogaster / Science, 2000 Stephen Richards et al. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution / Genome Research, 2005 Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny / Nature, 2007

Phylogram of the 12 sequenced species of Drosophila.

Methods:  Whole-genome shotgun sequencing (WGS);  Clone-based sequencing;  Bacterial artifcial chromosome (BAC) physical mapping

Mitotic chromosomes of D. melanogaster, showing euchromatic regions, heterochromatic regions, and centromeres. Arms of the autosomes are designated 2L, 2R, 3L, 3R, and 4. The euchromatic length in megabases is derived from the sequence analysis. The eterochromatic lengths are estimated from direct measurements of mitotic chromosome lengths. The heterochromatic block of the X chromosome is polymorphic among stocks and varies from one-third to one-half of the length of the mitotic chromosome. The Y chromosome is nearly entirely heterochromatic.

Goals of WGS sequencing:  to test the strategy on a large and complex eukaryotic genome as a prelude to sequencing the human genome  to provide a complete, high-quality genomic sequence to the Drosophila research community so as to advance research in this important model organism

Steps of WGS sequencing:  all the DNA of an organism is sheared into segments a few thousand base pairs (bp) in length  cloned directly into a plasmid vector suitable for DNA sequencing  the fragments are assembled in overlapping segments to reconstruct the complete genome sequence

Genomic Structure VectorInsert size (kbp) Paired sequences Total sequences Clone coverage Sequence coverage High-copy plasmid2732,3801,903, Low-copy plasmid10548,9741,278, BAC1309,86919, Total1,290,8233,201, Source of data for assembly: Whole-genome shotgun sequencing.

BAC and P1 clone-based sequencing Chromosomal region SizeFinished sequence (Mb) Total sequenced BACs (P1s) in joint assembly X (1 - 3)32.50 X (4 -11) X (12-20) L R L L R Total

 “Scaffold” is a set of contiguous sequences (contigs), ordered and oriented with respect to one another by mate-pairs.  Gaps within scaffolds are called “sequence gaps”;  gaps between scaffolds are called “physical gaps” because there are no clones identified spanning the gap

Assembly status of the Drosophila genome. Each chromosome arm is depicted with information on content and assembly status: (A) ransposable elements, (B) gene density, (C) scaffolds from the joint assembly, (D) scaffolds from the WGS-only assembly, (E) polytene chromosome divisions, and (F) clone-based tiling path. Gene density is plotted in 50-kb windows; the scale is from 0 to 30 genes per 50 kb. Gaps between scaffolds are represented by vertical bars in (C) and (D). Clones colored red in the tiling path have been completely sequenced; clones colored blue have been draft-sequenced. Gaps shown in the tiling path do not necessarily mean that a clone does not exist at that position, only that it has not been sequenced. Each chromosome arm is oriented left to right, such that the centromere is located at the right side of X, 2L, and 3L and the left side of 2R and 3R.

Annotation  Tasks:  prediction of transcript and protein sequence  prediction of function for each predicted protein There are 13,601 genes, encoding 14,113 transcripts through alternative splicing in some genes. The GO project is a collaboration among FlyBase, the Saccharomyces Genome Database, and Mouse Genome Informatics.

Annotation  The largest predicted protein is Kakapo amino acids  The smallest is the 21–amino acid ribosomal protein L38  56,673 predicted exons, an average of four per gene = 24.1 Mb of the total euchromatic sequence  The size of the average predicted transcript is 3058 bp  292 transfer RNA genes and 26 genes for spliceosomal small nuclear RNAs (snRNAs) were identified  The total number of protein-coding genes, 13,601 is far less than the 27,000 of the plant Arabidopsis thaliana  The average gene density in Drosophila is one gene per 9 kb.

Remarks of Genomic Content  The genomic sequence has shed light on some of the processes common to all cells, such as replication, chromosome segregation, and iron metabolism  There are new findings about important classes of chromosomal proteins that allow insights into gene regulation and the cell cycle  The correspondence of Drosophila proteins involved in gene expression and metabolism to their human counterparts reaffirms that the fly represents a suitable experimental platform for the examination of human disease networks involved in replication, repair, translation, and the metabolism of drugs and toxins

Remarks of Genomic Content  The large diversity of transcription factors is likely related to the substantial regulatory complexity of the fly  Many of the genes involved in core processes are single-copy genes and thus provide starting points for detailed studies of phenotype, free of the complications of genetically redundant relatives

Concluding Remarks  There is no clear boundary between euchromatin and heterochromatin  Over a region of 1 Mb, there is a gradual increase in the density of transposable elements and other repeats, to the point that the sequence is nearly all repetitive

Concluding Remarks  There are clearly genes within eterochromatin, and it is suspected that most of 3.8 Mb of unmapped scaffolds represent such genes, both near the centromeres and on the Y chromosome  The diversity of predicted genes and gene products will serve as the raw material for continued experimental work aimed at unraveling the molecular mechanisms underlying development, behavior, aging, and many other processes common to metazoans for which Drosophila is such an excellent model