1 Genome Composition Dan Graur 2 Genome Composition in Bacteria.

Slides:



Advertisements
Similar presentations
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao Chavan Maharashtra Open University, Nashik.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Genomics, Genetics and Biochemistry
An Introduction to Bioinformatics Finding genes in prokaryotes.
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
RNA and Protein Synthesis
Microbial Genetics As we have looked at, information for all cellular function is from DNA, mRNA carries that info to ribosomes, rRNA codes for proteins.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Microbial Genetics. Terminology Genetics Genetics Study of what genes are Study of what genes are how they carry information how they carry information.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
How do Replication and Transcription Change Genomes? Andrey Grigoriev Director, Center for Computational and Integrative Biology Rutgers University.
Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Central Dogma Information storage in molecules DNA RNA Protein transcription translation replication.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Lecture 12 Splicing and gene prediction in eukaryotes
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
1 Patterns of Substitution and Replacement. 2 3.
Amino acids are the building blocks of what macromolecule?
An Introduction to Bioinformatics
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
NATURE vs. NURTURE.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
A markovian approach for the analysis of the gene structure C. MelodeLima 1, L. Guéguen 1, C. Gautier 1 and D. Piau 2 1 Biométrie et Biologie Evolutive.
DNA Structure & Function Chapter 13. DNA Structure & Function 2 Mr. Karns Genetic Material  Transformation DNA Structure  Watson and Crick DNA Replication.
BSC Developmental Biology Patterns of Inheritance EvolutionEcology.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
GENOME: an organism’s complete set of genetic material Humans ~3 billion base pairs CHROMOSOME: Part of the genome; structure that holds tightly wound.
A Theoretical Approach for the Genetic Code Paul SORBA Seminar dedicated to my always young friend Branko Belgrad, Sept.2015.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Nucleic Acids. Bio-molecules are compounds composed of repeating units of their building blocks i.e. monomers. There are four major classes of bio- molecules.
NEW TOPIC: MOLECULAR EVOLUTION.
1 Codon Usage. 2 Discovering the codon bias 3 In the year 1980 Four researchers from Lyon analyzed ALL published mRNA sequences of more than about 50.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Take a Journey into DNA on PBS NOVAJourney into DNA INTRODUCTION TO DNA DNA = Deoxyribonucleic acid.
© © Miscellaneous Question Discussion series-I Topic:- Molecular Biology.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
1. 2 Discovering the codon bias 3 Il codice genetico è DEGENERATO.
The genome of prokaryotes and eukaryotes- nuclear and extranuclear genetic organization.
Discovering the codon bias
13/11/
Bioinformatics Overview
A Quest for Genes What’s a gene? gene (jēn) n.
Recitation 7 2/4/09 PSSMs+Gene finding
Chapter 9 Organization of the Human Genome
How to Use This Presentation
Genetic Influences on Behavior
The Complete Genome Sequence of Escherichia coli K-12
Biology, 9th ed,Sylvia Mader
Genetic Problems.
Volume 26, Issue 5, Pages (March 2016)
Chapter 17 From Gene to Protein.
Section 20.4 Mutations and Genetic Variation
Patterns of amino acid usage and its GC-content of synonymous codons in 65 nuclear genomes in this study. Patterns of amino acid usage and its GC-content.
Presentation transcript:

1 Genome Composition Dan Graur

2 Genome Composition in Bacteria

Carsonella ruddii has a very low GC content.

4

5 The selectionist explanation views GC content as an adaptation. G:C pairs are more stable than A:T pairs. Preferential usage of amino acids encoded by GC-rich codons (e.g., ala and arg) and avoidance of amino acids encoded by GC- poor codons (e.g., ser and lys). T-T dimers are sensitive to UV radiation. Noempiricalevidence

6 The mutationist explanation Rate of substitution G/C  T/A is  Rate of substitution T/A  G/C is Noboru Sueoka University of Colorado

7

8

9 Mycoplasma capricolum Escherichia coli Micrococcus luteus

10

11 Differences in the way the leading and lagging strands of DNA are replicated can result in strand-dependent mutation patterns. The expectation under no-strand-bias conditions is f A = f T and f C = f G

12 Deviations from equal mutation rates between the two strands are quantified by the skew.

13 The skew is a measure of inequality between the frequencies of nucleotides X and Y on a strand.

14 If there are no violations of the no-strand-bias conditions:

15 Skew values are calculated for sliding windows of predetermined lengths, and are plotted on a skew diagram.

16 Bacillus subtilischirochorechirochore

17

18 Chlamidia trachomatis

19 Compositional Properties of Eukaryotic Genomes

20 GC content of bacterial genomes ranges from ~24% to ~74% Intergenomic variability GC content of vertebrate genomes ranges from ~40% to ~45%

21 TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTT AACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGCTAATCTCAGCGCTCCGCTG ACCCCTCAGCAAAGGGCTTGGCTCAATCTCGTCCAGCCATTGACCATCGTCGAGGGGTTT GCTCTGTTATCCGTGCCGAGCAGCTTTGTCCAAAACGAAATCGAGCGCCATCTGCGGGCC CCGATTACCGACGCTCTCAGCCGCCGACTCGGACATCAGATCCAACTCGGGGTCCGCATC GCTCCGCCGGCGACCGACGAAGCCGACGACACTACCGTGCCGCCTTCCGAGAGATTGATG ACAGCGCTGCGGCACGGGGCGATAACCAGCACAGTTGGCCAAGTTACTTCACCGAGCGCC CGCACAATACCGATTCCGCTACCGCTGGCGTAACCAGCCTTAACCGTCGCTACACCTTTG ATACGTTCGTTATCGGCGCCTCCAACCGGTTCGCGCACGCCGCCGCCTTGGCGATCGCAG AAGCACCCGCCCGCGCTTACAACCCCCTGTTCATCTGGGGCGAGTCCGGTCTCGGCAAGA CACACCTGCTACACGCGGCAGGCAACTATGCCCAACGGTTGTTCCCGGGAATGCGGGTCA AATATGTCTCCACCGAGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCA AGGTCGCATTCAAACGCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAAT TCATTGAAGGCAAAGAGGGTATTCAAGAGGAGTTCTTCCACACCTTCAACACCTTGCACA ATGCCAACAAGCAAATCGTCATCTCATCTGACCGCCCACCCAAGCAGCTCGCCACCCTCG AGGACCGGCTGAGAACCCGCTTTGAGTGGGGGCTGATCACTGACGTACAACCACCCGAGC TGGAGACCCGCATCGCCATCTTGCGCAAGAAAGCACAGATGGAACGGCTCGCGGTCCCCG ACGATGTCCTCGAACTCATCGCCAGCAGTATCGAACGCAATATCCGTGAACTCGAGGCCG AGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCAAGGTCGCATTCAAAC GCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAATTCATTGAAGGCAAAG Interspecific variation among vertebrate genomes is low. However, vertebrates seem to have a much more complex intragenomic compositional organization (internal structure) than prokaryotic genomes.

22 TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTT AACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGCTAATCTCAGCGCTCCGCTG ACCCCTCAGCAAAGGGCTTGGCTCAATCTCGTCCAGCCATTGACCATCGTCGAGGGGTTT GCTCTGTTATCCGTGCCGAGCAGCTTTGTCCAAAACGAAATCGAGCGCCATCTGCGGGCC CCGATTACCGACGCTCTCAGCCGCCGACTCGGACATCAGATCCAACTCGGGGTCCGCATC GCTCCGCCGGCGACCGACGAAGCCGACGACACTACCGTGCCGCCTTCCGAGAGATTGATG ACAGCGCTGCGGCACGGGGCGATAACCAGCACAGTTGGCCAAGTTACTTCACCGAGCGCC CGCACAATACCGATTCCGCTACCGCTGGCGTAACCAGCCTTAACCGTCGCTACACCTTTG ATACGTTCGTTATCGGCGCCTCCAACCGGTTCGCGCACGCCGCCGCCTTGGCGATCGCAG AAGCACCCGCCCGCGCTTACAACCCCCTGTTCATCTGGGGCGAGTCCGGTCTCGGCAAGA CACACCTGCTACACGCGGCAGGCAACTATGCCCAACGGTTGTTCCCGGGAATGCGGGTCA AATATGTCTCCACCGAGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCA AGGTCGCATTCAAACGCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAAT TCATTGAAGGCAAAGAGGGTATTCAAGAGGAGTTCTTCCACACCTTCAACACCTTGCACA ATGCCAACAAGCAAATCGTCATCTCATCTGACCGCCCACCCAAGCAGCTCGCCACCCTCG AGGACCGGCTGAGAACCCGCTTTGAGTGGGGGCTGATCACTGACGTACAACCACCCGAGC TGGAGACCCGCATCGCCATCTTGCGCAAGAAAGCACAGATGGAACGGCTCGCGGTCCCCG ACGATGTCCTCGAACTCATCGCCAGCAGTATCGAACGCAATATCCGTGAACTCGAGGCCG AGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCAAGGTCGCATTCAAAC GCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAATTCATTGAAGGCAAAG How are nucleotides distributed along the genome? Uniform? Patchy? Clines?

23 “When vertebrate genomic DNA is randomly sheared into fragments kb in size and the fragments are separated by base composition, the fragments cluster into a small number of classes distinguished from each other by their GC content. Each class is characterized by bands of similar, but not identical, base compositions.” (Macaya et al. 1976; Thiery et al. 1976; Bernardi et al. 1985) Equilibrium centrifugation in Cs 2 SO 4 density gradient

24 carp

25 The Isochore Theory - Giorgio Bernardi carp

26

27 Isochores do not merit the prefix “iso.” Lander et al. (2001)

28 Post genomic era (2001) Objections against the isochore theory: “We can rule out a strict notion of isochores as compositionally homogeneous.” Lander et al. (2001) “There are no isochores in chromosomes 21 and 22.” Häring and Kyper (2001) Defense of the isochore theory: “The conclusion of the authors that ‘isochores’ are not ‘strict isochores’ is correct, however isochore are fairly homogeneous regions.” Bernardi (2001)

29

30 In search of isochores… Questions:  Do isochores exist?  Is the isochore theory a useful (or practical) concept?

31 Segmentation Models Assumption: Sequences can be partitioned into a number of segments each with a characteristic GC content. Each segment has a certain degree of internal homogeneity (or similarity).

32 In search of isochores… Methodology:  Define rigorously 6 attributes of isochores and of the isochore theory as applied to humans  Test attributes against the human genomedata

33 Attributes of isochores A1. Distinguishability: An isochore is a DNA segment that has a characteristic GC content that differs significantly from the GC content of adjacent isochores. A2. Homogeneity: An isochore is more homogeneous in its composition than the chromosome on which it resides. A3. Minimum length: The length of an isochore exceeds a certain cutoff value. In the literature, the most commonly mentioned value is 300 Kb.

34 Attributes of the isochore theory in humans A4. Genome coverage: The overwhelming majority of the human genome consists of segments abiding by A1- A3. Non-isochoric DNA takes up only a small fraction of the genome.

35 A5. Isochore families: The human genome comprises of five isochore families, each described by a particular Gaussian distribution of GC content. Attributes of the isochore theory in humans

36 A6. Isochore assignment into families: It is possible to classify each isochore into its isochore family based solely on its compositional properties. Practicality of the isochore theory

37 Segment length distribution The fitted regression line (solid line) indicates that the tail of the distribution exhibits power-law decay with an exponent of –2.38. P  L –2.38

38 Power laws everywhere!

39 Isochore families Most parsimonious Gaussian fit to putative isochores

40 Homogeneous “isochores” in vertebrates

41 Assignment into families Classification errors reach values of 70%. Only a minute fraction of segments can be classified with an expected error under 5%.

42 Summary (A1) Distinguishability  (A2) Homogeneity  (A3) Minimum length X (A1) Genome coverage  (A2) Isochore families  families (A3) Isochore assignment into families X

43 Conclusion: The isochore theory may have reached the limits of its usefulness as a description of genomic compositional structures.

44

45

46 As of December genetic codes 11 mitochondrial 5 nuclear 1 nuclear + mitochondrial

47 Lock & Key Hypothesis

48 Frozen accidents Evolutionary Dead Ends

49

50 The codon- capture hypothesis Thomas Jukes

51 AAA = lysine Universal genetic code

52 AAA = asparagine Echinodermata

53 Hemichordata AAA = unassigned

54