Genome evolution: a sequence-centric approach Lecture 7: Brief evolutionary history of everything.

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

Introduction to genomes & genome browsers
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
The Nucleus Nuclear Organization Nuclear Envelope and Molecular Trafficking Nucleolus and rRNA Processing The nucleus is one of the main features that.
Eukaryotes and Prokaryotes Key Differences in Protein Synthesis.
Prof. Drs. Sutarno, MSc., PhD.. Biology is Study of Life Molecular Biology  Studying life at a molecular level Molecular Biology  modern Biology The.
Functional Non-Coding DNA Part I Non-coding genes and non-coding elements of coding genes BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
 2.e.1 – Timing and coordination of specific events are necessary for the normal development of an organism, and these events are regulated by a variety.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Stability of the Genome Duplication, Deletion, Transposition.
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
ECE 501 Introduction to BME
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Computational biology seminar
Gene Expression.
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 5: Inference through sampling. Basic phylogenetics.
Intro to Comp Genomics Lecture 3: Genomic features and patterns.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
RNA Molecules and RNA Processing Functions and Modifications of RNA Molecules.
RNA.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
NcRNAs What Genomes are Telling Us ncrna.ppt. ncRNA genes are difficult to discover! small an annotational and statistical concern no ORFs and no polyadenylation.
Chapter 19: Eukaryotic Genomes Most gene expression regulated through transcription/chromatin structure Most gene expression regulated through transcription/chromatin.
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
Selfish DNA Honors Genetics.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
Eukaryotic Genomes Demonstrate Sequence Organization Characterized by Repetitive DNA Honors Genetics Lemon Bay High School
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Fig Genome = Genic + Intergenic (or non-genic) Eukaryotic genomes: composition of human genome.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
More regulating gene expression. Combinations of 3 nucleotides code for each 1 amino acid in a protein. We looked at the mechanisms of gene expression,
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
Chapter 21 Eukaryotic Genome Sequences
How Genes Work Ch. 12.
Fig.1.8 DNA STRUCTURE 5’ 3’ Antiparallel DNA strands Hydrogen bonds between bases DOUBLE HELIX 5’ 3’
PROTEIN SYNTHESIS. Protein Synthesis: overview  DNA is the code that controls everything in your body In order for DNA to work the code that it contains.
Genetics 3: Transcription: Making RNA from DNA. Comparing DNA and RNA DNA nitrogenous bases: A, T, G, C RNA nitrogenous bases: A, U, G, C DNA: Deoxyribose.
Eukaryotic Genomes 15 November, 2002 Text Chapter 19.
REVIEW. Protein Synthesis AT-A-GLANCE Translation.
A Biology Primer Part III: Transcription, Translation, and Regulation Vasileios Hatzivassiloglou University of Texas at Dallas.
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
Control of Eukaryotic Genome
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Eukaryotic Gene Expression
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Genes in ActionSection 3 Section 3: Genome Interactions Preview Bellringer Key Ideas Genomes and the Diversity of Life Moving Beyond Chromosomes Multicellular.
Aim: How is DNA organized in a eukaryotic cell?. Why is the control of gene expression more complex in eukaryotes than prokaryotes ? Eukaryotes have:
Mestrado Integrado em Medicina Biologia Celular e Molecular II
 DNA- genetic material of eukaryotes.  Are highly variable in size and complexity.  About 3.3 billion bp in humans.  Complexity- due to non coding.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
Fig Prokaryotes and Eukaryotes
Protein Synthesis Part 3
SGN23 The Organization of the Human Genome
Protein Synthesis Part 3
Protein Synthesis Part 3
Evolution of eukaryote genomes
Organization of the human genome
Gene Density and Noncoding DNA
mRNA Degradation and Translation Control
Chapter 6: Transcription and RNA Processing in Eukaryotes
siRNA / microRNA epigenetics stem cells
Presentation transcript:

Genome evolution: a sequence-centric approach Lecture 7: Brief evolutionary history of everything

Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs DP Sampling Variational apx. LBP EM Generalized EM (optimize free energy)

Genome Structure, Genome Information Genome structure Genomic information Selection Mutation

Diversity: Brief description of the tree of life Genome structure: Size, Key features, Mobile elements Genome information: Proteins/RNA genes, regulatory elements Today: A lot of terminology, basic overview

RNA Based Genomes Ribosome Proteins Genetic Code DNA Based Genomes Membranes Diversity! ? ? 3.4 – 3.8 BYA – fossils?? 3.2 BYA – good fossils 3 BYA – metanogenesis 2.8 BYA – photosynthesis BYA – eukaryotes BYA – camberian explosion 0.44 BYA – jawed vertebrates 0.4 – land plants 0.14 – flowering plants mammals

Curated set of universal proteins Eliminating Lateral transfer Multiple alignment and removal of bad domains Maximum likelihood inference, with 4 classes of rate and a fixed matrix Bootstrap Validation Ciccarelli et al 2005

PROKARYOTESEUKARYOTES (Also present in the Planktomycetes)Presence of a nuclear membrane (also in b-protebacteria)Organelles derived from endosymbionts Tubulin-related protein, no microtubulesCytoskeleton and vesicle transport -Trans-splicing Rare – almost never in codingIntrons in protein coding genes, spliceosome Short UTRsExpansion of untranslated regions of transcripts Ribosome binds directly to a Shine-Delgrano sequence Translation initiation by scanning for start Nonsense mediated decay pathway is absentmRNA surveillance Single linear chromosomes in a few eubacteriaMultiple linear chromosomes, telomeres AbsentMitosis, Meiosis -Gene number expansion Some exceptions, but cells are smallExpansion of cell size

Biknots Uniknots Eukaryotes

Uniknots – one flagela at some developmental stage Fungi Animals Animal parasites Amoebas Biknots – ancestrally two flagellas Green plants Red algea Ciliates, plasmoudium Brown algea More amobea Strange biology! A big bang phylogeny: speciations across a short time span? Ambiguity – and not much hope for really resolving it

Vertebrates Sequenced Genomes phylogeny Fossil based, large scale phylogeny

Marmoset Macaque Orangutan Chimp Human Baboon Gibbon Gorilla 0.5% 0.8% 1.5% 3% 9% 1.2% Primates

Flies

Yeasts

Genome Size

Why larger genomes? Selflish DNA – –larger genomes are a result of the proliferation of selfish DNA –Proliferation stops only when it is becoming too deleterious Bulk DNA –Genome content is a consequence of natural selection –Larger genome is needed to allow larger cell size, larger nuclear membrane etc.

Why smaller genomes? Metabolic cost: maybe cells lose excess DNA for energetic efficiency –But DNA is only 2-5% of the dry mass –No genome size – replication time correlation in prokaryotes –Replication is much faster than transcription (10-20 times in E. coli)

Mutational balance Balance between deletions and insertions –May be different between species –Different balances may have been evolved In flies, yeast laboratory evolution –4-fold more 4kb spontaneous insertions In mammals –More small deletions than insertions Mutational hazard No loss of function for inert DNA –But is it truly not functional? Gain of function mutations are still possible: –Transcription –Regulation Differences in population size may make DNA purging more effective for prokaryotes, small eukaryotes Differences in regulatory sophistication may make DNA mutational hazard less of a problem for metazoan Can we model genome size evolution in a quantitative way?

Genome Structural features: centromeres/telomeres Rat – Partly acrocentric Human Centromeres are essential and universally important for proper cell division, but are highly diverging among species Sattelites and repeats Pericentromeric regions – more repeats Telomeres are critical for genome maintenance Sub telomeric regions – also repetitive May be key to nuclear structure?

Genome Structural features: nuclear organization The nucleus must be organized to allow functional transcription and replication Incredibly dense mesh of chromosomes, cytoskeleton, membranes Transcription factories / chromosomal territories “spacer DNA” may affect physical organization in unexpected ways Inter- and Intra- chromosomal interactions Entire genome may participate in regulating interactions

Genomic information: Protein coding genes

Modeling protein coding genes Modeling protein structure/function Structure is complex Dependencies are not confined by gene linear coding

Genomic information: the gene repertoire is evolving by duplication and loss

Genome information: Introns/Exons

Genome information: RNA genes mRNA – messenger RNA. Mature gene transcripts after introns have been processed out of the mRNA precursor miRNA – micro-RNA bp in length, processed from transcribed “hair-pin” precursors RNAs. Regulate gene expression by binding nearly perfect matches in the 3’ UTR of transcripts siRNA – small interfering RNAs bp in length, processed from double stranded RNA by the RNAi machinary. Used for posttranscriptional silencing rRNA – ribosomal RNA, part of the ribosome machine (with proteins) snRNA – small nuclear RNAs. Heterogeneous set with function confined to the nucleus. Including RNAs involved in the Splicesome machinery. snoRNA – small nucleolar RNA. Involved in the chemical modifications made in the construction of ribosomes. Often encode within the introns of ribosomal proteins genes tRNA – transfer RNA. Delivering amino-acid to the ribosome. piRNA - ???

miRNA clusters

snRNA works by binding other RNAs RNA structure affects function

Computational perspective: finding and understanding RNAs and their evolution

Ultra-high throughput sequencing is transforming all aspects of biology

Genome information: regulatory elements Computational perspective: finding and understanding TFBSs Specialized proteins can bind DNA in a sequence specific fashion Genomes can therefore control the level of affinity of each region to a large set of DNA binding proteins DNA binding sites are typically short (<20bp) Multiple binding sites at different affinities participate in regulation

The regulatory process is likely to less deterministic and discrete the this beautiful idealized sea urchin regulatory network Each regulatory interaction is parameterized and many additional weak interaction participate in the Process Evolution of regulatory regions involve more than a small set of discrete 20bp sites

Chromatin Immunoprecipitation is mapping DNA binding sites

Structure meets information: packaging and chromosomal interactions are critical for proper genome function

Structure meets information: HOX clusters as an example Hox genes are important developmental regulators Present in linear clusters, preserving order Their expression is frequently coordinate with the gene order 4 HOX clusters are present in the human genome Additional gene clusters: Protocadherins, Olfactory receptors, MAGE genes, Zinc fingers Additional smaller groups of related regulators are co-located

Mapping chromosomal interactions: 4C

Repeats: selfish DNA Genome FractionCopiesClass 20.4%868,000 (only ~100 active!!) LINEs 13.1%1,558,000 (70% Alu) SINEs 8.3%443,000LTR elements 2.8%294,000Transposons Repetitive elements in the human genome

Retrotransposition via RNA

Repeats: short tandems, satellites DNA-based transposons do not involve an RNA intermediate, and are quite rare. Satellite DNA duplicate by Replication slippages which is enhanced for specific sequences. Abundant near telomeres and centromeres. Some of these are still a mystery. Retrotransposition is generally sloppy and noisy – so elements die out quickly Element proliferation appears in evolutionary bursts.

Pseudogenes Genes that are becoming inactive due to mutations are called pseudogenes mRNAs that jump back into the genome are called processed pseudogenes (they therefore lack introns)

Summary – History/Phylogeny: –Early phylogenetics can be inferred using genome sequences, but conclusions are not always reliable –Maximum likelihood models sometime depends on the gene/genomic region analyzed, genome is highly heterogeneous at all levels. –The major clades, phylogeny of model organisms and sequenced genomes Genome structure –Size and its consequences –Packaging and nuclear organization –Mutational effects and differences –Selfish DNA Genome information –Protein coding genes –RNA genes –Transcription factor binding sites –Chromosomal organization and DNA codes that affect it