Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences.

Slides:



Advertisements
Similar presentations
Chapter 10 How proteins are made.
Advertisements

Genomics – The Language of DNA Honors Genetics 2006.
Introduction to genomes & genome browsers
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Genomics, Genetics and Biochemistry
Functional Non-Coding DNA Part I Non-coding genes and non-coding elements of coding genes BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
12/29/102 Functional segments of DNA Code for specific proteins Determined by amino acid sequence One gene-one protein hypothesis (not always true)
Chapter 4 Transcription and Translation. The Central Dogma.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
ECE 501 Introduction to BME
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
[Bejerano Fall10/11] 1 Any Project reflections?
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
Genome evolution: a sequence-centric approach Lecture 7: Brief evolutionary history of everything.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
RNA Molecules and RNA Processing Functions and Modifications of RNA Molecules.
Chapter 15 Noncoding RNAs. You Must Know The role of noncoding RNAs in control of cellular functions.
[Bejerano Fall10/11] 1.
The Human Genome Project Public: International Human Genome Sequencing Consortium (aka HUGO) Private: Celera Genomics, Inc. (aka TIGR)
RNA.
 Assemble the DNA  Follow base pair rules  Blue—Guanine  Red—Cytosine  Purple—Thymine  Green--Adenine.
Chapter 13 - Transcription
Gene Structure and Identification
Chapter 2 Genes Encode RNAs and Polypeptides
Ch. 10 Notes DNA: Transcription and Translation
Introns and Exons DNA is interrupted by short sequences that are not in the final mRNA Called introns Exons = RNA kept in the final sequence.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Chapter 14 – RNA molecules and RNA processing
Marco Magistri , Journal Club. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein “Structural genes encode proteins.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Chapter 21 Eukaryotic Genome Sequences
PROTEIN SYNTHESIS. Protein Synthesis: overview  DNA is the code that controls everything in your body In order for DNA to work the code that it contains.
Genetics 3: Transcription: Making RNA from DNA. Comparing DNA and RNA DNA nitrogenous bases: A, T, G, C RNA nitrogenous bases: A, U, G, C DNA: Deoxyribose.
From DNA to Proteins. Same two steps produce all proteins: 1) DNA is transcribed to form RNA –Occurs in the nucleus –RNA moves into cytoplasm 2) RNA is.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Chapter 12 DNA, RNA, Gene function, Gene regulation, and Biotechnology.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Eukaryotic Gene Expression
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Lecture 4: Transcription in Prokaryotes Chapter 6.
From DNA to Proteins Chapter 13. Same two steps produce all proteins: 1) DNA is transcribed to form RNA –Occurs in the nucleus –RNA moves into cytoplasm.
From DNA to Proteins Chapter 13. Central Dogma DNA RNA Protein.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
MiRNAs and siRNAs 5 th March 2013 Saeideh Jafarinejad 4/22/2013 Rhumatology Research Center Lab(RRC lab)
Mestrado Integrado em Medicina Biologia Celular e Molecular II
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
MCB 7200: Molecular Biology
Fig Prokaryotes and Eukaryotes
Protein Synthesis Part 3
Protein Synthesis Part 3
RNA Molecules and RNA Processing
Evolution of eukaryote genomes
MICROBIAL GENETICS CHAPTER 7.
Protein Synthesis Part 3
Evolution of eukaryote genomes
UNIT 5 Protein Synthesis.
From DNA to Proteins Chapter 13.
Gene Density and Noncoding DNA
mRNA Degradation and Translation Control
Noncoding RNA roles in Gene Expression
The gene: structure, function and location
The Content of the Genome
Presentation transcript:

Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences

Genome Evolution. Amos Tanay 2009 Why larger genomes? Ameobe dubia – 670Gb! S. cerevisae is 0.3% of human, D. melanogaster is 3% Selflish DNA – –larger genomes are a result of the proliferation of selfish DNA –Proliferation stops only when it is becoming too deleterious Bulk DNA –Genome content is a consequence of natural selection –Larger genome is needed to allow larger cell size, larger nuclear membrane etc.

Genome Evolution. Amos Tanay 2009 Why smaller genomes? Metabolic cost: maybe cells lose excess DNA for energetic efficiency –But DNA is only 2-5% of the dry mass –No genome size – replication time correlation in prokaryotes –Replication is much faster than transcription (10-20 times in E. coli)

Genome Evolution. Amos Tanay 2009 Mutational balance Balance between deletions and insertions –May be different between species –Different balances may have been evolved In flies, yeast laboratory evolution –4-fold more 4kb spontaneous insertions In mammals –More small deletions than insertions Mutational hazard No loss of function for inert DNA –But is it truly not functional? Gain of function mutations are still possible: –Transcription –Regulation Differences in population size may make DNA purging more effective for prokaryotes, small eukaryotes Differences in regulatory sophistication may make DNA mutational hazard less of a problem for metazoan

Genome Evolution. Amos Tanay 2009 Repeats: selfish DNA

Genome Evolution. Amos Tanay 2009 Retrotransposition via RNA Genome Fraction CopiesClass 20.4%868,000 (only ~100 active!!) LINEs 13.1%1,558,000 (70% Alu) SINEs 8.3%443,000LTR elements 2.8%294,000Transposons Repetitive elements in the human genome

Genome Evolution. Amos Tanay 2009 Burst of repeats activity Han et al. 2005

Genome Evolution. Amos Tanay 2009 Age of repeats in the human genome

Genome Evolution. Amos Tanay 2009 DNA and gene distribution in the isochore families of the human genome Bernardi G. PNAS 2007;104: These trends are quite clear. But the existence of distinct isochore classes can be questioned

Genome Evolution. Amos Tanay 2009 Bernardi G. PNAS 2007;104: The selection hypotheses on the origin of G+C content heterogeneity

Genome Evolution. Amos Tanay 2009 Genomic information: Protein coding genes

Genome Evolution. Amos Tanay 2009 Genome information: RNA genes mRNA – messenger RNA. Mature gene transcripts after introns have been processed out of the mRNA precursor miRNA – micro-RNA bp in length, processed from transcribed “hair-pin” precursors RNAs. Regulate gene expression by binding nearly perfect matches in the 3’ UTR of transcripts siRNA – small interfering RNAs bp in length, processed from double stranded RNA by the RNAi machinary. Used for posttranscriptional silencing rRNA – ribosomal RNA, part of the ribosome machine (with proteins) snRNA – small nuclear RNAs. Heterogeneous set with function confined to the nucleus. Including RNAs involved in the Splicesome machinery. snoRNA – small nucleolar RNA. Involved in the chemical modifications made in the construction of ribosomes. Often encode within the introns of ribosomal proteins genes tRNA – transfer RNA. Delivering amino-acid to the ribosome. piRNA – silencing repeats in the germline

Genome Evolution. Amos Tanay 2009 Gene content in the genome M. Lynch

Genome Evolution. Amos Tanay 2009 Genome information: Introns/Exons

Genome Evolution. Amos Tanay 2009 Pseudogenes Genes that are becoming inactive due to mutations are called pseudogenes mRNAs that jump back into the genome are called processed pseudogenes (they therefore lack introns) M. Lynch

Genome Evolution. Amos Tanay 2009 Adaptive evolution of non-coding DNA in Drosophila (P. Andolfatto, 2005) 12 D. melanogaster collected in Zimbabwe 188 regions of ~800bp, surveyed for polymorphisms compared to sequences of D. simulans to measure divergence Classified loci according to genomic context

Genome Evolution. Amos Tanay 2009 Estimating  Theorem: Let u be the mutation rate for a locus under consideration, and set  =4Nu. Under the infinite sites model, the expected number of segregating sites is: The Waterston estimator for theta is: Definition: Let  ij count the number of differences between two sequences. The average number of pairwise difference in a sample of n individuals is: Theorem: as always,  =4Nu. We have:

Genome Evolution. Amos Tanay 2009 Tajima’s D Theorem: as always,  =4Nu. We have: Proof: Going backwards. Coalescent is occuring before mutation in a rate of: After one mutation occurred, we again have the same rate so overall: The expected value of this geometric series is  and so is the average of all pairs. Definition: Tajima’s D is the difference between two estimators of  :

Genome Evolution. Amos Tanay 2009 Tajima’s D for classes of drosophila sequence Definition: Tajima’s D is the difference between two estimators of  : High D values: allele multiplicities are spread more evenly than expected – (why?) Low D values: More rare alleles are present (Why?)

Genome Evolution. Amos Tanay 2009 Adaptive evolution of non-coding DNA in Drosophila (P. Andolfatto) The proportion of divergence driven by positive selection:  = 1–(D S P X /D X P S )

Genome Evolution. Amos Tanay 2009 Phastcons (A. Siepel) Siepel A. et.al. Genome Res. 2005;15: Each model is context-less Transition parameters are kept fixed – this determine the fraction of conserved sequence Inference on the phyloHMM -> inferred conserved model posteriors Use threshold to detect contiguous regions of high conservation posterior Learning the branch lengths

Genome Evolution. Amos Tanay 2009 Siepel A. et.al. Genome Res. 2005;15: Phastcons parameters

Genome Evolution. Amos Tanay 2009 Fixation probabilities and population size: what selection coefficient can drive a 70% decrease in substitution rate (if N_e = 10,000)?

Genome Evolution. Amos Tanay 2009 ENCODE

Genome Evolution. Amos Tanay segment longer than 200bp that are absolutely conserved between human, mouse and rat (Bejerano et al 2005) What are these elements doing? Why they are completely conserved? 4 Knockouts are not revealing significant phenotypes.. Ahituv et al. PloS Biolg 2007 Ultra-conserved elements

Genome Evolution. Amos Tanay 2009 Katzman et al., Science 2007 Population genetics do suggest ultraconserved elements are under selection Separating mutational effects from selective effect is still a challenge… Ultra-conserved elements