Presenting: On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE AKA: why the ENCODE.

Slides:



Advertisements
Similar presentations
Processes of Evolution
Advertisements

Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Authority 2. HW 8: AGAIN HW 8 I wanted to bring up a couple of issues from grading HW 8. Even people who got problem #1 exactly right didn’t think about.
A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Gene Expression Chapter Eleven. What is Gene Expression? When a gene is expressed – that gene’s protein product is made: 1.DNA is transcribed into RNA.
Mutations and Regulation of Gene Expressions. Introduction A change in the sequence of bases in DNA or RNA is called a mutation. Everyone has mutations.
Introduction to Hypothesis Testing
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
HIV/AIDS as a Microcosm for the Study of Evolution.
Introduction to Hypothesis Testing
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genomic Organization at the DNA level! By: Caroline Fowle, Amanda Zink, Ben Whitfield, Farvah Khaja and Danielle Siegert.
ENCODE The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
1 Economics 173 Business Statistics Lectures 3 & 4 Summer, 2001 Professor J. Petry.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Respected Professor Kihyeon Cho
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Control of Gene Expression Eukaryotes. Eukaryotic Gene Expression Some genes are expressed in all cells all the time. These so-called housekeeping genes.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Gene Regulation and Cancer. Gene Regulation At any given time, most of the thousands of genes in a cell are not needed. How do cells “turn on” (express)
Regulation of Gene Expression Chapter 18. Warm Up Explain the difference between a missense and a nonsense mutation. What is a silent mutation? QUIZ TOMORROW:
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Genomes and Genomics.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Genes - Human Genome Project took 13 years and 3 billion dollars. It took less time than anticipated because one gene area can code for more than 1 protein.
Mechanisms of Evolution Concept 2 – Analyzing Descent with Modification: A Darwinian View of Life (Ch 22) Part 2: Evidence for Evolution.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Regulation and Gene Expression
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Human Inheritance & Genetic Mutations
Genomics Chapter 18.
Gene Regulation In 1961, Francois Jacob and Jacques Monod proposed the operon model for the control of gene expression in bacteria. An operon consists.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
DNA Organization.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
EVALUATING EVOLUTIONARY EXPLANATIONS THE SCHOOL NEWSPAPER HAS DECIDED TO INCLUDE A SPECIAL SECTION ON EVOLUTION AND MEDICINE. WE NEED TO HELP THE EDITOR.
Alu Elements PCR Workshop Instruction manuals that come with new gadgets are notoriously frustrating…but at least they do not insert, just when.
Gene structure and function
Biotechnology.
Statistics in Clinical Trials: Key Concepts
Mutations and Regulation of Gene Expressions
Genomes and Their Evolution
Chapter 11 Gene Expression.
Genetics Primer to Evolution
Gene Density and Noncoding DNA
7.2 Transcription & Gene Expression
Chapter 12 Power Analysis.
Presented by, Jeremy Logue.
Presented by, Jeremy Logue.
Presentation transcript:

Presenting: On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE AKA: why the ENCODE project is full of it by Matthew Oberhardt

What is ENCODE? Attempt to find all functional elements of the human genome huge international consortium, 10 years running exome = 1.5% of human DNA How much of the rest of it is garbage, vs. being useful ‘junk’ or fully functional? pilot phase ended 2007 production phase, 2007 – 2012 (with first major results published in 2012), and funded by $80 million in grants over 4 years attempt to answer questions like: why are 88% of disease- associated SNPs in non-coding DNA regions?

What did ENCODE do? mapped: RNA transcribed regions protein coding regions transcription factor binding sites chromatin structure DNA methylation sites performed assays on all of these biological areas in “tier 1,” “tier 2”, and “tier 3” cells – different standard cell types provide 1640 ‘datasets’ designed to annotate functional elements in the human genome

ENCODE datatypes:

Major findings: 80.4% of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type (i.e., are ‘functional’ according to ENCODE) Primate specific elements are in general negatively selected (fig 1) classified chromatin states into groups with different promoter functionalities, and correlated RNA sequence production and processing to these chromatin states (showing that “most” variation in RNA expression can be explained by chromatin states). found (or just repeated known information?) that most disease- related SNPs lie outside of coding regions

But-- There are some problems with encode...

On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE “Unless a genomic functionality is actively protected by selection, it will... cease to be functional. The absurd alternative, which unfortunately was adopted by ENCODE, is to assume that no deleterious mutations can ever occur in the regions they have deemed to be functional. Such an assumption is akin to claiming that a television set left on and unattended will still be in working condition after a million years because no natural events... can affect it.”

On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE But let’s back up...

On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE Major criticisms of ENCODE: (1)using the ‘causal role’ definition of biological function (2)committing the logical fallacy of ‘affirming the consequent’ (3)using analytical estimates that yield biased errors and inflate functionality estimates (4)favoring statistical sensitivity over specificity (5)emphasizing statistical significance rather than the magnitude of an effect

Criticism 1: using the ‘causal role’ definition of biological function Two biological concepts of function: (1)The ‘causal role’ definition - a functional element is a genome segment producing a protein or an RNA or displaying a reproducible biochemical signature (e.g., protein binding) (2) The ‘selected effect’ definition – for a trait, T, to have a biological function F, it must (1) originate as a reproduction’ of some prior trait that performed F (or some similar function) in the past, and (2) T exists because of F. Example: a sequence similar to TATAAA can easily arise by chance, and will certainly bind transcription factors (being similar to the TATA box). It is therefore functional in the ‘causal role’ sense but not in the ‘selected effect’ sense. Similarly, the human heart has the ‘causal role’ of producing sounds, but its selected effect is pumping blood...

Criticism 1: using the ‘causal role’ definition of biological function Bottom line: If a sequence doesn’t show signs of selection, it cannot be functional in the ‘selected effect’ manner, which is the only one that really counts. (this is a very strong statement...)

Criticism 1: using the ‘causal role’ definition of biological function How, then, to detect selection? can have positive selection, purifying selection, or recently evolved species- specific elements. some of these can be subtle & hard to detect. SO – likely that more than 9% of the human genome is functional (what is currently thought) BUT – 80% is too high. Comparative genomics suggests that <15% of the genome is under evolutionary selection Therefore, % of functional elements should be below that... “ENCODE Incongruity”, that a biological function can be maintained without selection. AND – just because it’s hard to detect selection, you shouldn’t discard it.

Why single out transcription as a function? You could also say ‘acted on by DNA polymerase’ is a function, in which case 100% of the genome is functional! Criticism 1: using the ‘causal role’ definition of biological function

ENCODE also uses this wrong definition of functionality wrongly...

Criticism 2: committing the logical fallacy of ‘affirming the consequent’ The Fallacy: 1.if P then Q. 2.Q. 3.Therefore, P. Example: A random sequence binds a transcription factor; this does not necessarily result in transcription. However, the ‘binding’ property would be enough for ENCODE. In ENCODE, a DNA segment is ascribed ‘functionality’ if it is: (1)transcribed (2)associated with a modified histone (3)located in an open chromatin area (4)binds a transcription factor (5)contains a methylated CpG dinucleotide All of these are examples of affirming the consequent...

In ENCODE, a DNA segment is ascribed ‘functionality’ if it is: (1)transcribed (2)associated with a modified histone (3)located in an open chromatin area (4)binds a transcription factor (5)contains a methylated CpG dinucleotide All of these are examples of affirming the consequent... And continuing on this theme: Criticism 3: using analytical estimates that yield biased errors and inflate functionality estimates

According to ENCODE, all of the below are (wrongly) considered functional: (1) 74.7% of genome that is transcribed – ALL OF WHICH IS CONSIDERED FUNCTIONAL also, ENCODE used stem cells and cancer cells, both very transcriptionally active... what about pseudogenes, introns, and mobile elements (non-functional)?? Also, mapped RNA transcripts to DNA using a tool with 10% rejection rate (2) 56.1% that is associated with modified histones A recent study showed 2% of histone modifications to affect function ENCODE assigned functions to all histone modifications it analyzed (3) 15.2% that is found in open chromatin areas ENCODE claims most open chromatin regions are functional transcription start sites In fact, only 30% of open regions are even in the neighborhood of start sites (4) 8.5% that binds transcription factors transcription sites are short, so many can occur by chance better estimate is 0.28%, taking into account selection Mean lengths of ENCODE ‘transcription factor binding sites’ are 824, 457, and 535 nucleotides, while most binding sitest are 6 – 14 bp!!!!! (5) 4.6% that is methylated CpG dinucleotides ENCODE claims that 96% of CpG sites are methylated – not a sign of function, but merely that all CpG sites can be methylated!

Evidence for purifying selection in ENCODE And the errors...: instead of using all SNPs, ENCODE used only the 1.3 million primate-specific ones of >=200bp ***By doing this, they removed everything that is of interest functionally!!! then, more processing left 82% of segments smaller than 100bp, with a median of 15bp, so: inferences in part using ~85,000 alignment blocks of 1bp and ~76,000 of 2bp... other problems with the controls... (they were longer, etc.) but in the end, the ENCODE-containing samples had a frequency 0.20% lower than control (hence negative selection!!). the pval was strong because there were so many datapoints (4e-37). IS THIS BIOLOGICALLY MEANINGFUL??? (stat test also probably didn’t take into account dependence of variables, and there are other possible causes of the 0.20% laid out)

Evidence for purifying selection in ENCODE (CODING) allele frequency for primate-specific elements. this is the evidence for negative selection allele frequency for primate-specific elements. this is the evidence for negative selection derived allele frequency

Criticism 4: favoring statistical sensitivity over specificity (Just covered as well...)

Criticism 5: emphasizing statistical significance rather than the magnitude of an effect

Junk DNA ENCODE would have us think that “Junk DNA is Dead” A few distinctions: (1)Having a potential future function does NOT mean that a DNA segment is functional (hence ‘junk’, not ‘garbage’) (2)evolution will drive towards a mostly functional genome only if genome size is a significant negative selector & if the population size is huge – in humans neither are true (in bacteria they are), hence we expect a lot of junk.

Big vs. Small science What is the function of ‘big science’? --to generate massive amounts of reliable & easily accessible data BUT – wisdom is best gained from small science...

Take Home messages selection is a *must* in ascribing a function to a gene. (is this strictly true?) don’t affirm the consequent don’t believe everything you read, even in prestigious journals...

resistance is growing, as are multiply resistant strains reverse-incentive for drug companies to produce antibiotics, esp. narrow spectrum ones drugs today are very safe – high hurdle! penicillin wouldn’t have passed current standards! current Ab’s are off-patent & thus cheap, so doctors don’t want to use expensive new Ab’s infections present with vague symptoms usually... broad spectrum Ab’s are the best bet. Ab’s actually cure disease after a short run – not so good for $$ closing pipelines mean the intellectual base is scattering –we can’t just turn on the tap again!!