Presentation is loading. Please wait.

Presentation is loading. Please wait.

A high-resolution map of human

Similar presentations


Presentation on theme: "A high-resolution map of human"— Presentation transcript:

1 A high-resolution map of human
evolutionary constraint using 29 mammals Kerstin Lindblad-Toh, Manuel Garber, Or Zuk, Michael F. Lin, Brian J. Parker, Stefan Washietl, Pouya Kheradpour, Jason Ernst, Gregory Jordan, Evan Mauceli, Lucas D. Ward, Craig B. Lowe, Alisha K. Holloway, Michele Clamp, Sante Gnerre, Jessica Alfo’ldi, Kathryn Beal, Jean Chang, Hiram Clawson, James Cuff, Federica Di Palma, Stephen Fitzgerald, Paul Flicek, Mitchell Guttman, Melissa J. Hubisz, David B. Jaffe, Irwin Jungreis, W. James Kent, Dennis Kostka, Marcia Lara, Andre L. Martins, Tim Massingham, Ida Moltke, Brian J. Raney, Matthew D. Rasmussen, Jim Robinson, Alexander Stark, Albert J. Vilella, Jiayu Wen, Xiaohui Xie, Michael C. Zody, Broad Institute Sequencing Platform and Whole Genome Assembly Team{, Kim C. Worley, Christie L. Kovar, Donna M. Muzny, Richard A. Gibbs, Baylor College of Medicine Human Genome Sequencing Center Sequencing Team, Wesley C. Warren, Elaine R. Mardis, George M. Weinstock,, Richard K. Wilson, Genome Institute at Washington University, Ewan Birney, Elliott H. Margulies, Javier Herrero, Eric D. Green, David Haussler,, Adam Siepel, Nick Goldman, Katherine S. Pollard, Jakob S. Pedersen,, Eric S. Lander & Manolis Kellis Discover and interpret all functional elements within it (for studies in human bio, health and disease) functional elements: exons, introns and intergenic regions (protein-coding, RNA, regulatory and chormatin roles) HMRD Comparative analysis w/human, mouse, rat, and dog protein sequence genomes Resulted in similarities (showed least 5% is under purifying selection and mostly likely functional consisting of non-coding elements with regulatory roles) evolution selected for them Presentation by: Tu Nguyen & Yazmin Rodriguez

2 Goal with human genome Discover and interpret all functional elements within it HMRD Comparative analysis w/human, mouse, rat, and dog protein sequence genomes Resulted in similarities Discover and interpret all functional elements within it (for studies in human bio, health and disease) functional elements: exons, introns and intergenic regions (protein- coding, RNA, regulatory and chormatin roles) HMRD Comparative analysis w/human, mouse, rat, and dog protein sequence genomes Resulted in similarities (showed least 5% is under purifying selection and mostly likely functional consisting of non-coding elements with regulatory roles) evolution selected for them

3 Coverage average depth of sequence over a nucleotides coverage = (#reads * read length) / length of genome Branch Length and Genetics one branch length is equal to 1 nucleotide substitution per site False Discovery Rate (FDR) Why is coverage important? coverage: average depth of sequence over a nucleotide (ie: how many times that portion is sequenced) | the higher the coverage, the more accurate the sequencing the higher the coverage, the more accurate the sequencing is to determine whether a sequencing deviation is an error in sequencing or a SNP Branch Length and Genetics one branch length is equal to 1 nucleotide substitution per site (usually 100 bp) False Discovery Rate (FDR) designed to control the expected proportion of incorrectly rejected null hypothesis (‘false discoveries’) for example a FDR of 10% would mean, there is a maximum of 10% false positives in the discoveries you made Picture credit:

4 Why is this important according to the authors?
Genetic Constraints portions of DNA that remain the same Why is this important according to the authors? they must be important to be selected for and conserved through evolution Genetic Constraints portions of DNA that remain the same suggesting evolution (exons, introns, intergenic elements) Why is this important according to the authors? they must be important to be selected for and conserved through evolution even something as simple and effective as DNA polymerase would be prone to mutations, however, it remained more or less unchanged, therefore there must be a selection towards DNA polymerase (species that had mutations in DNA polymerase were not as ‘fit’ as the others) Why is this important in terms of this article? Evolutionary genetic constraints were not only exons, that would eventually code for proteins. They also contained introns (spliced out during transcription) and intergenic regions (region between genes often referred to as ‘junk DNA’). These regions must have an important function (protein coding or regulatory) for it to be conserved and selected for SCE (synonymous constraint elements) short stretches within protein-coding ORFs that also encode additional, overlapping functional elements reduced apparent rate of synonymous substitutions in cross-species alignment ------ Yaz’s notes: Initial Mammalian Comparisons (just estimated the overall portion of a genome under evolutionary constraint [sequence that has not changed over the years, suggesting evolution has selected for them) gave a general view of location (estimated the overall location of the genome under evolutionary constraint couldn’t detect everything (but, it couldn’t detect the constraint elements especially the smaller ones) Wanted to identify and interpret constraint elements found for functionality Picture credit:

5 Process - Sequencing, assembly, alignment
29 mammalian genomes shotgun sequencing Largest fraction of constraint found in: exons, introns, intergenic regions 40% of what they found were introns MultiZ Image: Blue are organisms with finished genome sequences high quality drafts are in green (7x coverage) drafts are in black (2x coverage) red branches indicate more than 10 substitutions per 100 bp, while blue are less than 10 substitutions per 100 bp 29 mammalian genomes shotgun sequencing (20 based on ~2 fold coverage; and the rest by ~7 fold coverage thus to maximize the species sequenced Unique counts: 5’ UTRs, 3’ UTRs, promoters, pseudogenes, non-coding RNAs, introns, intergenic Used MultiZ to take many local alignments to generate a multiple local alignment 40% of the gene are intronic power to detect constrained elements depends on total branch length of phylogenetic tree connecting the species

6 Image: at 10% FDR, 3.6 million constrained elements can be detected encompassing 4.2% of the genome the area shaded in blue are fraction of newly detected bases compare this to the 29 mammals (union of HMRD 50-bp + Siepel vertebrate elements) largest fraction of constraint can be seen in coding exon, introns and intergenic regions

7 AR - neutrally evolving repeats
HMRD - Human, mouse, rat, dog Masked genomic - the whole genome the numbers of aligned species increases with the functional importance of each feature, suggesting that the power is highest over functional elements

8 Process - Detection of constrained sequence
4.2% of human genome (3.6 million elements) were pinpointed with a resolution of 12bp fine enough to detect individual binding sites for NRSF in promoter SiPhy-pi substitution rate and substitution pattern constrained regions - decrease in SNPs count when polymorphism do occur in these constrained regions, they tend to match alleles of non-human mammals Detection of constrained sequence

9 The neurological gene NPAS4has many constrained elements overlapping introns and the upstream intergenic region transcription factor on a gene found overlapping constrained element introns…. which looking further into are known to regulate lineages.

10 Biased nucleotide substitution patterns identifies positions where two bases appear equally constraint and correlating with SNPs in the human population. An example of an intergenic SiPhy-π element (HG18 chr12:1,916,342-1,916,380) detected based on the presence of three 2-fold degenerate constrained bases. Note how these bases (in grey boxes) alternate between bases across the evolutionary tree. One of the degenerate bases matches a SNP present in several human populations, European CEPF (ECU), Yoruban Africans (YRI) and Japanese (JPT).

11 Process - Functional annotation of constraint
Detection of constrained sequence

12 You don’t need to look at the figure...
The top figure basically points out that a new protein-coding gene (exon) was predicted using the 29 mammalian comparison. Its, then, supported by two independent multi-exon transcripts predicted by Scripture based on the Illumina HiSeq Body Map 2 The bottom figure basically points out that there might be a stop codon readthrough. The region between the first stop codon and second stop codon is highly conserved. This breaks down shortly after the second codon, making the authors believe that the second stop codon is the ‘true’ stop codon. In this case, TGA (a stop codon) also codes for SEC which may act as an active site for amino acids

13 Evolutionary signatures characteristic of conserved RNA secondary structures to reveal 37,381 candidate structural elements covering ~1% of constrained regions. This technique helped predict a new structure for the 3’ end of XIST large intergenic non-coding RNA This genomic region is constrained throughout the 29 species, making the authors believe that it is crucial in making the bp stem and 14 bp loop for the hairpins Image: (b) The RNA structure (green) is predicted on the XIST strand (purple) and overlaps short RNAs (blue) observed at high abundance in the chromatin cellular compartment. (d) The human sequences of all six hairpins were aligned using hairpin D as the reference. Insertions relative to D are shown with orange bars and numbers. Fully conserved positions (*) between the human sequences reveal the same loop region motif. (e) Multiple alignment across vertebrates for hairpin D (f) Secondary structure drawing of XIST structure with color-coding of substitution evidence

14 As different types of conservation in promoters may imply distinct biological functions, we classified the patterns of conservation within core promoters into three categories: (1) those with uniformly ‘high’ constraint (2) uniformly ‘low’ constraint (3) ‘intermittent’ constraint, consisting of alternating peaks and troughs of conservation High and intermittent constraints are associated with CpG islands, while low constraint regions are associated with low regions of overlap. All three regions overlap at TATA boxes Image: Analysis of promoters for 47,945 transcripts identified three patterns of high (red), intermittent (blue) and low (green) constraint. The genes with intermittent constraint had between 1-9 peaks of constraint within the 200 bp core promoter. This means that the promoter region is generally conserved.

15 There was enough data to produce known and novel motifs form four species (HMRD) with many conserved instances across the genomes. However, this data doesn’t allow us to discover new motifs. Using the 29 mammalian genomes improves this, allowing us to detect individual motif instances, and predict specific target sites for 688 regulatory motifs corresponding to 345 transcription factors There was a 60% FDR implemented, representing a reasonable compromise between specificity and sensitivity given the available discovery power and matching the experimental specifity of chromatin immunoprecipitation (ChIP) Image: (a) Enrichment of motifs in published experimental data sets. Known motifs for each factor show an enrichment in experimental data sets, which increases with conservation. (b) Enrichment further increases for regions that are bound both in human and in the orthologous positions in mouse.

16 Scaling of motif instances using different species subsets
Scaling of motif instances using different species subsets. Comparison of high and low coverage species demonstrates the value of having low coverage species.

17 Examination of evolutionary signatures identified Synonymous constraint elements and evidence of positive selection for certain sequences. Here we see two regions of SCE within HOXA2 reading frame (protein present in embryonic development regulating gene expression) these two regions have been characterized as enhancers on exons driving expression of HOX2A

18 As mentioned in the previous slide, examination of evolutionary signatures has brought up evidence of positive selection throughout lineages. blue are sites under purifying selection (the selective removal of alleles that are causing damage) gray are the sites under neutral selection (changes in the gene pool that are a result of neutral occurrences that don’t hurt nor give advantage to the species) while the red are under positive selection (selective of an allele that increases fitness)

19 Why is this important Detecting and interpreting these elements is relevant to medicine Gives us more of an understanding of a gene Epigenetics? Detecting and interpreting these elements is relevant to medicine (as loci identified in genome-wide studies frequently lie on non-coding regions)

20 Take Home Message Found multiple results of constrained sequences in the 29 mammalian genomes Potential functional classes for ~60% of constrained bases Found multiple results of constrained sequences in the 29 mammalian genomes (specifically on non-transcribable sequences) Potential functional classes for ~60% of constrained bases (found similar constraints within the genomes suggesting further studies for the actual function and more understanding of these ‘non-coding genes’)

21 Where do we go from here? functional elements relevant to this clade, including recent eutherian innovations discovering regulatory elements enable discovery of lineage-specific elements within mammalian clades human-specific selection should be detectable comparative approaches provide an unbiased catalogue of shared functional regions provide information on ancestral and recent selective pressures important implications for understanding human biology, health and disease functional elements relevant to this clade, including recent eutherian innovations discovering regulatory elements enable discovery of lineage-specific elements within mammalian clades, increased resolution for shared mammalian constraint (single-nucleotide resolution) Laurasiatherian and Euarchontoglire branches contains multiple model organisms human-specific selection should be detectable by combining data across genomic regions and by comparing thousands of humans experimental studies require prior knowledge of the biochemical activity sought and reveal regions active in specific cell types and conditions comparative approaches provide an unbiased catalogue of shared functional regions independent of biochemical activity or condition with increasing branch length, they can provide information on ancestral and recent selective pressures across clades and within the human population combination of disease genetics, comparative and population genomics and biochemical studies have important implications for understanding human biology, health and disease

22 Critiques Overview what they found Statistical findings, no process
Straightforward, understandable

23 Questions?


Download ppt "A high-resolution map of human"

Similar presentations


Ads by Google