Phylogenetic footprinting and shadowing

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Comparative Motif Finding
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Todd J. Treangen, Steven L. Salzberg
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Mark D. Adams Dept. of Genetics 9/10/04
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Comparative Genomics.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Chapter 3 The Interrupted Gene.
Finding genes in the genome
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Comparative genomics. Comparing the DNA sequences from several species makes it possible to eliminate spurious gene predictions, sometimes find new ones,
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Introduction to Bioinformatics Resources for DNA Barcoding
Detection of genome regulation sequences
Genetics and Evolutionary Biology
Very important to know the difference between the trees!
A genomic code for nucleosome positioning
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Recitation 7 2/4/09 PSSMs+Gene finding
Eukaryotic Comparative Genomics
Cis-regulatory evolution of duplicate genes in yeasts
Volume 38, Issue 4, Pages (May 2010)
Gene Structure and Identification
From Mendel to Genomics
Volume 128, Issue 6, Pages (March 2007)
Nora Pierstorff Dept. of Genetics University of Cologne
Basic Local Alignment Search Tool
Modelling heterogeneity in multi-gene data sets
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Sequence variation of 16S rRNA gene primer-binding sites.
Presentation transcript:

Phylogenetic footprinting and shadowing Comparative Genomics Phylogenetic footprinting and shadowing

Understanding genomes Yeast genome structure ~10Mbp 70% coding, 15% of intergenic region is regulatory Elements to identify ~60 motifs 4800-6400 Results Clustering / common motif search de novo / other organisms Systematic mutations upstream cDNA Regulatory elements Genes

Patterns of bullet holes Bullet holes in planes after combat Aim: protect vulnerable areas Abraham Wald was asked to analyse patterns of bullet holes

Patterns of bullet holes Phylogenetic footprinting: Airplane – genome Bullet hole – mutation Combat – natural selection mutations genome sequence functional region!

Comparative genomics: intro ...CGATGACTATTA... ...CGATGACTA-TA... ...C---GAGTATTA... More than one genome in hand no additional information Look at strongly conserved regions “Because evolution relentlessly tinkers with genome sequence and tests the results by natural selection, such [functional] elements should stand out by virtue of having a greater degree of conservation”

We start with… Input: 4 Saccharomyces S. cerevisiae, S. paradoxus, S. mikatae, S. bayanus 5 to 20 mln years since the divergence of species Divergent enough to introduce noise where needed Related enough for orthologues to be easily detectable Sequencing and comparison of yeast species to identify genes and regulatory elements, Manolis Kellis, Nick Patterson, Matthew Endrizzi, Bruce Birren & Eric S. Lander,Nature 423, 241 - 254 (15 May 2003)

Genome alignment Anchoring with ORFs Aligning region in between ...CGATGACTATTA... ...CGATGACTA-TA... ...C---GAGTATTA... Anchoring with ORFs Aligning region in between 50kb segment; arrow – direction of ORF, red – 1-1 match; blue – multiple match

Genome evolution - nucleotides S. cerevisiae ...CGTTGACTATTA... Other yeast ...CGATGGCTA-TA... Nucleotide identity: 60% 70% 80% intergenic 85% 90% coding S. bayanus S. mikatae S. paradoxus 2x faster!

Genome evolution - nucleotides Measures of variation for all species (multiple alignment) S. cerevisiae ...CGATGCCTATTC... S. paradoxus ...CGATGGCTA-TA... S. mikatae ...C---GAGTATTA... S. bayanus ...CGATTACTATGA... +stop codons 0.14% 1.3% 60% coding 75x 10x 2x difference 10.2% 14% 30% intergenic frame shift gap identity

Genes identification Solution: Expected quality: Dummy method: long region without stop codon becomes “suspected” ORF Reject such ORF if they don't behave like genes Expected quality: High sensitivity (no “false negatives”) of dummy method High specificity (no “false positives”) thanks to divergence

Genes identification Results: gene catalogue revision From ~4000 named genes only 15 rejected. One mistake. 5538 genes (before: 6062) Different start/stop codons for 5% of genes ~60 new introns

Regulatory elements identification Regulatory motifs Short 6-15bp Hard to identify A small investigation of Gal4 motif shows conservation rates: 4x Difference 3% 12.5% Gal4 ½x 7% Random Coding Intergenic Motif:

Detection of regulatory elements Make up random motif of form: CGAxxxxxTGG Check validity with the test Results any nucleotides 40 20 30 Discovered Known

Summary Results Human Quantity of data -> quality of results Exact count of 5538 genes Better gene boundaries New introns identified New motifs found Human Coding part of genome 2% (70% in yeast), regulatory motifs 3% (15% in yeast) of intergenic sequences Repetitions

An Example: ORF rejected by the test

Motif recognition ? Gal4 Difference Convergent Divergent a motif genes

Genome evolution at large scale

Genome evolution at large scale Telomeres - evolution’s workshop Clusters of ambiguity 7-52kb from each end Translocations between telomeres Rapid evolution Observed in Plasmodium falciparum (antigenic variation)

Phylogenetic shadowing A few closely related species Chimp Baboon monkeys Credits to Bofelli et al. Science, Vol 299, Issue 5611, 1391-1394 , 28 February 2003

Phylogenetic shadowing Additive divergence Credits to Bofelli et al. Science, Vol 299, Issue 5611, 1391-1394 , 28 February 2003