Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
[Bejerano Fall10/11] 1 Any Project reflections?
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
[Bejerano Fall10/11] 1.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Klaudia Walter, Wally Gilks, Lorenz Wernisch 12 th December 2006 HUMANHUMAN Modelling the Boundary of Highly Conserved Non-Coding DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.
VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Sackler Medical School
Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development PLoS Biol Jan;3(1):e7. Epub 2004 Nov 11. Yvonne Li Paper presentation.
LARVA: An integrative framework for Large-scale Analysis of Recurrent Variants in noncoding Annotations M Gerstein, Yale Slides freely downloadable from.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Using vertebrate genome comparisons to find gene regulatory regions
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Comparative Sequence Analysis BioQUEST Workshop, Beloit, June Ivan Ovcharenko Lawrence Livermore National Laboratory.
Comparative Genomics I: Tools for comparative genomics
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Considerations for multi-omics data integration Michael Tress CNIO,
The Transcriptional Landscape of the Mammalian Genome
ENCODE Pseudogenes and Transcription
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Ultraconserved Elements in the Human Genome
Volume 116, Issue 4, Pages (February 2004)
Structure of the 5′ Portion of the Human Plakoglobin Gene
Presented by, Jeremy Logue.
Evolution of Alu Elements toward Enhancers
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Presented by, Jeremy Logue.
The Bov-A2 element is conserved in the NOS2 gene of bovid species.
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230

Genomic Non-coding Regions Significant number of vertebrate genomes available Comparative genomics aids identification of functional regions –~ 97% of human genome is non-coding Ultra-conserved regions identified (Gill Bejerano, 2004) –481 elements >200bp, 100% conserved between human, mouse and rat –average of 95% and 99% ID with dog and chicken –Non-exonic ultraconserved elements in gene deserts of > 1Mb –156 genes flanking intergenic ultras tend to be involved in developmental genes –Distal enhancers for these genes? Non-exonic conserved (NECs) regions

Investigating Enhancers in NECs Alignment nets were used for human-mouse comparisons and for zebrafish (Danio rerio) and Tetraodon nigroviridis comparisons Sliding window of 50bp and threshold such that 5% of human genome is conserved and about 73% of exons tagged as conserved NECs filtered of conserved regions of annotated exons, pseudogenes, RNA genes, gene predictions, mRNA, ESTs from any species, annotated repeats and regions mapping to 10 or more genomic locations in mouse Blastz of human to fish NECs Blat these regions to the human genome (hg17, May 2004), merge and unique –4551 regions found –compare to experimentally verified putative enhancers –search for known Transcription factor binding sites

Methods – Enhancer Comparison Enhancers from literature and provided by researchers –Dach1 enhancer regions (Science, 2003, Nobrega et al.) –chr13 CFTR region elements (Rick Myers group, Stanford) –Vista chr16 elements (Eddy Rubin from Enhancer Browser) –HLXB9, SOX21, PAX6, SHH,KIAA0010 (Woolfe et al., 2005, PLOS) conserved non-coding regions, Fugu vs. human Map these to human (hg17) if not already Results: –Dach1: 8/9 elements intersect with NECs –PLOS elements: 31/32 mapped to hg17 4 PAX6, 2 SHH, KIAA0010 and HLXB9 regions mapped

Search for TFBS in NECs Use Jaspar database of PSSM for TFBS Bin NECs according to GC content for generation of background sequences GC content shows normal distribution with mean at around 35% and range to 88.06% PST program used to create a Markov model and emit 50 sequences of 300 bp each for background dnaMotifFind (Jim Kent): 2 nd order Markov model, with PSSM from Jaspar vs NECs –Jaspar has 49 human TFBS

TFBS Distribution in NECs dnaMotif score: –log odds: ln P(motif | PSSM) / P(motif | background) Distribution of TFBS with length of NEC: –skewed to left with highest counts in TFBS / base –1040 elements have TFBS and 316 have > 1 Improbizer: –search for consensus sequences in the sets of putative enhancers –control runs of 100 to compare score

Consensus Sequence Identification Using a control set of sequences used by Stanford in enhancer experiments – ACGTCGC, GCATTTGT –these are barely significant compared to control runs –do not intersect with TFBS found by rVista2 –Motif 1 in Dc3, 6 and 7, motif 2 in Dc6, 7 and 8 Control set – similar GC content (30-35%) – TTTCCTATTTCGCTT, motif 1, Dc6 and Dc7 – GCTCCACGCTTCCACCT, motif 2, Dc6, 7 and 8 –score higher than control runs. Dc7 overlap - MEIS1

Future Work Search for TFBS for NECs using Transfac Take the NECs and create a model using PST, emit sequences and use in TFBS search as comparison Use this to help with fine tuning dnaMotifFind Search in NECs for consensus motifs found by Improbizer More investigation of TFs from rVista and how these map to NECs Map NECs and find nearest genes –GO annotation - function

Thank You Gill Bejerano (PST) Jim Kent (Improbizer, dnaMotifFind,MotifMatcher) UCSC Browser Staff