Understanding Gene Regulation Through Integrated Analysis of Genomic Data Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
ZHANG ZHIZHUO MAY 2010 Nucleosome Positioning & Transcription Factor Identification.
Interpreting Variation in Human Non-Coding Genomic Regions Using Computational Approaches with Experimental Support Lisa Brooks, Ph.D., Mike Pazin, Ph.D.
Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Prof. Drs. Sutarno, MSc., PhD.. Biology is Study of Life Molecular Biology  Studying life at a molecular level Molecular Biology  modern Biology The.
PROMoter SCanning/ANalysis tool. Goal Creating a tool to analyse a set of putative promoter sequences and recognize known and unknown promoters, with.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Investigating the Importance of non-coding transcripts.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
at the Single-Cell Level
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Manolis Kellis Broad Institute of MIT and Harvard
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"
Genome Editing for Thalassemia CAF Patient-Family Conference 21 June 2014 Daniel E. Bauer, MD PhD.
ENCODE The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Bioinformatics.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression.
Current Topics in Genomics and Epigenomics – Lecture 2.
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Marco Magistri , Journal Club. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein “Structural genes encode proteins.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Genetics 760: Genomic Methods for Genetic Analysis Course Organizer: Jim TAs: Tim
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Overview of ENCODE Elements
Jason Ernst Broad Institute of MIT and Harvard
Motif Search and RNA Structure Prediction Lesson 9.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Accessing and visualizing genomics data
Starter What do you know about DNA and gene expression?
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Gene structure and function
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
YOUR FUTURE STARTS WITH HOPE YOUR FUTURE STARTS WITH HOPE Genome Biology & Applied Bioinformatics Human Genome Mehmet Tevfik DORAK, MD PhD.
Sungkyunkwan University, School of Medicine.
National Human Genome Research Institute
Epigenetics 04/04/16.
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Gene Hunting: Design and statistics
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
From Prescription to Transcription: Genome Sequence as Drug Target
Volume 57, Issue 2, Pages (January 2015)
Epigenetics System Biology Workshop: Introduction
Genetic variation in DREs could be a causative factor in dysregulation of distal target gene expression. Genetic variation in DREs could be a causative.
ChIP-seq Robert J. Trumbly
Integrative analysis of 111 reference human epigenomes
Presentation transcript:

Understanding Gene Regulation Through Integrated Analysis of Genomic Data Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School of Public Health Faculty Workshop, July 23 rd, 2014

Biology used to be about memorizing terms and facts CategoryHumanZebrafish DomainEukarya KingdomAnimalia PhylumChordata ClassMammaliaActinopterygii OrderPrimatesCypriniformes FamilyHominidaeCyprinidae GenusHomoDanio SpeciesH. SapiensD. Rerio

Genome sequencing has digitized biology aggcctttgttgttggcagattgctagggtctgaatgtttatgcccctg tgaaatttctttgttgaaatcttcacccctaaggtaatgctattagaagg tgggaaccttagaataattaggtgatggggacagagccctcatgaagggg atcagtgcccttataaaagaaatctgagagagaccctttgccacttctgc catgtgggttagagtgagaagaaggttatttacgagaaagtagcccttac tagacgctgaatcttctggtgccttgatcttagactcaccagctttcaga actgtaagaaataaatttctagtgtttacaagccacccagcctatggtat tttgttatagcatctggaatggactaagacacagaacaagataatgggtg gatatgctaaactttgtatatacacatgtccatttatatttccatatgtc tccatctgttatctatatcaagctaaacatgagttcatattgatgtttcc aattccaattgttacaaaatggatcatcaccttgtttttctgtaatcctc tattcagtgaaaaaccttgctcccatactatgacatccatttatttaatt gttcaatttcattatatatgtacagcaatatccaaattaataacatgtac ccctgtggacatgattatgtgaactagagtatagggcttatAAATTAAAA AAATTTAtttttattttggaaaatgcatataacaaaatgtggcattttaa tgatttttaagggtaaaatttagtgacattaattatattactaacgttgt acagctatcattactatctactttgaaaatacttttaagaacccaaacag aaaatccatacccactaagcaataaccctattgccccctcctttcagccc ttggcaatgaccattgtacttttagtctgtatgagtttgccttttctgga tatttcattttagtgaaatcatagaatatttgctcttttgtgtgtggatt atttcacttatttttaaagtttattcatttgtaacatgtattaaaacttt attcctttttttggttgaataatattctattatgtgtatataacacattt tgtttattcattcatttgttggtgaatacttgggttatttccaccttcta gaaattgtgagtcatgctgcagtggacataggcatacaattatctgagtt tctactttctattgttttggatatataatcagaattttaattgctggtgc atatggtaattttatgtatactaatttgaggagaatccatactgtttttc tcaatggctacaccattttacattcccaccagcaatgcattatggggcaa tttatccacaccaacagcaacacttattattttctaggtttttttatctt tttattttattaatgtttatcctaacagatatgaaataatatttcattgt gattttgatttacatgctaatgattagtgatgttgaacagtatttcatgt gcttatgggctatcttgtatcttttttagataaatgtctatttaaatcct ttgtttatttttgagctgaaatgtttagtttttgtggagttgtgggaatt

Variation of genetic information may predict disease risk wikipedia

Most DNA is not transcribed Most transcripts are noncoding Most proteins has unknown functions Courtesy of National Health Museum

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has … These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein- coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes … The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation…

Courtesy of Broad Institute

Quantifying cross cell-type plasticity H3K27me3 mean variance Highly Plastic Regions (HPR): the top 1% with highest plastic score. Lowly Plastic Regions (LPR): the bottom 1% with lowest plastic score.

HPRs are associated with regulatory regions

Chromatin plasticity is related to DNA sequence

A pipeline to identify regulatory TFs Pinello, PNAS Jan 21;111(3):E344-53

Example: PAX5 in GM Motif Enrichment PAX5 is one of the most enriched motifs in GM12878 specific MPRs 2. Coordinated Expression (z-score) PAX5 PAX5 Targeted HPR Genes GM Centralization -2KB MPR_Center 2KB Enrichment Score

ChIPseq confirms colocalization between Pax5 and H3K27me3 in GM KB HPR Center -2KB HPR Center

Haystack is (almost) available! INPUT: Aligned reads from ChIP-seq (.bam files) ONE COMMAND ONLY: haystack_pipeline my_bam_folder hg19 OUTPUTS: Highly plastic regions Tracks normalized for IGV or Genome Browser List of candidate regulatory TFs.

Take home message We shouldn't just focus on a snapshot of the histone patterns and try to interpret what they all mean. Dynamic change is the key to understand biological function.

Conclusions Biology has entered a data-rich era. “All models are wrong; but some are useful.” ---- George E. P. Box

Acknowledgement Our group – Luca Pinello – Kimberly Glass – Eugenio Marco – Jialiang Huang NIH, Barr Award, Milton Foundation, HSPH CIF Stuart Orkin – Jian Xu – Zhen Shao – Dan Bauer