Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Localization Analysis
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Chromatin Immuno-precipitation (CHIP)-chip Analysis
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Understanding the Human Genome: Lessons from the ENCODE project
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Analysis of ChIP-Seq Data
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
Canadian Bioinformatics Workshops
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Comparative ab initio prediction of gene structures using pair HMMs
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
ChIP-chip Data, Model and Analysis Ying Nian Wu Dept. Of Statistics UCLA Joint with Ming Zheng, Leah Barrera, Bing Ren.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Massive Parallel Sequencing
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
EDACC Quality Characterization for Various Epigenetic Assays
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Algorithms in Bioinformatics: A Practical Introduction
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Analysis of protein-DNA interactions with tiling microarrays
Introduction to biological molecular networks
Cluster validation Integration ICES Bioinformatics.
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Overview of ENCODE Elements
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Transcription factor binding motifs (part II) 10/22/07.
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
Peak Calling for ChIP-Seq data Larry Meyer UCSC Bioinformatics Dept. BME 230 January 11, 2011.
Special Topics in Genomics ChIP-chip and Tiling Arrays.
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
Epigenetics Continued
Outline of the chromatin immunoprecipitation (ChIP) technique
Learning Sequence Motif Models Using Expectation Maximization (EM)
De novo Motif Finding using ChIP-Seq
Simon v ChIP-Seq Analysis Simon v
Ci Chu, Kun Qu, Franklin L. Zhong, Steven E. Artandi, Howard Y. Chang 
Protein Occupancy Landscape of a Bacterial Genome
OTX2 is associated with higher levels of activity when paired with NEUROD1 and arranged in clusters. OTX2 is associated with higher levels of activity.
ChIP-seq Robert J. Trumbly
Human Promoters Are Intrinsically Directional
Volume 63, Issue 6, Pages (September 2016)
Volume 10, Issue 10, Pages (October 2017)
High Sensitivity Profiling of Chromatin Structure by MNase-SSP
Ci Chu, Kun Qu, Franklin L. Zhong, Steven E. Artandi, Howard Y. Chang 
Presentation transcript:

Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in lab we will predict orthologs using reciprocal genome-scale BLAST searches W Oct 31 – Phylogenetic Profiles ( an example of unsupervised machine learning) and supervised machine learning approaches and applications M Nov 5 - Phylogeny (Phylogeny Lab) W Nov 7 – Metabolic reconstruction and modeling ***2-3 pg paper on preliminary results due*** Today: Chip-chip and Chip-seq analysis

Chromatin immunoprecipitation (ChIP) 1.Chemical or light-based crosslinking added to living cells 2.Shear DNA by sonication or digestion 3.IP by specific Ab or Ab against protein tag 2

ChIP on ChIP (tiled genomic microarrays) Signal Intensity Array Probes Peak resolution a function of: - shearing size - probe resolution - ChIP enrichment 3

ChIP - Seq Read Counts 4

5

1.Map reads to the reference genome 2.Convert to ‘tag’ counts: sequence coverage at each base pair in the genome 3.Find peaks of high tag count (using a fixed/sliding window with count threshold) or based on bimodal peak distribution 4.Convert bimodal peaks into summits (by shifting 3’ tag positions OR by extending the tag signal to estimated size of fragments) 5.Identify summits that represent fragment enrichment relative to control 6.Assign a confidence score (p-value, enrichment score, and/or FDR)

Types of ‘control’ data for ChIP experiments 1.‘Input’ DNA = sheared but no IP 2.No-antibody mock IP 3.Untagged strain Almost always some background in mock-IP … hope is to have enrichment of IP material over background. * Certain artifacts can give the appearance of real peaks in control experiments.

Pepke et al Read counts/ tag profile is generally smoothed before peak calling (e.g. running average) and then the ‘summit’ is inferred by the dual read peaks * using a method that incorporates measured background model is probably very important

10 3 Types of peaks 1. Sharp & narrow (100s bp) (eg. site-specific TF) 2. Broader but defined (kb) (eg. RNA Polymerase) 3. Very broad (regional, 1000s kb) (eg. heterochromatin histone marks) methods that identify bimodal peak profiles to identify summits work less well for biologically wider peaks/loci

Hidden Markov Models for Identifying Bound Fragments HMM ’ s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Once trained, an HMM can be used to identify the ‘ hidden ’ states in an unknown dataset, based on the known characteristics of each state ( ‘ emission probabilities ’ ) and the probability of moving between states ( ‘ transition probabilities ’ ) Example: “ A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences ” Li, Meyer, Liu

Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 I = Intensity units > 10,000i = Intensity units < 10,000 P= 0.5 P= 1.0 P= 0 P= 0.7 P= 0.3 P= 1.0 Unbound 25merBound 25mer

Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P= 0.5 P= 1.0 P= 0 P= 0.7 P= 0.3 P= 1.0 Unbound 25merBound 25mer Emission Probabilities Transition Probabilities Given the data, an HMM will consider many different models and give back the optimal model P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2

14 Evaluated 11 different peak-calling algorithms using 3 real datasets * & default parameters (mimicking “non-expert users”) - methods with smaller peak lists often return peaks identified by other methods (more stringent) “many programs call similar peaks, though default parameters are tuned to different levels of stringency”

15

Output: list of peak locations (start & stop) and p-values Challenge is peaks do not show precisely where protein binds. Different programs vary in the width of the identified peaks Can apply the same type of motif finding to a set of IP’d regions to identify motifs shared by regions.

Other approaches ChIP-exo DNaseI hypersensitive sites Micrococcal nuclease sensitive sites (nucleosome mapping)

What can you do with the data? 1.Motif finding: look for motif shared in bound regions (e.g. XX) 2.Association bound loci with neighboring genes, elements -functional enrichment of neighboring genes -other non-random association among neighboring genes, e.g. shared expression profiles, expression dependency on factor in question 3.Locus distribution across the genome