Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2007. Day 5-2 What bioinformatics.

Slides:



Advertisements
Similar presentations
Chromatin Immunoprecipitation and the Chip on Chip technique. Fredrik Fagerström Billai B E A- Core Facility for Bioinformatics and Expression Analysis.
Advertisements

Chromatin Immuno-precipitation (CHIP)-chip Analysis
Understanding the Human Genome: Lessons from the ENCODE project
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene expression analysis summary Where are we now?
Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September 2008 Elisabet Andersson, Alistair Chalk Stem Cell Biology and Bioinformatic.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Chip-chip and handling.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduce to Microarray
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
High Throughput Sequencing
Analysis of High-throughput Gene Expression Profiling
with an emphasis on DNA microarrays
Technology and Methods Seminar
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Whole Exome Sequencing for Variant Discovery and Prioritisation
Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University
Affymetrix vs. glass slide based arrays
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
ChIP-chip Data, Model and Analysis Ying Nian Wu Dept. Of Statistics UCLA Joint with Ming Zheng, Leah Barrera, Bing Ren.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Bioinformatics Primer.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
es/by-sa/2.0/. Large Scale Approaches to the Study of Gene Expression Prof:Rui Alves Dept.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
CDNA Microarrays MB206.
Data Type 1: Microarrays
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Agenda Introduction to microarrays
Massive Parallel Sequencing
Finish up array applications Move on to proteomics Protein microarrays.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Verna Vu & Timothy Abreo
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Runx1-VE+ Runx1+VE+CD41-Runx1+VE+CD41+Runx1+VE-CD41+ Supplementary Figure 1 Supplementary Figure 1: Validation of cell populations for gene expression.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Analysis of protein-DNA interactions with tiling microarrays
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 1 – 3 Introduction.
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Introduction to Next Generation Sequencing. Strategies For Interrogating the Transcriptome Known genes Predicted genes Surrogate strategy Exon verification.
Special Topics in Genomics ChIP-chip and Tiling Arrays.
Gene expression from RNA-Seq
Exploring and Understanding ChIP-Seq data
Presentation transcript:

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics tools can be used for analysing ChIP data?What bioinformatics tools can be used for analysing ChIP data? What bioinformatics tools can be used for analysing ChIP data?

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September After this seminar You should be able to  Understand the differences between CHip-chip and CHip-Seq and identify key decision making steps for choosing a platform  Identify bioinformatics steps needed for handling CHip-chip and Chip- Seq datasets  Understand underlying data from genome tiling arrays  Understand how to search for binding sites in genomic data  Understand the need for skills in handling large datasets

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September General problem Find accessible regions of DNA that are bound to your protein. What method is best? What sort of bioinformatics skills are required? What is real signal and what is noise? What do we do with the regions once you have identified them? Zheng, M. et al. (2007) ChIP-chip: data, model, and analysis. Biometrics, Vol 63,

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Experimental methods give different types of data ChIP-chip  microarray data defining genomic regions  probe (with position usually defined) + expression ChIP-Seq  high throughput DNA sequence  ACGATGTCA sequence fragments (from Solexa/SOLID/454)‏  sequence position undefined (search required)‏ The same issues exist for microarray vs. deep sequencing in gene expression experiments  coverage  cost  practicality‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Raw (sequence) data Flat files, processed from base-calls to fasta format Solexa  ~25-30 bp reads Barcode is used to pool samples in one sequence run ACGT = Expt1 TGAC = Expt2 ACGT|Sequence TGAC|Sequence

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Choice of experiment Choice of experiment depends on the focus you require  Whole genome broad coverage (of known genome)‏  or focused genomic region?  or discovery based (known or unknown genome)‏ How much coverage do you need?  Fewer broad experiments vs. many focused experiments? Custom chips can be easily designed for focused regions and custom applications.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Chip- Workflow Select antibody Select chip or design and select probes Map Array probes to genomic positon (BLAST/BLAT or lookup table from chip supplier)‏ Identify peaks from data and minimise false positives Analyse peaks to predict binding sites Select antibody Decide how deep to sequence ($$$ vs. coverage)‏ Sequence fragments Map Sequence to genomic position (BLAST/BLAT)‏ Identify peaks from data and minimise false positives Analyse peaks to predict binding sites CHip-chip CHip-seq

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Chip- Ringo Workflow example

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Chip- output BMC Bioinformatics 2007, 8:219 Peaks on the genome “Score” for each genomic position

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Antibody selection Success depends on your antibody Select antibodies that are suitable for CHip-chip experiments  Only a small number so far!  List available from

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Microarray companies DNA microarrays suitable for ChIP-chip assays:  Affymetrix Human Chr21&22 tiling microarrays (oligonucleotide arrays)‏ Human ENCODE tiling arrays (oligonucleotide arrays)  Agilent Custom oligonucleotide arrays  Nimblegen Systems, Inc. Human promoter microarrays Human ENCODE microarrays Custom oligonucleotide arrays  Aviva Systems Biology Hu5K promoter arrays (PCR product arrays)‏ Hu20K promoter arrays (Oligo arrays)

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Probe Design Tiling  high-resolution arrays  target genomic regions of interest  whole genome or specific targeted regions? Agilent eArray probe database  >21 million tiled CGH and ChIP-on-chip probes Do it yourself  unassembled genomes, etc...‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Mapping to genome The genome is still not constant, especially for many organisms You must map the probe/sequence to genomic location using  standard alignment software (BLAST/BLAT/vmatch/...)‏  or rely on datafiles from the vendor (reccomended for most cases)‏ R packages exist for annotating probes to genomic location

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Mapping to genome For sequence based methods this step is critical (and slow)‏  need unix server to run (or VMware) Do I need access to a computing cluster?  choice of parameters for short sequences Filter raw sequences -> representative sequence set Do I need to pre-filter data (some seqs will account for most of the compute time)‏  must be aware of speed vs. specificity for large datasets Genome

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Normalisation A normalization procedure:  (a) The MA plot before normalization shows a need for rotation to correct dye-bias.  (b) To determine the correct angle of rotation, the σ(M) vs σ(A) plot of the differences between probes is generated This circumvents the effect of binding signal in determining the rotating angle for original MA plot in (a).  (c) The MA plot after rotation by the angle determined in (b). The green line is the fitting line after rotation.  (d) The MA plot after normalization..  BMC Bioinformatics. 2007; 8: 219. MA plot is a scatterplot with transformed axes. The X-axis represents the average log intensity from 2 channels while Y-axis represents the log- ratios.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Peak detection What regions of DNA contain signal peaks? How to define a statistically significant peak? Zheng, M. et al. (2007) ChIP-chip: data, model, and analysis. Biometrics, Vol 63,

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Normalisation Before normalization  the mock control appears to show the same differential enrichment between genic and intergenic regions as the histone occupancy, suggesting that the differential enrichment may be an artifact. After normalization  the mock control no longer shows significant differential enrichment while H3 and H4 profiles still do  Peng et al. BMC Bioinformatics :219 doi: /

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Noise Contamination  Do sequences match the expected genome? Sequencing errors  Can you determine where a sequencing error is? Multiple-mapping sequences  Many sequences do not unique genome matches Dye specific bias ChIP-chip data for chromatin-associated proteins and histone modifications present additional challenges  as they often display broad regions of enrichment. This is in contrast to the isolated and sharp peaks that are typical for the binding of transcription factors.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Peak detection - replicates Use replicates to improve detection  Peaks that are consistent between replications are more likely to be true Zheng, M. et al. (2007) ChIP-chip: data, model, and analysis. Biometrics, Vol 63,

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September What next? Given that you've identified accessible regions in the genome  What information can be gathered from this sequence? Use discovery methods to look for common patterns in the regions  MEME, etc Use TFBS databases to look for known transcription factor binding sites in the sequence Transfac  High coverage  Noisy database Jaspar  Low coverage  Higher quality‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September R packages for chip-chip Ringo  Well documented workflow and good tutorial BAC  Perfect example of minimal documentation  Bayesian Analysis of ChIP-chip data

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Summary You should be able to  Understand the differences between CHip-chip and CHip-Seq and identify key decision making steps for choosing a platform  Identify bioinformatics requirements for handling CHip-chip and Chip-Seq datasets  Find transcription factor binding sites in genomic data  Understand the need for skills in handling large datasets