Localization Analysis

Slides:



Advertisements
Similar presentations
Chromatin Immunoprecipitation and the Chip on Chip technique. Fredrik Fagerström Billai B E A- Core Facility for Bioinformatics and Expression Analysis.
Advertisements

Chromatin Immuno-precipitation (CHIP)-chip Analysis
Bioinformatics Lectures at Rice
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Analysis of ChIP-Seq Data
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Introduce to Microarray
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
with an emphasis on DNA microarrays
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Technology and Methods Seminar
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Affymetrix vs. glass slide based arrays
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
ChIP-chip Data, Model and Analysis Ying Nian Wu Dept. Of Statistics UCLA Joint with Ming Zheng, Leah Barrera, Bing Ren.
This Week: Mon—Omics Wed—Alternate sequencing Technologies and Viromics paper Next Week No class Mon or Wed Fri– Presentations by Colleen D and Vaughn.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Whole Genome Expression Analysis
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Data Type 1: Microarrays
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Agenda Introduction to microarrays
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Verna Vu & Timothy Abreo
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Statistics for Differential Expression Naomi Altman Oct. 06.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Analysis of protein-DNA interactions with tiling microarrays
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Gene expression and DNA microarrays No lab on Thursday. No class on Tuesday or Thursday next week –NCBI training Monday and Tuesday –Feb. 5 during class.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Lecture-5 ChIP-chip and ChIP-seq
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
No reference available
CGH Data BIOS Chromosome Re-arrangements.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Other uses of DNA microarrays
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
Microarray: An Introduction
Special Topics in Genomics ChIP-chip and Tiling Arrays.
Functional Genomics in Evolutionary Research
Presentation transcript:

Localization Analysis 11/07/07

Tiling arrays Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome

Tiling Arrays http://en.wikipedia.org/

Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Spike-in experiments – we can find linkers as short as 7 bp Measured red/green ratio Location of labeled PCR product

Experimental Determination of Cross-Hybridization Spike in PCR product – (1+1)/1 > (1+n)/n, so X-hybing probes will detect less enrichment experimentally

Spike-in data

Array CGH Technology

Genome-wide measurement of DNA copy number alteration by array CGH Genome-wide measurement of DNA copy number alteration by array CGH. (a) DNA copy number profiles are illustrated for cell lines containing different numbers of X chromosomes, for breast cancer cell lines, and for breast tumors. Each row represents a different cell line or tumor, and each column represents one of 6,691 different mapped human genes present on the microarray, ordered by genome map position from 1pter through Xqter. Moving average (symmetric 5-nearest neighbors) fluorescence ratios (test/reference) are depicted using a log2-based pseudocolor scale (indicated), such that red luminescence reflects fold-amplification, green luminescence reflects fold-deletion, and black indicates no change (gray indicates poorly measured data). (b) Enlarged view of DNA copy number profiles across the X chromosome, shown for cell lines containing different numbers of X chromosomes. Pollack J R et al. PNAS 2002;99:12963-12968 ©2002 by The National Academy of Sciences

DNA copy number alteration across chromosome 8 by array CGH DNA copy number alteration across chromosome 8 by array CGH. (a) DNA copy number profiles are illustrated for cell lines containing different numbers of X chromosomes, for breast cancer cell lines, and for breast tumors. Breast cancer cell lines and tumors are separately ordered by hierarchical clustering to highlight recurrent copy number changes. The 241 genes present on the microarrays and mapping to chromosome 8 are ordered by position along the chromosome. Fluorescence ratios (test/reference) are depicted by a log2 pseudocolor scale (indicated). Selected genes are indicated with color-coded text (red, increased; green, decreased; black, no change; gray, not well measured) to reflect correspondingly altered mRNA levels (observed in the majority of the subset of samples displaying the DNA copy number change). The map positions for genes of interest that are not represented on the microarray are indicated in the row above those genes represented on the array. (b) Graphical display of DNA copy number profile for breast cancer cell line SKBR3. Fluorescence ratios (tumor/normal) are plotted on a log2 scale for chromosome 8 genes, ordered along the chromosome. Pollack J R et al. PNAS 2002;99:12963-12968 ©2002 by The National Academy of Sciences

Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

RNA vs genomic 3’ UTR 5’ UTR

Tiling of the Hox loci – mRNA vs. genomic

Transcript maps. ZY Xu et al. Nature 000, 1-5 (2009) doi:10.1038/nature07728

Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

DNaseI HS profiling

DHS profiling identifies promoters, enhancers, and insulators

Isolation of nucleosomal DNA Cut in half

Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Experimental Protocol Step 1: crosslink protein with DNA Step 2: sonication (break) DNA Kim and Ren 2007

Experimental Protocol Step 1: crosslink fix protein with DNA Step 2: sonication break DNA Step 3: immuno-precipitation Pull down target protein by specific antibody Kim and Ren 2007

Experimental Protocol Step 1: crosslink fix protein with DNA Step 2: sonication break DNA Step 3: immuno-precipitation Pull down target protein by specific antibody Step 4: hybridization Hybridize input and pulled-down DNA on microarray Kim and Ren 2007

Chromatin Immuno-precipitation

Tiling Array Data Each TF binding signal is represented by multiple probes. Need more sophisticated statistical tools. Kim and Ren 2007

Tiling arrays provide high resolution for identifying bound fragments Overlapping 25-mer fragments Boyer et al. 2005

Mapping histone modifications

Chromatin’s primary structure

OK, now what? Analysis method strongly depends on how widespread the thing being examined is, and if you have a guess regarding its localization CGH: Just look! TF ChIP-chip, DHS: peak finding algorithms (BUT BUT BUT). RNA, chromatin marks: Hidden Markov Models, aggregation plots

CGH Array Segmentation Key idea: Most probe targets have same copy number as their next neighbors Can average over neighbors Key issue: when is a difference real? Recommended Programs: DNACopy – Solid statistical basis; slow StepGram – Heuristic ; fast

Methods Moving average t-test (Keles et al. 2004) HMM (Li et al. 2005; Yuan et al. 2005) Tilemap (Ji and Wong 2005) MAT (Johnson et al. 2006)

Keles’ method Calculate a two-sample t-statistic CHIP-signal Y2 Y1 Input-signal i Keles et al. 2004

Keles’ method Calculate a two-sample t-statistic CHIP-signal Y2 Y1 Moving average scan-statistic Input-signal i

Multiple hypothesis testing Multiple hypothesis testing needs to be considered to control false positive error rates. What is the null distribution of this statistic?

Multiple hypothesis testing Assume has t-distribution Approximate by normal distribution. Alternatively can use resampling method to estimate the null distribution.

ChIPOTle: a simple method for identifying ‘bound’ genomic fragments (Buck et al. 2005) Assumption: real binding site will have distribution of bound fragments encapsulating it. Therefore, true positives will likely have multiple, contiguous fragments with high signal. Walk across tiled genomic probes with user-defined window size Calculate mean signal intensity within each window Estimate p-value of binding (Bonferroni-corrected) based on a standard error model or by permuting the dataset.

BUT: Extensive low-affinity transcriptional interactions in the yeast genome Amos Tanay Genome Research 2006

OK, what about more continuous data like RNA or chromatin marks?

Inferring nucleosomes: HMM

A Hidden Markov Model objectively identifies nucleosome positions

Hidden Markov Models for Identifying Bound Fragments HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and the probability of moving between states (‘transition probabilities’) Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” 2005. Li, Meyer, Liu

Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P= 1.0 P= 0.5 P= 0.3 P= 0 P= 0.5 P= 1.0 P= 0.7 P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 Unbound 25mer Bound 25mer Bound 25mer Bound 25mer I = Intensity units > 10,000 i = Intensity units < 10,000

Emission Probabilities Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Transition Probabilities Emission Probabilities P= 1.0 P= 0.5 P= 0.3 P= 0 P= 0.5 P= 1.0 P= 0.7 P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 Unbound 25mer Bound 25mer Bound 25mer Bound 25mer Given the data, an HMM will consider many different models and give back the optimal model

Other types and uses of microarrays: aCGH CGH (comparative genomic hybridization) looks at cytogenetic abnormalities genomic DNA hybridized to array often uses large clones (e.g., BACs) as array features

Validation of data There’s no way that all of your microarray data can be validated. It’s strongly recommended that any key findings be verified by independent means. Northern blots and quantitative RT-PCR are the typical ways of doing this; real-time, quantitative RT-PCR is generally the method of choice.

Chromatin’s primary structure

One way to turn this 1D trace into 2D is via “averageogram”

H4 K16 Acetyl, aligned by NFR

Beyond Transcription % exchange events (Printed Arrays) % nucleosomes

Multiple visualizations of tiling data

RNA-Seq Lockhart and Winzeler 2000 Wang et al. 2009

RNA-Seq Whole Transcriptome Shotgun Sequencing Sequencing cDNA Using NexGen technology Revolutionary Tool for Transcriptomics More precise measurements Ability to do large scale experiments with little starting material

RNA-Seq Experiment Wang et al. 2009

Mapping Create unique scaffolds Harder algorithms with such short reads

Unbiased sequencing of the yeast transcriptome Unbiased sequencing of the yeast transcriptome. (A) Distribution of reads mapped to the PAP1 locus. Shown are SGD annotations (downloaded at November 2007) (8), and mapped reads (red, W strand; blue, C strand). Additional tracks plot the cumulative number of reads covering each base position (yellow, YPD; light blue, HS). Full data can be accessed at http://compbio.cs.huji.ac.il/RNASeq, and is visualized using the University of California, Santa Cruz, genome browser (22). (B) Distribution of reads matched to the genome. Of the 26,050,414 reads sequenced in YPD (Left), 13,424,957 (52%, blue) were uniquely mapped to a single genomic locus, 6,144,595 (23%, green) were mapped to several locations, and 6,480,862 (25%, yellow) could not have been aligned, and were later used to detect splice junctions. Similar numbers were found after a HS (Right). Yassour M et al. PNAS 2009;106:3264-3269 ©2009 by National Academy of Sciences

Mapping Place reads onto a known genomic scaffold Requires known genome and depends on accuracy of the reference http://en.wikipedia.org/

Ab initio assembly of a transcript catalog Ab initio assembly of a transcript catalog. (A) Outline of steps in the catalog construction pipeline. (B) Segmentation of a contiguously transcribed region into 2 regions of distinct expression levels corresponding to the genes YBR287W and APM3. When using YPD reads alone, both genes exhibit similar coverage and thus cannot be segmented. However, in HS, they are differentially expressed, and hence by combining observations from both conditions the automatic segmentation procedure (see Materials and Methods) correctly separates them to 2 units. Tracks from top to bottom: SGD annotations (blue), our catalog (green), read coverage at YPD (yellow), and read coverage at HS (blue). (C) Detection of splice junctions. Full and gapped reads mapped to the RIM1 genomic locus. Tracks are as in B, together with gapped reads (connected segments), our putative splice junctions (in red and blue), including the junction orientations as estimated by donor and acceptor sequence motifs (arrows). As shown, our procedure identifies the exact coordinates and orientation of the known splice site. Yassour M et al. PNAS 2009;106:3264-3269 ©2009 by National Academy of Sciences

Biases Wang et al. 2009

What the data look like

Superimposing channels Giresi et al, Genome Res. 10

Experimental Design for Microarrays There are a number of important experimental design considerations for a microarray experiment: technical vs biological replicates amplification of RNA dye swaps reference samples

Experimental Design for Microarrays Technical vs biological replicates technical replicates are repeat hybridizations using the same RNA isolate biological replicates use RNA isolated from separate experiments/experimental organisms Although technical replicates can be useful for reducing variation due to hybridization, imaging, etc., biological replicates are necessary for a properly controlled experiment

Experimental Design for Microarrays Amplification of RNA linear amplification methods can be used to increase the amount of RNA so that microarray experiments can be performed using very small numbers of cells. It’s not clear to what degree this affects results, especially with respect to rare transcripts, but seems to be generally OK if done correctly

Experimental Design for Microarrays Dye swaps When using 2-color arrays, it’s important to hybridize replicates using a dye-swap strategy in which the colors (labels) are reversed between the two replicates. This is because there can be biases in hybridization intensity due to which dye is used (even when the sequence is the same). S1 S2 S1 S2

Experimental Design for Microarrays Reference samples one common strategy is to use a reference sample in one channel on each array. This is usually something that will hybridize to most of the features (e.g., a complex RNA mixture). Using a reference sample allows comparisons to be made between different experimental conditions, as each is compared to the common reference. S1 S2 S3 R compare S1/R vs. S2/R vs. S3/R

Experimental Design for Microarrays The bottom line is that you should discuss your experimental design with a statistician before going ahead and beginning your experiments. It’s usually too late and too expensive to change the design once you’ve begun!

MIAME (Minimal Information About a Microarray Experiment) When you publish a microarray experiment, you are expected to make available the following minimal information. This allows others to evaluate your data and compare it to other experimental results: • EXPERIMENT DESIGN type, factors, number of arrays, reference sample, qc, database accession (ArrayExpress, GEO) • SAMPLES USED, PREPARATION AND LABELING • HYBRIDIZATION PROCEDURES AND PARAMETERS • MEASUREMENT DATA AND SPECIFICATIONS quantitations, hardware & software used for scanning and analysis, raw measurements, data selection and transformation procedures, final expression data • ARRAY DESIGN platform type, features and locations, manufacturing protocols or commercial p/n