Analysis of Exon Arrays Slides provided by Dr. Yi Xing.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

RNA-Seq as a Discovery Tool
Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Data Analysis for High-Throughput Sequencing
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Using Isoform-Sensitive Microarrays to Study Different Modes of Alternative Splicing Christina Zheng Ares Lab RNA Club September 14, 2006.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarray Preprocessing
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
June Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009.
Whole Genome Expression Analysis
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Differential Analysis & FDR Correction
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.
Verna Vu & Timothy Abreo
Microarray - Leukemia vs. normal GeneChip System.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Gene Expression Platforms for Global Co-Expression Analyses A Comparison of spotted cDNA microarrays, Affymetrix microarrays, and SAGE Obi Griffith, Erin.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Paper Review on Cross- species Microarray Comparison Hong Lu
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Figure 1. Gene expression analysis
Gene expression.
Volume 44, Issue 3, Pages (November 2011)
RNA Exosome Depletion Reveals Transcription Upstream of Active Human Promoters by Pascal Preker, Jesper Nielsen, Susanne Kammler, Søren Lykke-Andersen,
Transient N-6-Methyladenosine Transcriptome Sequencing Reveals a Regulatory Role of m6A in Splicing Efficiency  Annita Louloupi, Evgenia Ntini, Thomas.
Getting the numbers comparable
Volume 33, Issue 4, Pages (February 2009)
Volume 3, Issue 1, Pages (July 2016)
Volume 16, Issue 6, Pages (December 2004)
Molecular Convergence of Neurodevelopmental Disorders
Volume 44, Issue 3, Pages (November 2011)
Volume 36, Issue 6, Pages (December 2009)
Origins and Impacts of New Mammalian Exons
Presentation transcript:

Analysis of Exon Arrays Slides provided by Dr. Yi Xing

Outline –Design of exon arrays –Background correction –Probe selection, expression index computation –Evaluation of gene level index –Exon level analysis –Conclusion

1. Basic design of Exon Array 3’ ArraysExon Arrays 1 gene or 2 probesets1 gene --- many probesets Probes from 600 bps near 3’ end Probes from each putative exon Probeset has 11 PM, 11 MM probesProbeset has 4 PM probes 54,000 probesets1.4 Million probesets, 6 M features Average16 probes per RefSeq geneAverage147 probes per RefSeq gene

Exon Array Probesets Classified by Annotational Confidence Core probesets target exons supported by RefSeq mRNAs. Extended probesets target exons supported by ESTs or partial mRNAs. Full probesets target exons supported purely by computational predictions.

2. Background modeling: predict non- specific hybridization from probe sequence Wu and Irizarry (2005) use probe effect modeling to obtain more accurate expression index on 3’ arrays Johnson et al (2006) use probe effect modeling to detect ChIP peaks for Tiling arrays Kapur et al (2007) use probe effect modeling to correct background for Exon array

Background modeling in Exon Arrays logB i = α*n iT + ∑ β jk I ijk + ∑ γ k n ik 2 + ε i Estimate parameters from either –Background probes (n = 37,687) –Full probes (n = 400,000) test on a different array (with single scaling constant) Full probes useful for modeling background

Arraystem cell mPromoter R 2 exon array R 2 H9-38-3B H9-38-3C H9-38-3CM_ H9-38-7B H9-39-7B H9-41-7B H9-43-3B H9-43-7B Promoter array may be used to train exon array background

Preliminary conclusions Background correction based on background probe effect modeling can greatly reduce background noise Model parameters are similar for different ChIP- DNA samples, or for different RNA samples, but not across DNA and RNA. The data may be rich enough to support learning of more complex models with even better predictive power.

3. Probe selection and expression index computation

Probes Samples Core probes Gene-level visualization: Heatmap of Intensities major histocompatibility complex, class II, DM beta

Heatmap of Pairwise Correlations Probes HLA_DMB

First observations Heapmap of correlations is a useful complement to heatmap of intensities Core probes have higher intensity than extended and full probes

Probe selection for gene-level expression Most full and extended probes are not suitable for estimating gene-level expression –Probes may target false exon predictions Even some core probes may not be suitable –Bad probes with low affinity, or cross-hybridize –Probes targeting differentially spliced exons Probe selection –Selecting a suitably large subset of good probes targeting constitutively spliced regions of the gene –Use only to selected probes to estimate gene expression

_____________ ________________________ _____________ constitutive alternatively spliced constitutive Heatmap of CD44 core probes (Ordered By Genomic Locations)

ataxin 2-binding protein 1

These examples motivated our Probe Selection Strategy Probe selection procedure (on core probes) –Hierarchical clustering of the probe intensities across 11 tissues (33 samples), and cut the tree at various heights (0.1,0.2,…1.0). –Choose a height cutoff to strike a balance between the size of the largest sub-group and the correlation within the sub-group. –Iteratively remove probes if they do not correlate well with current expression index –At least 11 core probes need to be chosen. –If the total number of core probes is less than 11 for the entire transcript cluster, we skip probe selection. (Xing Y, Kapur K, Wong WH. PLoS ONE ;1:e88)

Hierarchical Clustering of CD44 Core Probes (distance=1-corr, average linkage) h= (42%) probes

Computation of gene level expression index Background correction Normalization Probe selection Computation of Overall Gene Expression Indexes GeneBASE: Gene-level Background Adjusted Selected probe Expression Download: Xing, Kapur, Wong, PLoS ONE, 1:e88, 2006 Kapur, Xing, Wong, Genome Biology, 8:R82, 2007 (linear scaling or none) (dChip type model) Gene level quantile normalization optional

In most cases selection does not affect fold changes

spectrin, beta, non-erythrocytic 4 (SPTBN4) Sometimes, selections change fold-change significantly BetaIV spectrins are essential for membrane stability and the molecular organization of nodes of Ranvier along neuronal axons

4. Evaluations of gene level index

Before selection After selection Fold-change of liver over muscle, in 438 genes with high fold-change in 3’ expression array data 1 st evaluation: tissue fold change

Before selection After selection Probe selection allows more sensitive detection of fold-changes Zoom-in

Before selection After selection FC of muscle over liver, in 500 genes detected to be overexpressed in muscle over liver by 3’ array

Before selection After selection Zoom-in FC of muscle over liver

2 nd evaluation: Presence/Absence calls Use SAGE data to construct gold-standard Presence in tissue if 100 tags per million Absence if no tags in given tissue but >100 tpm in at least another tissue Exon array A/P calls: use sum of z-scores for core probes (z-score is computed based on background model)

(a) (b) (c) Cerebellum Heart Kidney ROC curves shows that background correction improves A/P calls. Red: Exon, Z-score call Blue: Exon Affy call Brown: 3’ Affy call, max probeset Purple: 3’ Affy call, min probe set

3 rd evaluation: Cross-species conservation 3’ and Exon array data for six adult tissues in both human and mouse Expression computed for about 10,000 pairs of human-mouse ortholog pairs

3’ arrays Exon arrays Similarity of gene expression profiles in six human tissues and six corresponding mouse tissues. For each ortholog pair we calculated the Pearson correlation coefficient (PCC) of expression indexes across six tissues (solid line). We also permutated ortholog relationships and calculated the PCC for random human-mouse gene pairs (dashed line). (Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH. Mol Biol Evol. April 2007)

3’ arrays correlationsExon arrays correlations 3’ arrays scatter plotExon arrays scatter plot Exon arrays also reveal conservation of absolute abundance of transcripts in individual tissues!

4 th evaluation: q-PCR On log scale, exon array fold change estimate is correlated with qPCR fold change (corr = 0.9)

5. Issues in exon level analysis

Challenges The experimental validation rate in several published exon array studies are highly variable. –Gardina et al. BMC Genomics 7:325, 21% –Kwan et al. Genome Res 17:1210, 45% –Hung et al. RNA 14:284, 22%-56% –Clark et al. Genome Biol 8:R64, 84%. Most exons are targeted by no more than four probes. No probes for splice junctions. Noise in observed probe intensities (due to background, cross-hybridization) can make the inferred splicing pattern unreliable.

MADS: Microarray Analysis of Differential Splicing 1. Correction for background (non- specific hybridization) 2. Probe selection and expression index calculation 4. Detection of differential splicing 3. Correction for cross- hybridization 1. Kapur, Xing, Wong, Genome Biology, 8:R82, Xing, Kapur, Wong WH. PLoS ONE ;1:e88 3. Xing et.al., 2008, RNA, 2008, 14(8):

Splicing Index: Corrected Probe Intensity Estimated Gene Expression Level

Analysis of “gold-standard” alternative splicing data via PTB knockdown experiments Our “gold-standard” - a list of exons with pre-determined inclusion/exclusion profiles in response to PTB depletion (Boutz P, et.al. Genes Dev. 2007, 21(13): ) We used shRNA to knock-down PTB, generated Exon array data, and analyzed data on “gold- standard” exons. MADS detected all exons with large changes (>25%) in transcript inclusion levels, and offered improvement over Affymetrix’s analysis procedure. Collaboration with Douglas Black (UCLA) Boutz P, et.al. Genes Dev. 2007, 21(13):

MADS sensitivity correlates with the magnitude of change in exon inclusion levels of “gold-standard exons” Xing et.al., 2008, RNA, 2008, 14(8):

Exon array detection of novel PTB- dependent splicing events control shRNA knockdown of splicing repressor PTB

Detection of alternative 3’-UTR and Poly-A sites of Ncam1 30 differentially spliced exons were tested; 27 were validated. Validation rate: 27/30=90%

Cross-Hybridization Probes are designed to hybridize to their target transcripts Often probes have 0,1,2,3 base pair mismatches to non-target transcripts Cross-hyb seriously complicates exon- level analysis.

Mapping mismatches to probes 6,000,000 probes Each 25bp long 3,000,000,000bp genome sequence For 1-bp mismatch, a naïve search needs O(6M x 3G x 25) ~ years of CPU time Fast matching algorithm (by Hui Jiang) makes this feasible in hours

Distribution of Number of Cross-hyb Transcripts 0 Trans.1 Trans.2 Trans.3 Trans.≥ 4 Trans. 0 bp bp bp bp bp Full Probes 0 Trans.1 Trans.2 Trans.3 Trans.≥ 4 Trans. 0 bp bp bp bp bp Core Probes

Correction of sequence-specific cross- hybridization to off-target transcripts PAN3 Estimated expression levels of off-target transcripts of EEF1A1 Intensities of four probes of the target exon of PAN3

Conclusion Gene level index is accurate and reflects absolute abundance We show that sequence-specific modeling of microarray noise (background and cross-hybridization) improves the precision of exon- level analysis of exon array data. Overall, our data demonstrate that exon array design is an effective approach to study gene expression and differential splicing. Development of future “probe rich” exon arrays, with increased probe density on exons and inclusion of splice junction probes, will offer more powerful tools for global or targeted analysis of alternative splicing.