Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Transcriptome Sequencing with Reference
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
RNA-seq: the future of transcriptomics ……. ?
Data Analysis for High-Throughput Sequencing
Transcriptomics Jim Noonan GENE 760.
Microarray GEO – Microarray sets database
Comparative ab initio prediction of gene structures using pair HMMs
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
mRNA-Seq: methods and applications
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
June Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
RNAseq analyses -- methods
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
Analysis of Exon Arrays Slides provided by Dr. Yi Xing.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Variable gene expression and reduced penetrance in familial adenomatous polyposis Rohlin A* 1, Wernersson J 1, Björk J 2, and Nordling M 1 Dept of Clinical.
Exon Array Analysis Changing the Landscape of Gene Expresson Profiling Tzu L. Phang Ph. D. Department of Medicine Division of Pulmonary Sciences and Critical.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
The iPlant Collaborative
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Affymetrix Confidential Transcript Level Expression Profiling from Predicted and Transcribed Sequences with a 5 µm, PM-only Tomato Array.
Glue Grant Human Transcriptome Array. 2 Affymetrix Confidential PNAS (9) ; published ahead of print February 11, 2011, doi: /pnas
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to RNAseq
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Variable gene expression in APC- and MUTYH mutation negative FAP and AFAP patients Rohlin A* 1, Wernersson J 1, Björk J 2, and Nordling M 1 Dept of Clinical.
Cluster validation Integration ICES Bioinformatics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Paper Review on Cross- species Microarray Comparison Hong Lu
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Canadian Bioinformatics Workshops
RNA-Seq analysis in R (Bioconductor)
Volume 19, Issue 5, Pages (May 2017)
Volume 19, Issue 5, Pages (May 2017)
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Sequence Analysis - RNA-Seq 2
Presentation transcript:

Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.

Exon Array Design Strategy GeneChip® Human Exon 1.0 ST  All content is projected onto the genome  Content has hard edges and soft edges: – Hard edges partition regions into multiple probe selection regions – Soft edges infer a probe selection region, but can be extended into a larger region by other content  Hard Edges – Internal splice site boundaries – PolyA sites – CDS Start and Stop Positions  Soft Edges – Transcript start and stop positions (except when there is evidence of a PolyA site) – Internal splice site boundaries for aligned cDNAs when there are unaligned cDNA bases – All splice site boundaries from syntenic cDNA content  Introducing some new concepts: – Probe Selection Region (PSR) – Exon cluster – Transcript cluster (gene locus)

Probe Coverage Exon vs 3’ Array Gene Coverage RefSeq HG-U Plus Human Genome 1.0 ST

Content Sources GeneChip® Human Exon 1.0 ST  Core Gene Annotations – RefSeq alignments – GenBank annotated full length alignments  Extended Gene Annotations – cDNA alignments – Ensembl annotations (Hubbard, T. et al.) – Mapped syntenic mRNA from rat and mouse – microRNA annotations – MitoMAP annotations – Vegagene (The HAVANA group, Hillier et al., Heilig et al.) – VegaPseudogene (The HAVANA group, Hillier et al., Heilig et al.)  Full Gene Annotations – Geneid (Grup de Recerca en Informàtica Biomèdica) – Genscan (Burge, C. et al.) – GenscanSubopt (Burge, C. et al.) – Exoniphy (Siepel et al.) – RNAgene (Sean Eddy Lab) – SgpGene (Grup de Recerca en Informàtica Biomèdica) – Twinscan (Korf, I. et al.)

Probes per RefSeq Transcript >= 10 Probes % >= 20 Probes % >= 30 Probes % >= 40 Probes % >= 50 Probes % HG-U133 Plus 2.0

Gene Level Summaries  With exon arrays we can combine exon-level probesets to obtain better gene- level estimates. – More probes for greater sensitivity – Gene level signal estimates based on expression throughout the locus rather than a single point – Simplified bioinformatics – More flexibility in restructuring probe groupings based on expert knowledge  There is a variety of well established tools (including R/BioConductor) and methods for secondary analysis of gene level array data  Challenge – Non-constitutive exons – Discovery/Speculative content

Gene Level Analysis on Exon Arrays  Sketch Normalization (Quantile-like)  PM-GCBG  IterPLIER – using Extended Meta Probeset File groupings  Users may want to do post summarization operations: – Normalization – Log transform – Variance stabilization by adding positive bias (ie PLIER+16)

Different Meta Probeset Lists Core-Constitutive

IterPLIER  Start by generating PLIER signal estimate using all the probes  Pick 22 probes which are best correlated to the PLIER signal  Run PLIER on just the 22 probes  Pick 11 probes which are best correlated to the PLIER signal  Generate a final PLIER estimate with the 11 probes  Corollary: – If the meta probeset has 11 or fewer probes, then only 1 run of PLIER is performed and the result is equal to a regular PLIER result – If the meta probeset has more than 11 but 22 or fewer probes, then PLIER is run twice: once on the full set of probes and once on the best 11

Correlation of Different Gene Level Estimates

Adding Low-signal Decoys Correlation with original estimates as Genscan Subopt probesets are added. (996 loci with 4-11 probesets) Regular PLIER Iterative PLIER Correlation with original estimates as mRNA probesets are added. (996 loci with 4-11 probesets)

Gene Level Performance HuEx 1.0 ST vs HG-U133 Plus 2.0

Platform Concordance % Probe Set Pairs vs. Correlation Coefficient (1-way ANOVA p <= ) ~60% of matched probe sets have correlation ≥ 0.8

High Correlation: GLYAT: r= Log2(sig+16)

Moderate Correlation: TSN: r=0.6575

Poor Correlation: SREBF1: r=0.0482

Platform Gene Level Sensitivity # Exons % Significant Probesets HG-U133 Plus 2.0 (21% overall) Human Exon 1.0 ST (23% overall)

One Array, Two functions Gene Level Expression and Transcript Diversity

TPM2 Heart Muscle

Data Courtesy of Millennium

“Splicing Index” defined

Splicing Index Examples

Alternative Splicing Detection  PAttern based Correlation (PAC) – Test whether exons correlate with each other  ANOVA based (MiDAS) – Test a log-linear model  For more information see the Alternative Transcript Analysis Methods for Exon Arrays whitepaper: – aper.pdf e i,j,k = exon signal for ith probeset, k tissue, j gene g i,k = gene signal for k tissue and j gene a i,k = log coupling for exon and gene signals

ROC Curves  PAC method not suitable for a two group data set  No filter on input data  Synthetic Data – Tissues – mix exons across genes – Cancer – mix in low expression exons

Alternative Splicing Detection Active Area of Research  Exon Array Workshop – 45 attendees – 11 presentations – New alternative splicing algorithms – New confidence in using Exon Arrays for Gene-Level expression profiling – New directions for filtering data for more robust results  nts/2006_exon_tiling_workshop.affx nts/2006_exon_tiling_workshop.affx

Resources  Human, Mouse, & Rat array content and annotation information – Array Support Page on Affymetrix.com  Various Analysis Whitepapers – Array Support Page on Affymetrix.com  Sample Data Sets – Sample Data section under Support – Colon cancer data set with 10 paired samples – Tissue data set  11 tissues in triplicate  4 different mixture levels for 3 tissues  Includes HG-U133 Plus 2.0 and Human Exon 1.0 ST  Analysis Software – Affymetrix Power Tools (APT) – ExACT