Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.

Similar presentations


Presentation on theme: "Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc."— Presentation transcript:

1 Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.

2 Exon Array Design Strategy GeneChip® Human Exon 1.0 ST  All content is projected onto the genome  Content has hard edges and soft edges: – Hard edges partition regions into multiple probe selection regions – Soft edges infer a probe selection region, but can be extended into a larger region by other content  Hard Edges – Internal splice site boundaries – PolyA sites – CDS Start and Stop Positions  Soft Edges – Transcript start and stop positions (except when there is evidence of a PolyA site) – Internal splice site boundaries for aligned cDNAs when there are unaligned cDNA bases – All splice site boundaries from syntenic cDNA content  Introducing some new concepts: – Probe Selection Region (PSR) – Exon cluster – Transcript cluster (gene locus)

3 Probe Coverage Exon vs 3’ Array Gene Coverage RefSeq HG-U133 2.0 Plus Human Genome 1.0 ST

4 Content Sources GeneChip® Human Exon 1.0 ST  Core Gene Annotations – RefSeq alignments – GenBank annotated full length alignments  Extended Gene Annotations – cDNA alignments – Ensembl annotations (Hubbard, T. et al.) – Mapped syntenic mRNA from rat and mouse – microRNA annotations – MitoMAP annotations – Vegagene (The HAVANA group, Hillier et al., Heilig et al.) – VegaPseudogene (The HAVANA group, Hillier et al., Heilig et al.)  Full Gene Annotations – Geneid (Grup de Recerca en Informàtica Biomèdica) – Genscan (Burge, C. et al.) – GenscanSubopt (Burge, C. et al.) – Exoniphy (Siepel et al.) – RNAgene (Sean Eddy Lab) – SgpGene (Grup de Recerca en Informàtica Biomèdica) – Twinscan (Korf, I. et al.)

5 Probes per RefSeq Transcript >= 10 Probes1984998.40% >= 20 Probes1854191.9 % >= 30 Probes1564577.60% >= 40 Probes1278963.40% >= 50 Probes986848.90% HG-U133 Plus 2.0

6 Gene Level Summaries  With exon arrays we can combine exon-level probesets to obtain better gene- level estimates. – More probes for greater sensitivity – Gene level signal estimates based on expression throughout the locus rather than a single point – Simplified bioinformatics – More flexibility in restructuring probe groupings based on expert knowledge  There is a variety of well established tools (including R/BioConductor) and methods for secondary analysis of gene level array data  Challenge – Non-constitutive exons – Discovery/Speculative content

7 Gene Level Analysis on Exon Arrays  Sketch Normalization (Quantile-like)  PM-GCBG  IterPLIER – using Extended Meta Probeset File groupings  Users may want to do post summarization operations: – Normalization – Log transform – Variance stabilization by adding positive bias (ie PLIER+16)

8 Different Meta Probeset Lists Core-Constitutive

9 IterPLIER  Start by generating PLIER signal estimate using all the probes  Pick 22 probes which are best correlated to the PLIER signal  Run PLIER on just the 22 probes  Pick 11 probes which are best correlated to the PLIER signal  Generate a final PLIER estimate with the 11 probes  Corollary: – If the meta probeset has 11 or fewer probes, then only 1 run of PLIER is performed and the result is equal to a regular PLIER result – If the meta probeset has more than 11 but 22 or fewer probes, then PLIER is run twice: once on the full set of probes and once on the best 11

10 Correlation of Different Gene Level Estimates

11 Adding Low-signal Decoys Correlation with original estimates as Genscan Subopt probesets are added. (996 loci with 4-11 probesets) Regular PLIER Iterative PLIER Correlation with original estimates as mRNA probesets are added. (996 loci with 4-11 probesets)

12 Gene Level Performance HuEx 1.0 ST vs HG-U133 Plus 2.0

13 Platform Concordance % Probe Set Pairs vs. Correlation Coefficient (1-way ANOVA p <= 10 -8 ) ~60% of matched probe sets have correlation ≥ 0.8

14 High Correlation: GLYAT: r=0.9902 Log2(sig+16)

15 Moderate Correlation: TSN: r=0.6575

16 Poor Correlation: SREBF1: r=0.0482

17 Platform Gene Level Sensitivity # Exons % Significant Probesets HG-U133 Plus 2.0 (21% overall) Human Exon 1.0 ST (23% overall)

18 One Array, Two functions Gene Level Expression and Transcript Diversity

19 TPM2 Heart Muscle

20 Data Courtesy of Millennium

21

22 “Splicing Index” defined

23 Splicing Index Examples

24 Alternative Splicing Detection  PAttern based Correlation (PAC) – Test whether exons correlate with each other  ANOVA based (MiDAS) – Test a log-linear model  For more information see the Alternative Transcript Analysis Methods for Exon Arrays whitepaper: – http://www.affymetrix.com/support/technical/whitepapers/exon_alt_transcript_analysis_whitep aper.pdf e i,j,k = exon signal for ith probeset, k tissue, j gene g i,k = gene signal for k tissue and j gene a i,k = log coupling for exon and gene signals

25 ROC Curves  PAC method not suitable for a two group data set  No filter on input data  Synthetic Data – Tissues – mix exons across genes – Cancer – mix in low expression exons

26 Alternative Splicing Detection Active Area of Research  Exon Array Workshop – 45 attendees – 11 presentations – New alternative splicing algorithms – New confidence in using Exon Arrays for Gene-Level expression profiling – New directions for filtering data for more robust results  http://www.affymetrix.com/corporate/eve nts/2006_exon_tiling_workshop.affx http://www.affymetrix.com/corporate/eve nts/2006_exon_tiling_workshop.affx

27 Resources  Human, Mouse, & Rat array content and annotation information – Array Support Page on Affymetrix.com  Various Analysis Whitepapers – Array Support Page on Affymetrix.com  Sample Data Sets – Sample Data section under Support – Colon cancer data set with 10 paired samples – Tissue data set  11 tissues in triplicate  4 different mixture levels for 3 tissues  Includes HG-U133 Plus 2.0 and Human Exon 1.0 ST  Analysis Software – Affymetrix Power Tools (APT) – ExACT


Download ppt "Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc."

Similar presentations


Ads by Google