Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Experiment Design for Affymetrix Microarray.
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.
Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
A Statistical Framework for the Design of Microarray Experiments and Effective Detection of Differential Gene Expression by Shu-Dong Zhang, Timothy W.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Alternative Splicing As an introduction to microarrays.
Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
BWBmin Administrative Web Interface for Paracel BioView WorkBench Frances Tong Marc Rieffel, PhD Paracel Southern California Bioinformatics Summer Institute.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Preprocessing
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University
Multiple testing in high- throughput biology Petter Mostad.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Comparison of Microarray Data Generated from Degraded RNA using Five Different Target Synthesis Methods and Commercial Microarrays Scott Tighe and Tim.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Microarray Data Analysis The Bioinformatics side of the bench.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Estimation of Gene-Specific Variance
Differential Gene Expression
Microarray Technology and Applications
Significance Analysis of Microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Presentation transcript:

Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D & Jim Breaux, Ph.D Southern California Bioinformatics Institute Summer 2005 Funded By NSF/NIH

Outline Our Task Statistical Analysis with a Small Number of ReplicatesStatistical Analysis with a Small Number of Replicates Functional AnalysisFunctional Analysis Additional ProjectsAdditional ProjectsBackground Affymetrix GeneChip ® Microarrays Affymetrix GeneChip ® Microarrays VMAxS VMAxS Steps in Microarray Data Analysis Steps in Microarray Data Analysis

Affymetrix GeneChip ® Microarrays Affymetrix GeneChip ® Microarrays FOR MORE INFO Probes define one gene Signal detection. Signal detection. Fluorescence detection of hybridization between RNA target and oligonucleotide probe.

Each gene on an Affy chip is represented by a probe set FOR MORE INFO... “Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA”Roger Bumgarner “Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA”(Roger Bumgarner University of Washington University of Washington). Perfect Match (PM) probe represents short segment of gene of interest. Perfect Match (PM) probe represents short segment of gene of interest. Mismatch (MM) probe measures background signal Mismatch (MM) probe measures background signal Data for probe set is summarized into single number (“gene-level” data) Data for probe set is summarized into single number (“gene-level” data)

 ViaLogy ’ s data analysis service for DNA microarray chip data  Employs Quantum Resonance Interferometry technology to detect signals below background noise FOR MORE INFO... Visit Vialogy.com. Raw Data

Steps in Microarray Data Analysis Raw Data Image Image Analysis (extract cell-level data) VMA x S Gene-level summarization Normalization (remove non-biological variation)Statistical Analysis (select differentially expressed genes) Functional Analysis (identify affected processes and pathways)

Statistical Analysis with a Small Number of Replicates  Overall objective: Perform end-to-end analysis on a client’s microarray data set (from raw image to pathway analysis)  Problem: Dataset contained a small number of replicates Overview

Problem with small number of replicates Small number of replicates yields unreliable identification of gene variances FOR MORE INFO... Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays (Nitin et al.) With seven replicates, we are more confident that gene 1 is upregulated

Approach to dealing with a small number of replicates  Analyze a larger data set that has a good number of replicates (n = 8x8). –Assume this is the “truth”  Analyze a randomly selected subset of this data set (n = 3x3) using three different algorithms.  Compare output from 8x8 analysis to 3x3 analysis. –Decide how to analyze client’s data set based on results

Statistical Analysis Algorithms  SAM: Significance Analysis of Microarray (Tusher, Tibshirani & Chu)  J-Score (Jim Breaux)  Cyber-T (Baldi & Long)

SAM  Each gene receives a score based on the difference in average gene expression relative to the standard deviation of the repeated measurements.  Genes with scores greater than a threshold are considered significant.  This threshold is determined by the false discovery rate the user desires. FOR MORE INFO... Significance analysis of microarrays applied to the ionizing radiation response(Tusher et al)

J-Score  Each gene receives a score based on average fold-change in gene expression relative to the standard deviation of the repeated measurements.  Cut-off for selection of “significant” genes is arbitrary.

Cyber-T (Baldi & Long) Cyber-T ‘Regularized t-test’  “Assumes genes of similar expression levels have similar measurement errors.  The variance of any single gene can be estimated from the variance from a number of genes of similar expression level.  The variance of any gene within any given treatment can be estimated by the weighted average of a prior estimate of variance for that gene.” FOR MORE INFO... Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework (Long et al).

 At 1% False Discovery Rate (FDR) SAM 8x8 picked up 762 significant genes (estimated number of false significant genes = 8).  Agreement between SAM 8x8 and the top 1000 genes from the 3x3 methods: Results: Comparison between SAM 8x8 and 3x3 methods

 Venn Diagram: Results: Comparison between 3x3 methods Union of all three methods = 433 unique genes

 Agreement between any two methods:  These findings are consistent with a previous study by a group at NIH :  These findings are consistent with a previous study by a group at NIH (Hosack et al.): –Found that agreement between various methods tested ranged from 7% to 60%. Results: Comparison between 3x3 methods

Possible Approaches for Final Analysis  Method 1: Final set of significant genes is derived from the method that had the most overlap with SAM 8x8 (J-Score).  Final result: –1000 total significant genes –At most 356 true positives –At most 652 false positives  Pro: –Decent number of true positives  Con: –Large number of false positives –Might be missing important genes found by other two methods

Possible Approaches for Final Analysis  Method 2: Final set of significant genes is the intersection of the three methods.  Final result: –174 total significant genes –At most 174 true positives –At most 8 false positives  Pro: –Lowest number of false positives  Con: –Lowest number of true positives

Possible Approaches for Final Analysis  Method 3: Final set of significant genes is the union of the three methods  Final result: –1631 total significant genes –At most 433 True positives –At most 1206 False positives  Pro: –Highest number of true positives.  Con: –Highest number of false positives

Final Approach  Return the largest number of true positives to the client (Method 3).  To deal with large number of potential false positives in the results, we rank each gene based on the ranking from Cyber-T, J-Score, and SAM methods. –For example, if “Gene 02” is ranked number 2 in Cyber- T, number 3 in J-Score, and number 4 in SAM, then the overall ranking is ( ) / 3 = 3 –Higher ranking = more likely to be true positive

Example Output of Our Approach

Functional Analysis FOR MORE INFO... Mapping to biological processes. Mapping to biological processes. - EASE, the Expression Analysis Systematic Explorer from the National Institute of Allergy and Infectious Diseases at the National Institute of Health. Mapping to pathways. Mapping to pathways. - PathwayAssist software from Ariadne Genomics.

Mapping to biological processes The list of up and down regulated genes were inserted into EASE. The list of up and down regulated genes were inserted into EASE. The Lower the EASE score the more highly the ranked process is. The Lower the EASE score the more highly the ranked process is. Example of the top 14 processes, locations and functions found from our significant genes. Example of the top 14 processes, locations and functions found from our significant genes.

Mapping to pathways Gene 1, 2 and 3 are significant up- or down- regulated genes by our combination methodGene 1, 2 and 3 are significant up- or down- regulated genes by our combination method Investigation of gene 1 reveals gene 2 and 3 are involved in gene 1’s pathway.Investigation of gene 1 reveals gene 2 and 3 are involved in gene 1’s pathway. Gene 2 Gene 1 Gene 3

Conclusion  Three algorithms for selecting differentially expressed genes produced different lists of genes with ~60% to 70% agreement.  Taking the union of the results from the three algorithms yielded the most true positives for our client.  Biological processes and pathways found through functional analysis correspond to what we expected based on samples studied. –Helps to make microarray results more believable.

Additional Projects: Chris’s GUI  Automation of the previously discussed analyses with a GUI.

Chris’ GUI project

Chris’ GUI project screen 2

Additional Projects: Dhonam’s GUI  ViaLogy has individual scripts that are used to test quality of VMAxS output.  Current implementation requires working knowledge of R scripting.  Project: implement a user-friendly GUI program to execute multiple QC tests.

Dhonam’s GUI Project Screen 1

Dhonam’s GUI Screen 2

Dhonam’s GUI Screen 3 Optional window pops up if default parameters are not desired

Acknowledgements  Dr. Sandra Sharp  Dr. Wendie Johnston  Dr. Jamil Momand  Dr. Nancy Warter-Perez  Other SoCalBSI Staff and Faculty  SoCalBSI 2005 Participants  Lien Chung (SoCalBSI Participant 2004)  Dr. Cecilie Boysen  Dr. Jim Breaux  Other ViaLogy Employees SoCalBSIViaLogy

References  Hosack DA, Dennis GJ, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE.Gen ome Biol 2003, 4:R70.  Leslie M. Cope, Irizarry RA, Jaffee HA, Wu J, Speed, TP. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 2004;20:323–331  Long, A.D., Mangalam, H.J., Chann, B.Y.P., Tolleri, L., Hatfield, G.W., and Baldi, P. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. The Journal of Biological Chemistry 276(23):  Nitin Jain, Jayant Thatte, Thomas Braciale, Klaus Ley, Michael O'Connell, Jae K. Lee: Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics 19(15): (2003)  Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA (Roger Bumgarner )  Saviozzi S, Calogero RA Microarray probe expression measures,. data normalization and statistical validation. Comparative and Functional Genomics Comp Funct Genom 2003; 4: 442– 446.Conference review  Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, PNAS, 98,     