A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Getting the numbers comparable
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Statistical Methods for Analyzing Ordered Gene Expression Microarray Data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Computational Analysis of USA Swimming Data Junfu Xu School of Computer Engineering and Science, Shanghai University.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
P. falciparum Life Cycle & Pathogenesis of Malaria Miller et al., Nature  Molecular and genetic.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
1 Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms Critical Assessment.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Statistics for Differential Expression Naomi Altman Oct. 06.
A Method for Analyzing Time Course Multi-factor Expression Data with Applications to A Burn Study Baiyu Zhou Department of Statistics Stanford University.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
DAVID Bioinformatics Web Site 2012 – 2015 David Huang, MD LMS/CCR/NCI
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
David Amar, Tom Hait, and Ron Shamir
Data Mining – Intro.
Fig. 1. — The life cycle of S. papillosus. (A) The life cycle of S
Microarray Experiment Design and Data Interpretation
Biological networks CS 5263 Bioinformatics.
Parimal Samir1, Rahul2, James C. Slaughter3, Andrew J. Link1,4,5, *
Zebrafish, C. elegans and Human Polycystic Kidney Disease
Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.
Analysis of Data Graphics Quantitative data
Pathweavers Elizabeth McClellan Ribble, Ph.D.
Loyola Marymount University
The Most Informative Spacing Statistic Identifies Biologically Relevant Patterns in Transcript Level Distributions Stan Pounds Department of Biostatistics.
Volume 33, Issue 1, Pages (July 2010)
Schedule for the Afternoon
Inferring Connection Maps from AfCS Experimental Data and
by Andrea J. O'Hara, Ling Wang, Bruce J. Dezube, William J
Volume 12, Issue 7, Pages (April 2002)
Getting the numbers comparable
LincRNAs expressed in specific subpopulations of mESCs and NPCs.
Transcriptional Landscape of Cardiomyocyte Maturation
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
SEG5010 Presentation Zhou Lanjun.
Project Work Problem formulation.
Volume 3, Issue 1, Pages (July 2016)
Mapping Gene Expression in Two Xenopus Species: Evolutionary Constraints and Developmental Flexibility  Itai Yanai, Leonid Peshkin, Paul Jorgensen, Marc W.
Genes affected concordantly within the ileum and meta-analysis data sets formed biofunctional clusters of overlapping pathways. Genes affected concordantly.
Volume 6, Issue 5, Pages (May 2016)
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Gene Expression Analysis
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Differential gene expression in whole blood from SJIA patients and healthy controls. A. Data were normalized in Beadstudio using the "average" method and.
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Volume 12, Issue 9, Pages (April 2002)
Loyola Marymount University
Volume 1, Issue 1, Pages (July 2015)
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Loyola Marymount University
Statistical chart of significantly differentially expressed genes
Genetic maintenance of histone acetylation prevents gene expression changes in the promoters of genes responding to acute mtDNA depletion. Genetic maintenance.
Relationship between blood cell and plasma miRNA expression among published circulating cancer biomarkers. Relationship between blood cell and plasma miRNA.
Loyola Marymount University
Loyola Marymount University
Gene expression signature that predicts early molecular response failure in chronic-phase CML patients on frontline imatinib by Chung H. Kok, David T.
Identification of putative TET1 targets in TNBC
Presentation transcript:

A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of Statistics Stanford University 10/06/2008

Outline Data description Brief review: current statistical methods Proposed statistical method Application on Burn data

Data Description Two data sets: (1) burn + gender (2) burn + age (1) burn + gender Burn patients ControlsTotal Male Female Total (2) burn + age Gender effect on burn patients Burn patients ControlsTotalAdult Children Total Age effect on burn patients Gene expression from each patient (blood) was measured at different time points after burn. The data sets are longitudinal (time course) and involve multiple factors (Burn/control; gender or age)

Brief Review : Current methods (1) Time course microarray data analysis Time course clustering. Identify co-expressed genes Ma et al., Nucleic Acids Res Mar 1;34(4): Fit smooth function, use gene specific summary statistic to characterize the significance of change over time or between biological conditions Storey et al., Proc Natl Acad Sci U S A Sep 6; 102(36): Empirical Bayes method to rank differentially expressed genes between biological conditions. Tai et al. Annals of Statistics 34(5), 2387–2412.

Brief Review : Current methods (2) Multifactor microarray data analysis ANOVA for gene selection Pavlidis et al., Methods. 2003;31:282–289. Nonparametric ANOVA, but has restrictions on # of replicates and noise distribution Gao et al., Bioinformatics (12): ; We have developed a non paremetric ANOVA (NANOVA) method and gene classification algorithm for microarray data analysis easily handle balanced/unbalanced experiment design free of distributional assumption estimating FDR robust to outliers Zhou et al., in manuscript There is no existing method for analyzing longitudinal multifactor expression data !

Methodology Let be a gene expression from an individual over p time points. Each individual is associated with two factors (e.g. gender; burn). We want to identify genes : (1)respond differently for male and female burn patients (2) Respond to burn Some genes might respond to burn at : Early stage Late stage Which time point to use? (t1, t2 ….tp or their average ?) We call (1), (2) … ANOVA structures (interaction effect, main effect). In p-dimensional space, there is a direction on which the interested ANOVA structure is most prominent. We first estimate this direction, project data into the estimated direction and perform NANOVA analysis and gene classification algorithm.

Gene Classification We use NANOVA to classify genes into 5 classes by factor effects C1 (interaction): factor effects are dependant C2 (additive): have both factor effects, but factors are independent C3 ( effect): have only effect C4 ( effect): have only effect C5 : no factor effects

Burn Data Analysis Data preprocessing In our analysis, we used two time points : early and middle stage. Only used patients have both data points. Post burn day (min)Post burn day (median) Post burn day (max) Early stage Middle stage Filtering probe sets : CV (coefficient of variation) > 0.5; median expression > 50 # of probe sets# of arrays (patients) # of array (controls) Burn + gender Burn + age

Burn Data Analysis After applying the proposed method, we classified genes (probes) into different gene sets (FDR = 0.05 ) C1 (# of probes) C2 (# of probes) C3 (# of probes) C4 (# of probes) Burn + gender Burn + age Burn effect is dominating Burn effect is dependant on age for a large set of genes gender has a smaller effect than age in burn patients.

C1 Genes Have burn and age/gender effect. Burn effect is dependant on age/gender Red: burn; green: control; circle: adult; triangle: children Each point is a group mean (e.g. burn children)

Top ranking C1 genes : Burn + Gender

Top ranking C1 genes : Burn + Age

C2 Genes Have burn and age/gender effect. Burn effect is independent of age/gender Red: burn; green: control: circle: adult; triangle: children

Top ranking C2 genes : Burn + Gender

Top ranking C2 genes : Burn + Age

C3 Genes Only have burn effect. No age/gender effect Red: burn; green: control: circle: adult; triangle: children

Top ranking C3 genes : Burn + Gender

Top ranking C3 genes : Burn + Age

C4 Genes Only have age/gender effect. No Burn effect Red: burn; green: control: circle: adult; triangle: children

Top ranking C4 genes : Burn + Gender

Top ranking C4 genes : Burn + Age

GO Enrichment Analysis Top ranking pathways in C3 ( Burn + gender)

GO Enrichment Analysis Top ranking pathways in C3 ( Burn + Age)

GO Enrichment Analysis Top ranking pathways in C2 ( Burn + Gender) Top ranking pathways in C2 ( Burn + Age)

GO Enrichment Analysis Top ranking pathways in C1 ( Burn + Age)

A Few Interesting Pathways Some pathways are important for burn patients. Although they don ’ t have gender difference, they are very different in adults and children patients.

Interpretation of Projection Direction The projection direction is gene specific The following 4 genes are from C3 ( Burn + Gender) Burn effect is most prominent: (1) At early stage (2) At middle stage (3) on the average of the two stages (4) on the change of the gene expression between early stage and middle stage The projection direction contains temporal information of gene expression (1) which time points are important (2) what kind of patterns (e.g. average or change) are important

Temporal Information in Projection Direction We did GO analysis on 200 probe sets from C3 (Burn + Gender), which have (1) strong early stage signals or (2) Strong middle stage signals (1)Enriched in acute response genes: kinase cascade, immune response …… (2)Enriched in DNA repair, metabolism, cell cycle genes ……

Temporal Information of Pathways Projection direction contains temporal information about pathways Example 1:T cell receptor signaling pathway ( C3 of Burn + Gender) Most genes cluster together. Projection direction indicates importance in both early and middle stage

Temporal Information of Pathways Example 2:Hematopoietic cell lineage ( C3 of Burn + Gender) Most genes form sub clusters. It might be interesting to analyze these two sub clusters of genes.

Summary A new approach to analyze longitudinal mutifactor expression data (1) Classify genes into different gene sets based on factor effects, suited for explorative study (2) The projection direction contains temporal information Application on burn data pointed out some important genes/pathways and their roles in male/female or adult/children burn patients.

References Ma et al., Nucleic Acids Res Mar 1;34(4): Storey et al., Proc Natl Acad Sci USA Sep 6; 102(36): Tai et al. Annals of Statistics 34(5), 2387–2412. Pavlidis et al., Methods. 2003;31:282–289. Gao et al., Bioinformatics (12): Anderson et al., Ann. Statist. Volume 13, Number 2 (1985) Dennis et al., Genome Biology 2003; 4(5):P3

Acknowledgement Wing Wong Weihong Xu, Wenzhong Xiao Ted Anderson