Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of.

Similar presentations


Presentation on theme: "A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of."— Presentation transcript:

1 A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of Statistics Stanford University 10/06/2008

2 Outline Data description Brief review: current statistical methods Proposed statistical method Application on Burn data

3 Data Description Two data sets: (1) burn + gender (2) burn + age (1) burn + gender Burn patients ControlsTotal Male661783 Female202141 Total8638124 (2) burn + age Gender effect on burn patients Burn patients ControlsTotalAdult8638124 Children333164 Total11969188 Age effect on burn patients Gene expression from each patient (blood) was measured at different time points after burn. The data sets are longitudinal (time course) and involve multiple factors (Burn/control; gender or age)

4 Brief Review : Current methods (1) Time course microarray data analysis Time course clustering. Identify co-expressed genes Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 Fit smooth function, use gene specific summary statistic to characterize the significance of change over time or between biological conditions Storey et al., Proc Natl Acad Sci U S A. 2005 Sep 6; 102(36):12837-42. Empirical Bayes method to rank differentially expressed genes between biological conditions. Tai et al. Annals of Statistics 34(5), 2387–2412.

5 Brief Review : Current methods (2) Multifactor microarray data analysis ANOVA for gene selection Pavlidis et al., Methods. 2003;31:282–289. Nonparametric ANOVA, but has restrictions on # of replicates and noise distribution Gao et al., Bioinformatics 2006 22(12):1486-1494; We have developed a non paremetric ANOVA (NANOVA) method and gene classification algorithm for microarray data analysis easily handle balanced/unbalanced experiment design free of distributional assumption estimating FDR robust to outliers Zhou et al., in manuscript There is no existing method for analyzing longitudinal multifactor expression data !

6 Methodology Let be a gene expression from an individual over p time points. Each individual is associated with two factors (e.g. gender; burn). We want to identify genes : (1)respond differently for male and female burn patients (2) Respond to burn...... Some genes might respond to burn at : Early stage Late stage Which time point to use? (t1, t2 ….tp or their average ?) We call (1), (2) … ANOVA structures (interaction effect, main effect). In p-dimensional space, there is a direction on which the interested ANOVA structure is most prominent. We first estimate this direction, project data into the estimated direction and perform NANOVA analysis and gene classification algorithm.

7 Gene Classification We use NANOVA to classify genes into 5 classes by factor effects C1 (interaction): factor effects are dependant C2 (additive): have both factor effects, but factors are independent C3 ( effect): have only effect C4 ( effect): have only effect C5 : no factor effects

8 Burn Data Analysis Data preprocessing In our analysis, we used two time points : early and middle stage. Only used patients have both data points. Post burn day (min)Post burn day (median) Post burn day (max) Early stage0.12.210.2 Middle stage10.519.948.6 Filtering probe sets : CV (coefficient of variation) > 0.5; median expression > 50 # of probe sets# of arrays (patients) # of array (controls) Burn + gender606017238 Burn + age649123869

9 Burn Data Analysis After applying the proposed method, we classified genes (probes) into different gene sets (FDR = 0.05 ) C1 (# of probes) C2 (# of probes) C3 (# of probes) C4 (# of probes) Burn + gender517554110180 Burn + age218111832562151 Burn effect is dominating Burn effect is dependant on age for a large set of genes gender has a smaller effect than age in burn patients.

10 C1 Genes Have burn and age/gender effect. Burn effect is dependant on age/gender Red: burn; green: control; circle: adult; triangle: children Each point is a group mean (e.g. burn children)

11 Top ranking C1 genes : Burn + Gender

12 Top ranking C1 genes : Burn + Age

13 C2 Genes Have burn and age/gender effect. Burn effect is independent of age/gender Red: burn; green: control: circle: adult; triangle: children

14 Top ranking C2 genes : Burn + Gender

15 Top ranking C2 genes : Burn + Age

16 C3 Genes Only have burn effect. No age/gender effect Red: burn; green: control: circle: adult; triangle: children

17 Top ranking C3 genes : Burn + Gender

18 Top ranking C3 genes : Burn + Age

19 C4 Genes Only have age/gender effect. No Burn effect Red: burn; green: control: circle: adult; triangle: children

20 Top ranking C4 genes : Burn + Gender

21 Top ranking C4 genes : Burn + Age

22 GO Enrichment Analysis Top ranking pathways in C3 ( Burn + gender) http://david.abcc.ncifcrf.gov/

23 GO Enrichment Analysis Top ranking pathways in C3 ( Burn + Age) http://david.abcc.ncifcrf.gov/

24 GO Enrichment Analysis Top ranking pathways in C2 ( Burn + Gender) http://david.abcc.ncifcrf.gov/ Top ranking pathways in C2 ( Burn + Age)

25 GO Enrichment Analysis Top ranking pathways in C1 ( Burn + Age) http://david.abcc.ncifcrf.gov/

26 A Few Interesting Pathways Some pathways are important for burn patients. Although they don ’ t have gender difference, they are very different in adults and children patients.

27 Interpretation of Projection Direction The projection direction is gene specific The following 4 genes are from C3 ( Burn + Gender) Burn effect is most prominent: (1) At early stage (2) At middle stage (3) on the average of the two stages (4) on the change of the gene expression between early stage and middle stage The projection direction contains temporal information of gene expression (1) which time points are important (2) what kind of patterns (e.g. average or change) are important

28 Temporal Information in Projection Direction We did GO analysis on 200 probe sets from C3 (Burn + Gender), which have (1) strong early stage signals or (2) Strong middle stage signals (1)Enriched in acute response genes: kinase cascade, immune response …… (2)Enriched in DNA repair, metabolism, cell cycle genes ……

29 Temporal Information of Pathways Projection direction contains temporal information about pathways Example 1:T cell receptor signaling pathway ( C3 of Burn + Gender) Most genes cluster together. Projection direction indicates importance in both early and middle stage

30 Temporal Information of Pathways Example 2:Hematopoietic cell lineage ( C3 of Burn + Gender) Most genes form sub clusters. It might be interesting to analyze these two sub clusters of genes.

31 Summary A new approach to analyze longitudinal mutifactor expression data (1) Classify genes into different gene sets based on factor effects, suited for explorative study (2) The projection direction contains temporal information Application on burn data pointed out some important genes/pathways and their roles in male/female or adult/children burn patients.

32 References Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9 Storey et al., Proc Natl Acad Sci USA. 2005 Sep 6; 102(36):12837-42. Tai et al. Annals of Statistics 34(5), 2387–2412. Pavlidis et al., Methods. 2003;31:282–289. Gao et al., Bioinformatics 2006 22(12):1486-1494. Anderson et al., Ann. Statist. Volume 13, Number 2 (1985) Dennis et al., Genome Biology 2003; 4(5):P3

33 Acknowledgement Wing Wong Weihong Xu, Wenzhong Xiao Ted Anderson


Download ppt "A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of."

Similar presentations


Ads by Google