Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection.

Similar presentations


Presentation on theme: "Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection."— Presentation transcript:

1 Multivariate Analysis of Pathways

2 Multivariate Approaches to Gene Set Selection

3 Key Multivariate Ideas PCA (Principal Components Analysis) SVD (Singular Value Decomposition) MDS (Multi-dimensional Scaling) Hotelling T 2

4 PCA Three correlated variables PCA1 lies along the direction of maximal correlation; PCA 2 at right angles with the next highest variation.

5 Multivariate Representation of Pathways BAD pathway Normal IBC Other BC Clear separation between groups Variation differences

6 Compute distance between sample means using (common) metric of covariation Where Multidimensional analog of t (actually F) statistic Hotelling’s T 2

7 Principles of Kong et al Method Normal covariation generally acts to preserve homeostasis The transcription of genes that participate in many processes will be changed The joint changes in genes will be most distinctive for those genes active in pathways that are working differently

8 Critiques of Hotelling’s T Small samples: unreliable  estimates –N < p Estimates of  x and  not robust to outliers Assumes same covariance in each sample –   =   ? Usually not in disease –Kong et al propose analog of Welch t-test –Permutation in samples for significance

9 Making it Stable 1.Insufficient information to capture all relationships – too much correlation! –Power of Hotelling’s method comes from identifying directions of rare variation –Many (spurious) directions of 0 variation 2.Random variation in data leads to random variation in PCA Regularization strategy: force covariance to be more like IID

10 Making it Robust Microarray data has many outliers Multivariate methods are very much distorted by outliers Robust estimates of covariance could give robust PCA Simple approach: trim outliers

11 Handling Changes of Covariance Power of Hotelling’s method comes from identifying directions of rare variation If one group shows little covariation in one direction but the other does – how to test for changes? If one group is control then its rare covariance changes should be taken as standard –Robust measure of means in both groups

12 Detecting changes of covariance

13 Meaning of Covariance Change Meaning of covariance across individuals –Homeostasis in face of individual variation –e.g. BAD pathway: largest loadings of PC1 on PRKARB & ADCY1 –PRKARB represses CREB1; ADCY activates CREB1 Gene sets whose covariance diminishes may –be responding to different inputs –have escaped their usual regulatory control Characteristic of cancers

14 Testing Covariance Changes Idea: directions of small variation in one should match directions of small variation in other Mathematical approach –Find solutions of S 1 – S 2 –Solutions should all be near 1, if no change –Test statistic: easily computed Computational approach –Ratio of largest to smallest: max / min

15 Network Connectivity Methods

16 Network Topology Connections represent interactions: –Regulatory (one-way) –Protein interaction (two-way) Hubs are genes with many connections Bottlenecks are single genes that connect two parts of a functional network

17 Devising Tests Based on Topology Issues: how to weight more heavily the genes that are hubs How to assess directionality of change How to measure co-operativity (activation or repression changes in appropriate ways)

18 Draghici et. al. Approach Overall measure Effective contribution (perturbation factor)

19 Analysis of Outliers

20 Outliers: Clues to Disease Process? Outliers usually reflect idiosyncratic events Recurrent outliers reflect rare events that are selected If a particular pathway is disrupted in disease, but by many different mechanisms, then the expression profiles should –Lose healthy covariance –Show recurrent outliers How to test for ‘consistent’ outliers? COPA: a method for flagging recurrent outliers in expression data –Finds consistent fusion gene

21 A Test Statistic for Consistent Outliers Ratio of quantile differences to normal variation: (q.90 – q.10 ) tumor /max( (q.9 - q.1 ) normal,0.4) Compare to null distribution by permutation Many genes show much higher ratios

22 Statistical Significance Find false positives confidence limits by permutations Several hundred genes appear significant at 10-20% FDR –Actual scores: 267 scores are greater than 5, where 90% of permutations have fewer than 34 scores over 5

23 A Test for Functional Groups For each group G of genes s G <- sum(scores[G])/sqrt(length(G)) Scores: t-scores or range ratios PAGE (BMC Bioinformatics, 2005)

24 Do Genes Make Sense? Quantile Ratio [1] "DNA replication" [2] "response to pathogenic fungi" [6] "cleavage of lamin" [7] "spindle organization and biogenesis" [15] "response to osmotic stress" [16] "nutrient import" [22] "response to mercury ion" T-test [2] "sodium ion homeostasis" [3] "leukocyte adhesive activation" [4] "positive regulation of calcium-independent cell-cell adhesion" [5] "oxytocin receptor activity" [6] "ADP biosynthesis" [7] "dADP biosynthesis" [10] "regulation of muscle contraction" [11] "caveolar membrane" [12] "response to cold" [16] "stress fiber formation" [18] "positive regulation of complement activation" [19] "astrocyte activation" [22] "regulation of long-term neuronal synaptic plasticity" [24] "positive regulation of endocytosis" [25] "embryonic hemopoiesis"

25 Cancer Functional Groups Do very probable cancer genes show high- discrepancy in few samples? Program: identify genes that might contribute to cancer processes: growth signaling, loss of cell-matrix adhesion, apoptosis 1.Do most samples from these categories show at least one gross mis-regulation? 2.Are they the same genes in most samples?

26 Example: Cell Growth Select genes in GO:001558 ‘regulation of cell growth’ Expect most samples to have at least one very serious mis-regulated gene from this category. Compute maximum aberration score across category

27 Aberrations Aberration score indicated by color: vanilla: 0; red: 4 Nine normals at left No gene misregulated in even 50% of samples BUT: Only a few genes commonly misregulated

28 Simplest Summary Maximum aberration score for samples

29 Testing the Pathway for Outliers Many genes show aberrations in tumor group Null distribution: medians of maxima from randomly selected gene groups of size 37 P <.01 NB. The results for cell-matrix interaction are very similar; angiogenesis not so strong


Download ppt "Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection."

Similar presentations


Ads by Google