Download presentation
Presentation is loading. Please wait.
1
Using Gene Ontology Models and Tests Mark Reimers, NCI
2
Outline What we might gain by using annotations What we might gain by using annotations Models for group effects Models for group effects Enrichment of selected genes Enrichment of selected genes Chi-square and Fisher test Chi-square and Fisher test Group scores Group scores Overlap in hierarchical annotations Overlap in hierarchical annotations
3
Why Use Annotations Goal: How to identify biological processes or biochemical pathways that are changed by treatment Goal: How to identify biological processes or biochemical pathways that are changed by treatment Common procedure: select ‘changed’ genes, and look for members of known function Common procedure: select ‘changed’ genes, and look for members of known function Problem: moderate changes in many genes simultaneously will escape detection Problem: moderate changes in many genes simultaneously will escape detection New approach: start with a vocabulary of known GO categories or pathways, and look for coherent changes New approach: start with a vocabulary of known GO categories or pathways, and look for coherent changes Variations: look for chromosome locations, or protein domains, that are common among many genes that are changed Variations: look for chromosome locations, or protein domains, that are common among many genes that are changed
4
Statistical Methods How likely is it that the set of ‘significant’ genes will include as many from the category, as you see? How likely is it that the set of ‘significant’ genes will include as many from the category, as you see? Two-way table: Two-way table: Fisher Exact test Fisher Exact test handles small categories better handles small categories better How to deal with multiple categories? How to deal with multiple categories? 8112 4212,500 CategoryOthers On list Not on list
5
GoMiner: Leverages the Gene Ontology (Zeeberg, et al., Genome Biology 4: R28, 2002)
6
P-values for Tests About 3,000 GO biological process categories About 3,000 GO biological process categories Most overlap with some others Most overlap with some others p-values for categories are not independent p-values for categories are not independent Permutation test of all categories simultaneously in parallel Permutation test of all categories simultaneously in parallel
7
Gene Set Expression Analysis Ignore for the moment the ‘meaning’ of the p-value: consider it just as a ranking of S/N Ignore for the moment the ‘meaning’ of the p-value: consider it just as a ranking of S/N between group difference relative to within-group between group difference relative to within-group If we select a set of genes ‘at random’, then the ranking of S/N ratios should be random If we select a set of genes ‘at random’, then the ranking of S/N ratios should be random ie. a sample from a uniform distribution ie. a sample from a uniform distribution Adapt standard (K-S) test of distribution Adapt standard (K-S) test of distribution
8
Continuous Tests Model: all genes in group contribute roughly equally to effect Model: all genes in group contribute roughly equally to effect Test: for each group G Test: for each group G Compare z to permutation distribution Compare z to permutation distribution More sensitive under model assumptions More sensitive under model assumptions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.