Download presentation
Presentation is loading. Please wait.
Published byJoan Shields Modified over 8 years ago
1
Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu
2
Contents Recap what I have done past Introduction to the dataset from Scott Powers Statistical model used Results Summary and future directions
3
Results from Previous Talks Notes: 1. Values in parentheses are numbers of proteins in SwissProt (Column 2) or coverage in SwissProt (Column 3) 2. Coverages in column 3 were calculated by dividing numbers in column 2 by total number of HPRD entries (25205) or SwissProt entries (14446) 3. Citations: 1). Joshi-Tope G. et al. Nucleric Acids Res.33 : D428-32 (2005) 2). Huaiyu Mi et al. Nucleric Acids Res. 35: D247-D252 (2007) 3). http://cancer.cellmap.org 4). http://www.inoh.orghttp://www.inoh.org 5). http://pid.nci.nih.orghttp://pid.nci.nih.org 6). http://www.genome.jp/kegghttp://www.genome.jp/kegg
4
Results from Previous Talks Naïve Bayes Classifier
5
Dataset from Scott Powers Lung cancer samples or cell lines: 135 Amplified fragments: 365 Genes contained by fragments: 3900 Question: How to find statistically significant pathways for these genes?
6
A Simple Model Binomial Test Bonferroni Correction ?
7
Results from Simple Model
8
Bonferroni Correction: P-values 536 536: number of pathways
9
How to consider frequencies? To consider frequencies, a new list of genes was generated: genes were counted multiple times based on frequencies E.g.: OR2T29 14, MYC 9, etc. Total numbers: 5717 Redundant SetNon-redundant Set
10
Results from Simple Model - Redundant Set Bonferroni Correction cannot make any difference!
11
Permutation Based Model Sampling genes Binomial test Filtering out hit pathways based on cut-off value 1000 Counting occurrences of pathways Generating a mapping file Binomial test of actual sample Correcting sample p values using mapping file Choosing cut-off p-value
12
Sampling Genes Chromosome segment based: Using a fixed length to sample a chromosome based on CNV information Example: Chromosome 1
13
One Run 2.9E-07 B cell receptor signaling pathway(I) 3.0E-07
14
B cell receptor signaling pathway(I) p value: < 0.001
15
Another Run 2.7E-06 TGFBR(C) 3.0E-06
16
TGFBR(C) 2TGFBR(C) p value: 0.002
17
Significantly Hit Pathways - Non-redundant Set
18
Significantly Hit Pathways - Redundant Set
19
Results from A Simple Sampling Sampling: Randomly pick 3900 genes from all human genes
20
Summary A framework has been built to look for statistically significant pathways for a list of genes Using this framework, we found several pathways linking to the gene set from lung cancer CNVs However, relationships among these hit pathways and genes in these pathways need further investigations.
21
Future Directions Validate the predicated results: Pick disease-related gene sets with known pathways (e.g. Type 1 diabetes) Develop a web based application to deploy the combined network to end users. Develop methods based on the Graph theory to explore relationships among genes in hit pathways: protein interaction data will be used as bridges to traversal different pathways.
22
Reference Osier, MV, Zhao, H and Cheung, KH: Handling multiple testing while interpreting microarrays with the Gene Ontology Database. BMC Bioinformatics 2004, 5: 124
23
Thanks!!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.