Download presentation
Presentation is loading. Please wait.
Published byBonnie Curtis Modified over 9 years ago
1
1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06
2
2 higher order networks, revealing functional modules Functional modules, which are hierarchically clustered Systems Biology Stydy behaviour of complex biological systems Consider interaction of all cellular/molecular parts Often studied over time Goal: develop models for system understanding Powerful computational tools are required Organisation: "Life's complexity pyramid" (Oltvai & Barabasi, 2002) genes, mRNA, proteins & metabolites regulatory motifs & metabolic pathways
3
3 Thesis aim To develop semantic pathway alignment algorithms for systems biology Three related algorithms GOTEM (GO-based regulatory TEMplates) GOSAP (GO-based Semantic Alignment of biological Pathways) EGOSAP (Evolutionary GO-based Semantic Alignment of biological Pathways)
4
4 Gene Ontology (GO) Gene_Ontology molecular_functionbiological_processcellular_component catalytic activitytransporter activitycellular process developmentcellextracellular Function sub-graphProcess sub-graphComponent sub-graph G1 G2 G3G4 G1 G2 G3 G4 G1 G2 G3 G4 G x = gene product
5
5 Semantic similarity ms(A,B)=D p ms (A,B)=p D =0.10 SS(A,B)=-log 2 (0.10)=3.32 D BC A p=0.10 p=0.03p=0.07 p=0.01 Note: all nodes not shown in graph is-a EXAMPLE: Resnik (1995)
6
6 GOTEM: background 1(2) Highly desirable to derive gene regulatory networks using gene expression data Reverse engineering (RE) algorithms derive a model (set of rules) that fits the data Examples; boolean networks, neural networks, Bayesian networks Limitations of RE algorithms Many derived model networks can fit the same data Few derived networks are actually biologically feasible RE algorithms do not distinguish between biologically plausible and implausible networks. Reduce search space?
7
7 GOTEM: background 2(2) We propose GOTEM; GO-based regulatory TEMplates [1,2] Contribution Means to distinguish between plausible and implausible networks GOTEM generalises knowledge about gene products using the molecular function part of Gene Ontology Binary semantic templates encoding general knowledge of regulation are derived from documented pathways and used to assess the biological plausibility of regulatory hypotheses [1] Gamalielsson, J., Olsson, B., Nilsson, P. (2005). A Gene Ontology based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. Technical report, HS-IKI-TR-05-004, University of Skövde, Sweden [2] Gamalielsson, J., Nilsson, P., Olsson, B. (2006). A GO-based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. In proceedings of the 2nd International Workshop on Bioinformatics Research and Applications (IWBRA 2006), Reading, Great Britain (May 2006)
8
8 GOTEM Annotation databases Templates GO term probability calculation Binary relations Extract binary pathway relations Template generation Model pathway databases Hypothesis assessment Enriched GO graph Method/algorithm Data/information GO Regulatory hypotheses Scored & ranked hypotheses
9
9 GOTEM: example RAD24 [act] MEC3 SWI4 [expr] CLN1 SWI4 [expr] CLN2 SWI6 [expr] CLN1 SWI6 [expr] CLN2 CLN1 [phos] SIC1 CLN2 [phos] SIC1 CDC28 [phos] SIC1. T1: GO:0003689 [act] GO:0003677 T2: GO:0003689 [act] GO:0003676 T3: GO:0003689 [act] GO:0005488 T4: GO:0003689 [act] GO:0003674 T5: GO:0003677 [act] GO:0003677 T6: GO:0003677 [act] GO:0003676 T7: GO:0003677 [act] GO:0005488. GO-score(Tx)=-log2((p(GOID LHS )+p(GOID RHS ))/2) RAD24 [?] MEC3 CLN1 [?] SWI4 MBP1 [?] CLN2. TM1: GO:0003689 [act] GO:0003677 (GO-score=6.80). TM1: GO:0003674 [exp] GO:0003674 (GO-score=0). Generation Assessment TM1: GO:0003700 [exp] GO:0016538 (GO-score=5.88).
10
10 GOTEM: results Test Templates created from KEGG S. cerevisiae cell cycle Reverse engineered hypotheses from microarray gene expression data Assess how well templates can separate true positive interactions from false positive ones Results Method can filter out a large proportion of implausible hypotheses Hence, improves specificity of network reconstruction
11
11 GOSAP: background 1(2) Large base of biological pathways Need for pathway analysis methods: Inter-species comparisons Intra-species comparisons Assess hypothetical pathways Limitations of related work Previous efforts on metabolic pathways Little work on approximate matching by semantic similarity EC hierarchy used before, which only covers the molecular function of enzymes
12
12 GOSAP: background 2(2) We propose GOSAP; GO-based Semantic Alignment of biological Pathways [3,4] Contribution GO has not been used before for semantic pathway alignment GOSAP generalises about any kind of gene product using GO, not only enzymes Richer semantic description of gene products by combining function-, process- and component ontologies of GO in similarity calculations [3] Gamalielsson, J., Olsson, B. (2005). GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways. Technical report, HS-IKI-TR-05-005, University of Skövde, Sweden [4] Gamalielsson, J., Olsson, B. (200x). GOSAP: GO-based Semantic Alignment of Biological Pathways. Manuscript in preparation.
13
13 GOSAP Organism annotation databases GO term probability calculation Model paths Extraction of super-paths Model pathway database Path alignment Enriched GO graph Procedure/algorithm Data/information GO graph Query paths Scored & ranked path alignments Query pathway database Parameter settings
14
14 GOSAP: example Path extraction Path alignment e.g. 1. SWI4 [e]>CLN1 2. SWI4 [e]>CLN2[p]>SIC1 3. MBP1[e]>CLB5[p]>CDC6. Only super-paths, extracted by depth-first based algorithm. 1. SWI4[?]>CLN2 2. MBP1[?]>CLN1[?]>CDC6 1.SWI4 [e]>CLN1 2.SWI4 [e]>CLN2[p]>SIC1 3. MBP1[e]>CLB5[p]>CDC6 Query paths Model paths align Example alignment Q: FAR1 ?> SIC1 (GAP) CLN2 ?> SIC1 M: FAR1 i> CLN1 p> SWI6 e> CLN2 p> SIC1 F: GO:0004861 > GO:0019207 (GAP) GO:0016538 > GO:0019210 P: GO:0007050|GO:0045786 > GO:0000079 (GAP) GO:0000320|GO:0000321 > GO:0000079 C: GO:0005634 > GO:0005634 (GAP) GO:0005634 > GO:0005634
15
15 GOSAP: results Test Model pathways: KEGG S. cerevisiae cell cycle, metabolic pathways Query pathways: Reverse engineered (RE) regulatory pathways, KEGG MAPK, metabolic pathways Assess if GOSAP can find significant alignments of biological interest Results Method is able to detect significant alignments between RE paths and model paths and between different metabolic pathways suggest missing gene products in query paths Combined ontologies resulted in significant alignments when molecular function alone did not
16
16 EGOSAP: background 1(2) Large base of biological pathways and microarray gene expression data Sometimes only hypothetical sets of gene products are known Highly desirable derive interactions between gene products Limitations of related work Previous efforts merely map genes onto known pathways by identity No work on approximate matching by semantic similarity Related methods do not attempt to assemble hypothetical paths using a query set of gene products
17
17 EGOSAP: background 2(2) We propose EGOSAP; Evolutionary GO-based Semantic Alignment of biological Pathways [5] Contribution GO has not been used before for semantic pathway alignment GOSAP generalises about any kind of gene product using GO, not only enzymes Richer semantic description of gene products by combining function-, process- and component ontologies of GO in similarity calculations Hypothetical paths are assembled using an evolutionary algorithm and a query set of gene products [5] Gamalielsson, J., Corne, D. W., Olsson, B. (200x). EGOSAP: Evolutionary Gene Ontology Based Semantic Alignment of Biological Pathways. Manuscript in preparation.
18
18 EGOSAP Organism annotation databases GO term probability calculation Model paths Path extraction Model pathway database Evolution of path alignments Enriched GO graph Procedure/algorithm Data/information GO graph Query set of gene products Path alignments Parameter settings
19
19 EGOSAP: example Evolutionary algorithm t 0 initialise P(t) evaluate P(t) while(not term-cond) do begin t t+1 select P(t) from P(t-1) alter P(t) evaluate P(t) apply elitism to P(t) end P(t): a set of gene product permutations initialised from query alphabet evaluate: Calculate fitness, i.e. semantic similarity score btw model path and each evolved path in P(t). select: tournament selection alter: partially mapped crossover, mutation Example alignment (fitness=0.73, p=0.01): Query (mouse): MEF2C > NR2F6 > NRBF2 > AFG3L2 > TRIM28 Model (yeast): SWI6 > SWI4 > NDD1 > ACE2 > SFL1 Function: GO:0003713 > GO:0003700 > GO:0016563 > GO:0008237 > GO:0016564 Process: GO:0006366 > GO:0007049 > GO:0006357 > GO:0006508 > GO:0000122 Component: GO:0005634 > GO:0005634 > GO:0005634 > GO:0016021 > GO:0005694
20
20 EGOSAP: results Test Model pathway: S. cerevisiae regulatory chain motifs Query set: Differentially expressed genes for transgenic and knock-out mice Assess if EGOSAP can evolve significant alignments (of biological interest) Results Method is able to detect significant alignments between evolved paths and model paths. Like for GOSAP, combined ontologies resulted in significant alignments when molecular function alone did not
21
21 Conclusions Three methods for semantic analysis of biological pathways are developed Methods assess biological plausibility of derived pathways compare different pathways for semantic similarities evolve hypothetical pathways similar to model pathways Methods are novel Methods are believed to be useful to biologists
22
22 Write-up schedule September 2006 Thesis contributions, thesis skeleton, set of chapters, draft of GOSAP paper October 2006 Submission GOSAP paper, redrafts of earlier material, set of new chapters, draft of EGOSAP paper November 2006 Submission EGOSAP paper, nearly complete thesis draft December 2006 - February 2007 Continual refinement of thesis March 2007 Submission of thesis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.