Download presentation
Presentation is loading. Please wait.
1
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid
2
J. Komorowski and A. Lägreid Joint work with Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik
3
J. Komorowski and A. Lägreid Selected Challenges in Gene- expression Analysis Function similarity corresponds to expression similarity but: –Functionally corelated genes may be expression-wise dissimilar (e.g. anti-coregulated) –Genes usually have multiple function –Measurements may be approximate and contradictory Can we obtain clusters of biologically related genes? Can we build models that classify unknown genes to functional classes, that are human legible, and that handle approximate and often contradictory data? How can we re-use biological knowledge?
4
J. Komorowski and A. Lägreid Data Data material –Serum starved fibroblasts, 8,613 genes Added serum to medium at time = 0 Used starved fibroblasts as reference Measured gene activity at various time points –493 genes found to be differentially expressed Results –278 genes known (3 repeats) –212 genes unknown, (uncharacterized) –211 genes given hypothetical function with 88% quality
5
J. Komorowski and A. Lägreid Fibroblast - serum response 104824 quiescent non-proliferating proliferating serum samples for microarray analysis
6
J. Komorowski and A. Lägreid 104824 quiescent non-proliferating proliferating protein synthesis lipid synthesis stress response cellmotility re-entry cell cycle organellebiogenesis transcription Processes
7
J. Komorowski and A. Lägreid quiescent non-proliferating proliferating immediate early delayed immediate early intermediate 104824 late primarysecondarytertiary Dynamic processes
8
J. Komorowski and A. Lägreid quiescent non-proliferating proliferating 104824 primarysecondarytertiary Protein appears after the transcript
9
J. Komorowski and A. Lägreid 104824 genetranscriptprotein Protein dynamics are not always similar to transcript dynamics
10
J. Komorowski and A. Lägreid Molecular mechanisms of transcriptional response immediate early response genes delayed immediate early response genes intermediate/late response genes effectors = cellular response response serum = signal immediate early response factors secondary transcription factors
11
J. Komorowski and A. Lägreid quiescent non-proliferating proliferating 14824 protein synthesis DNA synthesis energy metabolism cell motility stress response cell motility cell adhesion DNA synthesis lipid synthesis cell cycle regulation The dynamics of cellular processes cell proliferation, negative regulation
12
J. Komorowski and A. Lägreid 0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) Methodology 1. Mining functional classes from an ontology 2. Extracting features for learning 3. Inducing minimal decision rules using rough sets 4. The function of unknown genes is predicted using the rules !
13
J. Komorowski and A. Lägreid Gene Ontology
14
J. Komorowski and A. Lägreid Energy pathwaysDNA metabolism Amino acid and derivative metabolism Protein targeting Lipid metabolismTransportIon hemostasisIntracellular traffic Cell deathCell motilityStress response Organelle organization and biogenesis OncogenesisCell cycleCell adhesion Cell surface receptor linked signal transduction Intracellular signaling cascade Developmental processesBlood coagulationCirculation Biological processes from GO
15
J. Komorowski and A. Lägreid Hierchical Clustering of the Fibroblast Data It’s not a cluster!
16
J. Komorowski and A. Lägreid Gene Ontology vs. Clusters found by Iyer et al.
17
J. Komorowski and A. Lägreid Template-based feature synthesis 12 measurement points, 55 possible intervals of length >2
18
J. Komorowski and A. Lägreid Examples of template definitions
19
J. Komorowski and A. Lägreid Rule example 1 RuleCovered genes 0 - 4(Constant) AND 0 - 10(Increasing) => GO(protein metabolism and modification) OR GO(mesoderm development) OR GO(protein biosynthesis) M35296 J02783 D13748 X05130 X60957 D13748 U90918 (unknown)
20
J. Komorowski and A. Lägreid Rule example 2 Rule Covered genes 0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) OR GO(cell-cell signaling) OR GO(intracellular signaling cascade) OR GO(oncogenesis) Y07909 X58377 U66468 X58377 X85106 Y07909
21
J. Komorowski and A. Lägreid Classification using template- based rules IF … THEN … IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR … IF … THEN IF … THEN … … +4 Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions
22
J. Komorowski and A. Lägreid Cross validation estimates Iyer et al. A: Coverage: 84% Precision: 50% B: Coverage: 71% Precision: 60% C: Coverage: 39% Precision: 90% Coverage = TP/(TP+FN) Precision = TP/(TP+FP)
23
J. Komorowski and A. Lägreid Cross validation estimates Cho et al. Coverage: 58% Precision: 61% Coverage = TP/(TP+FN) Precision = TP/(TP+FP)
24
J. Komorowski and A. Lägreid Protein Metabolism and Modification ABC D E A – annotations B – false negatives C – false positives D – true positives E – pred. unknown gene
25
J. Komorowski and A. Lägreid Re-classification of the Known Genes
26
J. Komorowski and A. Lägreid Co-classifications for the Unknown Genes
27
J. Komorowski and A. Lägreid Conclusions Our methodology –Incorporates background biological knowledge –Handles well the noise and incompleteness in the microarray data –Can be objectively evaluated –Predicts multiple functions per gene –Can reclassify known genes and provide possible new functions of the known genes –Can provide hypotheses about the function of unknown genes Experimental work needs to be done to confirm our predictions
28
J. Komorowski and A. Lägreid Genomic ROSETTA: http://www.idi.ntnu.no/~aleks/rosetta
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.