Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk
slide 2 The Motivation In various biological studies researchers often come up with a list of (possibly related) genes If the relations between these genes are unknown or hypothetic, they have to be confirmed either experimentally or through a database search (or both) Manual browsing or searching is a very tedious task; any interpretation of the results requires expert knowledge
slide 3 The Goal To automate the search in order to –assist a biologist in forming explanations of actual and hypothetical relationships between sets of genes –using various types and sources of data, and various similarity assessment tools, and background (domain) knowledge
slide 4 The Field The most important participating disciplines Biology Computer Science Bioinformatics
slide 5 The Biologist’s Problem Given a collection of genes, how can we explain the relationships between them, using the available data and knowledge? –How does gene g 1 regulate (activate, inhibit) gene g 2 ? –What is the functional similarity of gene g 3 to gene g 4 ? –What is the metabolic (signalling) pathway common to gene g 5 and g 6 in the context of disease d 1 ?
slide 6 The Bioinformatician’s Problem Given a collection of (biological) objects, which of their properties can we compare and how, and where can we find their values? –Where do we find the gene sequence (protein structure) data? –How do we assess the similarity between two gene sequences (protein structures)? –Where do we find the suitable tools, how do we use them and how do we interpret the results?
slide 7 The Computer Scientist’s Problem Given a collection of distributed data and tools to link them, how do we build an explanatory path between objects from a query? A search problem: –separate, partially overlapping graphs –coloured nodes –coloured, weighted, dynamic edges
slide 8 Simplified Search Space Graph with homogeneous vertices and edges Task: find (shortest) paths
slide 9 More Realistic Search Space Graph with qualitatively different vertices, qualitatively different edges weighted with qualitatively different weights Task: find (plausible) paths
slide 10 Even More Realistic Search Space Each node is connected to a multitude of other nodes; combinatorial explosion – an exhaustive search unfeasible Task: find heuristics to guide the search (generic and specific)
slide 11 A Trivial Example Input query
slide 12 A Trivial Example Initial mapping
slide 13 A Trivial Example Activation spreading
slide 14 A Trivial Example Plausible inheritance (inference)
slide 15 A Trivial Example Activation spreading
slide 16 A Trivial Example Data retrieval and mapping
slide 17 A Trivial Example Induction
slide 18 A Trivial Example Activation spreading
slide 19 A Trivial Example Plausible inheritance
slide 20 A Trivial Example Data retrieval and mapping Formulation of an explanation
slide 21 Explanation Schema
slide 22 System Architecture
slide 23 Related Work Basic research in gastric cancer Genomic & proteomic datawarehouse Syntactic & semantic database integration Natural language understanding Knowledge representation & modelling Knowledge intensive reasoning and learning
slide 24 Concerns Is it reasonable? (what do biologists say) Is it possible? (what do bioinformaticians say) Is it feasible? (what do computer scientists say) Isn’t it too ambitious (for a PhD study)? ? ?
slide 25 Disclaimer An in silico solution is actually a hypothesis that requires physical (experimental) confirmation. ! !
slide 26 Acknowledgments Agnar Aamodt, IDI.IME (AI, ML, CBR) Astrid Lægreid, IKM.DMF (biology, bioinformatics) Arne Sandvik, IKM.DMF (medicine) Frode Sørmo, IDI.IME (Creek)