C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili, epxing, 1
Background Fruit-fly development in genetic study: – Genes controlling the body plan and patterning organs are similar to higher animals including human. Objective: a framework for applying data mining techniques to assist biological research. 2
The Graph Representation 3 Images Genes Keywords Image-layer edges: nearest neighbors in feature space embryonic hindgut
Proximity Measure Random Walk with Restart – Starting from a node s; – Randomly walk to a neighbor, with probability 1-c; – Restart at s, with probability c; – Compute the steady-state probability vector. – Complexity: O(E), but faster methods exist (Tong et al., ICDM’06) 4
Random Walk with Restart – Starting from a node s – Randomly walk to a neighbor, with probability 1-c – Restart at s, with probability c Proximity Measure
Computing the Steady-State Probability Proximity Measure Desired probability vector Adjacency matrixVector w/ non-zero entry for restart nodes Complexity: O(E), but faster methods exist (Tong et al., ICDM’06)
Multi-Modal Query Results 7 2D Expression Images Genes Annotation Terms
More Mining Tasks Image Auto-Caption Gene function identification 8
Related Work Berkeley Drosophila Genome Project ( FlyExpress ( Berkeley Drosophila Transcription Network Project (bdtnp.lbl.gov)bdtnp.lbl.gov 9
System Architecture 10 Browser-based UI Tomcat Web Server JSP Application Computing Engine QueriesResult Pages Results Remote Function Calls HTTP RMI