Answering Cross-Source Keyword Queries Over Biological Data Sources Fan Wang Gagan Agrawal Ohio State University
Motivation Many biological data sources are deep web Only the query interface, and not the contents, on the surface web Easy mechanism needed for finding this information Approach: simple keyword interface Need ontology and query planning
Our Contribution: SEEDEEP System Discover data source metadata Generate query plans for search Fault Tolerance mechanism Query caching mechanism
Query Planning Problem Ordinary query format Entity keywords, attribute keywords, comparison predicates Standard select-project-join SQL query style Formulation Sub-graph set cover problem, NP-hard Target data source subgraph, can have disconnected components which nodes cover what terms the size should be minimal, our cost model This problem is NP hard we have node and edge ranking functions Starting data source