1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science, University of Oregon, USA Peishen Qi Computer Science Department, Yale University, USA April, SWDB’06
2 Outline Introduction – The status of the Semantic Web – Realizing SW needs existing databases OntoGrate: An Ontology-based Information Integration Framework – Some previous work – Modules in OntoGrate Architecture Case Study for integrating Databases into SW – Without an existing domain ontology – With an existing domain ontology Conclusion and Future Work
3 The Semantic Web One major goal of the Semantic Web is that web-based agents can process and “understand” data [Berners-Lee etal01]. Ontologies formally describe the semantics of data and web-based agents can take SW documents (e.g. in RDF/OWL) as a set of assertions (true statements) and draw inferences from them. human SW Web-based agents
4 What we have now? DAML+OIL OWL (Web ontology language) More and more domain ontologies are defined in DAML+OIL/OWL, even for some specific domains (e.g., GO) We are developing some tools, agents, services See
5 Two things are important Real Data for sharing – relational databases (may be the biggest resource) – Other kinds of databases – WWW/XML data – Some knowledge bases Better Semantic Web Services/Agents
6 Semantic Annotation for Data? It is good for small size data resources It is not that good for large size data resources (relational databases) – “Redundant” copies – Time consuming for query answering. E.g. it currently works as loading OWL data into a knowledge base then answering queries with DL ABox reasoning. (Can it compete with existing DBMS which has well developed indexing and query optimization techniques?) It is better that relational databases can be accessed/queried directly by SW agents/services
7 The difficulties The Semantic WebThe Relational DBs Ontologies define the semantics of data Schemas define the structure and integrity constraints
8 A more general question How can we make databases, SW resources, WWW/XML data, KBs work together? The problem is similar – SW resources and KBs are defined by ontologies, which are more expressive and focus on semantics – Databases and XML documents are defined by schemas, which focus on structure – Syntax difference (e.g., OWL vs. SQL)
9 OntoGrate: An Ontology-based Information Integration System
10 Some Previous Work Schemas (e.g., stores7 DB in IBM informix),
11 Some Previous Work Schemas, Ontologies and Web-PDDL Relation Type/Class Attribute Predicate/Property Integrity Constrain Axiom/Rule Primary Key Fact/Instance
12 Some Previous Work Merging Ontologies with Bridging Axioms
13 Some Previous Work The Bridge Axiom/mapping on customerfname/customerlname vs. customercontactname : (forall (c f l (if (and c f) c l)) c f l))))
14 Some Previous Work The Bridge Axiom/mapping on customerregion vs. customerstatecode/statename/statecode : (forall (x y (if x y) (exists (z t (and x t) z y) z t)))))
15 Some Previous Work Inferential Data Integration with OntoEngine – Data Translation: View data as true statements, e.g., (statecode S#28 “OR”) (M s_t ; s ) D t only if (M s_t ; s ) ╞ t (M s_t ; s ) D t (M s_t ; s ) ├ t (M s_t ; s ) ╞ t – Query Translation: (M s_t ; s ) Q t only if (M s_t ; ( t )) ╞ ( s ) (M s_t ; s ) Q t (M s_t ; ( t )) ├ ( s ) (M s_t ; ( t )) ╞ ( s )
16 OntoGrate Architecture Revisited
17 Modules in OntoGrate Architecture The Syntax Translators (Wrappers) – e.g., PDDSQL (SQL Web-PDDL), PDDOWL(OWL Web-PDDL) The Matching (correspondence) Generation – e.g., name, structure (tree, graph) similarity,synonyms and is-a (part of) relationships using thesauri and dictionary, such as Wordnet The Data Mining Module The Machine Learning Module The Inference Engine (OntoEngine) The User Interface
18 Learning the mappings from domain experts (forall (x (if x) (and x 6) x 3))))
19 Mining the mappings from large datasets For example, two Medical databases in the same hospital: DB1 list blood pressure of patients with nominal values, such as low, normal, at risk, and high, while the other DB2 may record the exact numerical values for systolic and diastolic pressure. By association rule mining, we may get the rule/mapping 140 90 = `High‘ (support = 40%, confidence = 90%)
20 Case Study in Two Scenarios Integrating DBs into SW without an existing domain ontology Integrating DBs into SW with an existing domain ontology
21 Without an existing domain ontology
22 Generating OWL ontologies from DB Schemas SQL schema Web-PDDL (by using PDDSQL) Web-PDDL OWL (by using PDDOWL) – E.g., Stores7.sql Stores7.pddl Stores7.owl... <rdfs:subClassOf rdf:resource=“
23 An OWL-QL query based on Stores7.owl …
24 The corresponding Web-PDDL and SQL queries (and (customercity ?C - Customer "Eugene") (customerfname ?C - Customer ?x - String) (customerlname ?C - Customer ?y - String)) PDDSQL SELECT C.customerfname, C.customerlname FROM Customer C WHERE C.customercity = "Eugene" PDDOWL
25 Getting Answers from Stores7 DB {?x/Paea, ?y/LePendu} {?x/Dejing, ?y/Dou} {?x/Shiwoong, ?y/Kim} PDDOWL PDDSQL customerfnamecustomerlname PaeaLePendu DejingDou ShiwoongKim <owl-ql:answerBundle xmlns:owl-ql=" owl-ql-syntax#"...> (1000 bindings/3 secs) (1000/100,000/3secs)
26 With an existing domain ontology Order ontology:
27 An OWL-QL query based on order.owl …
28 The Bridging Axioms/Mappings between Stores7.pddl and (forall (P A z - String) (if (and P A) A z)) P z))) (forall (C z - String) (if P z) (exists (A (and P A) A z)))))
29 The Bridging Axioms/Mappings between Stores7.pddl and (forall (C x - String) (iff C x) C x))) (forall (C y - String) (iff C y) C y)))
30 The Query Translation between Stores7 and Order (and (hasAddress ?C - Person ?A - Address) (City ?A "Eugene") (FirstName ?C - Person ?x - String) (LastName ?C - Person ?y - String)) OntoEngine ( < 1 sec) (and (customercity ?C - Customer "Eugene") (customerfname ?C - Customer ?x - String) (customerlname ?C - Customer ?y - String)) PDDOWL Bridging Axioms OWL-QL query in order.owl
31 Final Answers in the order ontology (customerfname C1 Paea) (customerlname C2 LePendu) (customerfname C1 Dejing) … PDDOWL (10,000 facts/11 secs) PDDSQL customerfnamecustomerlname PaeaLePendu DejingDou ShiwoongKim … OntoEngine (40,000facts/30 secs) Bridging Axioms (FirstName C1 Paea) (LastName C2 LePendu) (FirstName C1 Dejing) …
32 Some related work Semantic Annotation – [Stojanovic maps relational model to frame logic/RDF. – DOGMA[Verheyden translates a ontology query to SQL Schema and Ontology mapping – Similarity matching, machine learning… useful for generating candidate matchings – Semi-automatic tool (Clio) Data integration and query answering – Federated databases[Sheth&Larson 90], data warehouse, peer to peer management [Halevy MiniCon uses query rewriteing at GLV Logic and Databases – Reiter’s reconstruction of relational model in FOL. – Carnot, SIMS, Information Manifold by using a global ontology, DL or Datalog
33 Conclusion and Future work We applied OntoGrate, an ontology-based information integration framework, to integrate relational databases with the Semantic Web. The testing result based on two scenarios is promising. We are developing other modules (e.g., learning/mapping/UI) in OntoGrate. The scalability and efficiency need to be investigated in larger- size data resources. Extending the current work to integrate XML (with/without XML schemas or DTD) and the Semantic Web.
34 Thank you for your attention !