ASWC08 Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the
ASWC08 Semantically Conceptualizing and Annotating Tables Overview Context WoK: Web of Knowledge TANGO: Table ANalysis for Generating Ontologies MOGO: Mini-Ontology GeneratOr Semantic Enrichment via MOGO Implementation Experimentation Enhancements Challenges & Opportunities
ASWC08 Semantically Conceptualizing and Annotating Tables WoK: a Web of Knowledge
ASWC08 Semantically Conceptualizing and Annotating Tables TANGO fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon TANGO repeatedly turns raw tables into conceptual mini- ontologies and integrates them into a growing ontology. Growing Ontology
ASWC08 Semantically Conceptualizing and Annotating Tables MOGO fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon TANGO repeatedly turns raw tables into conceptual mini- ontologies and integrates them into a growing ontology. Growing Ontology MOGO generates mini-ontologies from interpreted tables.
ASWC08 Semantically Conceptualizing and Annotating Tables MOGO Overview Table Interpretation Yields a canonical table Canonical Table Concept/Value Recognition Relationship Discovery Constraint Discovery Yields a semantically enriched conceptual model Mini-ontology Integration into a growing ontology MOGO
ASWC08 Semantically Conceptualizing and Annotating Tables Sample Input Region and State Information LocationPopulation (2000)LatitudeLongitude Northeast2,122,869 Delaware817, Maine1,305, Northwest9,690,665 Oregon3,559, Washington6,131, Sample Output
ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout
ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout Concepts and Value Assignments Northeast Northwest Delaware Maine Oregon Washington Location RegionState
ASWC08 Semantically Conceptualizing and Annotating Tables Concept/Value Recognition Lexical Clues Labels as data values Data value assignment Data Frame Clues Labels as data values Data value assignment Default Recognize concepts and values by syntax and layout PopulationLatitudeLongitude 2,122, ,376 1,305,493 9,690,665 3,559,547 6,131, Year Concepts and Value Assignments Northeast Northwest Delaware Maine Oregon Washington Location RegionState
ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Dimension Tree Mappings Lexical Clues Generalization/Specialization Aggregation Data Frames Ontology Fragment Merge 2000
ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Dimension Tree Mappings Lexical Clues Generalization/Specialization Aggregation Data Frames Ontology Fragment Merge
ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation Region and State Information LocationPopulation (2000)LatitudeLongitude Northeast2,122,869 Delaware817, Maine1,305, Northwest9,690,665 Oregon3,559, Washington6,131,
ASWC08 Semantically Conceptualizing and Annotating Tables Validation Concept/Value Recognition Correctly identified concepts Missed concepts False positives Data values assignment Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets Constraint Discovery Valid constraints Invalid constraints Missed constraints PrecisionRecallF-measure Concept Recognition 87%94%90% Relationship Discovery 73%81%77% Constraint Discovery 89%91%90%
ASWC08 Semantically Conceptualizing and Annotating Tables Concept Recognition Counted: Correct/Incorrect/Missing Concepts Correct/Incorrect/Missing Labels Data value assignments
ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Counted: Correct/incorrect/missing relationship sets Correct/incorrect/missing aggregations and generalization/specializations
ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery Counted: Correct/Incorrect/Missing: Generalization/Specialization constraints Computed value constraints Functional constraints Optional constraints
ASWC08 Semantically Conceptualizing and Annotating Tables Concept Recognition Successes 98% of concepts identified Missing label identification 97% of values assigned to correct concept Common problems Finding an appropriate label Duplicate concepts
ASWC08 Semantically Conceptualizing and Annotating Tables Relationship Discovery Recall of 92% for relationship sets Missing aggregations and gen./spec.s (only found in label nesting) Unnecessary rel. sets generated (are computable)
ASWC08 Semantically Conceptualizing and Annotating Tables Constraint Discovery F-measure of 98% for functional relationship sets Computed value discovery Funtional/non-functional lists in cells
ASWC08 Semantically Conceptualizing and Annotating Tables MOGO Contributions Tool to generate mini-ontologies Accuracy encouraging PrecisionRecallF-measure Concept Recognition 87%94%90% Relationship Discovery 73%81%77% Constraint Discovery 89%91%90%
ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: MOGO Enhancements Check for inter-label relationships Check for more complex computations Check for lists in cells … Wish List Data-frame library Atomic knowledge components Instance recognizers Library of molecular components Semi-automatic construction of a WordNet-like resource for knowledge components
ASWC08 Semantically Conceptualizing and Annotating Tables Summary MOGO Semantic Enrichment Encouraging Results But More Possible Broader Implications ~ Vision & Challenges TANGO WoK Web of Data Semantic Annotation User-friendly Query Answering
ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: TANGO Table Interpretation Transforming tables to F-logic [Pivk07] Layout-independent table representation [Jha08] Table interpretation by sibling tables [Tao07] Semantic Enhancement / Ontology Generation Naming unnamed table concepts [Pivk07] MOGO [Lynn09] Semi-automatic Ontology Integration Ontology Matching [Euzenat07] Ontology-mapping tools [Falconer07] Direct and indirect schema mappings for TANGO [Xu06]
ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: WoK Web of Data The Semantic Web is a web of data. [W3C] Upcoming special issue of Journal of Web Semantics Enabling a Web of Knowledge [Tao09] Information Extraction Domain-independent IE from web tables [Gatterbauer07] Open IE [Banko07] …
ASWC08 Semantically Conceptualizing and Annotating Tables Opportunities & Challenges: WoK … Semantic Annotation wrt Ontologies Linking Data to Ontologies [Poggi08] TISP [Tao07] FOCIH [Tao09] Reasoning & Query Answering Description Logics [Baadar03] NLIDB Community AskOntos [Ding06] SerFR [Al-Muhammed07]
ASWC08 Semantically Conceptualizing and Annotating Tables References [Al-Muhammed07] Al-Muhammed and Embley, Ontology-Based Constraint Recognition for Free-Form Service Requests, Proceedings of the 23 rd International Conference on Data Engineering, [Baader, Calvanese, McGuinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge University Press, [Banko07] Banko, Cafarella, Soderland, Broadhead and Etzioni, Open Information Extraction from the Web, Proceedings of the International Joint Conference on Artificial Intelligence, [Ding06] Ding, Embley and Liddle, Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies, Proceedings of the First Asian Semantic Web Conference, [Euzenat07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, [Falconer07] Falconer, Noy and Storey, Ontology MappingA User Survey, Proceedings of the Second International Workshop on Ontology Mapping, [Gatterbauer07] Gatterbauer, Bohunsky, Herzog and Pollak, Towards Domain-Independent Information Extraction from Web Tables, Proceedings of the Sixteenth International World Wide Web Conference, [Jha07] Jha and Nagy, Wang Notation Tool: Layout Independent Representation of Tables, Proceedings of the 19 th International Conference on Pattern Recognition, [Pivk07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, Transforming Arbitrary Tables into Logical Form with TARTAR, Data & Knowledge Engineering, [Poggi08] Poggi, Lembo, Calvanese, DeGiacomo, Lenzerini and Rosati, Linking Data to Ontologies, Journal on Data Semantics, [Tao07] Tao and Embley, Automatic Hidden-Web Table Interpretation by Sibling page Comparison, Proceedings of the 26 th International Conference on Conceptual Modeling, [Tao09] Tao, Embley and Liddle, Enabling a Web of Knowledge, Technical Report : tango.byu.edu/papers, [Xu06] Xu and Embley, A Composite Approach to Automating Direct and Indirect Schema Mappings, Information Systems, 2006.