Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle www.deg.byu.edu Supported by the.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Ontology-Based Free-Form Query Processing for the Semantic Web by Mark Vickers Supported by:
David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao* Brigham Young University, Provo, Utah, USA *Mayo Clinic, Rochester,
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
Principled Pragmatism: A Guide to the Adaptation of Philosophical Disciplines to Conceptual Modeling David W. Embley, Stephen W. Liddle, & Deryle W. Lonsdale.
Multilingual Extraction Ontologies. Outline Our MEG A possible WWW paper Getting there from here What we propose(d) to do Multilingual resources Evaluation.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract.
Ontology-Based Free-Form Query Processing for the Semantic Web Thesis proposal by Mark Vickers.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
Ontology-Based Information Extraction and Structuring Stephen W. Liddle † School of Accountancy and Information Systems Brigham Young University Douglas.
From OSM-L to JAVA Cui Tao Yihong Ding. Overview of OSM.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao & David W. Embley Data Extraction Research Group Department of.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. WSMX Data Mediation Adrian Mocan
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Semantic Web Queries by Mark Vickers Funded by NSF.
Ontos Project n Ontology Parser n Data Frame/Ontology Definition n Relevance Detection n Coarse Structure Detection n Constant/Keyword Matching n Database.
Automatic Data Ramon Lawrence University of Manitoba
Generating Data-Extraction Ontologies By Example Joe Zhou Data Extraction Group Brigham Young University.
Semantic Mediation & OWS 8 Glenn Guempel
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
CS 586 – Distributed Multimedia Information Management Prof. Dennis McLeod.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
WebODE and its Ontology Management APIs. April 8th © Ontology Engineering Group WebODE and its Ontology Management APIs Ontology Engineering Group.
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
An Aspect of the NSF CDI InitiativeNSF CDI: Cyber-Enabled Discovery and Innovation.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Working with Ontologies Introduction to DOGMA and related research.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
David W. Embley Brigham Young University Provo, Utah, USA.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
Cross-language Information Retrieval
David W. Embley Brigham Young University Provo, Utah, USA
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Presentation transcript:

Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the

Overview  Background  OSM ontologies  OntoES and related tools  Multilingual extraction  Vision  Implementation  Current status, conclusions

 Concepts, relationships, and constraints with formal foundation Conceptual modeling and ontologies

Ontology components Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization

 Recovering knowledge: “What is knowledge?” and “Where is knowledge found?”  Populated conceptual model Ontologies and data extraction

Data frames External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Key Word Phrase Left Context: $ Data frame: Internal Representation: float Values Key Words: ([Pp]rice)|([Cc]ost)| … Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…

Extraction ontologies: generality & resiliency  Generality: assumptions about web pages  Data rich  Narrow domain  Document types Single-record documents (hard, but doable) Multiple-record documents (harder) Records with scattered components (even harder)  Resiliency: declarative  Still works when web pages change  Works for new, unseen pages in the same domain  Scalable, but takes work to declare the extraction ontology

From symbols to knowledge  Symbols: $ 11, K Nissan CD AC  Data: price(11,500) mileage(117K) make(Nissan)  Conceptualized data:  Car(C123) has Price($11,500)  Car(C123) has Mileage(117,000)  Car(C123) has Make(Nissan)  Car(C123) has Feature(AC)  Knowledge  “Correct” facts  Provenance

OntoES data extraction system

OntoES semantic annotation

Annotation results

Query-based extraction Find me the price and mileage of all red Nissans – I want a 1990 or newer.

Query semantically annotated data

High precision, recall when documents are data-rich, domain-specific. Extraction recall/precision

Issue: ontology construction  Several dozen person-hours per ontology  Scalability: thousands (?) of extraction ontologies needed  Automate the process as much as possible  Forms-based interaction  Instance recognizers  Some pre-existing instance recognizers  Lexicons

Ontology editor

Building ontologies manually

-Library of instance recognizers -Library of lexicons

Ontology workbench

Workbench functions  Ontology editor (hand-construct ontologies)  Semantic annotation  GUI for creating user-specified forms  Form-driven creation of ontologies  Generating ontologies from tabular data  Merging and mapping ontologies  Transforming results between various data formats  Supporting queries over extracted data

Beyond English  English Web is increasingly being overshadowed  We are investigating the viability of our approach for other languages  Goal: develop a multilingual ontology-based semantic web application

How different is this?

Current state of the art  Some multilingual/crosslinguistic extraction efforts exist  Norwegian drilling, VerbMobil, EU trains  CLEF, NTCIR  Variety of technologies used: alignment, cognate matching, various translation strategies, IR techniques, machine learning  Few use ontologies

Our solution(s) 1. Enhance ontologies:  Compound recognizers  Pattern discovery  Discover and extract relationships among objects 2. Demonstrate viability of ontologies beyond English  Declare narrow-domain ontologies in other languages  Develop lexicons, value recognizers, data frames for multilingual processing  Create crosslinguistic mappings 3. Develop working prototype showing multilingual capabilities

Multilingual adaptation  OntoES, workbench are already largely multilingual-capable  UTF-8, Java  Some prototyping work remains  Knowledge sources  Many exist; don’t have resources to re-invent the wheel  NLP resources: lexical databases, WordNet, …  Termbases, multilingual lexicons, …  Aligned bitext

Expected results  Monolingual queries possible in languages where components developed  Ontological content, lexical primitives can provide some degree of mediation between languages  Crosslinguistic queries: query in English, retrieve data in another language, map back  Reminiscent of conceptual “pivot”, “interlingua” in MT

Basic premises  Analogous data-rich documents should not differ substantially crosslinguistically  Ontological content should only involve minimal conceptual variation across langua- ges/cultures  Obituaries: “tenth-day kriya”, “obsequies”  Existing technologies can provide large- scale mapping between languages

Car ontology (English)

Car ontology (Japanese)

English price data frame

Japanese price data frame

Current status  Successful proof-of-concept, prototype implementations beyond English  Japanese car ads  Spanish obituaries  French obituaries  Knowledge sources need further development  Formal evaluations needed

Conclusions  Ontologies, tools provide flexible, tractable framework for monolingual data extraction  English well explored, documented  Preliminary work on other languages  Mappings at the conceptual/lexical levels might enable crosslinguistic functionality  Implications for larger context: multilingual semantic web

Questions?

GUI for creating extraction forms Basic form-construction facilities: single-entry field multiple-entry field nested form …

Creating ontologies from forms

Source-to-form mapping

Forms-driven ontology creation

Inferring ontologies from tables Religion Population Albanian Roman Shi’a Sunni Country (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other Afganistan 26,813,057 15% 84% 1% Albania 3,510,484 20% 70% 10%

Merging and mapping ontologies

Interpret tables from sibling pages Different Same

Interpret tables from sibling pages

C-XML: Conceptual XML XML Schema C- XML

Free-form query

Parse free-form query “Find me the and of all s – I want a ”pricemileageredNissan1996or newer >= Operator

Select appropriate ontology “Find me the price and mileage of all red Nissans – I want a 1996 or newer”

 Conjunctive queries and aggregate queries  Projection on mentioned object sets  Selection via values and operator keywords  Color = “red”  Make = “Nissan”  Year >= 1996 >= Operator Formulate query expression

For Let Where Return Formulate query expression

Ontology transformations Transformations to and from all

Generated RDF