AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University

Slides:

Advertisements

Similar presentations

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.

Advertisements

Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.

So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.

Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.

Interoperability of Distributed Component Systems Bryan Bentz, Jason Hayden, Upsorn Praphamontripong, Paul Vandal.

Semiautomatic Generation of Resilient Data Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.

Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.

April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:

Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.

Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract.

Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.

BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.

Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.

DLLS Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:

Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.

ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,

Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary

From OSM-L to JAVA Cui Tao Yihong Ding. Overview of OSM.

DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.

UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

1 Extracting RDF Data from Unstructured Sources Based on an RDF Target Schema Tim Chartrand Research Supported By NSF.

PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.

BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.

BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Overview of Search Engines

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.

Objects What are Objects Observations

Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,

Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.

Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.

Concepts and Terminology Introduction to Database.

Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.

Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.

RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,

Dimitrios Skoutas Alkis Simitsis

Mining fuzzy domain ontology based on concept Vector from wikipedia category network.

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.

Using Meta-Model-Driven Views to Address Scalability in i* Models Jane You Department of Computer Science University of Toronto.

SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.

Introduction to the Semantic Web and Linked Data

Working with Ontologies Introduction to DOGMA and related research.

Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.

Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.

Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

Of 24 lecture 11: ontology – mediation, merging & aligning.

Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.

SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.

UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)

2. An overview of SDMX (What is SDMX? Part I)

Automating Schema Matching for Data Integration

Block Matching for Ontologies

Building Ontologies with Protégé-2000

Presentation transcript:

AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University (Boosting conceptual content for ontology generation)

AAAI 2002 WS 2 Acknowledgements Co-authors (Embley, Ding) EU Fifth Framework IST/HLT NSF Information and Intelligent Systems grant IIS Gerhard Budin (Eurodicautom data) Sergei Nirenburg (Mikrokosmos ontology)

AAAI 2002 WS 3 Outline Termbases and lexicons: (re)use(s) The SALT and TIDIE projects Data modeling and data resources Termbase conversion Ontology generation Results and evaluation Conclusions

AAAI 2002 WS 4 Termbases Terminology databases for humans in multilingual documentation industry Several models, formats; often concept-oriented in nature Termium, Eurodicautom, etc.

AAAI 2002 WS 5 Lexicons NLP applications: IR, MT, NLU, speech understanding Widely varying data formats Description at various levels of linguistic theory

AAAI 2002 WS 6 Sharing resources Integration is the trend Lexicons (OLIF for MT system lexicons) Termbases (MARTIF for human termbases) Lexicons and termbases Needed: principled data-modeling approach Wide variety of information to be treated Wide range of formats currently in use

AAAI 2002 WS 7 The SALT project SALT: Standards-based Access service to multilingual Lexicons and Terminologies ( International cooperation, standards for coding and interchange of linguistic data, and the combining of technologies Several partners (BYU TRG, KSU, etc.) Data modeling approach to addresses the problem of interchange among diverse collections of such data, including their ontological substructure

AAAI 2002 WS 8 The SALT approach Goal: provide 1)Modularity differentiate core structure vs. data category specification 2)Coherence use a meta-model 3)Flexibility Support interoperable alternative representations Modular meta-model approach Implemented in various settings Ongoing refinement: model’s coverage

AAAI 2002 WS 9 The TIDIE project TIDIE: Target-based Independent-of- Document Information Extraction ( Ontology-based data extraction Conceptual modeling of real-world applications Narrow, data-rich domains Leverage (or build) custom ontologies for target-based extraction

AAAI 2002 WS 10 Information exchange SourceTarget Information Extraction Schema Matching Leverage this … … to do this

AAAI 2002 WS 11 Information Extraction Examine/retrieve information from documents to fill information from user- supplied template Requires some user-oriented specification of information Our approach: finding, extracting, structuring, and synthesizing information is easier given a conceptual-model-based ontology

AAAI 2002 WS 12 Extracting pertinent information from documents

AAAI 2002 WS 13 A Conceptual Modeling Solution YearPrice Make Mileage Model Feature PhoneNr Extension Car has is for has 1..* * * 1..*

AAAI 2002 WS 14 Car-Ads Ontology Car [->object]; Car [0..1] has Year [1..*]; Car [0..1] has Make [1..*]; Car [0...1] has Model [1..*]; Car [0..1] has Mileage [1..*]; Car [0..*] has Feature [1..*]; Car [0..1] has Price [1..*]; PhoneNr [1..*] is for Car [0..*]; PhoneNr [0..1] has Extension [1..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; }, … End;

AAAI 2002 WS 15 Recognition and Extraction Car Year Make Model Mileage Price PhoneNr Subaru SW $1900 (363) Elandra (336) HONDA ACCORD EX 100K (336) Car Feature 0001 Auto 0001 AC 0002 Black door 0002 tinted windows 0002 Auto 0002 pb 0002 ps 0002 cruise 0002 am/fm 0002 cassette stero 0002 a/c 0003 Auto 0003 jade green 0003 gold

AAAI 2002 WS 16 Lexical resources for data modeling Information extraction also requires knowledge representations with terminological and conceptual content. Extraction ontology knowledge sources must: be of a general nature contain meaningful relationships already exist in machine-readable form have a straightforward conversion into XML. This paper: create, leverage large-scale termbase some ontological structure reformatted according to the SALT standard converted into μK-compliant XML for use by the ontology generator

AAAI 2002 WS 17 Eurodicautom Well-known, widely-used termbase > 1 million concept entries Wide range of topics Entries are multilingual Entry information: sources cited, input/approval dates, … Single-word terms (e.g. “generator”) or multi-word expressions (e.g. “black humus”) Entries each have Lenoch subject-area code Hierarchical representation for classifying terms (and by extension their related concepts)

AAAI 2002 WS 18 Partial Eurodicautom entry %CM AG4 CH6 GO6 %DA %VE lavmosetørv %RF A.Klougart %EN %VE black humus %RF CILF,Dict.Agriculture,ACCT,1977 %IT %VE humus nero %RF BTB %ES %VE humus negro %RF CILF,Dict.Agriculture,ACCT,1977 %SV %VE sumpjord %RF Mats Olsson,SLU(1997)

AAAI 2002 WS 19 Sample Lenoch codes AD Public Administration - Private Administration - Offices AD1 general aspects of the subject field AD2 public and private organisations AD3 publications & documentary search AD31 documentation and information systems AD4 administrative staff AD5 public procurement AD51 expropriation in the public interest TEH testing methods TEH1 general aspects of testing methods TEH2 non-destructive testing TEH21 chemical tests TEH22 photometrical testing TEH221 X-ray spectrometrical testing

AAAI 2002 WS 20 Converting the termbase Use several thousand English terms and their subject codes %CM line lists three Lenoch codes: AG4 (representing the subclass AGRONOMY), CH6 (representing ANALYTICAL-CHEMISTRY) GO6 (representing GEOMORPH-OLOGY). Convert termbase entries via the SALT-developed TBX termbase exchange framework XML-based refinement of MARTIF Convert to μK XML format used by ontology engine Result: TBX-mediated conversion from native Eurodicautom terms to the final XML-specified ontology (μK) Lenoch codes re-interpreted as typical hierarchical relations (e.g. IS-A and SUBCLASS)

AAAI 2002 WS 21 Conversion process Eurodicautom (native) Lenoch Eurodicautom (TBX) SALT Eurodicautom (μK)

AAAI 2002 WS 22 Eurodicautom-TBX encoding sample Eurodicautom entry DXLTdv04.xml BTB DAG77 4 souto fullForm BTB-DAG77-63 V.Correia,EngÂº AgrÃ³nomo,PDR Vale do Lima minifÃºndio fullForm BTB-DAG77-63 V.Correia,EngÂº AgrÃ³nomo,PDR Vale do Lima

AAAI 2002 WS 23 Derived XML ontology xenobiotic substances SUBCLASSES VALUE/FACET> hazardous raw materials 0 physical nuisances SUBCLASSES VALUE/FACET> ambient light 0 financial statistics IS-A VALUE/FACET> economic statistics 0 ….

AAAI 2002 WS 24 Ontology generation Goal: specify an ontology for information extraction purposes Problem: complex, tedious, costly Ideally: automatically generate schemas, ontologies Source: natural-language text, tables, etc.

AAAI 2002 WS 25 Ontology generation overview

AAAI 2002 WS 26 Knowledge sources Mikrokosmos (μK) ontology About 5,000 hierarchically-arranged concepts Fairly high connectivity ( about 14 inter-concept links per node) Fairly general content, inheritance of properties Data frame library regular-expression templates for matching structured low-level lexical items (e.g. measurements, dates, currency expressions, and phone numbers) provide information for conceptual matching via inheritance Lexicons (e.g. onomastica, WordNet synsets) Domain-specific training documents

AAAI 2002 WS 27 Knowledge integration

AAAI 2002 WS 28 Methodology Preprocess input knowledge sources: Integrate: map lexicon content and data frame templates to nodes in the merged ontology Extract: match information from training documents collection Parse, tokenize, regularize lexical content Generate the ontology: four-stage generation process concept selection relationship retrieval constraint discovery refinement of the output ontology

AAAI 2002 WS 29 Processing input documents

AAAI 2002 WS 30 Concept selection Finding which subset of the ontology’s concepts is of interest to a user Concepts are selected via string matches between textual content and the ontological data. Three different selection heuristics concept-name matching concept-value matching data-frame pattern matching String matching plus: word synonym matching: WordNet synonym sets multi-word term matching: bag-of-words (CAPITAL-CITY is considered a synonym of capital and city)

AAAI 2002 WS 31 Concept selection algorithm PROCEDURE ConceptSelection(Tdoc, Kbase) SourceDoc = Parse(Tdoc); PrimarySelectedConceptsList = MikroSelection(M-Ontology); SecondarySelectedConceptsList = DataFrameSelection(DF- Library); ConflictHandling(); SelectedSubgraphGeneration();

AAAI 2002 WS 32 Basic Selection Strategy Select from Mikrokosmos Ontology Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

AAAI 2002 WS 33 Basic Selection Strategy Select from Mikrokosmos Ontology concept names and their synonyms Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population :17.7 million. Agriculture:Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

AAAI 2002 WS 34 Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population :17.7 million. Agriculture:Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. Basic Selection Strategy

AAAI 2002 WS 35 Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Select from Data Frame Libraries Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. Basic Selection Strategy

AAAI 2002 WS 36 Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Select from Data Frame Libraries extract result based on the data frames Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. Basic Selection Strategy

AAAI 2002 WS 37 Concept conflict resolution Arrive at an internally consistent set of selected concepts. Two levels of resolution Document-level resolution Knowledge-source resolution Criteria: lexical occurrence, proximity, length and distribution of words and terms Preferences from among knowledge sources specifying matches Other default strategies

AAAI 2002 WS 38 Document-Level Conflict Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

AAAI 2002 WS 39 Concept-Level Conflict Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population : 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

AAAI 2002 WS 40 Relationship retrieval Ontology structure: directed graph, nodes are concepts Conceptual relationship: all paths connecting concepts generated at given stage Theoretical solution: find all the paths in the graph (NP- complete) When multiple paths do exist, take the shortest path between 2 concepts (Cf. μK Onto-Search algorithm) Dijkstra’s (polynomial) algorithm to compute the most salient relationships between concepts Distance threshold on path length to prune weak relationships Construct schemas, or linked conceptual configurations, from the relationships posited in the previous step. Primary concept selected (or posited): highest connectivity Cardinalities inferred from observed relationships

AAAI 2002 WS 41 Participation Constraints Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital—Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. CapitalCity [1:1] IsA.CITY.PartOf Nation [1:1]

AAAI 2002 WS 42 Participation Constraints (2) Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities --Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. City [1:1] PartOf Nation [1:*]

AAAI 2002 WS 43 Refining results Output ontology: may require hand-crafting can be done in a text editor (flat ASCII ontology) Considerable expertise required: markup syntax specification of conceptual relations. familiarity with regular-expression writing Possible solution: ontology editors for typical end- users With rich enough knowledge sources and a good set of training documents, however, we believe that the generation of extraction ontologies can be fully automatic.

AAAI 2002 WS 44 Testing the system Input: various of U.S. Department of Energy abstracts Knowledge base: μK ontology Energy sub-hierarchy of Eurodicautom terms (300)

AAAI 2002 WS 45 Sample application document The trend in supply and demand of fuel and the fuels for electric power generation, iron manufacturing and transportation were reviewed from the literature published in Japan and abroad in FY 1986 was a turning point in the supply and demand of energy and also a serious year for them because the world crude oil price dropped drastically and the exchange rate of yen rose rapidly since the end of 1985 in Japan as well. The fuel consumption for steam power generation in FY 1986 shows the negative growth for two successive years as much as 98.1%, or 65,730,000 kl in heavy oil equivalent, to that in the previous year. The total energy consumption in the iron and steel industry in 1986 was 586 trillion kcal (626 trillion kcal in the previous year). The total sales amount of fuel in 1986 was 184,040,000 kl showing a 1.5% increase from that in the previous year. The concept Best Mix was proposed as the ideal way in the energy industry. (21 figs, 2 tabs, 29 refs)

AAAI 2002 WS 46 Sample output -- energy2 Information Ontology energy2 [-> object]; energy2 [0:*] has Alloy [1:*]; energy2 [0:*] has Consumption [1:*]; energy2 [0:*] has CrudeOil [1:*]; energy2 [0:*] has ForProfitCorporation [1:*]; energy2 [0:*] has FossilRawMaterials [1:*]; energy2 [0:*] has Gas [1:*]; energy2 [0:*] has Increase [1:*]; energy2 [0:*] has LinseedOil [1:*]; energy2 [0:*] has MetallicSolidElement [1:*]; energy2 [0:*] has Ores [1:*]; energy2 [0:*] has Produce [1:*]; energy2 [0:*] has RawMaterials [1:*]; energy2 [0:*] has RawMaterialsSupply [1:*]; Alloy [0:*] MadeOf.SOLIDELEMENT.Subclasses MetallicSolidElement [0:*]; Alloy [0:*] IsA.METAL.StateOfMatter.SOLID.Subclasses CrudeOil [0:*]; Alloy [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Produce [0:*]; AmountAttribute [0:*] IsA.SCALARATTRIBUTE.MeasuredIn.MEASURINGUNIT Consumption [0:*] IsA.FINANCIALEVENT.Agent Human [0:*]; ControlEvent [0:*] IsA.SOCIALEVENT.Agent Human [0:*]; ControlEvent [0:*] IsA.SOCIALEVENT.Location.PLACE.Subclasses Nation [0:*]; CountryName [0:*] NameOf Nation [0:*]; CountryName [0:*] IsA.REPRESENTATIONALOBJECT.OwnedBy Human [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.Location.PLACE.Subclasses Nation [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.OwnedBy Human [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.GROW.Subclasses GrowAnimate [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Increase [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Combine [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Display [0:*]; CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Produce [0:*]; Custom [0:*] IsA.ABSTRACTOBJECT.ThemeOf.MENTALEVENT.Subclasses AddUp [0:*]; Display [0:*] IsA.PHYSICALEVENT.Theme.PHYSICALOBJECT.Subclasses Gas [0:*]; Display [0:*] IsA.PHYSICALEVENT.Theme.PHYSICALOBJECT.OwnedBy Human [0:*]; ForProfitCorporation [0:*] OwnedBy Human [0:*]; ForProfitCorporation [0:*] IsA.CORPORATION.HasNationality Nation [0:*]; Gas [0:*] IsA.PHYSICALOBJECT.Location.PLACE.Subclasses Nation [0:*]; Gas [0:*] IsA.PHYSICALOBJECT.ThemeOf.GROW.Subclasses GrowAnimate [0:*]; LinseedOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Increase [0:*];

AAAI 2002 WS 47 Evaluation Several dozen relationships are generated Correct: relationship is posited between the concept CRUDE-OIL and the action PRODUCE; the role is Theme, meaning that one can PRODUCE CRUDE-OIL Incorrect: relationship between GAS and GROW Precision: relatively low (around 75%) due to high number of matches Recall: better (around 90%) Note: it’s easier for a human to refine the system’s output by rejecting spurious relationships (i.e. deleting false positives) than to specify relationships that the system has missed.

AAAI 2002 WS 48 How to improve results Less general, more focused ontologies Richer ontological structure More types of hierarchical relationships (beyond IS-A and its inverse, SUB- CLASSES) Deeper hierarchies (maximum 4 in Lenoch) Note: TBX supports several data types for conceptual encoding

AAAI 2002 WS 49 Related work Lexical chaining in NLP extracting and associating chains of word- based relationships from text relating words and terms to resources like WordNet Widely used in text categorization, automatic summarization, and topic detection and tracking Our contributions: integrating disparate knowledge sources for similar tasks Discovering and generating a compatible set of ontological relationships

AAAI 2002 WS 50 Conclusions The knowledge acquisition bottleneck impacts ontology construction for information extraction. Terminographers and lexicographers codify information that can be advantageous for work in semantic-based processing. Integrating these two disparate areas, it is possible to leverage large-scale terminological and conceptual information with relationship-rich semantic resources in order to reformulate, match, and merge retrieved information of interest to a user. Possible future applications: Knowledge-focused personal agents Customized search, filtering, and extraction tools Individually tailored views of data via integration, organization, and summarization Lots of work still to be done…