Omega Ontology: Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI.

Slides:



Advertisements
Similar presentations
KR-2002 Panel/Debate Are Upper-Level Ontologies worth the effort? Chris Welty, IBM Research.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Modeling Semantic Relations Expressed by Prepositions Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Semantics.
Semantic Frames: FrameNet. What is FrameNet? FrameNet is an ongoing project at the International Computer Science Institute located in Berkeley California.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
Helpful Hints to Conduct and Write a Literature Review October 2006.
Survey of Annotation Work Joint session Thursday afternoon, April 14 Chair: Eduard Hovy, ISI.
LCS and Approximate Interlingua at UMD Semantic Annotation Planning Meeting April 14, 2004 Bonnie J. Dorr University of Maryland.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
OntoNotes project Treebank Syntax Training Data Decoders Propositions Verb Senses and verbal ontology links Noun Senses and targeted nominalizations Coreference.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Factor Trees. What are Factor Trees for? Ever seen a question that said find the prime factors of a number? Use factor trees to work it out! Contents.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Ontology Learning from Text: A Survey of Methods Source: LDV Forum,Volume 20, Number 2, 2005 Authors: Chris Biemann Reporter:Yong-Xiang Chen.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
From Allesandro Lenci. Linguistic Ontologies Mikrokosmos (Nirenburg, Mahesh et al.) Generalized Upper Model (Bateman et al.)Generalized Upper Model WordNet.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Evgeniy Gabrilovich and Shaul Markovitch
Semantic Annotation for Interlingual Representation of Mulilingual Texts Teruko Mitamura (CMU), Keith Miller (MITRE), Bonnie Dorr (Maryland), David Farwell.
Working with Ontologies Introduction to DOGMA and related research.
NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder.
Using Ontologies to Enable Access to Multiple Heterogeneous Databases CARDGIS Eduard Hovy Information Sciences Institute University of Southern California.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Upper Ontology Summit The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National Center for Ontological Research National.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
English Proposition Bank: Status Report
Lexicons, Concept Networks, and Ontologies
Coarse-grained Word Sense Disambiguation
Talp Research Center, UPC, Barcelona, Spain
Can SNOMED CT be harmonized with an upper-level ontology?
Ontology Engineering: from Cognitive Science to the Semantic Web
Teach A level Computing: Algorithms and Data Structures
Ontology.
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Other time considerations
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Ontology.
C SC 620 Advanced Topics in Natural Language Processing
B-Trees.
Presentation transcript:

Omega Ontology: Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI

Omega content and framework Concepts: 120,604 Concept/term entries [76 MB]: –WordNet (Princeton; Miller & Fellbaum) –Mikrokosmos (NMSU; Nirenburg et al.) –Penman Upper Model (ISI; Bateman et al.) –25,000+ noun-noun compounds (ISI; Pantel) –Upper model: ResearchCYC, DOLCE, SUMO (not aligned) Lexicon / sense space: –156,142 English words; 33,822 Spanish words –271,243 word senses 13,000 frames of verb arg structure with case roles: –LCS case roles (Dorr) [6.3MB] –PropBank roleframes (Palmer et al.) [5.3MB] –Framenet roleframes (Fillmore et al.) [2.8MB] –WordNet verb frames (Fellbaum) [1.8MB] Associated information (not all complete): –WordNet subj domains (Magnini & Cavaglia) [1.2 MB] –Various relations: learned from text (ISI; Pantel) –TAP domain groupings (Stanford; Guha) –SemCor term frequencies [7.5MB] –Topic signatures (Basque U; Agirre et al.) [2.7GB] Instances [10.1 GB]: –1.1 million persons harvested from text –765,000 facts harvested from text –5.7 million locations from USGS and NGA Framework ( over 28 million statements of concepts, relations, & instances) : –Available in PowerLoom –Instances in RDF –With database/MYSQL –Online browser –Clustering software –Term and ontology alignment software Goal: one environment for various ontologies and resources

Omega in annotation Used as sense inventory for IAMTC and OntoBank projects –IAMTC: annotators choose directly from Omega; choose both Mikrokosmos and WordNet source tokens –OntoBank: PropBank procedure: expert creates set of senses; annotators choose sense(s) from set; we anchor senses into Omega Lessons: –WordNet is too fine-grained, Mikro too coarse  need middle group (approx. 60K concepts in Middle Model) –PropBank sense groups based on frame roles, so may group together ontologically distinct meanings (same word, same frameset, diff meanings)  need procedure to recluster sense groups

From words to concepts Lexical space –Words –Monolingual –“drive” –“steer” –“fahren” –“steuern” –“besturen” –“rijden” –“drijven” –… Sense space –Word senses –Multilingual –Drive1 –Drive2 –Drive3 –… –Manage Concept space –Concepts –Interlingual (?) ? How many concepts? How related to senses?

From senses to concepts: Graduated refinement 1.Initialization: Given a term (word), collect several dozen sentences containing it. Also collect definitions from various dictionaries 2.Cluster the word’s senses into preliminary, loosely similar groups 3.Differentiation process: Begin a tree structure with all the groups at the root 4.Considering all the groups, identify the group most different from the others 1.If you can find one clearly most different group, write down its most important distinction explicitly — this will later become the differentium and be formalized axiomatically 2.If you cannot find any distinctions by which to further subdivide the group, stop elaborating this branch and continue with some other branch 3.If you can find several distinctions that subdivide the group in different, but equally valid, ways, also stop elaborating this branch and continue with some other branch 5.Create two new branches in the evolving tree structure, putting the new group under one, and leaving the other groups under the other 6.Repeat from step 4, considering separately the group(s) under each branch 7.Concept formation: When all branches have stopped, the ultimate result is a tree of increasingly fine- grained distinctions, which are explicitly listed at each branch point. Each leaf becomes a single concept, not further differentiable in the current task/application/domain. Each distinction must be formalized as an axiom that holds for the branch it is associated with 8.Insertion into ontology: Starting from the top, visit each branch point. Do the two branches have approximately the same meaning? 1.If so, insert them into the ontology at the appropriate point and stop traversing this branch 2.If not, split the tree and repeat step 8 separately for each branch. Repeat until done

An exercise: “drive” “drive” (1,2…12) 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle (2,3,4,8,9,10) (1,5,6) (1,2,3,4,5,6,8,9,10) (7,11,12) 7: “drive mad”11: taxi 12: “drive a hard bargain” 5: business6: dollar (5,6) 1: demons

Deeper semantic “drive” “drive” (1,2…12) 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle (2,3,4,8,9,10) (1,5,6) (1,2,3,4,5,6,8,9,10) (7,11,12) 11: taxi 12: “drive a hard bargain” 5: business6: dollar (5,6) 1: demons 7: “drive mad”

Ontologizing “drive” 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle (2,3,4,8,9,10) (1,5,6) (1,2,3,4,5,6,8,9,10) 12: “drive a hard bargain” 11: taxi 5: business6: dollar (5,6) 1: demons 7: “drive mad”

From words to concepts Lexical space –Words –Monolingual –“drive” –“steer” –“fahren” –“rijden” –… Sense space –Word senses –Multilingual –Drive1 –Drive2 –Drive3 –… Concept space –Concepts –Interlingual (?) 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle (2,3,4,8,9,10) (1,5,6) (1,2,3,4,5,6,8,9) 12: “drive a hard bargain” 11: taxi 5: business6: dollar (5,6) 1: demons 7: “drive mad” Granularity: choose level Generally fewer concepts than senses Complex sense-concept mappings

Statistics Not yet done on large scale Tests show (when set of senses is given): –Approx. 20 mins needed to sense cluster and create major families (incl. difference axiom specification) –Approx. 20 mins to insert into ontology and verify axioms –Comparable to speed of sense creation in PropBank –Inter-human agreement increases with training (for simple adj clusters, over 80% agreement with little training) Can apply CBC (high-quality mutual-information based clustering) to speed up process