An Ontology for Linguistic Representation Scott Farrar, Terry Langendoen, William Lewis University of Arizona.

Slides:



Advertisements
Similar presentations
KR-2002 Panel/Debate Are Upper-Level Ontologies worth the effort? Chris Welty, IBM Research.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
The Study Of Language Unit 7 Presentation By: Elham Niakan Zahra Ghana’at Pisheh.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
The Wichita lexicon in LEXUS Armik Mirzayan University of Colorado at Boulder Jacquelijn Ringersma Max Planck Institute for Psycholinguistics RELISH Workshop.
Statistical NLP: Lecture 3
Morphology and Lexicon Chapter 3
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Autosegmental Phonology
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Language is very difficult to put into words. -- Voltaire What do we mean by “language”? A system used to convey meaning made up of arbitrary elements.
Syntax Lecture 4.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Title: Chinese Characters and Top Ontology in EuroWordNet Paper by: Shun Sylvia Wong & Karel Pala Presentation By: Patrick Baker.
Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.
323 Morphology The Structure of Words 1.1 What is Morphology? Morphology is the internal structure of words. V: walk, walk+s, walk+ed, walk+ing N: dog,
Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
EMELD Workshop on Digitizing Lexical Information Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model Mike Maxwell Linguistic.
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.
Language and Cognition Colombo, June 2011
Top Level Ontologies Ontologies and Ontology Engineering Ekhiotz Vergara and Maria Vasilevskaya Dept. of Computer & Information Science Linköping University.
The Linguistics of Second Language Acquisition
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
A Summary of Terminology in Linguistics. First Session Orientation to the Course Introduction to Language & Linguistics 1. Definition of Language 2. The.
Reasons to Study Lexicography  You love words  It can help you evaluate dictionaries  It might make you more sensitive to what dictionaries have in.
Formal Properties of Language. Grammar Morphology Syntax Semantics.
A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.
Formal Properties of Language: Talk is achieved through the interdependent components of sounds, words, sentences, and meanings.
The Descriptive Grammar as a (Meta)Database Jeff Good University of Pittsburgh and Max Planck Institute for Evolutionary Anthropology.
Morphology An Introduction to the Structure of Words Lori Levin and Christian Monson Grammars and Lexicons Fall Term, 2004.
Metalanguage Revision English language year
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
A very, very brief introduction to linguistics Computational Linguistics, NLL Riga 2008, by Pawel Sirotkin 1.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 2.
VOCABCHAPTER 10. CONCEPT A mental grouping of similar objects, events, ideas, or people.
Linguistic Essentials
Rules, Movement, Ambiguity
The Minimalist Program
Deep structure (semantic) Structure of language Surface structure (grammatical, lexical, phonological) Semantic units have all meaning components such.
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Natural Language Processing Chapter 2 : Morphology.
SYNTAX.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
Language Language - a system for combining symbols (such as words) so that an unlimited number of meaningful statements can be made for the purpose of.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 1 The GOLD Effort So Far Terry Langendoen Brian Fitzsimons Emily Kidder Department of Linguistics.
NATURAL LANGUAGE PROCESSING
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Vocabulary 7b Thinking Language Intelligence. a methodical, logical rule or procedure that guarantees solving a particular problem. Contrasts with the.
Back to Board Welcome to Jeopardy!. Back to Board Today’s Categories~ ~ Cognitive Psychology ~ Solving Problems ~ Obstacles to Solving Problems ~ Language.
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos
Welcome to Jeopardy!.
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
INTRODUCTION TO PHONETICS AND PHONOLOGY
Statistical NLP: Lecture 3
Revision Outcome 1, Unit 1 The Nature and Functions of Language
CSC 594 Topics in AI – Applied Natural Language Processing
Língua Inglesa - Aspectos Morfossintáticos
Levels of Linguistic Analysis
Linguistic Essentials
Introduction to Linguistics
Presentation transcript:

An Ontology for Linguistic Representation Scott Farrar, Terry Langendoen, William Lewis University of Arizona

Overview Discuss a proposal of how linguistic data can be shared over the Semantic Web. Endangered language data (EMELD)  Special focus on a linguistic ontology (conceptual modeling in linguistics)

Ontology and Linguistics How is the lexicon reflected in top-level distinctions? (Gangemi, Guarino, Masolo, and Oltramari 2001) How are functional/grammatical categories reflected in the ontology?

EMELD EMELD (Electronic Metastructure for Endangered Languages Data) As many as half of the world’s languages (3000 out of 6000) are in danger of disappearing LaPolla (1998) Purpose is to preserve endangered language data by creating a community of practice via the Semantic Web.

Linguistic Data Field linguists collect data. Hopi: sivu-’ikwiw-ta-qa [vessel-carry: on: back-DUR-REL] ‘kachina’ Linguistic data includes: grammars, dictionaries, text, sound and video recordings, glossed corpora subject language analysis (markup) gloss (markup)

Grammatical Descriptions English has noun-verb agreement in number and person. Warumungu is ergative-absolutive. Archi has extensive spatial cases. Spanish is SVO.

Facts about Language Nouns represent objects. Tense relates an event with a point in time. Case is a relation between a predicate and its argument, e.g., He knows him. SOV, SVO, OVS are possible natural language word orders. linguistic concepts general concepts

Challenges to Creating a Community of Practice Language data should be searchable and comparable—broad access. Few standardized methods for encoding of language data (cf. EAGLES and TEI). Authors or communities want control over their data.  Local control should be balanced with data interoperability Semantic Web

Example of the Problem google query: [“past tense” Australian languages] Web Warumungu [PAST] Dyirbal [PST] Umpila [Prehodiernal Tense]

Other Examples of the Problem homonymous terminology: A search for PA intended to mean ‘Partitive’ might return PA meaning Past or PerfectiveAspect. “covert” markup: Hopi CAUS really means Causative combined with PerfectiveAspect

Some Further Complications A search for present tense forms in English returns future tense, e.g., “Tomorrow, John goes to Holland.” Some language has past, present, habitual analyzed as grammatical tenses (Hopi). Searching for habitual aspect in Hopi does not guarantee anything to do with aspect.

Solution Strategy for Providing Broad Access to Language Data Integrate the data using metastructure—a linguistic ontology. Make the data available in standard format without imposing a standard for markup. Build tools to access and process the data—query engines, expert systems…

Challenges to Conceptual Modeling of Linguistic Domain domain is large most concepts are abstract linguistic objects are hierarchical language is symbolic field is fragmented—few standards (cf. chemistry, computer science)

Linguistic Ontology Our starting point is morpho-syntax (word parts, tense, case, aspect, inflection) linguistic segments, grammatical concepts, data structures Built on top of the Standard Upper Merged Ontology (SUMO)

Standard Upper Merged Ontology (SUMO) extensible resource already includes a number of concepts related to semiotics and linguistics connection with the NLP community (WordNet; B. Levin’s verb classes; Allen’s tense logic) developed by an IEEE working group and is freely available ( (Niles and Pease 2001)

Physical Entities What entities are physical, i.e., exist in space-time? The word and its parts (written or spoken): Stems, Affixes, Roots, Phonetic segments Larger constituents: Phrases, Clauses, Texts

Taxonomy for LinguisticExpression Entity Physical Object SelfConnectedObject ContentBearingObject Icon SymbolicString LinguisticExpression WrittenLinguisticExpression SpokenLinguisticExpression

Taxonomy for LinguisticExpression WrittenLinguisticExpression WordPart SimpleWordPart Root Affix Prefix Infix Suffix Clitic Stem Word SimpleWord ComplexWord Compound Phrase instance-of ‘un-’ instance-of ‘dog’

Other Possibilities recall: Word SimpleWord ComplexWord Compound Word Noun SimpleNoun ComplexNoun Verb SimpleVerb ComplexVerb Adjective … instance-of‘dog’ instance-of‘jump’ instance-of‘dog’ instance-of‘jump’

Part of Speech as Property Language exhibits categorical ambiguity (e.g., “fish”). No noun/verb distinction in some languages—language specific (e.g., Lummi). Related to the notions of “rigid” and “anti- rigid” w.r.t. properties in Ontoclean (Guarino and Welty 2002)

Mereological Relations for LinguisticExpression A Stem has-part Root. (=> (instance ?STEM Stem) (exists (?PART) (and (part ?PART ?STEM) (instance ?PART Root))))

Mereological Relations for LinguisticExpression A Word has-part Stem. (=> (instance ?WORD Word) (exists (?PART) (and (part ?PART ?WORD) (instance ?PART WordPart))))

Other Axioms for LinguisticExpression None for WrittenLinguisticExpression, without referring to abstract section of ontology. Phonetic overlay (intonation contour)

Abstract Entities What entities are abstract, i.e., qualities or attributes? Grammatical attributes: Tense, Aspect, Case Linguistic data structures: Paradigms, Feature Structures, PhonemeTables, Derivations (as in “Minimalism”) Mental entities: Morpheme, Phoneme, Lexeme

Taxonomy of GrammaticalProperty Abstract Relation Proposition Attribute InternalAttribute RelationalAttribute )? ( GrammaticalAttribute PartOfSpeech Tense Aspect Case …

Relational SyntacticRole subject object CaseRole agent patient These are relational much the same as PositionalAttribute, e.g., above.

Internal PartOfSpeech noun verb determiner Within a grammar, these do not appear to be relational, cf. ShapeAttribute.

Problematic Cases Tense Aspect Case As grammatical attributes, these do not appear to be relational. But what about meaning? After all, that’s the goal of the Semantic Web.

Grammar-Meaning Distinction GrammaticalAttribute should be tied to a language (e.g., to construct paradigms). Semantic notions do not have to be tied to a specific language. Tendency in language is to match these up, e.g., verbs pick out predicates, nouns pick out terms and functions. Salishan languages—no verb-noun distinction.

Tense AbsolutePastTense: (=> (AbsolutePastTense ?SENTENCE) (and (exists ?INTERVAL TimeInterval) (during ?INTERVAL (WhenFn (ProcessFn ?SENTENCE))) (before ?INTERVAL (WhenFn ?SENTENCE)))) HodiernalPastTense—?INTERVAL has to be during ‘yesterday’ RemotePastTense--?INTERVAL has to be before a certain time point (language-specific)

Tense RelativePastTense: (=> (RelativePastTense ?SENTENCE) (and (exists ?INTERVAL TimeInterval) (exists ?POINT TimePoint) (during ?INTERVAL (WhenFn (ProcessFn ?SENTENCE))) (before ?INTERVAL ?POINT)))

Taxonomy of Case Case EventiveCase AffectorCase AffecteeCase NoneventiveCase xxxxCase SpatialCase DirectionalCase PositionalCase instance-of ErgativeCase instance-ofDativeCase instance-of IllativeCase instance-of ExistentialCase

Predicates in the SUMO Linguistic/Semiotic Predicates refers represents containsInformation realization representsInLanguage (containsInformation ?SENT ?PROP) (representsInLanguage ?THING ?ENTITY ?LANGUAGE)

Additional Predicates Need a more linguistically-centered predicate that acts deictically and ‘picks out’ or points to a particular instance. (designates ?LinguisticExpression ?Entity)

Future directions Extend ontology into the domains of phonology and syntax. Recommend markup approach (XML) Explore applications of the ontology beyond the immediate EMELD project

Contact Info Terry Langendoen Scott Farrar See our website: