ISO TC37/SC4 – Tilburg 2007 Data categories for (lexical semantics and) reference annotation Susanne Alt ATILF-CNRS, Nancy, France & BBAW, Berlin, Germany.

Slides:



Advertisements
Similar presentations
TMF - a tutorial Part 3: Designing (schemas and) filters TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Advertisements

The Partitive French 1.
A Note how comparisons are expressed in the following sentences:
IRCS Workshop on Linguistic Databases December 2001 Philadelphia Standards for Language Resources Nancy IDE Department of Computer Science Vassar.
Reference Resolution And Cognitive Grammar Susanne Salmon-Alt Laurent Romary Loria - Nancy, France ICCS-01 San Sebastian, May 2001.
The way to open resources Laurent Romary CNRS. Two aspects of scientific communication Research papers –All types (Conferences, journals, grey literature.
Experience with OLAC for the ATILF archives Laurent Romary and Zina Tucsnak INRIA-LORIA, CNRS-ATILF LREC Symposium: The Open Language Archives Community.
LUXEMBOURG, NOVEMBER 18 TH 2010 Improve access to EU content through thesaurus matching Jérôme Euzenat - INRIA Laurent Bégin - Mondeca.
Prepositions. PREPOSITIONS: *sous - under *devant - in front of *sur - on *derrière- behind *dans - in *à côté de - next to *à gauche de - to the left.
The partitive and the definite article DE LA, DES, DU, DE L LE, LA, LES, L PAS DE.
Manger et boire Le (masc)La (fem)Les (pl) Le fromageLa viandeLes pommes de terre Le pouletLa confitureLes oeufs Le poissonLes céréales Le painLes fruits.
French Vocabulary FOOD AND DRINK. Les fraises Pronunciation?
A.Comment dit-on? la viande = les oeufs = le poisson = B.Find 4 reasons that this food group is good for you C. List 5 more items belonging.
Foods Review: Predict the outcome when certain foods (both new & review vocab) are added together.
DEFINITE ARTICLES: le, la, l’, les INDEFINITE ARTICLES: un, une, des
WALT: UNDERSTAND HOW CHRISTMAS IS CELEBRATED IN FRANCE WILF1: a cultural understanding of how Christmas is celebrated in France WILF2: basic topic related.
The energy issue and the possible contribution of various nuclear energy production scenarios part II H.Nifenecker Scientific consultant LPSC/CNRS Chairman.
Jeopardy les repas le café la table la nourriture Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy.
Le Réveillon de Noël. It takes place on Christmas Eve (24 th December)
Open Access to Humanities Data — a scholarly perspective Laurent Romary Inria — French national research center in computer science Humboldt University.
(c) Patricia Barry Photographs and Creation © Patricia Barry 2001.
Food - La Cuisine By Jacques. Fruits – Les Fruits  Cherries – la cerise  Strawberries – la fraise  Oranges – l’orange  Apples – le pomme  Peaches.
Français I Les Notes #7. Voici- This is/ Here is Voilà- There is Qui est-ce?- Who is it? C’est…- It’s… un garçon- a boy un ami- a friend (m) un copain-
French Nouns and Gender
Le Poisson d’avril: le 1 er avril Avril. Le Poisson d’Avril: le 1 er avril There are many stories describing the history of Le Poisson d’Avril. It is.
Les adverbes de quantité partie 2 une verre de… a glass of un tasse de…a cup of un bol de… a bowl of un sac de…a bag of un paquet de…a package of/ carton.
French nouns - countries. French nouns Nouns are words that name a person, place or thing. All nouns in French have a gender. Words are either masculine.
Le Jardinage Yang Giorgina Ricky. Le Jardinage Gardening is the practice of growing and cultivating plants as part of horticulture. A gardener is someone.
Representing dictionaries with the TEI Proposal for basic guidelines Laurent Romary - Max Planck Digital Library With the help of Susanne Alt - CNRS.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
Curating academic publications a perspective for research libraries Laurent Romary INRIA & HUB-IDSL.
Nouns –’the’ and plurals
You have already learned the partitive in another form. You learned it when it meant of, from, and about. It is “de”. The forms of it are: Masc: du Fem.
Standards for language resources the ISO/TC 37(/SC 4) perspective
►Thierry Declerck (DFKI GmbH, LT Lab. Saarbrücken, Germany) Standards and Infrastructures for Language Resources.
LIRICS mid-term review 1 LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck DFKI-LT - Saarbrücken 23rd May 2006.
Using the TEI framework as a possible serialization for LMF Laurent Romary INRIA & HUB-IDSL
ISO Project Semantic Annotation Framework, Part 2: Dialogue Acts Editorial Group first meeting Pisa, September 2008 TC 37/SC 4/WG 2 Kiyong.
Year 8 – PBL 3 Une Visite WALT: To talk about places and countries WILF: To say where you are been to (level 5)
An electrophysiological study of gender agreement transfer in early language learners Katherine J. Midgley 1,2, Nicole Y. Y. Wicha 3, Phillip J. Holcomb.
ISO TC 37 / SC4 Language Resources An overview (Ammended 2-5 février 2002) Laurent Romary.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
Powerpoint Jeopardy Vocabulaire 1Vocabulaire 2Les verbesLes quantitésQuel article ?
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
Irregular plurals. To make nouns plural, you add an “s” just like English. You will change the article also. Indefinite articles will change from un and.
Nicoletta Calzolari Berlin, October PWI ISO SC 4/WG 4 Lexicon-Ontology relations PWI Nicoletta Calzolari Exploratory meeting.
WALT: SAY WHAT I EAT AND DRINK AND GIVE OPINIONS. WILF: ACCURATE USE OF THE WORD “SOME” WITH CORRECT GENDER AND A VARIETY OF OPINIONS FOR LEVEL 4. USE.
TEI council report Business as usual or le calme avant la tempête… Laurent Romary, INRIA Gemo & Humbodt Univ. IDSL.
ISO/IEC JTC1 SC 32 WG1 eBusiness. WG 1 scope Standardization in the field of generic information technology standards for open electronic data interchange.
Salmon-Alt & Romary on Reference Annotation Fourth workshop on multimodal semantic representation, Tilburg Jan 2005.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
ISO/TC37/SC4/N377 secretary report
Progression Through the Framework. Framework objectives Find people in the room who have objectives from the same strand and skill as you. Then get yourselves.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
ISO TC37/SC4 N429 ISO/TC37/SC4/TDG6 Language Resource Ontologies /12, Busan /12, Busan HASIDA Koiti HASIDA Koiti
ISO/TC37/SC4 Draft Resolution
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
Handy Food Vocab 3 Fruit with the Angry Family I am Mr Angry and Confused Wolfman, and only the following fruits make me happy:
 Le 12 mai 2015 – C’est mardi. For each food listed, tell me a utensil or dish that you need for it. Make sure each answer has a different utensil/dish!
Models and standards for onomasiological and semasiological lexical data Laurent Romary Inria & BBAW & DARIAH COST ENEL meeting 31 March 2016.
Lirics mid-term review
I can understand how to pronounce six fruits in French
KEY LANGUAGE P2 / PLS1 / BLOCK 4.
Max Planck Digital Library (MPDL) Supporting the scientific information workflow within the Max Planck Society Malte Dreyer.
KEY LANGUAGE P3 / PLS1 / BLOCK 4.
KEY LANGUAGE P1 / PLS1 / BLOCK 4.
ESciDoc Overview Malte Dreyer.
Max Planck Digital Library (MPDL) Supporting the scientific information workflow within the Max Planck Society M. Dreyer.
Presentation transcript:

ISO TC37/SC4 – Tilburg 2007 Data categories for (lexical semantics and) reference annotation Susanne Alt ATILF-CNRS, Nancy, France & BBAW, Berlin, Germany Laurent Romary INRIA, France & MPDL, Berlin, Germany

ISO TC37/SC4 – Tilburg 2007 Reference annotation Links between markables Various views –Coreference: identity of the referent –{une poire, la, l’une} –{une pomme, le fruit, l’autre} –{(une poire, une pomme), les} –Anaphora: interpretational dependency –une poire <= la peau –l’une <= l’autre Prendre une poire et la couper. Enlever la peau. Laver une pomme. Éplucher le fruit. Les faire cuire. Servir l’une et l’autre avec de la glace.

ISO TC37/SC4 – Tilburg Global Meta-data RAF: Reference Annotation Framework Referential Data Collection n Markable n Referential Link 1..1

ISO TC37/SC4 – Tilburg 2007 Important issues Markables as autonomous units –No isomorphism to source data –complex markables, zero pronouns, discourse deixis, disfluencies –Necessity to identify non-referring units in a homogeneous way –Cf. Byron & Gegg-Harrison (2004) –Possible overwriting of inherited features –Gender, POS refinement –Markable specific data categories Links as autonomous units –Specific annotation mechanisms –e.g. Ambiguity, same source markable involved in different links –Link specific data categories => Markables and links may be annotated in different phases and by different annotators (cf. alignment...)

ISO TC37/SC4 – Tilburg 2007 une pomme nounPhrase le fruit nounPhrase coreference hypernymy Prendre une pomme. Eplucher le fruit.

ISO TC37/SC4 – Tilburg 2007 The current jungle of "attributes" direct anaphor, identity, coreference, identity of reference, bridging, part-whole, associative, reference to part of landmark, indirect anaphor, larger situation, unfamiliar, designation, conceptual bridging, set-subset, miscellaneous, cause, inferable-of-complement, propositional, possessive, implicit argument, ellipsis, plural NP, numerical pronoun, substitution form, identity of reference with two landmarks, NP predication, member, general relation, event relation, argument, proper name, bound anaphor, function-value, instantiation, agent, patient, attribute, partitive, strict possession, cause, other-anaphor… classification and description of data categories

ISO TC37/SC4 – Tilburg 2007 Data categories for RAF Markables –Lexical Semantics Data Categories “... are related to properties of semantic entities. Dependent on the underlying theory, semantic entities might be instantiated as concepts or referents. The following features are primarily considered as being lexicalized features. A strong indicator in favour of lexicalization is specific grammatical mark-up in some languages, as for example for animacy, alienability or collectiveness. However, in many cases, the value of a lexicalized or default semantic feature might be overwritten in discourse.” –Miscellaneous Semantic Data Categories “... groups other properties of semantic entities, useful for reference annotation. They might not be considered as lexicalized, but as discourse dependent features..” –Definiteness Data Categories “... are properties of linguistic units, mainly noun phrases, concerned with the identifiability and non-identifiability of their referents on the part of a speaker or addressee.”

ISO TC37/SC4 – Tilburg 2007 Overview Lexical Semantics Data Categories –/abstractness/ –/animacy/ –/alienability/ –/collectiveness/ –/countability/ Miscellaneous Semantic Data Category –/entityCategorization/ –/naturalGender/ –/cardinality/ Definiteness Data Categories –/definiteIdentifiableTerm/ –/genericTerm/ –/indefiniteTerm/ –/nonSpecificTerm/ –/specificTerm/

ISO TC37/SC4 – Tilburg 2007 Referential, lexical or syntactic property ? Not always syntactically marked. Die M ö bel waren zu verkaufen. Das Gefieder war schwarz. Not predictible from the referent. Die Federn waren schwarz. Das Gefieder war schwarz. Therefore – considered as lexicalized – sources, notes, explanations – possible overriding in discourse Le vin est bon. Les vins sont bons.

ISO TC37/SC4 – Tilburg 2007 Data categories from MAF, SynAF Relevant information percolated from lower levels –/part of speech/ –/grammatical gender (number, person, etc.)/ –/syntactic category/ –{ /noun phrase/, … } –Consensus hardly achievable on the possible values… –/syntactic function/ –{ /subject/, /object/, …} –Consensus… 

ISO TC37/SC4 – Tilburg 2007 Data categories for RAF Links –Lexical Relation Data Categories “... are relations between lexical items. For reference annotation, they might be extended to larger linguistic units, such as noun phrases.” –Coreference Relation Data Category “...an equivalence relation between linguistic expressions referring to the same extra- linguistic entity.” –Objectal Relation Data Categories “... are a generalisation of van Deemter and Kibble’s (2000) extensional approach to the definition of coreference in terms of relations holding between referents of linguistic expressions: an objectal relation holds between extra-linguistic entities, defines relations from a referential viewpoint.”

ISO TC37/SC4 – Tilburg 2007 Overview Lexical Relation Data Categories –/synonymy/ –/hyponymy/ –/hypernymy/ –/compatibility/ –/incompatibility/ –/meronymy/ –/lexicalIdentity/ Coreference Relation Data Category –/coreference/ Objectal Relation Data Categories –/objectalIdentity/ –/partOf/ –/subset/

ISO TC37/SC4 – Tilburg 2007

une pomme noun hrase le fruit nounPhrase Prendre une pomme. Eplucher le fruit.

ISO TC37/SC4 – Tilburg 2007 une pomme nounPhrase le fruit nounPhrase objectalIdentity hypernymy Prendre une pomme. Eplucher le fruit.

ISO TC37/SC4 – Tilburg 2007 Metadata for annotation schemes A general issue in annotation schema design –Global information –Annotator(s), tool, date –Pointer to scheme specification = DCS (Data Category Selection) –Inter-annotator agreement –Revision information –Local information : markables, links –Annotator (markable ≠ links) –Confidence level (cf. tools) –Update, correction Sources: –OLAC (Open Language Archive Community), IMDI (ISLE Metadata Initiative), TEI (Text Encoding Initiative)