Download presentation
Presentation is loading. Please wait.
Published byBlanche Harrington Modified over 9 years ago
1
Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr
2
Outline ● Why INTEX/NOOJ should be a tool of choice? ● raising language awareness ● studying linguistics – lexical analysis ● morphology – paradigms – word formation ● automatic lexical acquisition – syntax ● local grammars – semantic tagging
3
List of useful features ● instant lexical lookup ● linguistically sophisticated lexicon ● intuitive graphical interface ● fast, robust, finite-state technology ● corpus, lecxicon, grammar handled uniformly ● instant confirmation from corpus ● can be used at different levels of competence ● simple corpus query tool ● grammar development environment ● research tool for NLP projects
4
Morphology I - Inflection paradigms handled in the form of fst’s
5
Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character
6
Morphology II derivation ● All the forms derived from the root ‘fran-’ ● Ideal to learn and experiment with morphological segmentation
7
Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V
8
Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V
9
Syntax ● grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)
10
Instant feedback from corpus
11
Labelled bracketing ● hit strings may be tagged (merge mode) ● [NP a soft, slow step NP] ● or replaced with bracketing ● [NP NP]
12
Disambiguation ● Very – Adjective or Adverbs
13
Recursion – embedded graphs
14
An exercise in semantic tagging ● Expressions of time
15
An exercise in semantic tagging ● Expressions of time
16
Finally, not for the faint hearted … ● the big picture
17
Conclusions ● Teaching linguistic analysis by doing it ● INTEX/NooJ is [det THE] technology to use honestly … All welcome to have a go at it Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.