Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences Max Silberztein University.

Similar presentations


Presentation on theme: "Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences Max Silberztein University."— Presentation transcript:

1 Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr

2 Outline ● Why INTEX/NOOJ should be a tool of choice? ● raising language awareness ● studying linguistics – lexical analysis ● morphology – paradigms – word formation ● automatic lexical acquisition – syntax ● local grammars – semantic tagging

3 List of useful features ● instant lexical lookup ● linguistically sophisticated lexicon ● intuitive graphical interface ● fast, robust, finite-state technology ● corpus, lecxicon, grammar handled uniformly ● instant confirmation from corpus ● can be used at different levels of competence ● simple corpus query tool ● grammar development environment ● research tool for NLP projects

4 Morphology I - Inflection paradigms handled in the form of fst’s

5 Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character

6 Morphology II derivation ● All the forms derived from the root ‘fran-’ ● Ideal to learn and experiment with morphological segmentation

7 Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

8 Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

9 Syntax ● grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)

10 Instant feedback from corpus

11 Labelled bracketing ● hit strings may be tagged (merge mode) ● [NP a soft, slow step NP] ● or replaced with bracketing ● [NP NP]

12 Disambiguation ● Very – Adjective or Adverbs

13 Recursion – embedded graphs

14 An exercise in semantic tagging ● Expressions of time

15 An exercise in semantic tagging ● Expressions of time

16 Finally, not for the faint hearted … ● the big picture

17 Conclusions ● Teaching linguistic analysis by doing it ● INTEX/NooJ is [det THE] technology to use honestly … All welcome to have a go at it Thank you for your attention!


Download ppt "Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences Max Silberztein University."

Similar presentations


Ads by Google