Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters
2 Software Applications The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
3 DMR Paradigm of a given lemma classic form stem + termination Accents Syllabification Morphological analysis of a given word
4 Software Applications The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
5 LEXICON Specifying attributes for lexico-morphological classes Designed to collect data from multiple users Friendly interface
6 Software Applications The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
7 SIASTRO-AM Lexico-morphological analysis Parsing of noun, adjective, adverb, verb and prepositional phrases Uses a lexicon based on DMR, enriched with new lexical and syntactic attributes added with the LEXICON application Outputs an annotated text
8 SIASTRO-AM Tags for text elements sentence {F – Start sentence sentence sentence F} – End sentence word {C – Start word word word C} – End word unknown word {N – Start unknown word unknown word unknown word N} – End unknown word number {D – Start number number number D} – End number punctuation sign {S – Start punctuation sign punctuation sign punctuation sign S} – End punctuation sign hyphen {L – Start hyphen - hyphen L} – End hyphen ignored sequence {I – Start ignored sequence sequence ignored sequence I} – End ignored sequence
9 SIASTRO-AM Tags for words {C word ( part of speech + grammatical category , separates parts of speech + grammatical category ) syllabification+accent position:, separates homographs ( ), (......) syllabification+ accent position:+ lemma +: C} {C date (vrb+p_fp+, (vrb+p_fp+, sbt+fdpn+fisn+fipn+fvpa+, sbt+fdpn+fisn+fipn+fvpa+, adj+fdpn+fisn+fipn+fvpa+ adj+fdpn+fisn+fipn+fvpa+ ) da-te+2:+da+:+dată+:+dat+: da-te+2:+da+:+dată+:+dat+:C}
10 Software Applications The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective, adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
11 ETR Desk top
12 ETR Menu bar
13 ETR Files menu
14 Files Menu – New Project
15 Files Files Menu – New Project - Files
16 Files Files Menu – New Project - Files
17 Subject Fields Files Menu – New Project Subject Fields
18 Abbreviations Files Menu – New Project - Abbreviations
19 Initialisms Files Menu – New Project - Initialisms
20 File File Menu – Open Project
21 Contexts File menu – Contexts
22 File menu – Terms
23 File menu – Terminological forms
24 View menu
25 View menu
26 Export menu
27 ETR – Term Extraction
28 ETR – Contexts
29 ETR – Move term in Terminological form
30 ETR – Terminological Forms – contexts
31 Source text
32 ETR – Terminological Form
33 ETR – Future Developments Syntactical analysis Enriching the terminological form by adding new terminological features