Download presentation
Presentation is loading. Please wait.
Published byAlvin Booth Modified over 8 years ago
1
Applying Word Sketches to Russian Máša Khokhlova St.Petersburg State University khokhlova.marie@gmail.com
2
Word Sketches for Russian Grammatical rules that take into account syntactic constructions of the Russian language based on the morphologically tagged corpus; Regular expressions and query language IMS Corpus Workbench; The system searches for tags which correspond to word forms. For example, tag Ncfpnn means common noun (Nc) female gender (f) plural (p) noun case (n).
3
Word Sketch Rules Below there is an example of grammatical rules for the phrases «adjective+noun»: *DUAL =a_modifier/modifies 2:"A....n." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....n."]){0,7} 1:"N...n." 2:"A....g." (([word=","]|[word="и"]|[word="или"]){0,1}[tag="A....g."]) {0,3} 1:"N...g." 2:"A....d." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....d."]){0,3} 1:"N...d." 2:"A....a." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....a."]){0,3} 1:"N...a." 2:"A....i." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....i."]){0,3} 1:"N...i." 2:"A....l." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....j."]){0,3} 1:"N...l."
4
Word Sketch Rules (2) =Verb X/X Verb 2:[tag="V.*"] 1:[tag!="SENT"&tag!=","&tag!="-"] 1:[tag!="SENT"&tag!=","&tag!="-"] [lemma=”не”]? 2:[tag="V.*"] =Noun X 2:[tag="N.*"&lemma!=")."] 1:[tag!="SENT"&tag!=","&tag!="-"&lemma!=")."]
5
Text Corpora Russian Web Corpus – 190 mln tokens Rbc (РосБизнесКонсалтинг) – 22.5 mln tokens Romip (Российский семинар по Оценке Методов Информационного Поиска) – 2.7 mln tokens Corpus Linguistics – 2.7 mln tokens
6
Word sketches for the word “čaj” (Russian Web Corpus)
7
Word sketches for the word “čaj” (news)
8
Word sketches for the word “zelenyj” (Russian Web Corpus)
9
Word sketches for the word “imet’” (Russian Web Corpus)
10
Word sketches for the word “korpus” (texts on corpus linguistics)
11
Word sketches for the word “korpus” (news)
12
Word sketches for the word “korpus” (Web corpus)
13
Word sketches for the word “polucit’” (texts on corpus linguistics)
14
Word sketches for the word “polucit’” (news)
15
Word sketches for the word “polucit’” (Russian Web Corpus)
16
Word sketches for the word “dat’” (Russian Web Corpus)
17
Word sketches for the word “dat’” (texts on corpus linguistics)
18
Word sketches for the word “dat’” (news)
19
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.