Applying Word Sketches to Russian Máša Khokhlova St.Petersburg State University
Word Sketches for Russian Grammatical rules that take into account syntactic constructions of the Russian language based on the morphologically tagged corpus; Regular expressions and query language IMS Corpus Workbench; The system searches for tags which correspond to word forms. For example, tag Ncfpnn means common noun (Nc) female gender (f) plural (p) noun case (n).
Word Sketch Rules Below there is an example of grammatical rules for the phrases «adjective+noun»: *DUAL =a_modifier/modifies 2:"A....n." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....n."]){0,7} 1:"N...n." 2:"A....g." (([word=","]|[word="и"]|[word="или"]){0,1}[tag="A....g."]) {0,3} 1:"N...g." 2:"A....d." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....d."]){0,3} 1:"N...d." 2:"A....a." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....a."]){0,3} 1:"N...a." 2:"A....i." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....i."]){0,3} 1:"N...i." 2:"A....l." (([word=","]|[word="и"]|[word="или"]){0,1} [tag="A....j."]){0,3} 1:"N...l."
Word Sketch Rules (2) =Verb X/X Verb 2:[tag="V.*"] 1:[tag!="SENT"&tag!=","&tag!="-"] 1:[tag!="SENT"&tag!=","&tag!="-"] [lemma=”не”]? 2:[tag="V.*"] =Noun X 2:[tag="N.*"&lemma!=")."] 1:[tag!="SENT"&tag!=","&tag!="-"&lemma!=")."]
Text Corpora Russian Web Corpus – 190 mln tokens Rbc (РосБизнесКонсалтинг) – 22.5 mln tokens Romip (Российский семинар по Оценке Методов Информационного Поиска) – 2.7 mln tokens Corpus Linguistics – 2.7 mln tokens
Word sketches for the word “čaj” (Russian Web Corpus)
Word sketches for the word “čaj” (news)
Word sketches for the word “zelenyj” (Russian Web Corpus)
Word sketches for the word “imet’” (Russian Web Corpus)
Word sketches for the word “korpus” (texts on corpus linguistics)
Word sketches for the word “korpus” (news)
Word sketches for the word “korpus” (Web corpus)
Word sketches for the word “polucit’” (texts on corpus linguistics)
Word sketches for the word “polucit’” (news)
Word sketches for the word “polucit’” (Russian Web Corpus)
Word sketches for the word “dat’” (Russian Web Corpus)
Word sketches for the word “dat’” (texts on corpus linguistics)
Word sketches for the word “dat’” (news)
Thank you!