Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Similar presentations


Presentation on theme: "Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based."— Presentation transcript:

1 Sketch engine for Chinese Discussion notes

2 Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based summaries of a word’s grammatical and collocational behaviour Captures information in a more accessible way then hundreds of KWIC lines Uses MI based salience algorithm

3 Other corpus query tools do collocational salience too, but… Sketch engine uses lemmata not word- forms –So that eat and eats are treated the same And it takes account of grammatical relations –So that The plane banks and The investment banks are treated separately –And (if the corpus is appropriately parsed) He robs banks and He robbed the bank would be accorded similar treatment

4 Grammatical relations example Unary relations Word2 and Prep are not specified Binary relations Prep not specified Binary relations, Word2 not specified Trinary relations

5 Sketch engine modules Concordance –KWIC or sentence context Thesaurus –A list of “similar” words Sketch differences, for distinguishing near- synonyms –If both lemmata x and y have strong collocational salience with a, then they are near-synonyms Wordsketch

6

7 Sample of grammatical relation definitions script (M language) define(`wh_word',`[tag=3D"AVQ"|tag=3D"D`$ p& TQ"|tag=3D"PNQ"]') define(`whether_if',`[tag=3D"PNQ" & word=3D"if" |word=3D"whether"]') define(`determiner',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro]') define(`conjunction',`"CJC"') define(`simple_neg',`"XX."') define(`rel_start',`[tag=3D"DTQ"|tag=3D"PNQ"|tag=3Dthat_comp]') define(`adv_neg',`[tag=3Dany_adv|tag=3Dsimple_neg]') define(`number',`"[OC]RD"') define(`goal_adv',`[word=3D"back"|word=3D"over"|word=3D"home"|word=3D"awa= y"|word=3D"out"]') define(`long_np',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposp& €( s_pro|tag=3Dnumber|ta= g=3Dany_adv|tag=3Dany_adj|tag=3Dgenitive]{0,3} any_noun{0,2} 2:any_noun = [tag!=3Dany_noun & tag !=3D genitive]') define(`np_start',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro|tag=3Dnumber|t= ag=3Dany_adj|tag=3Dany_noun]')

8 Applications Intended as an aid to lexicographers At least one paper on MT application Could be used in pedagogical applications –Earlier NSF grant aimed at a complete Chinese learning platform, with Wordsketch as a module –Comparison of similar lexemes cross- linguistically Yiching is publishing about express vs biaoshi, and this work may use Wordsketch

9 Chinese Wordsketch Kilgarriff et al report that Wordsketch can be ported to any language –Pavel Rychly in Czech Rep has implemented concordancing at Chinese character level only AS has acquired Chinese Gigaword, and POS-tagged it automatically –No parsing has been attempted so far Grammatical relations ruleset for Chinese is needed I would plan to –contribute to the writing of this ruleset –collaborate on cross-linguistic lexical analyses, using Wordsketch where possible

10 links http://nlp.fi.muni.cz/projects/bonito2/chines e/http://nlp.fi.muni.cz/projects/bonito2/chines e/ –test chin http://www.sketchengine.co.uk/sampler/ –ssmith ssmith


Download ppt "Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based."

Similar presentations


Ads by Google