Presentation is loading. Please wait.

Presentation is loading. Please wait.

PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University.

Similar presentations


Presentation on theme: "PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University."— Presentation transcript:

1 PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University

2 2 PowerConc National Research Centre for Foreign Language Education, Beijing Foreign Studies University A general purpose tool for corpus analysis Developed in Delphi can deal with any ANSI encoded texts –E.g. on a Simplified Chinese OS –works well with Simplified/Trad. Chinese texts, (un)tokenised or raw/POS-tagged, as well as raw/POS- tagged English texts

3 3 Size: 1.5MB, compressed package less than 1MB Installation: Doesn’t require any installation. OS: Works only on Windows now. PowerConc

4 Design principles for PowerConc

5 5 Ideally Most powerful, can do anything that a concordancer can do and cannot do. involves least effort in learning to use it Doing MORE with less Reductionism in software design

6 6 Less buttons and/or tabs Frequency count Search List

7 7

8 8

9 9 Freq. Count N-gram list Key n-gram list Concordance Collocation & Colligation

10 10 More possibilities in tool develop’t Corpus-informed/related ‘grammars’ –Pattern grammar (local grammar) –Collostruction –Lexical grammar (natural grammar, real grammar) –Lexical priming (textual colligation) –Longman grammar: Biber et al. grammar register variation Tool development lags behind

11 11 From phraseology to R-gram Many of the ‘grammars’ as some sort of phraseology We coined a technical term ‘R-gram’. –An operational parallel to phraseology –The unit of language can be words, lemmata, phrases, POS, POS sequence, and combination of all these. –Can be linguistic structures with uncertain words or categories (e.g. be passive/get passive).

12 12 a * of: collocational framework It be ADJ that: evaluative construction Noun noun compounds Bi-nominal constructions Passive constructions: be/get ADV. V-EN All these could be matched with Regular Expressions. But Regex is too difficult for lay users.

13 13 Easy search with enhanced hits Smart Input Three meta-characters in Smart Input syntax, the simplest grammar ever. @be returns all inflectional forms of ‘be’ #n returns all nouns * refers to any single word

14 14 a * of => a * of It be ADJ that => It @be #adj that Noun noun compound => #n #n Bi-nominal => #n and #n Passive => \S+_VB\S+\s(\S+_[RXPJDN]\S+\s)*\S+_V\S*N

15 15 Limitation speed A concordancer without applying indexing can't process texts larger than a few million words anyway.

16 16 Download PowerConc www.fleric.org.cn/powerconc/ http://www.bfsu-corpus.org/channels/tools

17 Thank you!


Download ppt "PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University."

Similar presentations


Ads by Google