Download presentation
Presentation is loading. Please wait.
Published byAdam Benson Modified over 9 years ago
1
PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University
2
2 PowerConc National Research Centre for Foreign Language Education, Beijing Foreign Studies University A general purpose tool for corpus analysis Developed in Delphi can deal with any ANSI encoded texts –E.g. on a Simplified Chinese OS –works well with Simplified/Trad. Chinese texts, (un)tokenised or raw/POS-tagged, as well as raw/POS- tagged English texts
3
3 Size: 1.5MB, compressed package less than 1MB Installation: Doesn’t require any installation. OS: Works only on Windows now. PowerConc
4
Design principles for PowerConc
5
5 Ideally Most powerful, can do anything that a concordancer can do and cannot do. involves least effort in learning to use it Doing MORE with less Reductionism in software design
6
6 Less buttons and/or tabs Frequency count Search List
7
7
8
8
9
9 Freq. Count N-gram list Key n-gram list Concordance Collocation & Colligation
10
10 More possibilities in tool develop’t Corpus-informed/related ‘grammars’ –Pattern grammar (local grammar) –Collostruction –Lexical grammar (natural grammar, real grammar) –Lexical priming (textual colligation) –Longman grammar: Biber et al. grammar register variation Tool development lags behind
11
11 From phraseology to R-gram Many of the ‘grammars’ as some sort of phraseology We coined a technical term ‘R-gram’. –An operational parallel to phraseology –The unit of language can be words, lemmata, phrases, POS, POS sequence, and combination of all these. –Can be linguistic structures with uncertain words or categories (e.g. be passive/get passive).
12
12 a * of: collocational framework It be ADJ that: evaluative construction Noun noun compounds Bi-nominal constructions Passive constructions: be/get ADV. V-EN All these could be matched with Regular Expressions. But Regex is too difficult for lay users.
13
13 Easy search with enhanced hits Smart Input Three meta-characters in Smart Input syntax, the simplest grammar ever. @be returns all inflectional forms of ‘be’ #n returns all nouns * refers to any single word
14
14 a * of => a * of It be ADJ that => It @be #adj that Noun noun compound => #n #n Bi-nominal => #n and #n Passive => \S+_VB\S+\s(\S+_[RXPJDN]\S+\s)*\S+_V\S*N
15
15 Limitation speed A concordancer without applying indexing can't process texts larger than a few million words anyway.
16
16 Download PowerConc www.fleric.org.cn/powerconc/ http://www.bfsu-corpus.org/channels/tools
17
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.