Download presentation
Presentation is loading. Please wait.
1
Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer
2
… or the art of fruit salads
3
Learner uses of corpora Form-focussed (data-driven learning) Meaning-focussed (learning the culture) Skill-focussed (reading practice) Browsing environment (serendipity) Reference tool for other tasks (reading/writing aid)
4
Why make your own corpus? You can devise your own recipe You know what’s in it You learn how to do it Can be fun Can provide practice in language use
5
The raw ingredients
6
Devising your own recipe Only the text-type(s) you want Only the texts you want The quantity you want … small and specialised is beautiful
7
You know what’s in it Top-down knowledge of corpus Top-down knowledge of texts
8
You learn how to do it Can be a useful skill for many language workers –technical writers –translators –teachers Can make you a more critical corpus user
9
It can be fun Provides a challenge Gives sense of achievement/satisfaction Practice in language use Design/construction/evaluation of corpora can be communicative activities
10
Why use standard corpora? Less effort More reliable Better packaging You don’t want to learn to make your own
11
Less effort
12
More reliable if it’s well designed if it fits your needs
13
Better packaging Metatextual information Annotation Corpus-specific software
14
You don’t want to learn to make your own?
15
A compromise strategy: make your own subcorpus assemble using the pre-prepared ingredients of a larger corpus or in other words… go to a (fruit) salad bar
16
(Pick ’n’ mix with the BNC)
17
You have a choice of text-types individual texts selection by pre-determined criteria selection by hand … or both
18
You know what went in so top-down processing is easier Little effort in comparison with making your own
19
Good packaging Metatextual information Linguistic annotation Can use software designed for full corpus Indexed
20
You get to learn what are(n’t) useful subcorpora what are(n’t) useful design criteria how to do it
21
It can be fun challenge / achievement / satisfaction You can talk about its design / construction / evaluation
22
Talking about fruit salad BNC Sampler: KC2
23
Talking about fruit salad BNC Sampler: KC2
24
And now to details … the Sampler awaits!
25
You can create subcorpora of specific corpus texts texts containing solutions to a query encoded categories of texts your own categories of texts and compare them with other subcorpora the full corpus
26
Text analysis: selecting Choosing specific texts
27
Viewing the index
28
Party policies (will/shall be + VVN)
29
Or, to return to our fruit salad text …
30
Frequent adjectives (KC2) Most frequent adjectives (KC2)
31
Appreciating food (KC2)
32
A bad language subcorpus: texts containing solutions to a query
33
Choosing the bad language texts j
34
collocates of f.*k.* collocates of f_ words
35
oh fuck.* with oh as collocate
36
collocates of oh collocates of oh
37
‘context-governed’ spoken texts - monologue: 17 texts - dialogue: 29 texts Making subcorpora using encoded categories
38
More frequent in M* –could –had –he –know –their –were –when –who –your More frequent in D* –'ll –'m –any –no –pounds –right –yeah –yes *ranked 20+ positions higher in first 100 words Monologue vs Dialogue
39
no occurrences of all right in monologue when you’re / you’ll / you’d / you’ve is more common in monologue than when we’re / we’ll / we’d / we’ve; vice-versa in dialogue Investigating the differences
40
youweyou’*we’* Mo42532014 685 535 Dia66354949 9391253 we/we’* much more frequent in dialogue Pronoun (+ contraction)
41
you and we youwe Monologue42532014 Dialogue66354949
42
Subcorpora using your own categories David Lee’s book genres academic non-fiction (13 texts) non-academic non-fiction (15 texts) prose fiction (13 texts)
43
Distinctive -ly adverbs of: academic non-fiction –accordingly, essentially, eventually, largely, namely, notably, respectively, surprisingly non-academic non-fiction –effectively, merely, normally, obviously, possibly, specially prose fiction –carefully, quietly, slightly, slowly, softly, surely, truly
44
largely (academic non-fict) largely (academic non-fiction)
45
it (academic non-fiction)
46
To conclude …
47
Working with subcorpora can allow study/comparison of forms/meanings in particular texts/text-types better-focussed reading practice more appropriate reference tools for particular tasks more focussed browsing
48
may not be representative (but nor is most language learning data) are good for forming hypotheses to be tested more widely will allow more interesting uses when extracted from a larger corpus Subcorpora
49
Making your own provides better preparation and motivation for corpus use more critical awareness lots to talk about
50
Enjoy!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.