Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb.

Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb

Abstract Lexical frequency profiling (LFP; Laufer & Nation, 1995), which has been highly influential in ESL vocabulary research and instruction, has had a slower beginning in French. This has been due to lack of access to large corpora of French from which pedagogically relevant frequency information could be derived. Pioneering efforts in the 1990s (Goodfellow & Lamy, 2002) had facilitated promising comparisons of the lexical coverage of French and English texts (Author & Horst, 2004), which had pedagogical implications that were both interesting and practical (Ovtcharov, Author & Halter, 2006) but inconclusive owing to incompleteness of the frequency information. Now, however, work behind the Frequency Dictionary of French by Lonsdale and Lebras (Routledge, 2009) has produced and made available complete and lemmatized corpus-based frequency information for French. This means that both researchers and teachers can now in principle use the LFP methodology to explore thoroughly the lexical composition, sophistication, and ‘richness’ of French texts. To be discussed will be the method of incorporating the frequency information within an LFP methodology, examples of the sort of research such profiling makes possible, and the means by which researchers can access the tools of this analysis and use them for their own purposes. Representative initial findings from the application of this methodology to French will be offered, including a suggestion that French deploys its lexical resources rather differently from how English does and may present unique and previously undefined lexical challenges to its learners. Recent corpus work in French has made lexical frequency profiling (LFP) methodology available to French researchers and teachers. Initial findings suggest that French may present unique lexical challenges to its learners. Attendees will be shown how to access the tools of this analysis for use in their own work. 2

Lexical frequency profiling (LFP; Laufer & Nation, 1995), which has been highly influential in ESL vocabulary research and instruction, has had a slower beginning in French. 3

This has been due to lack of access to large corpora of French from which pedagogically relevant frequency information could be derived. 4

Pioneering efforts in the 1990s (Goodfellow & Lamy, 2002) had facilitated promising comparisons of the lexical coverage of French and English texts (Author & Horst, 2004), which had pedagogical implications that were both interesting and practical (Ovtcharov, Author & Halter, 2006) but inconclusive owing to incompleteness of the frequency information. 5

Now, however, work behind the Frequency Dictionary of French by Lonsdale and Lebras (Routledge, 2009) has produced and made available complete and lemmatized corpus- based frequency information for French. 6

This means that both researchers and teachers can now in principle use the LFP methodology to explore the lexical composition, sophistication, and richness of French texts. 7

To be discussed will be the method of incorporating frequency information within an LFP methodology, examples of the sort of research such profiling makes possible, and the means by which researchers can access the tools of this analysis and use them for their own purposes. 8

Initial findings from the application of this methodology to French will be offered, including a suggestion that French deploys its lexical resources rather differently from English and may present unique and previously undefined lexical challenges to its learners. 9

The main new idea of the “vocab revolution” 1990- in ESL/FL Is Zipf’s old idea that some words get way more use in any language Made recently useable by computer technology 10

13 1, consistency, 2 where to look

15 The AWL effect

So it was a reasonable question to ask, “Is there an AWL in French?” An interesting question for several reasons… This gradually became a question that could be answered 16

21 FRENCH – v.1 zoom

English French 22 ENG 1+2=80, FR 1+2=90

So French is getting the AWL effect for free And for fewer words 23

So the question had to be reformulated: Is there an AWL in French? “Is there room for an AWL In French?” 24

The answered seemed, “No” 1k+2k is already giving 90% coverage And the remaining 10% is presumably needed for technical, archaic, & oddball items With the implication that acquiring a functional second lexicon was easier in French 27

1995-2005, a happy picture in ESL vocab 2k+AWL=90% (+technical=95%) BUT SHORT LIVED 1. The goal of vocab development was recalculated (Nation, 2006) The Comprehension-Bar got raised 95% coverage  98% coverage 2. The how-to of building tech lists became less clear 3. Bigger, better frequency lists put the existence of an AWL in question – BNC lists (2005) – BNC-COCA lists (2012) But the notion of 2000 words = 80% has pretty much survived 28 Back to English

29 VP-BNC-Coca zoom

So the new question about French is ~ Is there room for an AWL In French? “ How are the medium and low frequency lexical resources of French deployed in the remaining 10% space available?” What does this imply for learning French? This question gradually became answerable  30

25 l emmatized French k-lists From Lonsdale & Le Bras dictionary project at BYU Based on 23-million word corpus Continental + International French 50/50 Spoken and written 50/50 Literary 40%, expository 60% List-crunched for RANGE + FREQ 32

35 FRENCH – v.5

So now we can investigate the shape of the mid-frequency French lexicon And make plausible comparisons with English What lies between 90% and 95% coverage in French texts? – Or between 90% and 98%? Is there “less to learn” in French than in English ? (Remembering that lemmas ≠ families) 36

3 tests 37

Test 1 Translated popular texts 20 translated Readers’ Digest texts  20 Fr, 20 Eng Half translated E->F, half F-> E Total 2939 words Eng, 3650 words Fr Run through VP-Fr as a mini-corpus (as a single file) 38

39 ENGLISHENGLISH 95% 98%

40 FRENCHFRENCH 95% 98%

Eng Side by side Fr (fams) (lemmas) 41 Using 98% criterion

Fr (lemmas) A lot of words in that blue circle! The difference between k8 to k16 is only 100 word types But these 100 words are drawn from a pool of 8,000 lemmas 42

Test 2 Translated extended literary work Samuel Beckett’s idea - French as “an impoverished lexicon”? Actually he never said this But he did write in French, and “use stark language to con- vey a stark world” How stark is Beckett’s French? 43

46 «En attendant Godot»“Waiting for Godot” Proper nouns-<1k has changed the 1k-2k thing

Test 3 Maybe Tests 1+2 were something about translated texts? Ok, then let’s compare 4 random original editorial texts Chosen 14-15 March, 2015 From (1) Le Monde - Paris (2) Le Devoir – Montreal (3) The Globe & Mail – Toronto (4) The NY Times – New York 47

Conclusion (1) Comparing languages: – French may make slightly more use of its common words than English does – But it makes far more use of its mid- and low- frequency lexical resources (3k to 20k+) – Cobb & Horst (2004) was right as far as it went, but incomplete For lack of resources 50

Conclusion (2) Comparing learning tasks: Learning enough vocab for 90% coverage looks slightly easier in French than English But learning enough words for 98% or even 95% coverage looks far more difficult How many FL2 S’s ever get there? 51

(3) The shapes of the two lexicons seem to be like this: English 52 98% 95%

French 53 98%95%

55 But notice that the French early advantage persists to about 4k (So 3k words in French gives better coverage than in English) F E

Discussion Is the greater ease of acquiring a 90% lexicon in French a reason for the traditional FL2 emphasis on phonology and syntax? Is it that French is a more “academic/elitist” lexicon… Or just that English is less so? – Maybe the shape of English reflects the lingua franca role the language has come to play – Such that its writers use *circumlocution* for complex ideas, rather than seeking « le mot juste »? Flaubert 56

57 ENGLISH AS A LINGUA FRANCA? BUT SURELY NOT IN 19th CENT.

Further work As ever in corpus work, this needs empirical validation – Do L2 readers with 10k lexicons actually experience a comprehension deficit? As ever in list work, new lists are probably just around the next corner – Any picture is strictly provisional 58

Pedagogical implications Are there manageable zones within the French lexicon, like “technical lists”? – … that could be found through work with specialist corpora? Till then, the message seems to be – Get out your flashcards! At least now we know what to put on them OR  59

All chapters + papers + /list_learn/ available at www.lextutor.ca Thank you! cobb.tom@sympatico.ca 61

A method note But wait! We are comparing lemmas v. families Cat cats v. cat cats catty 1000 families give more coverage than 1000 lemmas – How much more? Some recent work by Charles Browne suggests an answer 62

63 http://www.newgeneralservicelist.org/ 2368 / 2818 *100 = 84% 1000 lems have ~ 16% less coverage than 1000 fams in Eng At High-Frequency NGSL zone (1k+2k) (probably less at lower frequency zones)

But even assuming (1) a 16% difference that (2) was maintained at lower-frequency zones About every six lemma lists (6 x 16% = 96%) we would lose a k-level to maintain lemma- family equivalence – So in 18 levels we would lose 3 The picture would not change greatly – Even in exaggerated worst-case scenario 64

Eng Fr (fams) (lemmas) 65 K8 E-fams = k16 F-lems for 98% ?  K8 E-fams = k13 F-lems for 98% Pattern is the same

Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb.

Similar presentations

Presentation on theme: "Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb.

Similar presentations

Presentation on theme: "Profiling French Vocabulary: The shape of lexicons by frequency & coverage 10.45-11.15, Monday, March 23, Session K Nfld., Room 13, Mezzanine Tom Cobb."— Presentation transcript:

Similar presentations

About project

Feedback