What kind of vocabulary is in course books and graded readers? Rob Waring Notre Dame Seishin University JALT Vocab SiG Symposium June 29, 2013
Steps Decide a scale to use (ERF Scale) Make a base wordlist based on the scale Scan in the texts and remove proper nouns Run the analysis in AntWord Count the running words in each text at each of the wordlist levels Identify a typical average frequency profile (by baseword level) at each reading level for the GRs and course books Decide the number of average texts to be ‘read’ (30) Decide how many times a word has to be met before it’s learnt (20)
#Titles Total running words Average length Wordlist Level ERF1 (50) 64 4, Wordlist Level ERF2 (100) 39 9, Wordlist Level ERF3 (200) 68 42, Wordlist Level ERF4 (300) 57 82,962 1,455 Wordlist Level ERF5 (400) ,697 3,215 Wordlist Level ERF6 (600) ,740 2,111 Wordlist Level ERF7 (800) ,641 3,336 Wordlist Level ERF8 (1000) ,319 8,431 Wordlist Level ERF9 (1250) ,368 12,654 Wordlist Level ERF10 (1500) ,035 10,075 Wordlist Level ERF11 (1800) 65 1,113,465 17,130 Wordlist Level ERF12 (2100) 14 90,870 6,491 Wordlist Level ERF13 (2400) ,979 18,090 Wordlist Level ERF14 (3000) ,038 13,089 Wordlist Level ERF15 (3600) - Wordlist Level ERF16 (4500) 2 78,455 39, ,984,896 Graded Reader Corpus
Percentage of words at each ERF Reading level by Wordlist level ERF1ERF2ERF3ERF4ERF5ERF6ERF7ERF8ERF9ERF10ERF11ERF12ERF13ERF14ERF15ERF16 ERF1 (50)34.4%37.6%33.8%30.8%30.0%27.6% 27.4%27.7%27.3%27.8%26.7%26.4%26.3%27.6% ERF2 (100)8.2%13.3%12.1%11.6%11.2%10.0%9.3%9.5%9.8%9.4%9.2%8.9%9.0%8.6%8.9% ERF3 (200)17.6%13.1%18.9%20.3%22.0%21.8%21.4%21.2% 20.9%19.6%19.5%19.4%18.8%17.4% ERF4 (300)5.3%2.4%4.7%6.4%6.9%7.9%7.8%8.5%8.2%8.7%8.3%7.9%8.2%7.8%6.8% ERF5 (400)1.4%1.7%1.8% 2.3%2.5%2.9%3.2%3.1% 3.0%3.2%3.1%2.7% ERF6 (600)1.9%1.1%2.1%2.2% 2.7%3.0%3.5%3.4%4.0%3.8%3.9%4.0% 3.4% ERF7 (800)1.5%0.7%0.8%0.7% 1.1%1.4%1.6%1.5%1.9%2.1% 2.4%2.7%2.2% ERF8 (1000)1.3%0.5%1.0%0.9%1.0%1.5%1.3%1.6% 1.9%2.0%2.5% 2.9%2.4% ERF9 (1250)1.6%0.5%0.6%0.4%0.6%0.7%0.8%0.7% 0.8%1.0%1.1%1.3%1.4%1.3% ERF10 (1500)0.8%0.5% 0.6%0.7%0.8% 1.0%1.3%1.2%1.5%1.7% ERF11 (1800)0.9%0.3% 0.6%0.5%0.7%0.8%0.7%0.6%0.7%0.9%0.8%1.0%1.1%1.3% ERF12 (2100)1.4%0.2%0.5% 0.4%0.7% 0.5% 0.6%0.8%0.9%1.0%1.5% ERF13 (2400)0.8%0.6%0.5%0.2%0.4% 0.3% 0.4%0.2%0.5%0.6%0.7% ERF14 (3000)0.6%0.1%0.8%0.7%0.5%0.8%0.6%0.5% 0.4%0.5%0.3%0.4%0.6%0.7% ERF15 (3600)1.3%0.2%1.1%0.9%0.7%1.1%1.0%0.9%0.8% 0.9% 1.3%1.4% ERF16 (4500)0.8%0.5%0.3%0.2%0.1%0.3%0.2% 0.3% 0.7% ERF17 (6000)0.5%0.4%0.3%0.2%0.3%0.4% 0.3% 0.4% 0.3%0.4% 1.0% ERF18 (8000)0.9%0.2%0.4%0.2%0.6%0.9%0.7%0.3%0.4%0.3%0.4%0.3% 0.4%1.1% ERF19 (12000)0.7%0.6%0.8%0.3% 0.4%0.5%0.3%0.2%0.3% 0.2% ERF20 (18000)2.2%1.1%1.5%1.4%1.7%1.5%1.6%1.5%1.6% 1.7%1.5%1.8%1.9%1.7% Out of level0.2%2.2%0.2%2.5%0.9%1.8%1.9%2.0%1.0%0.7%0.5%3.7%0.8%0.9%0.2% Proper nouns15.7%22.3%16.9%16.5%16.0%14.7% 14.3%15.4%14.7%15.0%14.0%14.7%13.6%14.5% Not in lists0.1%0.0%0.1% 0.2% 0.3%0.2% 0.4%0.5% Wordlist levelERF Reading level
% of families at each level which occur more than 20 times (minus proper nouns) ERF1ERF2ERF3ERF4ERF5ERF6ERF7ERF8ERF9ERF10ERF11ERF12ERF13ERF14ERF15ERF16 ERF1 (50) 41%48%41%37%36%32% 33%32%33%31% 30% 32% ERF2 (100) 10%17%15%14%13%12%11% 12%11% 10%11%10% ERF3 (200) 21%17%23%24%26% 25% 24%23% 22%20% ERF4 (300) 6%3%6%8% 9% 10% 9%10%9%8% ERF5 (400) 2% 3% 4% 3%4% 3% ERF6 (600) 2%1%3% 4% 5% 4%5% 4% ERF7 (800) 2%1% 2% 3% ERF8 (1000) 2%1% 2% 3% ERF9 (1250) 2%1% 2% ERF10 (1500) 1% 2% ERF11 (1800) 1%0% 1% 2% ERF12 (2100) 2%0%1% 2% ERF13 (2400) 1% 0% 1% ERF14 (3000) 1%0%1% 0%1% ERF15 (3600) 2%0%1% 2% ERF16 (4500) 1% 0% 1% ERF17 (6000) 1%0% 1% 0% 1% ERF18 (8000) 1%0% 1% 0% 1% ERF19 (12000) 1% 0% 1%0% ERF20 (18000) 3%1%2% Out of level 0%3%0%3%1%2% 1% 4%1% 0% Proper nouns 19%29%20% 19%17% 18%17%18%16%17%16%17% Wordlist levelERF Reading level
Average book length without proper nouns ERF1ERF2ERF3ERF4ERF5ERF6ERF7ERF8ERF9ERF10ERF11ERF12ERF13ERF14ERF15ERF %65%78%83%86%85%86%89%91%92%93%91%94%93%94% % of each book in level ERF1ERF2ERF3ERF4ERF5ERF6ERF7ERF8ERF9ERF10ERF11ERF12ERF13ERF14ERF15ERF ,2152,7001,8012,846 7,22 510,6998,59314,5565,58315,42811,30633,531
How many words do you meet if you read 30 books at each level? ERF1 only ERF 1-2 ERF 1-3 ERF 1-4 ERF 1-5 ERF 1-6 ERF 1-7 ERF 1-8 ERF 1-9 ERF 1-10 ERF 1-11 ERF 1-12 ERF 1-13 ERF 1-14 ERF 1-15 ERF 1-16 Number of books read / level30 Accumulated books read Running words met /level ,455 3,215 2,111 3,336 8,431 12,654 10,075 17,130 6,491 18,090 13,089 39,228 Running words for 30 books / level 1,940 7,187 18,903 43,664 96,456 63, , , , , , , , , ,176,825 Accumulated having read all 450 books 1,940 9,127 28,029 71, , , , , ,120 1,266,362 1,780,269 1,974,991 2,517,699 2,910,379 4,087,204
Accumulated coverage for 30 books per level to 95% coverage of the families at 20 meetings for each type at each level ERF1 onlyERF1-2ERF1-3ERF1-4ERF1-5ERF1-6ERF1-7ERF1-8ERF1-9ERF1-10ERF1-11ERF1-12ERF1-13ERF1-14ERF1-15ERF1-16 # books read ERF1 (50) 16.7%28.2%35.5%53.2%71.5%70.3%70.1%73.0%73.3%73.6%77.8%76.1%78.2%78.1%76.7% ERF2 (100) 1.0%14.3%41.9%60.0%77.1%81.9%82.9%87.6%95.2%96.2%98.1%99.0% ERF3 (200) 0.8%4.6%23.2%52.9%76.8%79.8%83.7%86.3%89.0%90.5%92.8%93.2%95.1%95.8%96.2% ERF4 (300) 0.8%2.3%9.8%34.6%67.7%77.4%86.5%89.5%91.0%91.7%94.0% 94.7%95.5%96.2% ERF5 (400) 0.0%1.9%3.7%15.7%49.1%64.8%74.1%88.9%91.7% 92.6% 95.4% 96.3% ERF6 (600) 0.0%0.6% 9.4%28.1%38.1%58.1%79.4%88.1%91.3%93.8%94.4%95.6%96.3%96.9% ERF7 (800) 0.0%0.7%2.6% 7.9%15.8%26.3%48.7%59.2%68.4%82.2%87.5%94.7%96.1%97.4% ERF8 (1000) 0.0% 0.9%3.7%11.6%15.3%25.5%39.4%50.9%61.1%78.7%86.6%91.7%94.9% 97.2% ERF9 (1250) 0.0% 0.5% 6.3%8.2%14.4%26.0%39.4%52.4%63.0%72.1%80.3%86.5%89.9% ERF10 (1500) 0.0%0.4%0.8%1.2%5.8%9.1%12.8%23.5%36.2%43.2%58.0%65.0%73.3%83.1%89.7% ERF11 (1800) 0.0% 1.5%3.3%4.4%9.5%18.2%32.1%39.8%54.4%58.4%70.4%79.9%88.3% ERF12 (2100) 0.0% 0.3%1.2%3.4%4.3%7.4%13.3%20.4%27.6%40.6%43.0%53.6%63.8%76.8% ERF13 (2400) 0.0% 0.4% 2.7%4.2%6.2%9.7%16.2%22.0%30.1%32.8%43.2%51.4%68.7% ERF14 (3000) 0.0% 0.3%0.9%2.3%4.1%5.2%10.5%15.2%19.0%25.9%27.1%32.4%39.4%54.2% ERF15 (3600) 0.0% ERF16 (4500) 0.0%0.1%0.5%0.9%1.6%2.7%3.2%6.5%9.4%12.4%18.5%19.4%25.4%30.6%45.4% Accumulated reading amount
How many words are you likely to ‘know’ (20 meetings) after reading all that? ERF1 only ERF 1-2 ERF 1-3 ERF 1-4 ERF 1-5 ERF 1-6 ERF 1-7 ERF 1-8 ERF 1-9 ERF 1-10 ERF 1-11 ERF 1-12 ERF 1-13 ERF 1-14 ERF 1-15 ERF 1-16 ERF1 (50) ERF2 (100) ERF3 (200) ERF4 (300) ERF5 (400) ERF6 (600) ERF7 (800) ERF8 (1000) ERF9 (1250) ERF10 (1500) ERF11 (1800) ERF12 (2100) ERF13 (2400) ERF14 (3000) ERF15 (3600) ERF16 (4500)
Summary 450 books = 2894 ‘known’ words (20 meetings) Many words at each level won’t be met enough times to ‘learn’ them even after having read 30 titles at each level
Course books 6 Japanese Junior High texts 21 Japanese High school texts 18 Korean Middle School texts 15 Korean High School texts 5 Mexican Middle and Senior High texts
How many words will a learner meet on average in these texts in a middle or high school? Middle SchoolHigh SchoolTotal Mexico (Sequences) 126, , ,536 Korea (averaged) 23,483 37,950 61,433 Japan (averaged) 14,066 20,977 35,043
How many words are in each book by ERF level? Japanese Korea Mexico JH SH Middle HS Middle HS ERF1 (50) 2,906 4,278 4,770 6,756 41,038 30,059 ERF2 (100) 1,012 1,638 1,757 2,369 10,755 8,911 ERF3 (200) 1,906 3,450 3,479 5,358 20,963 18,664 ERF4 (300) 616 1,339 1,233 2,056 7,550 7,472 ERF5 (400) ,107 3,949 3,113 ERF6 (600) ,312 4,059 3,787 ERF7 (800) ,059 2,457 2,941 ERF8 (1000) ,353 2,686 3,238 ERF9 (1250) ,653 1,815 ERF10 (1500) ,002 1,091 1,868 ERF11 (1800) ,569 1,320 ERF12 (2100) ,413 ERF13 (2400) ERF14 (3000) ERF15 (3600) ,949 2,045 ERF16 (4500) ERF17 (6000) ERF18 (8000) ERF19 (12000) ERF20 (18000) ,657 2,040 Out of level Proper nouns 2,637 2,920 3,812 4,474 16,992 11,964 Not in lists 1, ,329 2,426 2,492 1,829 Total 14,066 20,977 23,483 37, , ,493
words met vs number of words probably learnt (>20 meetings) in various course books JapaneseKoreanMexico # meetingsJHSHBothJHSHBothJHSHBoth %3.1%4.1%2.4%2.7%4.8%18.9%14.2%21.8% %3.1%4.4%2.5%2.3%3.6%7.7%6.9%8.5% %4.5%4.6%3.4%3.0%6.6%8.4%7.3%8.8% %12.5%15.8%13.3%11.8%20.6%17.0%18.2%17.5% %20.9%23.1%27.1%28.8%26.3%17.7%20.6%18.2% %56.0%48.1%51.4% 38.2%30.3%32.8%25.2% 100% 100.0%
Course book plus a book a week = ? JapanKoreaMexico # meetings JH course book Plus ERF1-3 (90 Books) JH & SH course books Plus ERF 1-6 (180 books) Middle course books Plus ERF1-3 (90 Books) Middle & SH course books plus ERF 1-6 (180 books) Middle course books plus ERF1-3 (90 Books) Middle and SH course books plus ERF 1-6 (180 books) %22.4%7.0%19.7%19.2%29.0% %7.8%5.2%7.9%8.9%9.6% %7.8%5.9%7.1%8.1%8.5% %12.9%16.6%16.2%15.8%15.3% %16.0%21.2%20.6%17.5%15.0% %33.1%44.2%28.6%30.5%22.5% 100.0%
Number of words met JapanKoreaMexico Course books only JH14,06623,483126,043 JH & SH35,04361,433232,536 Course books plus reading JH35,98945,405147,966 JH & SH219,242245,632416,735
Likely uptake (words met more than 20 times from reading 30 texts at each level) JapanKoreaMexico Course books only JH JH & SH ,276 Course books plus reading JH JH & SH1,1871,4681,677
Summary Course books only leads to low gains most words forgotten Course books plus reading doubles vocabulary BUT these data underestimate learning because the data do not include partially known words (probably double that), collocations, colligations, multi-word phrases etc. are unfair to the Mexico group who were restricted to low level reading (so we could compare)
It’s a work in progress …. Some levels in my wordlist need redoing level 3 has lots of past forms and irregular verbs -> bump in data level 6, 8, 15 & 16 are short of families Some levels short of texts level 12 and level 15 Next I’ll … add higher level texts when they become available replicate Paul’s study on how many words you need to meet to learn X,000 words with this corpus of SL texts analyze which GR series best represents their stated levels find out how many texts are needed before learners have covered say 05% of the words at a set level re-do the stats for 12, 30 meetings
Phew! Yes Paul, I’ll publish it!