Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tatsuhiko Matsushita (University of Tokyo) 2013 Victoria University of Wellington 1.

Similar presentations


Presentation on theme: "Tatsuhiko Matsushita (University of Tokyo) 2013 Victoria University of Wellington 1."— Presentation transcript:

1 Tatsuhiko Matsushita (University of Tokyo) Vocab@Vic 2013 Victoria University of Wellington 1

2 2

3 1.Motives for the Study 2.Research Questions and Goals 3.Proposal of a New Index: Text Covering Efficiency (TCE) Text Covering Efficiency (TCE) TCE 4.Method of Validating TCE 5.Results and Discussion 6.Conclusion 3

4 How efficiently can we learn vocabulary? What words should learners learn first, second and next? Domain-specific words such as academic words (Coxhead, 2000) are often extracted for efficient vocabulary learning in a genre. Text coverage has been used for evaluating these groups of words (Coxhead, 2000; Hyland & Tse, 2007) 4

5 when the numbers of words are different between the groups However, text coverage is not appropriate for comparing the efficiency between grouped words when the numbers of words are different between the groups. How can we compare the efficiency between a group of domain-specific words and the other words? e.g. 1 How can we compare the efficiency between learning AWL (Coxhead, 2000) and UWL (Xue & Nation, 1984)? e.g. 2 How can we compare the efficiency between learning technical term lists in different genres? e.g. 3 How can we compare lists at different frequency levels in a genre e.g. sublists of AWL? How many times more efficient in gaining text coverage in different genres by learning the sublist 1 than the sublist 2? e.g. 4 For gaining higher text coverage, at which stage should learners transit from learning general words to domain- specific words? 5

6 For example, the table below (Hyland & Tse, 2007) does not show the difference in efficiency in gaining the text coverage because the numbers of words in AWL and GSL are different. 6

7 Research Questions when the numbers of words are different between the groups 1. What index is appropriate for comparing the efficiency between grouped words in gaining text coverage when the numbers of words are different between the groups? 2. Is there any advantages of comparing the efficiency between grouped words in gaining text coverage other than deciding the most efficient learning order of words? 7

8 Goals Text Covering Efficiency (TCE) 1. To propose an index: Text Covering Efficiency (TCE) 2. To show the validity and usefulness of TCE for a. deciding the most efficient order of words to learn b. analyzing lexical features of text genres by applying TCE to some groups of Japanese domain-specific words and other types of grouped words. 8

9 Problem: numbers of words are different between the groups to be compared Solution: Standardization Dividing text coverage (tokens) of a group of words by the number of the grouped words Dividing the quotient by the total number of tokens in the target text (domain) to adjust the difference in size of the texts and make the figures from differently-sized texts comparable. 9

10 For the user’s convenience, the figure is multiplied by 1,000,000. The solution means the expected number of tokens of a word from the grouped words in a one-million-token text in the target domain. Therefore, it is comparable with the standardized frequency per million. In other words, TCE is an expected standardized frequency of a grouped word. Text Covering Efficiency Text Covering Efficiency (TCE) = the mean text coverage per one million tokens of the target text by a word from the grouped words. 10

11 11

12 How can we validate an index?  By applying the index to the actual data to check if: 1. the results do not conflict with the findings from previous studies 2. the results show something which will not be clearly shown without the index TCE was applied to some grouped Japanese words in different text genres 12

13 (Japanese) Common Academic Words (CAW) (Matsushita, 2011) (Japanese) Limited-Academic-Domain Words (LAD) (Japanese) Literary Words (LW) (Matsushita, 2012) These word lists can be downloaded from “Matsushita Laboratory for Language Learning” http://www17408ui.sakura.ne.jp/tatsum/English_to p_Tatsu.html 13

14 Target Corpora: Technical texts in the four genres of Humanities, Social sciences, Technological natural sciences and Biological natural sciences Reference Corpus: Balanced Contemporary Corpus of Written Japanese (BCCWJ), 2009 monitor version excluding the target corpora part Index: Log-likelihood Ratio (LLR) Criteria for extraction 4-domain words and 3-domain words: CAW 2-domain words and 1-domain words: LAD 14

15 JS-Bn: Journal articles on biological natural sciences. 0.72 million tokens. MTT-Bn: Technical texts in biological natural sciences. 0.01 million tokens. JS-Tn: Journal articles on technological natural sciences. 2.71 million tokens. MTT-Tn: Technical texts in technological natural sciences. 0.07 million tokens. MTT-Ss: Technical texts in social sciences. 0.05 million tokens. TB: Texts in social sciences for intermediate and advanced learners of Japanese. 0.19 million tokens. TIS: Texts in a textbook in international studies. Mainly social science texts.. 0.04 million thousand tokens. UYN: Newspaper texts of 5.68 million tokens. BSB: Texts from best seller books. Mainly composed of literary works. 2.10 million tokens. UPC: Lieterary texts. 2.30 million tokens. MC: Conversation texts. 1.13 million tokens. 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 The result shows that TCE clearly indicates the efficiency in gaining text coverage, and thus it is useful for deciding a more efficient learning/teaching order of words. These findings do not seem to conflict with previous studies. Lexical features of texts in different genres can also be examined by checking the TCE figures. E.g. Japanese newspaper texts have similar lexical features to academic texts in social sciences. You can find things you cannot see without the index. For example, such an analysis allows you to say things like, “Learning the intermediate Japanese Common Academic Words is 6.2 times more efficient in covering Japanese social science texts than learning other words at the same level, and 8.3 times more efficient than learning the advanced common academic words”. 25

26 TCE: Text Covering Efficiency = the mean text coverage per one million tokens of the target text by a word from the grouped words TCE enables us to compare many different types of grouped words in many different genres. Therefore, it makes easier to decide what words should be learned first to read texts in a genre. TCE enables us to examine the lexical features of texts in different genres. 26

27 Thank you. 27

28 Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. Hyland, K., & Tse, P. (2007). Is there an “Academic Vocabulary 」 ? TESOL Quarterly, 41(2), 235–253. Matsushita, T. (松下達彦). (2011). 日本語の学術共通語彙(アカデミッ ク・ワード)の抽出と妥当性の検証 [Extracting and validating the Japanese Academic Word List]. [2011 年度 日本語教育学会春季大会 予稿 集 [Proceedings of the Conference for Teaching Japanese as a Foreign Language, Spring 2011] (p 244–249). Matsushita, T. (松下達彦). (2012). 日本語文芸語彙の抽出と検証 ― コー パスに基づくアプローチ ― [Extracting and validating the Japanese Literary Word List: A corpus-based approach]. 第九回国際日本語教育・日 本研究シンポジウム (The Ninth Symposium for Japanese Language Education and Japanese Studies), City University of Hong Kong, November 24, 2012 Richards, B. J., & Malvern, D. D. (1997). Quantifying lexical diversity in the study of language development. Reading: University of Reading. Xue, G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3(2), 215–229. 28

29 In addition, TCE is a robust index by which different lexical features in different genres can be clarified as well. As argued about TTR (Richards & Malvern, 1997), the relationship between the numbers of tokens and lexemes will be different depending on the text size. Nevertheless, it is not a problem for TCE because the formula does not use the number of lexemes occurring in the text but uses the number of lexemes of the target group of words. This is a reasonable idea because learners generally do not know which words will occur in a particular text. For example, to evaluate the value of the intermediate literary words as a source for gaining the text coverage, it is reasonable to divide the tokens by the number of lexemes of the intermediate literary words which a learner will learn before s/he reads the text. 29


Download ppt "Tatsuhiko Matsushita (University of Tokyo) 2013 Victoria University of Wellington 1."

Similar presentations


Ads by Google