Tatsuhiko Matsushita (University of Tokyo) 2013 Victoria University of Wellington 1.

Slides:



Advertisements
Similar presentations
European Modelling Symposium 2009 EMS2009 UKSim 3 rd European Symposium on Computer Modelling and Simulation 25 – 27 November, Athens, Greece Guidelines.
Advertisements

Perspectives on Teaching and Learning Academic Vocabulary Keith Folse Department of Modern Languages University of Central Florida gmail.com.
Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
Curriculum 2.0 Reading / Language Arts By the end of third grade, students: Create and follow rules for collaborative conversations. Acquire and use.
Tatsuhiko Matsushita LALS, Victoria University of Wellington
“In light of this, it is suggested…”: Comparing n-grams in Chinese and British students’ undergraduate assignments from UK universities Maria LeedhamICAME.
Book Report Academic Writing for Graduate Students Essential Tasks and Skills (3 rd edition) Asst. Prof. Dr. Siriluck Usaha Department of English for Business.
Vocabulary Assessment Norbert Schmitt University of Nottingham
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
7th Grade Portfolio of the California State Standards Miss Boenigk Columbus Middle School Miss Boenigk Columbus Middle School
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
名古屋大学 杉村 泰 Nagoya University Sugimura, Yasushi
Chapter 18: Words as They Appear in Malaysian Secondary School English Language Textbooks: Some Implications for Pedagogy Jayakaran Mukundan Presented.
DEVELOPING ACADEMIC LANGUAGE AND TEACHING LEARNING STRATEGIES Anna Uhl Chamot Jill Robbins George Washington University.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Factors Affecting the Innovation- Decision Process to Adopt Online Graduate Degree Program in Thailand (IEC2014) Siripen Pumahapinyo (1) Praweenya Suwannatthachote.
Types of Formal Reports Chapter 14. Definition  Report is the term used for a group of documents that inform, analyze or recommend.  We will categorize.
CAPE INFORMATION TECHNOLOGY
Memory Strategy – Using Mental Images
Journal Article Presentation Group 1: Anik Damaris Maria Rofik.
Biological Science Database Proquest WEDAD AL-HUSAINAN ISD/NSTIC Kuwait Institute for Scientific Research November/2012.
How Can OCAIR Partner with China AIR to Help Advance Higher Education in China? Tongshan Chang The University of California Office of the President OCAIR.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
DEVELOPMENT AND ASSESSMENT OF TRANSVERSAL KEY COMPETENCES IN THE DEGREE OF FOOD SCIENCE AND TECHNOLOGY M.D. Rivero-Pérez*, M.L. González-SanJosé, P. Muñíz,
Developing Critical Thinking Through Content-Based Language Instruction Mariko Henstock Dept. of Modern Languages & Comparative Literature Boston University.
Welcome to Scopus Training by : Arash Nikyar June 2014
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Easy-to-Understand Tables RIT Standards Key Ideas and Details #1 KindergartenGrade 1Grade 2 With prompting and support, ask and answer questions about.
EDU 385 Education Assessment in the Classroom
資料探し Research Progress Report. Research Progress Report Mini Literature Review ( 研究史 ) – Introduce/review materials collected to date What & how many.
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
Teaching Learning Strategies and Academic Language
Averil Coxhead Hüsem Korkmaz MA TEFL. was developed from a corpus of 5 million words with the needs of ESL/EFL learners in mind, contains the most widely.
Supplementary materials
Academic Vocabulary and Grammar Academic Word Lists.
Chapter Nine The Communicative Approach.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
How Can Corpora Help Me To Be Successful in CO150?
The Academic Word List (AWL) INTUIT Closing conference 30 January 2008 Riet Bettonviel –Tilburg University Ruben Comadina Granson – Language Centre, University.
1 Using ACCESS for ELLS ® Data to Inform Instruction Presenter: Margot Downs WIDA Certified Consultant ACCESS for ELLs ®, W-APT™, and ELP Standards Trainer.
CORPUS APPROACHES TO LANGUAGE STUDIES FL, AWL
GSL & NGSL. Comparison: GSL 1953 (Michael West) 1995 ( John Bauman & Brent Culligan) Today’s version 2284 Word families (famous early 20th century researchers;
Corpus approaches to discourse
语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis.
COUNCIL OF CHIEF STATE SCHOOL OFFICERS (CCSSO) & NATIONAL GOVERNORS ASSOCIATION CENTER FOR BEST PRACTICES (NGA CENTER) JUNE 2010.
Copyright © Cengage Learning. All rights reserved.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Teaching Writing.
Types of Writing, Prompts, and the ARCH Method
National Taiwan Ocean University
How to Teach with Go for it! 山西省教育科学研究院 山西省教育厅教研室 平克虹.
Vocabulary and understanding
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
COURSE AND SYLLABUS DESIGN
MEG 実験 2009 液体キセノン検出器の性能 II 西村康宏, 他 MEG コラボレーション 東京大学素粒子物理国際研究セン ター 第 65 回年次大会 岡山大学.
Using Corpora in TEFL By Terri Yueh. WhyWhy Work With Corpora? Why  From Vocabulary to Corpus  Choosing a Corpus Choosing a Corpus  Examples of Word.
1 Vocabulary acquisition from extensive reading: A case study Maria Pigada and Norbert Schmitt ( 2006)
Study Skills Taking control of your reading. Useful Tips Try to recognize the key features of a text, this helps you read quicker. Try to indentify appropriate.
Selection and Use of Supplementary Materials and Activities
Developing EAP reading materials for teaching and publication
Academic writing.
Advanced Higher Modern Languages

How many lexical items do students need to know?
© Copyright Showeet.com ORAL PRESENTATION Nº1 Subject: Curriculum Evaluation Date: May 11 th, 2018 Cycle: VI Topic: Unit 1: Evaluation and Innovation and.
DEVELOPING ACADEMIC LANGUAGE AND TEACHING LEARNING STRATEGIES
Literature Walk Recount/Summarize Fiction SECONDARY WALKS
Presentation transcript:

Tatsuhiko Matsushita (University of Tokyo) 2013 Victoria University of Wellington 1

2

1.Motives for the Study 2.Research Questions and Goals 3.Proposal of a New Index: Text Covering Efficiency (TCE) Text Covering Efficiency (TCE) TCE 4.Method of Validating TCE 5.Results and Discussion 6.Conclusion 3

How efficiently can we learn vocabulary? What words should learners learn first, second and next? Domain-specific words such as academic words (Coxhead, 2000) are often extracted for efficient vocabulary learning in a genre. Text coverage has been used for evaluating these groups of words (Coxhead, 2000; Hyland & Tse, 2007) 4

when the numbers of words are different between the groups However, text coverage is not appropriate for comparing the efficiency between grouped words when the numbers of words are different between the groups. How can we compare the efficiency between a group of domain-specific words and the other words? e.g. 1 How can we compare the efficiency between learning AWL (Coxhead, 2000) and UWL (Xue & Nation, 1984)? e.g. 2 How can we compare the efficiency between learning technical term lists in different genres? e.g. 3 How can we compare lists at different frequency levels in a genre e.g. sublists of AWL? How many times more efficient in gaining text coverage in different genres by learning the sublist 1 than the sublist 2? e.g. 4 For gaining higher text coverage, at which stage should learners transit from learning general words to domain- specific words? 5

For example, the table below (Hyland & Tse, 2007) does not show the difference in efficiency in gaining the text coverage because the numbers of words in AWL and GSL are different. 6

Research Questions when the numbers of words are different between the groups 1. What index is appropriate for comparing the efficiency between grouped words in gaining text coverage when the numbers of words are different between the groups? 2. Is there any advantages of comparing the efficiency between grouped words in gaining text coverage other than deciding the most efficient learning order of words? 7

Goals Text Covering Efficiency (TCE) 1. To propose an index: Text Covering Efficiency (TCE) 2. To show the validity and usefulness of TCE for a. deciding the most efficient order of words to learn b. analyzing lexical features of text genres by applying TCE to some groups of Japanese domain-specific words and other types of grouped words. 8

Problem: numbers of words are different between the groups to be compared Solution: Standardization Dividing text coverage (tokens) of a group of words by the number of the grouped words Dividing the quotient by the total number of tokens in the target text (domain) to adjust the difference in size of the texts and make the figures from differently-sized texts comparable. 9

For the user’s convenience, the figure is multiplied by 1,000,000. The solution means the expected number of tokens of a word from the grouped words in a one-million-token text in the target domain. Therefore, it is comparable with the standardized frequency per million. In other words, TCE is an expected standardized frequency of a grouped word. Text Covering Efficiency Text Covering Efficiency (TCE) = the mean text coverage per one million tokens of the target text by a word from the grouped words. 10

11

How can we validate an index?  By applying the index to the actual data to check if: 1. the results do not conflict with the findings from previous studies 2. the results show something which will not be clearly shown without the index TCE was applied to some grouped Japanese words in different text genres 12

(Japanese) Common Academic Words (CAW) (Matsushita, 2011) (Japanese) Limited-Academic-Domain Words (LAD) (Japanese) Literary Words (LW) (Matsushita, 2012) These word lists can be downloaded from “Matsushita Laboratory for Language Learning” p_Tatsu.html 13

Target Corpora: Technical texts in the four genres of Humanities, Social sciences, Technological natural sciences and Biological natural sciences Reference Corpus: Balanced Contemporary Corpus of Written Japanese (BCCWJ), 2009 monitor version excluding the target corpora part Index: Log-likelihood Ratio (LLR) Criteria for extraction 4-domain words and 3-domain words: CAW 2-domain words and 1-domain words: LAD 14

JS-Bn: Journal articles on biological natural sciences million tokens. MTT-Bn: Technical texts in biological natural sciences million tokens. JS-Tn: Journal articles on technological natural sciences million tokens. MTT-Tn: Technical texts in technological natural sciences million tokens. MTT-Ss: Technical texts in social sciences million tokens. TB: Texts in social sciences for intermediate and advanced learners of Japanese million tokens. TIS: Texts in a textbook in international studies. Mainly social science texts million thousand tokens. UYN: Newspaper texts of 5.68 million tokens. BSB: Texts from best seller books. Mainly composed of literary works million tokens. UPC: Lieterary texts million tokens. MC: Conversation texts million tokens. 15

16

17

18

19

20

21

22

23

24

The result shows that TCE clearly indicates the efficiency in gaining text coverage, and thus it is useful for deciding a more efficient learning/teaching order of words. These findings do not seem to conflict with previous studies. Lexical features of texts in different genres can also be examined by checking the TCE figures. E.g. Japanese newspaper texts have similar lexical features to academic texts in social sciences. You can find things you cannot see without the index. For example, such an analysis allows you to say things like, “Learning the intermediate Japanese Common Academic Words is 6.2 times more efficient in covering Japanese social science texts than learning other words at the same level, and 8.3 times more efficient than learning the advanced common academic words”. 25

TCE: Text Covering Efficiency = the mean text coverage per one million tokens of the target text by a word from the grouped words TCE enables us to compare many different types of grouped words in many different genres. Therefore, it makes easier to decide what words should be learned first to read texts in a genre. TCE enables us to examine the lexical features of texts in different genres. 26

Thank you. 27

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. Hyland, K., & Tse, P. (2007). Is there an “Academic Vocabulary 」 ? TESOL Quarterly, 41(2), 235–253. Matsushita, T. (松下達彦). (2011). 日本語の学術共通語彙(アカデミッ ク・ワード)の抽出と妥当性の検証 [Extracting and validating the Japanese Academic Word List]. [2011 年度 日本語教育学会春季大会 予稿 集 [Proceedings of the Conference for Teaching Japanese as a Foreign Language, Spring 2011] (p 244–249). Matsushita, T. (松下達彦). (2012). 日本語文芸語彙の抽出と検証 ― コー パスに基づくアプローチ ― [Extracting and validating the Japanese Literary Word List: A corpus-based approach]. 第九回国際日本語教育・日 本研究シンポジウム (The Ninth Symposium for Japanese Language Education and Japanese Studies), City University of Hong Kong, November 24, 2012 Richards, B. J., & Malvern, D. D. (1997). Quantifying lexical diversity in the study of language development. Reading: University of Reading. Xue, G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3(2), 215–

In addition, TCE is a robust index by which different lexical features in different genres can be clarified as well. As argued about TTR (Richards & Malvern, 1997), the relationship between the numbers of tokens and lexemes will be different depending on the text size. Nevertheless, it is not a problem for TCE because the formula does not use the number of lexemes occurring in the text but uses the number of lexemes of the target group of words. This is a reasonable idea because learners generally do not know which words will occur in a particular text. For example, to evaluate the value of the intermediate literary words as a source for gaining the text coverage, it is reasonable to divide the tokens by the number of lexemes of the intermediate literary words which a learner will learn before s/he reads the text. 29