中国学习者英语笔语中的 词块能力研究 许家金 中国外语教育研究中心 北京外国语大学. Lexical Chunks in Chinese Learners ’ Writing (WECCL) Xu Jiajin Beijing Foreign Studies University.

Slides:



Advertisements
Similar presentations
Part Two: Using Xaira to explore corpora Richard Xiao
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Data Mining and Text Analytics By Saima Rahna & Anees Mohammad Quranic Arabic Corpus.
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
Rita Juknevičienė Department of English Philology Vilnius University
Integrating corpus-based vocabulary activities into an academic writing course TESOL 2005, San Antonio, Texas March 30, 2005 John Bunting Georgia State.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Story telling : Year 3 & 4 focus. “The foreign language can be used to revise and reinforce prior learning, to provide opportunities to revisit earlier.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
A Language Environment for Second Language Writers Ola Knutsson KTH Nada.
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Corpus design & analysis techniques 1.  Monolingual: general, specialized, comparable  Bi/Multilingual: parallel, comparable 2.
Compiling a corpus II. Corpus A finite size, non random collection of naturally occurring language, in a computer readable form. Non-random = representative.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Constructing and Evaluating Web Corpora: ukWaC Adriano Ferraresi University of Bologna Aston University Postgraduate Conference.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Using Corpora in Linguistics
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Natural Language Processing Expectation Maximization.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Qiufang Wen The national research center for foreign language education, BFSU Chinese learner corpora and second language research The 2006 International.
The Background to CLT Phase 1: traditional approaches (up to the late 1960s) Phase 2: classic communicative language teaching (1970s to 1990s) Phase 3:
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
II. The research paper format required Cover page Acknowledgement Abstract and keywords Contents Text (without title) Bibliography Appendix Requirements.
Bilingual term extraction revisited: Comparing statistical and linguistic methods for a new pair of languages Špela Vintar Faculty of Arts Dept. of Translation.
Homing in on the Text- Initial Cluster Mike Scott School of English University of Liverpool Aston Corpus Symposium Friday May 4th 2007 This presentation.
PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Content-based Language CALL – Chapter 8. Focus on content The content is the reason for learning The content determines the choice of patterns of vocabulary.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Extensive Reading in a foreign language asks learners to read a lot of easy, interesting books.
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Copy all files on CD to D drive D:\workshop. Corpus: An Internet Metaphor  Web pages + search engine  Texts + Tools.
Business English - Basic Business Venture is for people who need to use English in everyday business situations. Language in this course is presented in.
Unit 13 Integrated Skills. Aims of the Unit - to be aware of the reasons of integrating the four skills; - to learn two ways of integrating the four skills.
学习者书面语中的程序化词汇研究 Procedural vocabulary and EFL writing quality 梁茂成
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
What’s in a Wordle? Vocabulary Learning Made Fun Tilly Harrison University of Warwick.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Module 3 Developing Reading Skills Part 1 Transition Module 3 developed byElisabeth Wielander.
Wordle and Wordsift: Vocabulary Learning Made Fun Tilly Harrison University of Warwick.
751-3.
Reading and Frequency Lists
Making useful wordlists for ELT
Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
عمادة التعلم الإلكتروني والتعليم عن بعد
Topics in Linguistics ENG 331
Writing Analytics Clayton Clemens Vive Kumar.
A Search for Discipline-Specific Vocabulary
Lecture 5 Lexicogrammar —a new perspective to language
Presentation transcript:

中国学习者英语笔语中的 词块能力研究 许家金 中国外语教育研究中心 北京外国语大学

Lexical Chunks in Chinese Learners ’ Writing (WECCL) Xu Jiajin Beijing Foreign Studies University

3 Interpretations Contents How to extract chunks in WS4? Comparisons of chunks Chunks Other areas of collocation

4 chunk 词块 ujz/chunks.ppt Chunk/lexical bundle/n-gram/multi- word unit (expression)/formulaic sequence/prefabs Cluster ( 词丛 ) in WordSmith

5 Why is lexical chunk important? Break language into units on a probabilistic basis Recurrent Psychologically real

6 Why is lexical chunk important? Form-function composite Can we build the entire edifice of language solely on lexical chunks? If not, what are the bricks & mortar of language?

7 Lexical grammar Lexico-grammar Pattern grammar Construction grammar Collostruction etc

8 Research question Are there any differences of chunk use between Chinese learners of English and NNS? What are some of the possible underlying reasons?

9 Lexical chunk extraction in WS4 Step 01 Cluster in [Concord]: Focusing on individual words WS4 中词块的自动提取

10 Lexical chunk extraction in WS4 Step 02 Cluster in [WordList] Indexing corpus data before computing cluster WS4 中词块的自动提取

11 Lexical chunk extraction in WS4 Step 03 How to index a corpus? 1. [Settings-Index] First assign a name to the text(s) to be indexed 2. [Make/Add to Index] Then index the selected text(s).tokens and.types pair WS4 中词块的自动提取

12 Step 04 Extracting clusters in WordList [WordList-Open] *.tokens [Compute-Clusters] [File-Save]/[File-Save As]

word cluster list 多词词表 of WECCL and LOCNESS weccl index 2-word clusters … weccl index 6-word clusters locness index 2-word clusters … locness index 6-word clusters

14 Step 05 keyword list generation to test for chunk over- and under-use significance [Keyword list] - [New] Reference corpus Chi-square test 卡方检验 Log likelihood 对数拟然检验 Keyness p value ≤.05

15 类似研究 管博、郑树堂, 2005 ,中国大学生英 语口语 Small Words 的研究,《外 语教学与研究》第 6 期。

16 Other tools for extracting lexical chunks Kfngram William Fletcher √ Collocation Extraction C-ngram N-gram PhraseExtract Tom Cobb ’ s Page PIE: Phrase in English Fletcher

17 A quick summary Chunk defined How to extract chunks from a corpus with WS4 How to compare chunks with keyword Interpretation

18 A bibliography of chunk research Aspects of collocation study

Thank you!

Lexical _JJ Chunks _NN2 in _II Chinese _JJ Learners ’ _NN2 Writing _NN1 Xu Jiajin