751 - 1.

Slides:



Advertisements
Similar presentations
You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
1 Vocab Assessment & Corpora and Concordancing Major vocabulary assessment tools Major corpora and concordancers.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Memory Strategy – Using Mental Images
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Averil Coxhead Hüsem Korkmaz MA TEFL. was developed from a corpus of 5 million words with the needs of ESL/EFL learners in mind, contains the most widely.
Academic Vocabulary and Grammar Academic Word Lists.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
for Materials Design The Theory & Practice of Concordancing.
CORPUS APPROACHES TO LANGUAGE STUDIES FL, AWL
Corpus approaches to discourse
Colorado State University
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
1 Ch 1. VOCABULARY SIZE, TEXT COVERAGE & WORD LISTS Nation& Waring.
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
Literature Review. Terminology Authentic Materials –Texts (written or spoken) designed for native speakers (Harmer 1991) –Text not specifically produced.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Oranim Academic College Dept of Postgraduate Studies M.Ed. Teaching Vocabulary Prof. Penny Ur Lexical Threshold Revisited: Lexical Text Coverage, Learners'
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Lexical chunks Liu, D. (2003). The most frequently used spoken American English idioms: a corpus analysis and its implications. TESOL Quarterly 37(4),
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Specialist Knowledge: an Interactive Approach Rina Fokel de Vries & Veronica Raffin EAP Tutors BIA, University of Birmingham References Clapham, C. (2001).
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
CHATTING IN THE ACADEMY: EXPLORING SPOKEN ENGLISH FOR ACADEMIC PURPOSES Michael McCarthy.
How Many Words Does It Take to Listen and Read in English?
Priya Mathew, Hilary Nesi & Benet Vincent
Vocabulary Module 2 Activity 5.
Corpus Linguistics Anca Dinu February, 2017.
The vocabulary of academic speaking: an interdisciplinary perspective
In the Name of God.
Vocabulary acquisition in language classrooms
Ma Rui Tianjin Normal University
ALE161 國際行銷英文簡報技巧 International Marketing Presentation Techniques
Text Based Information Retrieval
Searching corpora.
Anik Wulyani, PhD candidate
EXTENSIVE READING PART 1.
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction to Corpus Linguistics: Exploring Collocation
Topics in Linguistics ENG 331
Corpus Linguistics I ENG 617
Introduction to Corpus Linguistics: Key Word Analysis
How many lexical items do students need to know?
Lesson plans Introduction.
Inferential statistics,
Corpus-Based ELT CEL Symposium Creating Learning Designers
Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.
H070 Topic Title H470/1 Exploring language.
Using GOLD to Tracking L2 Development
A Corpus-Based Approach to Adapting Authentic Military Material
Applied Linguistics Chapter Four: Corpus Linguistics
The quality of choices determines the quantity of Key words
What the Professor needs to know
Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.
Sampling.
Applied Linguistics.
TEXT ANALYSIS BY MEANS OF ZIPF’S LAW
Presentation transcript:

751 - 1

Course admin Canvas info Class rep Email/Announcements Readings Resources – Computer and USB stick Class rep

Corpus A sample of naturally occurring language systematically collected for linguistic analysis Thus – the web is not a corpus, neither is a collection of examples of conditional sentences Hunston provides some examples of different types of corpus. (We will work with a variety of corpora.)

Corpus Useful to distinguish a balanced corpus (e.g., Brown corpus) versus a corpus consisting of a single genre Balanced corpus facilitates comparisons because there are the same number of words in each category and sub-category written versus spoken fiction versus nonfiction However, texts are not complete and genre distinctions can get washed out (Think about the purpose for using a corpus)

Frequency lists Start with wordlists since they are well-known in language teaching Structure of frequency lists Creating frequency lists Keywords Seemingly straightforward – examine some issues related to frequency lists in language teaching

Wordlists Wordlist are familiar – vocab list for a reading or wordlist for a course or a textbook. In these cases the words are taken from the teaching materials. The wordlist can indicate what the student might be expected to know after taking a course or it can indicate which words the student will encounter in a particular reading

Wordlists For our purposes, we are interested in wordlists associated with large texts What words are in a corpus – indicating the nature of the corpus (and language use) What words occur in a language/genre and with what frequency (alternatively, what words are distinctive for a particular genre such as Business English) What words does a language learner need to know (for academic study etc.)

A frequency wordlist for a short text Handout What type of info can be obtained? What can you say about the form of the frequency list? Word distribution Frequency distribution

Larger frequency list

Structure of a word frequency list Same for a single text versus a large corpus Function words are most frequent – the always ranks first in written English texts. Content words lower in the list Many words only occur once (hapax legomena) Zipf’ Law – frequency of a word is inversely proportional to its rank

Frequency list Types and tokens – type the; tokens the the the the the Type-token ratio for a text What is a word? let’s, mid-day, he’s Lemma – analyse, analyses, analysed Lemma – analysis, analyses Word family -- analyse, analyses, analysed, analysis, analytical, analytically, …

Wordlists for language teaching Sampling the language as a whole is difficult We can create as large a corpus as possible. We can then obtain frequency bands – the top 1000 words etc.

Wordlists – general and specialised Wordlists have been around since before the invention of computers. General wordlists are used for curriculum development, textbook writing etc.

Wordlists – general Thorndike (1921) created a frequency list from a corpus of 4.5 million words West's (1953) General Service List Coxhead's Academic Wordlist (AWL) Mark Davies’ Academic Vocabulary lists

Academic Word List

Academic Word List

Academic Word List receptive list (based on morphological derivations) the list excludes words found in non-academic texts (even if they occur in academic texts) do we need subject or genre-specific wordlists? (Hyland)

Wordlists If we can produce a wordlist for English (etc), then we have some idea of what words to teach (the more frequent first) we can estimate the difficulty of texts we can determine what is special about academic English, business English etc.

Wordlists An important threshold is 2000 words (Laufer 1994, Nation 2001) Learners who have control over 6000 words should be able to understand around 90% of a typical text McCarthy (2002) estimates that to reach higher levels of understanding it is necessary to aim for 10,000 word receptive vocabulary Corpus studies can help to identify different frequency bands – the top 2000 word band, etc

Frequency and coverage Levels Conversation Fiction Newspapers Academic text 1st 1000 84.3% 82.3% 75.6% 73.5% 2nd 1000 6% 5.1% 4.7% 4.6% Academic 1.9% 1.7% 3.9% 8.5% Other 7.8% 10.9% 15.7% 13.3%

Vocab Profile Applying the language frequency bands to a particular text results in a lexical or vocab profile Tom Cobb's Vocab Profile site http://www.lextutor.ca/vp/eng/

Vocab Profile

Lextutor: Blue – 1000, Green – 2000, Yellow AWL

Keyword list What words are special for a particular corpus Compare with a reference corpus

Specialised Word List Create a wordlist from a corpus (using concordancer or other utilities) May need to create your own corpus – BootCaT Create a business keyword list in the lab

Some general thoughts We will be using some simple software in the computer lab. Try not to get too involved in the details of using the software, at least not to the exclusion of broader, conceptual issues It is important to know the corpus you are using. What does it consist of? Are there any special features such as all lower case? Are there any annotations?