Presentation is loading. Please wait.

Presentation is loading. Please wait.

751 - 1.

Similar presentations


Presentation on theme: "751 - 1."— Presentation transcript:

1

2 Course admin Canvas info Class rep Email/Announcements Readings
Resources – Computer and USB stick Class rep

3 Corpus A sample of naturally occurring language systematically collected for linguistic analysis Thus – the web is not a corpus, neither is a collection of examples of conditional sentences Hunston provides some examples of different types of corpus. (We will work with a variety of corpora.)

4 Corpus Useful to distinguish a balanced corpus (e.g., Brown corpus) versus a corpus consisting of a single genre Balanced corpus facilitates comparisons because there are the same number of words in each category and sub-category written versus spoken fiction versus nonfiction However, texts are not complete and genre distinctions can get washed out (Think about the purpose for using a corpus)

5 Frequency lists Start with wordlists since they are well-known in language teaching Structure of frequency lists Creating frequency lists Keywords Seemingly straightforward – examine some issues related to frequency lists in language teaching

6 Wordlists Wordlist are familiar – vocab list for a reading or wordlist for a course or a textbook. In these cases the words are taken from the teaching materials. The wordlist can indicate what the student might be expected to know after taking a course or it can indicate which words the student will encounter in a particular reading

7 Wordlists For our purposes, we are interested in wordlists associated with large texts What words are in a corpus – indicating the nature of the corpus (and language use) What words occur in a language/genre and with what frequency (alternatively, what words are distinctive for a particular genre such as Business English) What words does a language learner need to know (for academic study etc.)

8 A frequency wordlist for a short text
Handout What type of info can be obtained? What can you say about the form of the frequency list? Word distribution Frequency distribution

9 Larger frequency list

10 Structure of a word frequency list
Same for a single text versus a large corpus Function words are most frequent – the always ranks first in written English texts. Content words lower in the list Many words only occur once (hapax legomena) Zipf’ Law – frequency of a word is inversely proportional to its rank

11 Frequency list Types and tokens – type the; tokens the the the the the
Type-token ratio for a text What is a word? let’s, mid-day, he’s Lemma – analyse, analyses, analysed Lemma – analysis, analyses Word family -- analyse, analyses, analysed, analysis, analytical, analytically, …

12 Wordlists for language teaching
Sampling the language as a whole is difficult We can create as large a corpus as possible. We can then obtain frequency bands – the top 1000 words etc.

13 Wordlists – general and specialised
Wordlists have been around since before the invention of computers. General wordlists are used for curriculum development, textbook writing etc.

14 Wordlists – general Thorndike (1921) created a frequency list from a corpus of 4.5 million words West's (1953) General Service List Coxhead's Academic Wordlist (AWL) Mark Davies’ Academic Vocabulary lists

15

16 Academic Word List

17 Academic Word List

18 Academic Word List receptive list (based on morphological derivations)
the list excludes words found in non-academic texts (even if they occur in academic texts) do we need subject or genre-specific wordlists? (Hyland)

19 Wordlists If we can produce a wordlist for English (etc), then
we have some idea of what words to teach (the more frequent first) we can estimate the difficulty of texts we can determine what is special about academic English, business English etc.

20 Wordlists An important threshold is 2000 words (Laufer 1994, Nation 2001) Learners who have control over 6000 words should be able to understand around 90% of a typical text McCarthy (2002) estimates that to reach higher levels of understanding it is necessary to aim for 10,000 word receptive vocabulary Corpus studies can help to identify different frequency bands – the top 2000 word band, etc

21 Frequency and coverage
Levels Conversation Fiction Newspapers Academic text 1st 1000 84.3% 82.3% 75.6% 73.5% 2nd 1000 6% 5.1% 4.7% 4.6% Academic 1.9% 1.7% 3.9% 8.5% Other 7.8% 10.9% 15.7% 13.3%

22 Vocab Profile Applying the language frequency bands to a particular text results in a lexical or vocab profile Tom Cobb's Vocab Profile site

23 Vocab Profile

24 Lextutor: Blue – 1000, Green – 2000, Yellow AWL

25 Keyword list What words are special for a particular corpus
Compare with a reference corpus

26 Specialised Word List Create a wordlist from a corpus (using concordancer or other utilities) May need to create your own corpus – BootCaT Create a business keyword list in the lab

27 Some general thoughts We will be using some simple software in the computer lab. Try not to get too involved in the details of using the software, at least not to the exclusion of broader, conceptual issues It is important to know the corpus you are using. What does it consist of? Are there any special features such as all lower case? Are there any annotations?


Download ppt "751 - 1."

Similar presentations


Ads by Google