751-3.

Slides:



Advertisements
Similar presentations
The people Look for some people. Write it down. By the water
Advertisements

Get. through back much go good new write out.
Word List A.
A.
Dolch Words.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Spelling Lists.
Spelling Lists. Unit 1 Spelling List write family there yet would draw become grow try really ago almost always course less than words study then learned.
High Frequency Words The second 100 get through.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
I am ready to test!________ I am ready to test!________
Sight Words.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Sight Word List.
High Frequency Words August 31 - September 4 around be five help next
Sight Words.
High Frequency Words.
Automatic Words by Tens. List 1 thatof and to a in the is was he.
INDIVIDUAL PROJECT TONI PARENT FHS LIFE AS I KNOW IT…..  I WAS BORN ON JULY 18, SHORTLY AFTER I WAS BORN, I BECAME AN OLDER SISTER TO A YOUNGER.
Created By Sherri Desseau Click to begin TACOMA SCREENING INSTRUMENT FIRST GRADE.
Christian Beliefs Forgiveness. Today’s Learning Intentions I can describe Christian beliefs about forgiveness I can reflect on my own views about forgiving.
Fry Phrase List 3.
Students’ typical confusions and some teaching implications
First 100 high frequency words
ESSENTIAL WORDS.
Vocabulary Module 2 Activity 5.
Lexical bundles Last week we looked for ngrams
Charlton Kings Junior School
Introduction to Corpus Linguistics
Statistical NLP: Lecture 7
List 1 List 1 able about above across after again able about
Chapter 6: From Brainstorm to Topic
Searching corpora.

Period 6 Grammar (II) Indefinite Pronouns.
Introduction to Corpus Linguistics: Exploring Collocation
Say the words as quick as you can!
List 1 List 1 able about above across after again able about
Dolch Words Step 3 Step 1 Step 2 Step 4 into blue by did came go
Grades K-2 Reading High Frequency Words
High Frequency Words. High Frequency Words a about.
Explain to the group of pupils that they have been given an important opportunity to lead this intervention in their schools. They are communication role.
Hypothesis Testing for Proportions
Phrases and Clauses English 10- Ms. Tocco 2014.
Fry’s Third 100 Phrases Read each phrase out loud in a soft voice.
Corpus Linguistics I ENG 617
Teaching Listening Based on Active Learning.
Sight Words.
Fry Word Test First 300 words in 25 word groups
Get.
The Second One Hundred Sight Words
Significance Tests: The Basics
Dolch Sight Word.
A short story about values
Quarter 1.
The. the of and a to in is you that with.
The of and to in is you that it he for was.
Sight Words.
Read the phrases before the slide changes for fluency practice.
Applied Linguistics Chapter Four: Corpus Linguistics
Reinforcing Positive Behaviors At Home
START.
First Grade High Frequency Words Kinder. review Pre-1st Grade
the I was for to you said go and is can play we do like see
Writing a Summary Say- Now we are going to write a summary of the story I just read- The Wall by Eve Bunting.
Constructing a Test We now know what makes a good question:
Presentation transcript:

751-3

Distinctive words Examined frequency list comparison to obtain positive and negative keywords Alternative – look for content words in a single frequency list Alternative – use a stop word list to filter out grammatical words

Assignment

Stop list Example of a stop list a about above across after again against all almost alone along already also although always among an and another any anybody anyone anything anywhere are area areas around as ask asked asking asks at away b back backed backing backs be became because become becomes been before began behind being beings best better between big both but by c came can cannot case cases certain certainly clear clearly come could d did differ different differently do does done down down downed downing downs during e each early either end ended ending ends enough even evenly ever every everybody everyone everything everywhere f face faces fact facts far felt few find finds first for four from full fully further furthered furthering furthers g gave general generally get gets give given gives go going good goods got great greater greatest group grouped grouping groups h had has have having he her here herself high high high higher highest him himself his how however i if important in interest interested interesting interests into is it its itself j just k keep keeps kind knew know known knows l large largely last later latest least less let lets like likely long longer longest m made make making man many may me member members men might more most mostly mr mrs much must my myself n necessary need needed needing needs never new new newer newest next no nobody non noone not nothing now nowhere number numbers o of off often old older oldest on once one only open opened opening opens or order ordered ordering orders other others our out over p part parted parting parts per perhaps place places point pointed pointing points possible present presented presenting presents problem problems put puts q quite r rather really right right room rooms s said same saw say says second seconds see seem seemed seeming seems sees several shall she should show showed showing shows side sides since small smaller smallest so some somebody someone something somewhere state states still still such sure t take taken than that the their them then there therefore these they thing things think thinks this those though thought thoughts three through thus to today together too took toward turn turned turning turns two u under until up upon us use used uses v very w want wanted wanting wants was way ways we well wells went were what when where whether which while who whole whose why will with within without work worked working works would x y year years yet you young younger youngest your yours z

Word cloud wordle.net jasondavies.com/wordlist

From my notes

Another view

Analysing concordance lines Worksheet

Teaching Acad Eng What to teach What is Academic English How do we go about teaching it

General experience Find an “academic writing textbook” Solves the problems: What is academic English How to teach academic English But still … How does the author know what academic English is?

Find/create a suitable corpus Now we can have a corpus of academic English From that we can get, for example, a wordlist or other linguistic patterns

Academic corpus what is academic English The English that academics use Which academics What language Lectures/Ppts/Tutorial/Morning tea/Conference presentations/Abstracts/Articles

Academic English corpus representativeness ? Academic English Academic English corpus

Academic English corpus representativeness ? Academic English Academic English corpus If you were going to create an academic corpus, what would you include (and in what proportions)?

Academic corpus Typically we don’t have access to our preferred corpus Existing corpora MICASE, MICUSP BASE, BAWE Some people create a corpus from journal articles

MICASE Transcripts from lectures in a variety of disciplines Assignment is based on the Physical Science files. A keyword analysis with the Times files as the reference corpus. Select 15 keywords Use the keywords to select 15 collocations/phrases Some reflections on the process

Moving on from wordlists In week 1 we looked at single words and frequency lists How do we move from single words to larger units? What are the larger units Grammatical units – verb phrase, sentence, … (requires Part of Speech tags or parsing) Collocations – recurrent word combinations (tea time, class schedule, point of view, take place, …)

ngrams One way to get sequences rather than words is to make a frequency list for word pairs, word triples, etc. Called bigrams, trigrams, etc. How many bigrams in a 1000 word corpus?

What are lexical bundles? Term  from  Doug  Biber Similar  to  n-grams With  a  minimum  frequency A  minimum  range  (e.g.  must  occur  in  15  out  of  the 20  files  in  the  corpus)

Lexical bundles in academic registers Academic registers vs. Non-academic registers Across academic registers Disciplinary variations in the same register

Corpus used – Biber et al

Examples of LB in academic registers Referential expression: at the bottom of is one of the Discourse organizer: on the other hand in addition to the Stance expression: it is difficult to it is important to

Disciplinary variation

Collocations More or less recognisable, but not definable No computer program can produce a definitive list of collocations – only a list of candidates

Computer identification of chunks Frequency alone cannot be used because a highly frequent sequence may not be a unit E.g., “and of the”, “that we will” Need to manually check word-sequences We can also use a statistic like Mutual Information or Log Likelihood to give different views of multiword lists

Expanded lexicon Idea that L2 learners have to be familiar with collocations as well as individual words -- change of heart, coffee cup, coffee beans, drip coffee, coffee shop.

Collocation lists Phrasal Expressions List Academic Formulas List Ron Martinez and Norbert Schmitt Applied Linguistics Aug 2012 Academic Formulas List Rita Simpson and Nick Ellis Applied Linguistics 2010

AFL-Simpson & Ellis (Nick) Frequent recurrent patterns Distinctive of academic texts (like keywords) Occurring in a range of academic genres Referred to as “range” or “dispersion”

AFL-Simpson & Ellis (Nick) Extracted 3- 4- 5- word sequences Comparison with non-academic texts (Used Log Likelihood – same as keyword analysis) Occurring in a range of academic genres 4 out of 5 Academic Divisions Used teachers to assess coherence of sequences in order to get the most reliable statistic Ranked using a frequency and MI measure

Simpson and Ellis Phrases organised by FTW FTW = Formula Teaching Worth

A Phrasal Expressions List Martinez and Schmitt Note that they used a different methodology Highlight non-compositional sequences (e.g., at all) – those likely to cause difficulties for learners Ngram analysis plus manual selection Consider relation of phrasal lists on word lists (and coverage)

Statistics Based on probability How likely is it that some event or outcome is based on chance (tossing coins) Applied to experimental data: drug trials, teaching methods Statements about the outcome: the probability that the outcome occurred by chance is less than 1 in 100. (p < 0.01)

Statistics and texts I view stats as a way of ranking (presenting) data for you to examine We cannot make statements such as “there is a 1 in 100 probability that this text data occurred by chance” We can note that a word pair have a high Mutual Information score Not an experiment. Text data is not random.