A Brief Intro to Corpus Techniques in ELT Research

Slides:



Advertisements
Similar presentations
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
Advertisements

Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Research methods in corpus linguistics Xiaofei Lu.
Memory Strategy – Using Mental Images
Searching American National Corpus with the Help of AntConc.
Constructing Your Own Corpus from Written Language.
Researching language with computers Paul Thompson.
How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
Gender What question would you like to ask these people? DO NOT CHOOSE THE OBVIOUS QUESTION tch?v=WDswiT87oo8.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Differentiating Your Instruction Through Guided Reading.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
An introduction to the wonderful world of EBSCO.  Online periodical database  Thousands of up-to-date articles and essays from around the world, available.
Introduction to CSCI 1311 Dr. Mark C. Lewis
I was lost but now I am Found by Info!
Corpora: a key part of a materials writer’s toolkit
How to Use Google Scholar An Educator’s Guide
Writing Inspirations, 2017 Aalto University
British and American English
& How to Study When Your Professor Doesn’t Give a Study Guide
Aim: How does the author use anecdote to develop the short story “The Myth of the Latin Woman” by Judith Ortiz Cofer? Do Now: Choose the best answer. 1.
Searching corpora.
AntConc is a freeware, multiplatform of application suitable for all types of users
Making useful wordlists for ELT

Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
ALE161 國際行銷英文簡報技巧 International Marketing Presentation Techniques
Exploring the BNC Corpus
عمادة التعلم الإلكتروني والتعليم عن بعد
Transcribing foreign-accented English
Introduction to Corpus Linguistics: Exploring Collocation
Topics in Linguistics ENG 331
Introduction to Corpus Linguistics: Dispersion/concordance plots
Corpus Linguistics I ENG 617
Introduction to Corpus Linguistics: Key Word Analysis
I was lost but now I am Found by Info!
Mess with Text: textual analysis using AntConc and TagAnt
Corpus Linguistics I ENG 617
WE ARE STARTING OFF WITH COMPONENT 2: SECTION B
Writing Inspirations, Spring 2016 Aalto University
Corpora and Concordancers in ESL/EFL Class:
(in general… and for this essay)
Signposts We’ve been talking about signpost. Remember, signposts are those things that let us know that something is going on, whether is be that a store.
Fry Word Test First 300 words in 25 word groups
A Search for Discipline-Specific Vocabulary
Topics in Linguistics ENG 331
(word formation: follow up)
Aha Moments Last week we talked about Aha moments. When you’re reading, authors often give you clues that the character has come to an important understanding.
Introduction to Academic Language
Blackboard Tutorial (Student)
Quarter 1.
The of and to in is you that it he for was.
Using web corpora for language queries
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Use this resource to help you complete project 1
Introduction to AntConc
Introductions.
Social Studies Method.
Prime Time Simply the best From online corpora to word clouds
Presentation transcript:

A Brief Intro to Corpus Techniques in ELT Research By Erkan Karabacak

Several Important Buzzwords corpus: corpora: concordancer: keyword in context (KWIC):

Several Important Buzzwords corpus: a collection of texts corpora: corpus in plural concordancer: a search engine keyword in context (KWIC): a list of words in context

Examples of Corpora •The Oxford Text Archive www.ota.ox.ac.uk •Warwick Centre for Applied Linguistics http://www2.warwick.ac.uk/fac/soc/al/ •Open American National Corpus http://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip

Examples of Concordancers Monoconc Wordsmith Tools Concordance Simple Concordance Program WConcord TextStat AntConc

KWIC (Keyword in Context)

Today AntConc is our concordancer We will use BASE and texts from our students as our corpora We will do some simple analyses to answer some language related questions.

How to use AntConc? Open the read-me file online and read it. http://www.antlab.sci.waseda.ac.jp/software/README_antconc3.2.1.txt

How to install AntConc? By Laurence Anthony, Waseda University, Tokyo http://www.antlab.sci.waseda.ac.jp/software/antconc3.2.1w.exe Open Google and search for “download AntConc”

What will we analyze? We need a collection of texts (corpus) of an adequate size.

The British Academic Spoken English Corpus • developed at the Universities of Warwick and Reading • a collection of transcripts of lectures and seminars recorded at two universities in the UK during the period 1998-2005. • recorded in a variety of university departments. four broad disciplinary groups, • each represented by 40 lectures and 10 seminars.

These groups are: • Arts and Humanities • Life and Medical Sciences • Physical Sciences • Social Studies and Sciences.

Today we will use: • Arts and Humanities , 40 text files, untagged • Life and Medical Sciences, 40 text files, untagged • Physical Sciences • Social Studies and Sciences.

An excerpt: …now what are you reading now he asked as i put down the book and reached for my jacket i was labouring over Troilus and Criseyde reading an essay on Criseyde's character you love this rubbish eh he laughed you'll end up an old professor wanking by the fireside putting aside your pipe and warming up your hand first i should say this is not a autobiographical work [laughter] in any s-, in any way right [laughter] er sm0003: you've said that before [laughter] nm0001: warming up your hand first [laughter] i looked at him sternly only a joke man he said with mocking reassurance only a joke i sat on the bus deep in thought trying to work out why she should have betrayed him so easily why after all those pure shy exchanges the secret glances

How will we analyze this corpus? Open AntConc FileOpen FilesSelect the files you would like to analyze by ctrl+shift (or clicking with your mouse’s left button)Open You will see the selected files in the left window (titled “corpus files”)

Word List Let’s get an idea of our corpus. What is the size of the corpus? (How many words (tokens) are there?) How many different words (types) are there? Click “Word List” Make your selectionsStart

Concordance Let’s search for a single word.

Activity 1: Some fun questions: Which lectures are the most fun? Which lectures did not have a lesson plan? What part of speech mostly follows a pause?

How to analyze tagged corpora <struct type="tok" from="29" to="34">   <feat name="base" value="right" />   <feat name="msd" value="NN" />   </struct> <struct type="tok" from="34" to="35">   <feat name="base" value="," />   <feat name="msd" value="," /> <struct type="tok" from="36" to="40">   <feat name="msd" value="DT" />   <feat name="base" value="this" />   <feat name="affix" value="" />   </struct> <struct type="tok" from="41" to="43">   <feat name="msd" value="VBZ" />   <feat name="base" value="be" />   <feat name="affix" value="s" />

Activity 2: Keyword Analysis Let’s say we want to create a dictionary of medical terms. Our analysis corpus is BAWE Life and Medical Sciences

Activity 3: Action Research What are the most frequently used 10 lexical bundles by American students? What are the most frequently used 10 lexical bundles by Chinese students?

What else can we do? Of course, AntConc is not enough for every type of analysis. An applied linguist who wishes to analyze large language data not only should know several application programs, but also learn a programming language; such as PERL

We can create a diachronic corpus from our students papers and observe their development. We can tag texts for their part of speech or for other information. We can automatically compile corpora from online sources.

We can do all of the above for other languages (Turkish, Chinese, Russian, and so on) We can do EVERYTHING a linguist might need to do with texts.