Corpus Linguistics: Counting words, texts or features Mike Scott, University of Liverpool Corpus Linguistics Summer Institute June-July 2008.

Slides:



Advertisements
Similar presentations
APPROACHES TO T&L Language
Advertisements

Book Port Plus Navigating the Different Files Presented by Maria E. Delgado.
Reading. 1: Is developing an interest in books Scale points 1 – 3 are based on the childrens achievement in their preferred language Looking for child.
Introduction to Computational Linguistics
The Behaviour of Key Words (KWs) Mike Scott University of Liverpool.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Using Corpora in Linguistics
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Exploring the Literacy Standards: Word Choice & Text Structure.
Nouns, pronouns, and the simple noun phrase
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.
English Language and Literature Prelim Lesson: Investigating Language Use in ‘The Handmaid’s Tale’
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
Registration and HEE Themes Learning Styles Concentration and Time Management Reading Skills Lectures and Taking Notes Gathering Information Seminars and.
II. The research paper format required Cover page Acknowledgement Abstract and keywords Contents Text (without title) Bibliography Appendix Requirements.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Homing in on the Text- Initial Cluster Mike Scott School of English University of Liverpool Aston Corpus Symposium Friday May 4th 2007 This presentation.
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Language acquisition and language learning: the linguistic background of language development Marina Tzakosta LaDiva coordinator University of Crete Dept.
Discourse Analysis Dr. Raymond Oenbring COB Lin 225.
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
Title: Corpus-Based Analysis of the Representation of Thai Government in Bangkok Post and the Nation: Critical Discourse Analysis Mr.Warawit Natephra.
Module 3.2.  Learn the differences between kinds of textbooks  Learn ways to help students focus their reading and manage multiple or very large reading.
Organizing Information for Your Readers Chapter 6.
Paper 2: Section A Worth 15% of the English Language GCSE
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
Close Reading Tips and Tricks. Understanding Questions It is vital that you always use your own words. Only include a quote if you are asked to ‘pick.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Theme Four: Developing a Unit Plan Shen Chen School of Education The University of Newcastle.
Defining Discourse.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
INTRODUCTORY LECTURE BY PROF. MIKE KURIA. WHAT IS STYLISTICS? Method of textual interpretation in which primacy is assigned to language. Literary texts.
Reading Textbooks and Taking Notes. Today’s Agenda  Learn the SQR4 Strategy.  Practice taking notes from the textbook together.
Chapter 2 The Nature of Learner Language By : Annisa Mustikanthi.
Discourse analysis May 2012 Carina Jahani
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Grounded theory, discourse analysis and hermeneutics Part Two – Discourse Analysis ERPM001 Interpretive Methodologies Dr Alexandra Allan.
Outline Feedback and Structure. The texts you have produced.
Bellwork 1/22 What is the best piece of advice or life lesson your father or mother ever gave you? Tell the person next to you.
English for EAP Practice activities Lesson 2 Reading more efficiently Three types of reading English for Academic Purposes Practice activities Reading.
Summarise (Sum up) Analyse (Work out) Hypothesise (Put forward)
Unions and Intersections of Sets Chapter 3 Section 8.
T H E D I R E C T M E T H O D DM. Background DM An outcome of a reaction against the Grammar- Translation Method. It was based on the assumption that.
Module 3 Developing Reading Skills Part 2 Transition Module 3 developed byElisabeth Wielander.
More Strategies for Reading your HealthcareTextbook Annotating Your Textbook.
AN INTRODUCTION TO SPOKEN LANGUAGE LG4 Section A.
Help your child revise for their GCSE in English Literature
What do these mean? Your time is up Ready for anything (Red E)
Dr. Holly Kruse Interpersonal Communication
THE BEST WAY TO STUDY For studying technique try SQ3R, which stands for Survey, Question, Read, Review and Re-read. When approaching a chapter in the book.
GCSE English Language 2017/18 Session 5
Question 1 – Information Retrieval
Dr. Debaleena Chattopadhyay Department of Computer Science
Basics & Stretch Yourself Assessment Objectives (AOs)
Text for section 1 1 Text for section 2 2 Text for section 3 3
The Final Week.
Year 10 Summer exam Monday, 27 May 2019.
PSYCHOLINGUISTICS To: Yaşam UMUT BILDIRCIN
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Basics & Stretch Yourself Assessment Objectives (AOs)
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Presentation transcript:

Corpus Linguistics: Counting words, texts or features Mike Scott, University of Liverpool Corpus Linguistics Summer Institute June-July 2008

Aims to identify what is in principle countable using CL techniques to consider what it is in principle desirable to count and why

No, not that kind of sentence

What have we got, anyway? electronic texts is anything missing?

What is a text, anyway?

What we’re looking at Words in Texts sentences paragraphs sections key words etc. Words in the Brain memory e.g. tip-of-the-tongue word associations enjoyment priming Words in the Language lexicography terminology, phraseology, etc. patterns of “standard English” Words in Culture cultural key words, indicators of class and stance, bias, etc.

What is countable? characters word-forms parts of speech sentences headings? paragraphs? lines? pages? other divisions (section, chapter) if marked up utterances turns grammatical sequences

What isn’t countable? metaphors semantic prosody patterns  because these are abstractions

though we have to try … by seeking various markers, frames signalling these abstractions recognising, however, that 1 form ≠ 1 function Corpus Linguistics is all about pattern-seeking!

Why counting, anyway? search for interpretations understanding re-defining categories via patterns WordSmith

What should we count? the question of focus the question of scope pointfulness: the search for patterns the POS-trap  metadata are used to forget the data (François Rastier)

Reference Scott, M. & C. Tribble, Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins. Chapters 1 & 2.