Stylistics and stylometry

Slides:



Advertisements
Similar presentations
You’re the author – what were your intentions?  A dot point outline of unrelated, random thoughts loosely connected to your writing  A plan for your.
Advertisements

Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
Stylistics – case study see last slide for websites used to get numerical information from texts.
Why study grammar? Knowledge of grammar facilitates language learning
Uses of a Corpus “[E]xplore actual patterns of language use”
Ways of classifying varieties of English Style, register, genre, …
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Matakuliah : G1222, Writing IV Tahun : 2006 Versi : v 1.0 rev 1
Stylistics and stylometry. 2 What is “style”? Term not much loved by linguists –Too vague –Has connotations in neighbouring fields (“style” = good style,
Statistics MP Oakes (1998) Statistics for corpus linguistics. Edinburgh University Press.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
Stylistics ENG 551 Lecture 2.
Focus Education Assessing Reading: Exceeding Year 6 Expectations Year 6 Exceeding Expectations: Comprehension Explain the structural devices used.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
The Criteria.  Criterion A: Content (Receptive and Productive)  Criterion B: Organisation  Criterion C: Style and Language Mechanics  You can achieve.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Assessing Reading Meeting Year 5 Expectations
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
MECHANICS OF WRITING C.RAGHAVA RAO.
UNIT 1 ENGLISH DISCOURSE ANALYSIS (an Introduction)
Scientific Prose Style (SPS) Literary and Linguostylistic Characteristics.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Discourse Analysis ENGL4339
Corpus approaches to discourse
Levels of Linguistic Analysis
Definition Essay WIT Comp 2. Definition A definition essay is an essay that defines a word, term, or concept. In this essay you should not define a term.
Written Assignment NOTES AND TIPS FOR STUDENTS.  MarksLevel descriptor 0The work does not reach a standard described by the descriptors below. 1–2The.
Let’s understand the eight sentences.(1-4)  The sender selects a message in the source language.  Encodes the message in signal 1 with the SL context.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
Approaching Literary Criticism. Commentary A literary analysis, which is essentially a close study of the elements that contribute to the success, or.
Lecture # 21.  A branch of applied linguistics concerned with the study of style in texts, especially (but not exclusively) in literary works.applied.
Summarise (Sum up) Analyse (Work out) Hypothesise (Put forward)
COMMENTARY LL2 - Coursework. Assessment Objectives Below is the breakdown of how many marks you get for each Assessment Objective you meet: AO1: Select.
General Notes on Styles and Stylistics
GENRES. WHAT IS A GENRE? A literary genre is a category of literary composition. Genres may be determined by literary technique, tone, content, or even.
The ‘text’ as linguistic unit. Different approaches to the study of texts from a linguistic perspective have been put forward - e.g. text grammar vs.
How to Successfully Read Text & Understand the Writer’s Craft
Non-fiction and Media Higher Tier.
General Notes on Stylistics
English Writing Course Calendar and Personal Learning Checklist
Collecting Written Data
DPS • English Copyright © 2017 mrshawke.com
Year 6 Objectives: Writing
Branches of Stylistics
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
IB Assessments CRITERION!!!.
TRANSLATION 5. Genre and translation 1 Lingua Inglese 2 LM.
MYP Descriptors – Essay Types & Rubrics
Grammar Workshop Thursday 9th June.
Creative NEA Friday 2nd February
A Systematic Framework for Language Analysis
Contextual Analysis Context governs our linguistics choice.
Structuring a response
Stylistics and Stylometry
Levels of Linguistic Analysis
I write neatly using accurate, consistent handwriting.
What is Discourse Analysis
The Invisible Process to help with analysis:
Deconstructing a text.
What is sociolinguistics?
TECHNICAL REPORTS WRITING
Writing to Discuss / Argue – Steps to Success
Presentation transcript:

Stylistics and stylometry

What is “style”? Term not much loved by linguists Too vague Has connotations in neighbouring fields (“style” = good style, ie a value judgment) Many books/articles make reference to etymology of the word (Lat. stilus = ‘pen’), so it follows that style is mainly about written language Various definitions, some very close to things already seen (especially “register”) Two main aspects widely supposed: style is choice style is described by reference to something else

Style as choice For any intended meaning there are a range of alternative ways of expressing that meaning Different choices express nuances of meaning of other things (style?) eg buy vs purchase Example: Visitors are respectfully informed that the coin required for the meter is 50p; no other coin is acceptable 50p pieces only Propositional meaning is the same; difference in expression conveys something else (register etc)

Style as choice Style is a choice, but often the “choice” is somewhat predetermined ie a choice between appropriate and inappropriate style So maybe “style” is just another word for register?

Style and the norm Some writers define style as “individual characteristics of a text” “total sum of deviations from a norm” But what is the “norm”? Is there some form of the language that is neutral as regards style/register? Note also that the norm shifts: eg Bible AV was written in the vernacular of its time Literary stylistics focuses on the exceptional

Even if there is no norm, we can describe style comparatively Stylistics mainly involves comparing and contrasting texts and associating linguistic variance with contextual explanation Some authors see style as being what is added to the text

Stylistic analysis Gulf between literary vs linguistic stylistics Lit crit focuses on effect on the reader, intended or otherwise, so largely intuitive and subjective Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description

Stylistic analysis Inventory of linguistic devices and their effect usually in a contrastive way: in contrast with other writers in a similar genre in contrast with other genres Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc. Effects can be directly expressive, or indirectly, by association example: onomatopoeia vs alliteration as a phonological device

Stylistic analysis Crystal & Davy (1969) Investigating English Style Informally identify stylistic features felt to be significant Devise a method of analysis which facilitates comparison between usages Identify the stylistic function of the features so identified

Types of features “Invariable” features due to the individual or the time – usually of little interest Discourse features medium (= Halliday’s mode), what features distinguish written language from spoken language participation: eg monologue vs dialogue Province (= field) lexis and syntax Status (= tenor) features relating to relative social standing of writer/speaker and reader/listener Modality (= text type) eg message delivered as a letter, postcard, text message, email, etc Singularity: deliberate occasional idiosyncracies

Method and function Methods and features determine each other you can only measure features that you can extract simple counting features are easy to extract more complex features can be extracted thanks to NLP techniques of corpus annotation (tagging, parsing, etc) Describing the function of observed differences could be based on intuition or (see later) partially automated (factor analysis)

What to count Simple things may characterise different styles average sentence length average word length type:token ratio (vocabulary richness) number of types = number of different words number of tokens = total number of words vocabulary growth (homogeneity of text) number of new types in 1st, 2nd, …, nth 1000 words in rich varied text, number will climb steadily Especially when used comparatively

What to count More complex analyses can give a more interesting picture specific syntactic structures degree of modification in NPs types of verbs (eg verbs of persuasion, speech verbs, action verbs, descriptive verbs) distribution of pronouns (1st/2nd/3rd person) etc … (anything you can think of) Quite sophisticated mathematical techniques can give an overall picture eg factor analysis: identifies from a (big) range of variables which ones best identify/characterize differences

Normalization and significance Always important to compare like with like It is usual when counting things to “normalize” over the length of the text If one text is longer than the other, of course you would expect higher frequencies of everything Issue of statistical significance Small differences may not really tell you anything Various measures can confirm whether difference is statistically significant or due to random fluctuation

How to count How to recognize paragraph breaks? How to recognize sentence breaks? Headlines don’t end in a fullstop Not all sentences end in a fullstop Not all full stops are sentence ending (abbreviations) How to count words Hyphenated words, contractions e.g. don’t How to measure word-length/complexity length only roughly corresponds to complexity number of characters vs number of syllables cf. through vs idea counting syllables implies either a dictionary or an algorithm

More sophisticated counting Tagging and parsing allows you to look at grammatical and lexical issues Use of particular POSs (conjunctions, pronouns, auxiliaries, modals) Use of particular features (tenses, …) Use of particular constructions (passives, interrogatives)

Quantifying register differences Much work based on corpora trying to quantify and characterize register differences Work pioneered by Douglas Biber Simple counts like the ones suggested Also, more complex computations

Example From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambriufge University Press, 1998. Ch 5: the study of discourse characteristics

Multidimensional analysis Collect a huge range of measures of a wide variety some simple word counts syntactic features classes and subclasses of N,V,Adj,Avd Factor analysis

~150 features in all

Factor analysis Statistical method to take large number of apparently random variables and group them together into “factors” Factors will be groups of (+ve and –ve) features Linguist might then try to characterize the factors in terms of some psycholinguistic feature

Example Biber took two Google classifications of text types: “Home” and “Science” Harvested ~1500 webpages in each category (3.74m words) originally got ~2500 webpages, but some were not suitable http://jan.ucc.nau.edu/biber/Web text types.ppt

Summary of analysis