Stylistics and Stylometry

Slides:



Advertisements
Similar presentations
Grammar Recipes, Grammar Ideas and Writing Labs
Advertisements

Teaching the language system: vocabulary & Grammar
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Stylistics and stylometry. 2 What is “style”? Term not much loved by linguists –Too vague –Has connotations in neighbouring fields (“style” = good style,
Stemming, tagging and chunking Text analysis short of parsing.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Stylistics and stylometry
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Stylistics ENG 551 Lecture 2.
Measuring Linguistic Complexity Kristopher Kyle
Assessing Reading Meeting Year 5 Expectations
How to Give Effective Written Feedback
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Scientific Prose Style (SPS) Literary and Linguostylistic Characteristics.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
For Friday Finish chapter 24 No written homework.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Approaching Literary Criticism. Commentary A literary analysis, which is essentially a close study of the elements that contribute to the success, or.
General Notes on Styles and Stylistics
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
GENRES. WHAT IS A GENRE? A literary genre is a category of literary composition. Genres may be determined by literary technique, tone, content, or even.
Non-fiction and Media Higher Tier.
General Notes on Stylistics
Lecture 2: Categories and Subcategorisation
Automatic Writing Evaluation
Collecting Written Data
REPORT WRITING.
E303 Part II The Context of Language Research
THE MAIN NOTIONS OF STYLISTICS
Lexical bundles Last week we looked for ngrams
DPS • English Copyright © 2017 mrshawke.com
Reading Skills for Academic Study
Year 6 Objectives: Writing
To make something less bad or severe
Introduction to Corpus Linguistics
Statistical NLP: Lecture 7
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
IB Assessments CRITERION!!!.
TRANSLATION 5. Genre and translation 1 Lingua Inglese 2 LM.
Statistics: The Z score and the normal distribution
Reading and Frequency Lists
Language is the capacity that distinguishes humans from all the other creatures. - the most sophisticated and most important feature  - the most uniquely.
Compelling, Convincing
Grammar Workshop Thursday 9th June.
Paper 1, Section A: Knowledge Organiser
AICE AS English Language (9093)
Corpus Linguistics I ENG 617
A Systematic Framework for Language Analysis
Contextual Analysis Context governs our linguistics choice.
Structuring a response
Style in E & SA Style is influenced by linguistic choices on all levels: lexical, syntactic, and semantic. For example, consider the differences in meaning.
The Study and Production of Texts
WRITING AN ANALYSIS   LANG AND LIT.
Lexico-grammar: From simple counts to complex models
Applied Linguistics Chapter Four: Corpus Linguistics
RIDDLE You see me quite often, But don’t really care. If you pass by me, You’ll often stop and stare. I can’t speak or see, But don’t think me uncouth,
Register variation: correlation, clusters and factors
Interpreting Tables and Graphs
The Study and Production of Texts
I write neatly using accurate, consistent handwriting.
What is Discourse Analysis
Comprehension Tests.
Information Retrieval
The Invisible Process to help with analysis:
Deconstructing a text.
What is sociolinguistics?
TECHNICAL REPORTS WRITING
Writing to Discuss / Argue – Steps to Success
Presentation transcript:

Stylistics and Stylometry CSC 4598 Machine Translation Dr. Tom Way

What is “style”? Term not much loved by linguists Too vague Has connotations in similar fields (“style” = good style, a value judgment) Many books/articles make reference to etymology of the word (Lat. stilus = ‘pen’), so it follows that style is mainly about written language Various definitions, some very close to things already seen (especially “register”) Two main aspects widely supposed: style is choice style is described by reference to something else

Style as choice For any intended meaning there are a range of alternative ways of expressing that meaning Different choices express nuances of meaning of other things (style?) eg buy vs purchase Example: Visitors are respectfully informed that the coin required for the meter is a quarter; no other coin is acceptable Quarters only Propositional meaning is the same; difference in expression conveys something else

Style as choice (2) Style is a choice, but often the “choice” is somewhat predetermined For example: a choice between appropriate and inappropriate style So perhaps style does not connote “good” or “bad” but merely the way in which the author expresses or conveys things

Style and the norm Some writers define style as “individual characteristics of a text” “total sum of deviations from a norm” But what is the “norm”? Is there some form of the language that is neutral as regards style? Note also that the norm shifts: for example, many works are written in the vernacular of their time Literary stylistics focuses on the exceptional

Style and the norm (2) Even if there is no norm, we can describe style comparatively Stylistics mainly involves comparing and contrasting texts and associating linguistic variance with contextual explanation Some authors see style as being what is added to the text

Stylistic analysis Informally identify stylistic features felt to be significant Devise a method of analysis which facilitates comparison between usages Identify the stylistic function of the features so identified

Types of features “Invariable” features due to the individual or the time – usually of little interest Discourse features medium, what features distinguish written language from spoken language participation: e.g., monologue vs dialogue Province (= field) lexis and syntax Status (= tenor) features relating to relative social standing of writer/speaker and reader/listener Modality (= text type) e.g., message delivered as a letter, postcard, text message, email, etc. Singularity: deliberate occasional idiosyncrasies

Method and function Methods and features determine each other you can only measure features that you can extract simple counting features are easy to extract more complex features can be extracted thanks to NLP techniques of corpus annotation (tagging, parsing, etc) Describing the function of observed differences could be based on intuition or using more advanced techniques (factor analysis)

What to count Simple things may characterize different styles average sentence length average word length type:token ratio (vocabulary richness) number of types = number of different words number of tokens = total number of words vocabulary growth (homogeneity of text) number of new types in 1st, 2nd, …, nth 1000 words in rich varied text, number will climb steadily Especially when used comparatively

What to count (2) More complex analyses can give a more interesting picture specific syntactic structures degree of modification in Noun Phrases (NPs) types of verbs (e.g., verbs of persuasion, speech verbs, action verbs, descriptive verbs) distribution of pronouns (1st/2nd/3rd person) etc. (anything you can think of) Quite sophisticated mathematical techniques can give an overall picture e.g., factor analysis: identifies from a (big) range of variables which ones best identify/characterize differences

Normalization and significance Always important to compare like with like It is usual when counting things to “normalize” over the length of the text If one text is longer than the other, of course you would expect higher frequencies of everything Issue of statistical significance Small differences may not really tell you anything Various measures can confirm whether difference is statistically significant or due to random fluctuation

How to count How to recognize paragraph breaks? How to recognize sentence breaks? Headlines don’t end in a full stop Not all sentences end in a full stop Not all full stops are sentence ending (abbreviations) How to count words Hyphenated words, contractions e.g. don’t How to measure word-length/complexity length only roughly corresponds to complexity number of characters vs. number of syllables counting syllables implies either a dictionary or an algorithm

More sophisticated counting Tagging and parsing allows you to look at grammatical and lexical issues Use of particular POSs (conjunctions, pronouns, auxiliaries, modals) Use of particular features (tenses, …) Use of particular constructions (passives, interrogatives)

Quantifying register differences Much work based on corpora trying to quantify and characterize register differences Work pioneered by Douglas Biber Simple counts like the ones suggested Also, more complex computations

Example anaphoric noun - refers back to previous object anaphoric pronoun - refers back to previous object exophoric pronoun - refers to something outside text From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambriufge University Press, 1998. Ch 5: the study of discourse characteristics

Features (1)

Features (2) ~150 features in all