Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

ATEC Procedural Animation Introduction to Procedural Methods in 3D Computer Animation Dr. Midori Kitagawa.
The Behaviour of Key Words (KWs) Mike Scott University of Liverpool.
Course Instructor: Aisha Azeem
Robert Huggins and Daniel Prokop Centre for International Competitiveness, Cardiff School of Management, University of Wales Institute, Cardiff Presentation.
IIUM Research, Invention and Innovation Exhibition 2010 ‘ Enhancing Quality Research and Innovation for Societal Development’ Asadullah Shah 1, Aznan Zuhid.
AJITESH VERMA 1.  Dictionary meaning of chaos- state of confusion lack of any order or control.  Chaos theory is a branch of mathematics which studies.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Assessment of coastal evolution: some statistical problems Antonio Speranza, CINFAI.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Homing in on the Text- Initial Cluster Mike Scott School of English University of Liverpool Aston Corpus Symposium Friday May 4th 2007 This presentation.
“Artificial Intelligence” in Database Querying Dept. of CSE Seung-won Hwang.
FRACTALS Dr. Farhana Shaheen Assistant Professor YUC.
SPSU, Fall 08, CS6353 Alice In Wonderland! Richard Gharaat.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
David Chan TCM and what can you do with it in class?
ALGEBRA Concepts Welcome back, students!. Standards  Algebra is one of the five content strands of Principles and Standards and is increasingly appearing.
Self-Similarity Some examples. Self-Similarity in the Koch Curve Fractals usually possess what is called self-similarity across scales. That is, as one.
Tag Clouds Reading the poetic interface. Jeremy Douglass Researcher, Software Studies Initiative University of California San Diego New Reading Interfaces.
Making sense of data We got to deal with some Math here folks.
ATEC Procedural Animation Introduction to Procedural Methods in 3D Computer Animation Dr. Midori Kitagawa.
 Introduction  Definition of a fractal  Special fractals: * The Mandelbrot set * The Koch snowflake * Sierpiński triangle  Fractals in nature  Conclusion.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
1 What did we learn before?. 2 line and segment generation.
The Language of Coalition: Does it exist? Do the Liberal Democrats and the Conservatives change their language after the 2010 General Election? Does Coalition.
MATH 598: Statistics & Modeling for Teachers June 4, 2014
Data Analysis.
Fractals.
Measuring Monolinguality
Statistical NLP: Lecture 7
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
OVERVIEW OF SYSTEM ANALYS AND DESIGN
Information Retrieval in Practice
THE NEED FOR DNS DOMAIN NAME SYSTEM
Searching corpora.
AP Biology: Normal Distribution

Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Measures of Center Section 3.1.
Introduction to Corpus Linguistics: Exploring Collocation
Introduction to Corpus Linguistics: Dispersion/concordance plots
Introduction to Corpus Linguistics: Key Word Analysis
ATCM 3310 Procedural Animation
The /a/  /aj/ Shift in Russian Verbs and Cognitive Linguistics
Introduction to Corpus Linguistics: Basic tools: Concordances
4 Sampling.
MR Application with optimizations for performance and scalability
Corpus-Based ELT CEL Symposium Creating Learning Designers
Warm up. LOOK AT YOU SHIRT TAG and write down where it comes from.
A Search for Discipline-Specific Vocabulary
MR Application with optimizations for performance and scalability
National Curriculum Requirements of Language at Key Stage 2 only
Wealthier areas COPYRIGHT NOTICE
Concept Decomposition for Large Sparse Text Data Using Clustering
Lesson 2 follow up.
Sampling.
Introduction: Statistics meets corpus linguistics
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Register variation: correlation, clusters and factors
The quality of choices determines the quantity of Key words
Information Retrieval and Web Design
ALGEBRA STATISTICS.
Lesson Objectives To understand the features of different types of newspapers To understand what DTP is To identify various tools in publisher.
The Invisible Process to help with analysis:
Parallelism in slides Maintain parallelism in design
Presentation transcript:

Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011 Networks of Key Words Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011

Abstract The notion of keyness is important for document retrieval, for language learning and for study of the nature of text. Keyness, a textual not a linguistic quality, may be shared by certain words and phrases in one text, but its patterning is further distributed across text sets of various dimensions in associates (Scott, 1997) and clustering. This presentation considers the network patterns of keyness which can be investigated using quite simple software procedures and the extent to which these patternings may relate to a user’s needs and interests. Scott, M., 1997, "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13.

Key words (KWs) Issues Keyness Aboutness Distribution patterns of KWs … in texts and across corpora

complex pattern

or simple

fractal?

A fractal is "a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole,"[1] a property called self-similarity (Wikipedia) [1] Mandelbrot, B.B. (1982). The Fractal Geometry of Nature. W.H. Freeman and Company. Fractal

aboutness importance a textual category Keyness

KWs

frequencies

aboutness what the text is about what the message is what it all means picture from mindreadersdictionary.com

importance centrality

PC Identification of KWs simple verbatim repetition no allowance for anaphora, synonymy, antonymy etc. simple frequency threshold one word, or more than one?

Corpus-based or corpus-driven? Machine-identified keyness is ideal for corpus-driven research The researcher lets the PC suggest areas needing further chasing up See recent work by McEnery, Baker, etc. Corpus-based or corpus-driven?

Dispersion within the text

Global KWs

Local KWs

middling burstiness verbs appears begins puts observes replies continues says considers etc.

Distribution patterns across the corpus

Key Key Words A "key key-word" is one which is "key" in more than one of a number of related texts. The more texts it is "key" in, the more "key key" it is.

Associates An "associate" of key-word X is another key-word (Y) which co-occurs with X in a number of texts. (It may or may not co-occur in proximity to key-word X.) Association strength measured using a standard collocation statistic, here MI3

Climate change LexisNexis database 9,444 stories UK press 2010

KKWs

Associates

waste

university

Conclusions but early days, lots of questions: KW patterns within individual texts within the corpus or sub-corpus but early days, lots of questions: are any KW patternings fractal? do specialised corpora have specialised KKWs?