Blueprints or conduits? Using an automated tool for text analysis Stuart G. Towns and Richard Watson Todd King Mongkut’s University of Technology Thonburi.

Slides:



Advertisements
Similar presentations
GSSR Research Methodology and Methods of Social Inquiry January 10, 2012 Research Using Available Data.
Advertisements

Academic Writing Writing an Abstract.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Chapter 4 Key Concepts.
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Critical Thinking Course Introduction and Lesson 1
ENGL  Unity  Topic Sentence  Adequate Development  Organization  Coherence.
Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser.
Leaders Critique Text Selection Day 1 Session 3. Objectives 2  Leaders will be able to lead professional conversations with teachers and coaches around.
Readability Assessment for Text Simplification Sandra Aluisio 1, Lucia Specia 2, Caroline Gasperin 1, Carolina Scarton 1 1 University of São Paulo, Brazil.
Help and Documentation zUser support issues ydifferent types of support at different times yimplementation and presentation both important yall need careful.
Corpus Linguistics and Second Language Acquisition – The use of ACORN in the teaching of Spanish Grammar Guadalupe Ruiz Yepes.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Help and Documentation CSCI324, IACT403, IACT 931, MCS9324 Human Computer Interfaces.
Measuring Linguistic Complexity Kristopher Kyle
Lecture 1 Introduction: Linguistic Theory and Theories
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Using Rhetorical Grammar in the English 90 Classroom.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Automated Essay Evaluation Martin Angert Rachel Drossman.
Key stage 3 English Writing Presentation 1: Overview and implications for teaching and learning Analysis of pupil performance 2004.
Coh-Metrix: Analysis of text on cohesion and language
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Discourse Topics, Linguistics, and Language Teaching Richard Watson Todd King Mongkut’s University of Technology Thonburi arts.kmutt.ac.th/crs/research/
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Requirement Review & Verification CS 330 – Fall 2002 Rania Elnaggar.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
SOCIO-COGNITIVE APPROACHES TO TESTING AND ASSESSMENT
Overview of Text Complexity Text complexity is defined by: Qualitative 2.Qualitative measures – levels of meaning, structure, language conventionality.
Chapter 6. Semantics is the study of the meaning of words, phrases and sentences. In semantic analysis, there is always an attempt to focus on what the.
CCR Anchor Standard 10 – Read and comprehend complex literary and informational texts independently and proficiently 10/18/2015MSDE2.
Summary-Response Essay Responding to Reading. Reading Critically Not about finding fault with author Rather engaging author in a discussion by asking.
Business Communication Workshop Course Coordinator:Ayyaz Qadeer Lecture # 9.
SAC 1 Informal Discourse Comparative Analysis. Analytical Commentary SAC 1: Analytical Commentary What is it? Linguistic analysis. Articulate your understanding.
10/7/14 Do Now: Take one of each of the handouts from the front and read the directions on the top of the page. Homework: - Finish reading chapters 9 &
Quantitative Analysis Flesch-Kincaid Grade Equivalent Flesch-Kincaid Grade Equivalent Lexile Measures: Sentence Length Number of Words in Text Average.
Classroom Research Workshop at Darunsikkhalai, 2 November 2012 Richard Watson Todd King Mongkut’s University of Technology Thonburi
Introduction to Scientific Research. Science Vs. Belief Belief is knowing something without needing evidence. Eg. The Jewish, Islamic and Christian belief.
Intermediate 2 Computing Unit 2 - Software Development.
WHAT MAKES A TEXT A SLOW OR FAST READ? SPEED BUMPS AND PASSING LANES LAURA S. TORTORELLI MICHIGAN STATE UNIVERSITY LITERACY RESEARCH ASSOCIATION CONFERENCE.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
© 2011 Pearson Education, Inc., publishing as Longman Publishers. 1 Chapter 11 Editing for a Professional Style and Tone Technical Communication, 12 th.
Differences between Spoken and Written Discourse
Abstracting.  An abstract is a concise and accurate representation of the contents of a document, in a style similar to that of the original document.
NATURAL LANGUAGE PROCESSING
Welcome Opening Prayer. Content Objectives: 1.I will review the definition of texts and the teacher’s responsibility in choosing classroom materials.
Depth of Knowledge: Elementary ELA Smarter Balanced Professional Development for Washington High-need Schools University of Washington Tacoma Belinda Louie,
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
Discourse analysis May 2012 Carina Jahani
2. The standards of textuality: cohesion Traditional approach to the study of lannguage: sentence as conventional object of study Structuralism (Bloofield,
Differences between Spoken and Written Discourse Source: Paltridge, p.p
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Running Records Feedback… What is a running record?
Chapter 5 The Oral Approach.
Using Parallel Corpora for Contrastive Studies Michael Barlow.
Automatic Writing Evaluation
Specialized texts How do we identify them?
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Criterial features If you have examples of language use by learners (differentiated by L1 etc.) at different levels, you can use that to find the criterial.
Digging In Deep: Literacy in Career and Technical Education Courses
Anik Wulyani, PhD candidate
Dr. A .K. Bhattacharyya Professor EEI(NE Region), AAU, Jorhat
Thinking & Language.
HCC class lecture 13 comments
Chapter 11 user support.
Applied Linguistics Chapter Four: Corpus Linguistics
Presentation transcript:

Blueprints or conduits? Using an automated tool for text analysis Stuart G. Towns and Richard Watson Todd King Mongkut’s University of Technology Thonburi

Perspectives on Analyzing Discourse ❖ Conduit Model: the text (the words) of the discourse is a conduit (container) that implicitly holds all of the meaning of the text to be sent to the user ❖ Blueprint Model: the text is merely a blueprint that guides the reader in the active construction of the meaning of the text (Reddy, 1979)

Conduit/Blueprint Metaphor AB CD (Reddy, 1979)

Automated Text Analysis Tools ❖ Examples: AntConc, WordSmith Tools, Coh-Metrix ❖ Automated Text Analysis tools can be very useful ❖ Fast speed, reliable results, lots of data ❖ But they also have their drawbacks ❖ Mostly a conduit model approach

Example Tool: Coh-Metrix Original purposes of Coh-Metrix: ❖ Update the idea behind “readability” to include newer theories on text and discourse ❖ First automated tool to measure text cohesion ❖ Offer many automated analyses using several different knowledge sources (Blueprint Model) (McNamara & Graesser, 2010)

Example Coh-Metrix Indicies Coh-Metrix Output Category Example Coh-Matrix Indices for each category DescriptiveWord, sentence, and paragraph count and length Text Easability Principal Component Scores Text Easability PC Syntactic simplicity, Text Easability PC Deep cohesion Referential CohesionNoun overlap, argument overlap, anaphor overlap Latent Semantic AnalysisLSA overlap in adjacent sentences, LSA given/new Lexical DiversityType-token ratio for content words and for all words ConnectivesCausal, logical, temporal, and additive connectives Situation ModelCausal and intentional verbs, WordNet overlap Syntactic ComplexityLeft-embeddedness, number of modifiers per noun phrase Syntactic Pattern DensityNoun, verb, adverbial, prepositional phrase densities Word InformationAge of acquisition, familiarity, concreteness, polysemy, hypernymy ReadabilityFlesch Reading Ease, Flesch-Kincaid Reading Level

Coh-Metrix: Conduit or Blueprint? Counting Word Frequenc y and Type/Tok en Ratio Counting Parts of Speech (uses POS Tagger) Counting Words related to a mental model (e.g., spatial and temporal words) Computing psycho-linguistic features of words (e.g., imageability and concreteness) (Coh-Metrix can not consider the socio-cultural aspects of the text) ConduitBlueprint

Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics Methodology for Using Automated Text Analysis Tools

Methodology using Coh-Metrix: Step 1 Linguistic Features that increase cognitive load on the reader: Longer length words Passive voice Logic words Dissimilar sentence structures (syntax) The number of words in a sentence before the main verb (Graesser & McNamara, 2011) Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Methodology using Coh-Metrix: Step 2 Coh-Metrix Available online Can analyze up to 15,000 characters 108 quantitative algorithms from simple (counting words) to complex (latent semantic analysis) Some Algorithms use outside sources (e.g., WordNet, POS taggers, psycholinguistic DBs) Output is 108 numbers Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Methodology using Coh-Metrix: Step 3 What do the 108 numbers actually mean? There must be a qualitative step where the researcher analyzes the results using human intuition. (Baker, 2006; Biber, 1998) Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Methodology: Potential Issues: Step 1 Construct Validity Researchers must choose the correct analysis tool based on the research questions Example: Readability indices use pure conduit model of analysis. But we now know that readability should be viewed using the Blueprint model. Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Methodology: Potential Issues: Step 2 Algorithms are a “black box” 1.Lack of Clarity (might do something unexpected) 2.Incorrect Results (bugs) 3.Technical Limitations (Language Complexity) Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Step 2 Issues: Lack of Clarity ❖ Conduit: What is a word? Coh-Metrix documentation doesn't say, but we determined that: ❖ “Marvel’s" is two words ❖ “comic-book-movie“ is one word ❖ Blueprint: Coh-Metrix uses outside sources (e.g. WordNet), but the researcher doesn't see how it is used or how this might affect the results.

Step 2 Issues: Incorrect Results “There were ripples of anticipation — and some anxiety — when Marvel Enterprises announced that Joss Whedon would direct “Marvel’s The Avengers,” the comic-book-movie to end all comic-book-movies, featuring Iron Man, Captain America, the Hulk, Black Widow, Hawkeye and (did I miss anyone?), oh yes, Thor.” (Hornaday, 2013). How many words? How many sentences? Coh-Metrix: 4 sentences with an average of words “Sentence” is an important variable in many Coh-Metrix indices, which brings their validity into question.

Step 2 Issues: Technological Limitations ❖ As an automated text analysis moves more towards a Blueprint Model, the results will be less accurate. ❖ Example: Anaphor Overlap - matching pronouns to the correct noun in the sentence before. ❖ These types of issues may not be obvious to the researcher

Methodology: Potential Issues Step 3 What do the 108 numbers actually mean? A purely quantitative analysis does not provide much insight into the socio-cultural context in which the text was written Cognitive or Social Construct is construed by Linguistic Characteristic Evidenced by a Linguistic Feature is processed (identified, counted, computed) by Automated Algorithm is interpreted (validated, checked) by Human Intuition/Heuristics

Summary ❖ Conduit-Blueprint Model ❖ Coh-Metrix is mostly Conduit, but starts to move towards Blueprint with the use of outside sources ❖ 3-step methodology for automated text analysis ❖ Many issues that a researcher should be aware of

Thank You!