What Sociolinguistics and Machine Learning Have to Say to One Another about Interaction Analysis Carolyn Penstein Rosé Language Technologies Institute.

Slides:



Advertisements
Similar presentations
Writing constructed response items
Advertisements

Performance Assessment
Critical Reading Strategies: Overview of Research Process
CVs & Telephone Skills Top Tips to remember …
Discussion Discussion # 86 Moving from Criticism to Feedback
Chapter 7 Hypothesis Testing
Writing an Effective Essay
Lesson 4: Gather Evidence & Handle It Correctly. Gather all the relevant Scriptural evidence on any Biblical subject. – There is a difference between.
Register Laulima Workshop for Instructors Solutions to help you engage your students through Laulima.
Foundations of Team Leadership 1 Left Hand Column.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1 Florida 4-H Leadership Series Communications The activities in this lesson are taken from Unlock Your Leadership Potential, Leader’s Guide, Florida 4-H.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 4 How to Observe Children
Carolyn Penstein Rosé Language Technologies Institute and Human-Computer Interaction Institute With funding from the National Science Foundation and the.
Fifth Workshop on Link Analysis, Counterterrorism, and Security. or Antonio Badia David Skillicorn.
Application Activity: Content Analysis The purpose of this PowerPoint presentation is to offer a more detailed assignment description than I offer in class.
Equality and Diversity: Resource for Level 4 students
Unit 1 Task 4 Barriers To Communication Jackson Coltman.
Test Taking Tips How to help yourself with multiple choice and short answer questions for reading selections A. Caldwell.
What is the value of audience to technical communicators? A Survey of Audience Research.
How to Write a Literature Review
Copyright © 2002 Thomson Learning, Inc. Chapter 5: Language: Barrier and Bridge PowerPoint Presentation to accompany Looking Out, Looking In, Tenth Edition.
Healthy Relationships
Methods of Media Research Communication covers a broad range of topics. Also it draws heavily from other fields like sociology, psychology, anthropology,
How to write better text responses A Step by Step Guide.
INFO3315 Week 4 Personas, Tasks Guidelines, Heuristic Evaluation.
…a mode. They include three basic parts: – 1. the term – 2. it’s class – 3. its distinguishing characteristics Example: Behaviorism (the term) is a theory.
Developing a Thesis Based Response Area of Study: Belonging – Section 3.
Sociological Research Methods Sociology: Chapter 2, Section 1.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Chapter 20 Testing hypotheses about proportions
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
How to Satisfy Reviewer B and Other Thoughts on the Publication Process: Reviewers’ Perspectives Don Roy Past Editor, Marketing Management Journal.
RPDP Secondary Literacy     Southern Nevada Regional Professional Development Program RPDP.net.
ORAL COMMUNICATION SKILLS Discussion skills and Presentation skills The course is designed to improve students’ speaking skills in English by: activating.
Introduction defining communication. communication let’s draw our map.
previous next 12/1/2015 There’s only one kind of question on a reading test, right? Book Style Questions Brain Style Questions Definition Types of Questions.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
 An article review is written for an audience who is knowledgeable in the subject matter instead of a general audience  When writing an article review,
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
4. Marketing research After carefully studying this chapter, you should be able to: Define marketing research; Identify and explain the major forms of.
: the art or skill of speaking or writing formally and effectively especially as a way to persuade or influence people.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Conducting an Interview Module 7 Level 1 Understanding Effective Communication.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Applied Opinion Research Training Workshop Day 3.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
WHAT SKILLS AND UNDERSTANDINGS DO I NEED TO DEMONSTRATE? HOW CAN I MAKE SURE I HAVE PRODUCED A HIGH QUALITY RESPONSE? (OR TWO!) Literature : Close Passage.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Analyzing & evaluating qualitative data Kim McDonough Northern Arizona University.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
Discourse Analysis Week 10 Riggenbach (1999) Chapter 1 - Quotes.
- Complete “Just Walk on By” text analysis response.
Interpersonal Communication NON-VERBAL COMMUNICATION by Jay Barrett What do you know about me through my non- verbal communication in class?
Point of View. Using Language to Persuade Being able to present a sustained and reasoned point of view on an issue is an important life- skill as it develops.
Project CARRE Creating a Responsive and Responsible Enviroment Faculty Training 2014 S. Craig Mourton, Assistant Provost.
A Master of Science Thesis by Lynn Schnoor March 21, 2012.
Consciousness & Causality Revision Lecture. Questions (open or closed?) Is there good evidence for learning while sleeping? Describe and discuss dualist.
District 4 Area Workshops 2016 Conflict Resolution or I say tomato you say…
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluation of Sources and Conclusion IB History. Evaluation of Sources ► This section of the paper should be a critical evaluation of two important sources.
What do you ALREADY know…? In groups, discuss what you ALREADY KNOW about the Church’s teaching on the Eucharist. How does this belief DIFFER from that.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
K-3 Student Reflection and Self-Assessment
Machine Learning in Practice Lecture 11
Presentation transcript:

What Sociolinguistics and Machine Learning Have to Say to One Another about Interaction Analysis Carolyn Penstein Rosé Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University

No easy answers….

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Social Media Analysis Personalization Sentiment Analysis/Opinion Mining Sarcasm detection Bias detection Lie detection Analysis of Bullying Analysis of social support

Impression Management Whereas some information is given intentionally (i.e., communicated by the speaker), other information is given off (i.e., expressed) unintentionally (Goffman, 1979) What details about a persons communication give us an impression?

Typical paradigm for sentiment analysis of product reviews: o Make a prediction based on text of single reviews taken out of context Some evidence of group effects in product review blogs based on numerical ratings (Wu et al., 2008) Typical Social Media Analysis Approach: Non-Conversational KEY ASSUMPTION: language is a reflection of the speakers perspective … but is it only the speaker?

Are product reviews conversational?

After many MANY weeks of research, gathering information from several sites, reviews etc I decided that the Britax Boulevard was definitely the safest bet available on the market. The things that sold me : All the safety gadgets that other seats don't have like the side impact wings, the HUGS system, the LATCH system and 5 point harness and also the fact that it lasts up to 29Kg. Are product reviews conversational?

I did most of my research on the net, picking my top 3 choices I went and had a look at them in the shops. I looked at one the Graco Comfort Sport, the Britax Boulevard and the Decathlon and Marathon seats. By far it seems that Britax have the upper hand safely wise on the market, many professional reviews and crash tests agree on this so Britax was the clear choice for us. Are product reviews conversational?

Are product reviews conversational?

All Language is Conversational

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Machine Learning Myth

Credo of Applied Machine Learning Machine learning isnt magic But it can be useful for identifying meaningful patterns in your data when used properly Proper use requires insight into your data ?

What information are we throwing away or ignoring that would allow us to distinguish meaningful variation from meaningless variation?

What cant you conclude from bag of words representations? Causality: X caused Y versus Y caused X Roles and Mood: oWhich person ate the food that I prepared this morning and drives the big car in front of my cat oThe person, which prepared food that my cat and I ate this morning, drives in front of the big car. Whos driving, whos eating, and whos preparing food?

Example related to sentiment: The function of frankly… A I tell you frankly youre a swine. B Frankly, youre a swine. C John told Bill frankly that he was a swine. (Levinson, 1983) Same propositional content, but frankly is not functioning the same way in all of these examples. In A and C it modifies the telling event, but in B its a warning that something negative is coming. What does this tell us about using words as evidence in Pragmatic oriented interpretation?

Understand Your Data ?

Are we missing something? Sociolinguists and Discourse Analysts have been studying social aspects of language since the 20s and 30s!!!

Dong Nguyen, Elijah Mayfield, & Carolyn Rosé (2010). An analysis of perspectives in interactive settings, Proceedings of the KDD Workshop on Social Media Analytics Displayed Bias as a Reflection of Both Projected Speaker and Assumed Hearer

Perspective from Rhetoric Projected author: Communication style is a projection of identity Impression management, not necessarily the ground truth Assumed reader: What we assume about who is listening Real assumptions, possibly incorrect What we want recipients or overhearers to think are our assumptions Actual Reader: may or may not understand the text the way it was intended Author Implied Author Implied Reader Text Effect Reader

Bias Estimation Start with LDA model (with 15 topics) of a politics discussion forum dataset Separate texts into two collections, one left affiliated, and one right affiliated We then have a Left model and a Right model Compute a rank for each word w in each topic t in each model Intuition: a word is more distinguishing for a particular point of view if it has a high probability within the associated model and a low probability in the opposite model Bias(w,t) = log(rank right (w,t) + 1) – log(rank left (w,t) + 1) The bias of a text is the average bias over the terms within the text Left scores positive, right scores negative

Qualitative Analysis Terror Language (Right): evokes emotional response to thread of attack. Define target as evil and as a threat. Provokes a defensive posture. Imperialist rhetoric (Right): racial prejudice, attitude of superiority. Web of concern (Left): focus on opposition as individuals with a culture and history, concern for wellbeing of all people, focus on potential negative effects of war

Qualitative Analysis Terror Language (Right): evokes emotional response to thread of attack. Define target as evil and as a threat. Provokes a defensive posture. Imperialist rhetoric (Right): racial prejudice, attitude of superiority. Web of concern (Left): focus on opposition as individuals with a culture and history, concern for wellbeing of all people, focus on potential negative effects of war

Quantitative Analysis Right Bias Left Bias Score of poster Score of quoted message Score of full post Score of words that appear in both messages Score of words that appear only in quoted message Score of words that appear only in the post Quoted Message Post

Investigation of Quoting behavior

Which words are quoted? by pointing out the inflation of Saddams body count by neocons in an effort to further vilify him and thus further justify our invasion we are not DEFENDING saddam....just pointing out how neocons rarely let facts get in the way of a good war. So wait, how many do you think Saddam killed or oppressed? Youre trying to make him look better than he actually was. Youre the one inflating the casualties weve caused! Seriously, what estimates (with a link) are there that weve killed over 100,000 civilians. Not some crack pot geocities page either. Investigation of Quoting behavior

Negative correlation between words only in quoted message and words only in post (r=-0.1, p < 0.05) Positive correlation between score of quoted words and score of the whole post (r=0.18, p < 0.02) Score of words only in post are significantly more reflective of the affiliation of the poster than that of the author of the quoted message o Similar result with score of words only in quote with affiliation of author of quoted message

Overview of Findings Evidence that both projected author and assumed hearer are reflected in our lexical choices: o Quotes from opposite point of view include the words that are less strongly associated with the opposite perspective o Because of quotes, displayed bias shifts towards the bias of the person to whom the message is directed o Personal bias of the speaker is most strongly represented by non-quoted portions of text

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Discourse and Identity Identity is reflected in the way we present ourselves in conversational interactions Reflects who we are, how we think, and where we belong Also reflects how we think of our audience Examples Regional dialect: shows my identification with where I am from, but also shows I am comfortable letting you identify me that way Jargon and technical terms: shows my identification with a work community, but also shows I expect you to be able to relate to that part of my life Level of formality: shows where we stand in relation to one another Explicitness in reference: shows whether I am treating you like an insider or an outsider

Systemic Functional Linguistics Discourse analysis employs the tools of grammarians to identify the roles of wordings in passages of text, and employs the tools of social theorists to explain why they make the meanings they do. (Martin & White, 2005) What do form-function correspondences look like?

Engagement: Social positioning in conversational style The message: Most contributions express some content Projected author: How I phrase it says something about my stance with respect to that content Assumed reader: Also says something about what I assume is your stance and my stance in relation to you Actual Reader: The hearer may respond either to the message or its positioning Author Implied Author Implied Reader Text Effect Reader

The Future of Computing? 35

Heteroglossia (Martin & White, 2005, p117) o System of Engagement Showing openness to the existence of other perspectives Less final / Invites more discussion o Example: [M] Iron Man is a good movie [HE] I consider Iron Man to be a good movie [HC] Theres no denying that Iron Man is a good movie [NA] Is Iron Man a good movie? 36

37 LineText AuthorityHeterog. 1Stark: Give me an exploded view.A2M 2Jarvis: The compression in cylinder three appears to be low. K1HE 3Stark: Log that.A2M 4Stark: I'm gonna try again, right now.A1M 5Stark: Hey, Butterfingers, come here.A2M 6Stark: What's all this stuff doing on top of my desk? K2NA

Jarvis : Test complete. Preparing to power down and begin diagnostics. Stark : Yeah. Tell you what. Do a weather and ATC check. Stark : Start listening in on ground control. Jarvis : Sir, there are still terabytes of calculations needed before an actual flight is... Stark : Jarvis! Sometimes you got to run before you can walk. [HC] Iron Man Film Script, 59:10. Usability Heuristic: Good feedback Usability Heuristic: Avoiding errors??? Towards evaluating the quality of futuristic human-computer interaction paradigms…

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Theory InterpretationResearch Questions PatternsData Methodology

Blogging!

Blog Authorship: Male or Female?

Stretchy Patterns (Gianfortoni, Adamson, & Rosé, 2011) A sequence of 1 to 6 categories May include GAPs o Can cover any symbol o GAP+ may cover any number of symbols Must not begin or end with a GAP

Evaluation of Domain Generality Contrast random CV and leave-one- occupation-out CV All feature space representations show significant drop between random CV and leave-one- occupation-out CV Only stretchy patterns remain significantly above random performance

Evaluation of Learning Efficiency Train and test on sampling across all occupations Always test on the same set Training sets vary by size No significant differences in performance with smallest training set Significant advantage for Stretchy Patterns at all other training set sizes

Does that mean we succeeded in modeling gender?

Theory InterpretationResearch Questions PatternsData Methodology

What did we learn about gender and blogging? Female PatternsMale Patterns ?

Outline Why should we care about Interaction Analysis? Caveats from Applied Machine Learning Sociolinguistic view of Interaction analysis Modeling Sociolinguistics with Machine Learning Remaining Tension: Communities in Conversation

Controversy over the nature of identity Identity is a function of social categories like gender, ethnicity, etc. Identity is highly individual and constructed in the moment Makes sense to study with a quantitative methodology Makes sense to study with a constructivist/ qualiataive methodology Variationist sociolinguistics Interactional Sociolingusitics PositivismConstructivism Methodology reflects our assumptions about the nature of what we are studying. Is a machine learning approach inherently variationist?

Conclusions All language analysis is interaction analysis The fields of Discourse Analysis and Sociolinguistics challenge the assumptions behind our approaches Machine learning is only part of the process of understanding interaction Were left with difficult tensions between competing research paradigms What can we do: Strive to Understand our Data!!!

Interest in Collaboration?

Questions? Carolyn Penstein Rosé,