PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.

Slides:



Advertisements
Similar presentations
COMP3410 DB32: Technologies for Knowledge Management 08 : Introduction to Knowledge Discovery By Eric Atwell, School of Computing, University of Leeds.
Advertisements

Comp3776: Data Mining and Text Analytics Intro to Data Mining By Eric Atwell, School of Computing, University of Leeds (including re-use of teaching resources.
An Introduction to Boosting Yoav Freund Banter Inc.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
What is Statistical Modeling
Multiple Instance Learning
A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham University Presented by Gary Weiss.
Induction of Decision Trees
Decision Trees Chapter 18 From Data to Knowledge.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Analyzing Chat Dialogue with Taghelper Tools Catherine Chase Stanford University PSLC Summer Institute June 22, 2007.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Bayesian Networks. Male brain wiring Female brain wiring.
Nima Hazar Amin Dehesh.  Several induction algorithm have been developed that vary in the methods employed to build the decision tree or set of rules.
Design Science Method By Temtim Assefa.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Appendix: The WEKA Data Mining Software
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Bug Localization with Machine Learning Techniques Wujie Zheng
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Universit at Dortmund, LS VIII
Prediction of Influencers from Word Use Chan Shing Hei.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
1 Computation Approaches to Emotional Speech Julia Hirschberg
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Machine Learning in Practice Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Lecture 1: Introduction and the Boolean Model Information Retrieval
Introduction to Machine Learning and Text Mining
New Machine Learning in Medical Imaging Journal Club
What is Pattern Recognition?
Prepared by: Mahmoud Rafeek Al-Farra
Classification & Prediction
Machine Learning in Practice Lecture 6
NAÏVE BAYES CLASSIFICATION
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC

Questions? "There was a significant negative correlation between the first and third metrics used to compute a Power score for the partner's conversational contributions and this question's numeric value, and a marginal negative correlation in the case of the second metric.“  What are we supposed to interpret/learn from this statement?

Rose at el. Automatic Analysis  By Machine Learning

What is machine learning? Machine learning is about automatically finding meaningful patterns in data Example for medical data: Rule predicts who is more likely to have problems with their teeth as they get older.

Why Machine Learning? We use machine-learning products every day  Weather forecast  Spelling checker  Automated voice response system  … It has been successfully applied to many research areas  Natural language processing  Market analysis  Bioinformatics  … Note: Search engines use machine learning to personalize search results and suggest related sites or queries.

How does machine learning work? The simplest rule learner will learn to predict whatever is the most frequent result class. This is called the majority Class. What will the rule be in this case? It will always predict yes. A slightly more sophisticated rule learner will find the feature that gives the most information about the result class. What do you think that would be in this case? Outlook: Sunny -> No Overcast -> Yes Rainy-> Yes : -> …

What is machine learning? Automatically or semi-automatically  Inducing concepts (i.e., rules) from data  Finding patterns in data  Explaining data  Making predictions Data Learning Algorithm Model New Data Prediction Classification Engine

What will be the prediction? Outlook: Sunny -> No Overcast -> Yes Rainy-> Yes Model New Data Yes

Terminology Concept: the rule you want to learn Instance: one data point from your training or testing data (row in table) Attribute: one of the features that an instance is composed of (column in table) * Compute the predicted value. bad

Task Assign labels to a collaborative learning corpus using the Weinberger and Fischer’s coding scheme  Text classification task

Two approach categories The Feature Based Approach  Basic feature  Thread Structure feature – depth  FSM features  LIWC features (Linguistic Inquiry Word Count) The Algorithm Based Approach  Cascaded Binary Classification  Confidence Restricted Cascaded Binary Classification  Supervised/Unsupervised

Methodology issues related to automatic corpus analysis Validity  Whether the automatic coding accomplished by the computer captures human analysts’ intention Reliability  How faithfully the automatic codes match those of human experts’ Efficiency  Are we saving time by using the automatic classifier?

Ai et al. Manual Analysis  By statistical tests Interaction between tutor’s social behaviors and tutor’s bias towards one student’s stance or the other Topic modeling: sub-dialog level instead of utterance level

Study design 3*3  Social: how, low, none  Goal Match: yes, no, neutral

The ccLDA model Green Collection Power Collection Topic 1 Topic 2 Topic 3 Topic 1Topic 2 Topic 3 A topic is a distribution of words. We computed a score for each utterance based on the words contained in the utterance. This score stands for to what extend this utterance is biased towards this topic.

Results Learning Gain  Student learned most in Low-social + Yes- matched Perception (Questionnaire & Topic detection)  Effect on Goal-match manipulation Conversational Data  More social turns in social conditions  More off-task turns in non-social condition  More Jokes on tutor in the high social condition

Important message Understand the data is important Design good features  Both in automatic and manual analysis