ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.

Programming Logic and Design Fourth Edition, Introductory

Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.

Data and the Nature of Measurement

Problems with parameterization (example:keeper usage): average duration: 1.27, min: 0.106, max: 6.46 possible outcome for keeper and crane queues? Assignment.

Methodologies for Evaluating Dialog Structure Annotation Ananlada Chotimongkol Presented at Dialogs on Dialogs Reading Group 27 January 2006.

A quick introduction to the analysis of questionnaire data John Richardson.

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Exponents and Polynomials

Meaning of Measurement and Scaling

Chapter 3: Central Tendency

BASIC STATISTICS WE MOST OFTEN USE Student Affairs Assessment Council Portland State University June 2012.

Accuracy Assessment. 2 Because it is not practical to test every pixel in the classification image, a representative sample of reference points in the.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 1-1 Chapter 1 Introduction and Data Collection Basic Business Statistics 11 th Edition.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 1-1 Chapter 1 Introduction and Data Collection Basic Business Statistics.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.

1 Some Metrological Aspects of Ordinal Quality Data Treatment *Emil Bashkansky Tamar Gadrich ORT Braude College of Engineering, Israel  ENBIS-11 Coimbra,

CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.

Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.

Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.

Slide 1 Copyright © 2004 Pearson Education, Inc..

1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.

Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.

13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.

Agenda for This Week Wednesday, April 27 AHP Friday, April 29 AHP Monday, May 2 Exam 2.

Thinking Mathematically

Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.

BY: BILGUUJIN DORJSUREN, KATHERINE WORONOWICZ, KELLY CANALES, NISSI DAVID, AND SYLVIA CHENG ENGLISH 393: TECHNICAL WRITING MAY 10, 2011.

35th Annual National Conference on Large-Scale Assessment June 18, 2005 How to compare NAEP and State Assessment Results NAEP State Analysis Project Don.

Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.

Evaluating Results of Learning Blaž Zupan

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.

WELCOME TO MSA699 TECHNICAL SUPPORT: For more information, go to the tutorial at

RESEARCH & DATA ANALYSIS

Calculating Inter-coder Reliability

Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Chapter 3: Central Tendency 1. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Applied Mathematics 1 Applications of the Multi-Weighted Scoring Model and the Analytical Hierarchy Process for the Appraisal and Evaluation of Suppliers.

COPS Communication Working Group Conference call on 3/8/05 from 1:00 – 2:00 Reviewed scope document and 2005 Goals Reviewed Notification Template.

Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.

The Role of Research and Development Foundations of Technology The Role of Research and Development © 2013 International Technology and Engineering Educators.

Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.

ESTIMATING WEIGHT Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257 RG712.

For more course tutorials visit PSYCH 625 Entire Course PSYCH 625 Week 1 Individual Assignment Basic Concepts in Statistics Worksheet.

Welcome to MSA699 WHAT I DO THE FIRST DAY OF CLASS….

Everything You Wanted to Know- But Were Afraid to Ask

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Your mission, should you choose to accept it, is to provide as

Test Design & Construction

Activity: In five days, it snowed 4 inches, 3 inches, 5 inches, 1 inch, and 2 inches. 4 in 3 in 5 in 1 in.

Chapter 1 Introduction and Data Collection

Dale Ulrich, University of Michigan

15.1 The Role of Statistics in the Research Process

Individual Assignment 6

Week 14 More Data Collection Techniques Chapter 5

Descriptive statistics for groups:

Information Organization: Evaluation of Classification Performance

Presentation transcript:

ACM corpus annotation analysis Andrew Rosenberg 2/26/2004

2 Overview Motivation Corpus Description Kappa Shortcomings Kappa Augmentation Classification of messages Corpus annotation analysis Next step: Sharpening method Summary

3 Motivation The ACM corpus annotation raises two problems. –By allowing annotators to assign a message one or two labels, there is no clear way to calculate an annotation statistic. An augmentation to the kappa statistic is proposed –Interannotator reliability is low (K <.3) Annotator reeducation and/or annotation material redesign are most likely necessary. Available annotated data can be used, hypothetically, to improve category assignment.

4 Corpus Description 312 messages exchanged between the Columbia chapter of the ACM. Annotated by 2 annotators with one or two of the following 10 labels –question, answer, broadcast, attachment transmission, planning, planning scheduling, planning-meeting scheduling, action item, technical discussion, social chat

5 Kappa Shortcomings Before running ML procedures, we need confidence in assigning labels to the messages. In order to compute kappa (below) we need to count up the number of agreements. How do you determine agreement with an optional secondary label? –Ignore the secondary label?

6 Kappa Shortcomings (ctd.) Ignoring the secondary label isn’t acceptable for two reasons. –It is inconsistent with the annotation guidelines. –It ignores partial agreements. {a,ba}- singleton matches secondary {ab,ca}- primary matches secondary {ab,cb}- secondary matches secondary {ab,ba}- secondary matches primary, and vice versa Note: The purpose is not to inflate the kappa value, but to accurately assess the data.

7 Kappa Augmentation When a labeler employs a secondary label, consider it as a single annotation divided between two categories Select a value of p, where 0.5≤p≤1.0, based on how heavily to weight the secondary label –Singleton annotations assigned a score of 1.0 –Primary p –Secondary 1-p

Kappa Augmentation example AB 1a,bb,d 2b,aa,b 3bb 4ca,d 5b,cc Annotator labels Judge Aabcd Total Judge Babcd Total Annotation Matrices with p=0.6

9 Kappa Augmentation example (ctd.) abcd Total Agreement matrix Judge Aabcd Total Judge Babcd Total Annotation Matrices

10 Kappa Augmentation example (ctd.) To calculate p(E), use the relative frequencies of each annotators label usage. P(Topic)Judge AJudge BP(A)*P(B) a b c d p(E)=0.312 Kappa is then computed as originally:

11 Classification of messages This augmentation allows us to classify messages based their individual kappa’ values at different values of p. –Class 1: high kappa’ at all values of p. –Class 2: low kappa’ at all values of p. –Class 3: high kappa’ at p = 1.0 –Class 4: high kappa’ at p = 0.5 Note: mathematically kappa’ needn’t be monotonic w.r.t. p, but with 2 annotators it is.

12 Corpus Annotation Analysis Agreement is low at all values of p –K’(p=1.0) = –K’(p=0.5) = Other views of the data will provide some insight into how to revise the annotation scheme. –Category distribution –Category co-occurrence –Category confusion –Class distribution –Category by class distribution

13 Corpus Annotation Analysis: Category Distribution totalgrdb Question Answer Broadcast Attachment Transmission312 Planning Meeting Scheduling Planning Scheduling27225 Planning Action Item19109 Technical Discussion31229 Social Chat36297

14 Corpus Annotation Analysis: Category Co-occurrence QABA.T.P.M.SP.S.P.A.IT.DS.C Questionx Answerxx Broadcastxxx Attachment Transmissionxxxx Planning Meeting Schedulingxxxxx21000 Planning Schedulingxxxxxx0000 Planningxxxxxxx320 Action Itemxxxxxxxx10 Technical Discussionxxxxxxxxx1 Social Chatxxxxxxxxxx

15 Corpus Annotation Analysis: Category Confusion QABA.T.P.M.S.P.SPA.IT.D.S.C. Question Answerx Broadcastxx Attachment Transmissionxxx Planning Meeting Schedulingxxxx Planning Schedulingxxxxx24110 Planningxxxxxx7550 Action Itemxxxxxxx121 Technical Discussionxxxxxxxx21 Social Chatxxxxxxxxx4

16 Corpus Annotation Analysis: Class Distribution Constant High (Class 1): Constant Low (Class 2): Low to High (Class 3): High to Low (Class 4): Total Messages312

17 Corpus Annotation Analysis: Category by Class Distribution-1/2 Num messages Class : Total Question Answer Broadcast Attachment Transmission00 Planning Meeting Scheduling Planning Scheduling Planning Action Item00 Technical Discussion Social Chat Num messages Class : Total Question Answer Broadcast Attachment Transmission31 Planning Meeting Scheduling Planning Scheduling Planning Action Item Technical Discussion Social Chat Class 1:const. highClass 2:const. low

Corpus Annotation Analysis: Category by Class Distribution-2/2 Num messages Class : Total Question Answer Broadcast Attachment Transmission00 Planning Meeting Scheduling Planning Scheduling Planning Action Item Technical Discussion Social Chat Num messages Class : Total Question Answer Broadcast Attachment Transmission00 Planning Meeting Scheduling Planning Scheduling Planning Action Item Technical Discussion Social Chat Class 3:low to highClass 4:high to low

19 Next step: Sharpening method In determining interannotator agreement with kappa, etc., two available pieces of information are overlooked: –Some annotators are “better” than others –Some messages are “easier to label” than others By limiting the contribution of known poor annotators and difficult messages, we gain confidence in the final category assignment of each message. How do we rank annotators? Messages?

20 Sharpening Method (ctd.) Ranking Annotators –Calculate kappa between each annotator and the rest of the group. –“Better” annotators have a higher agreement with the group Ranking messages –Variance (or -p*log(p)) of label vector summed over annotators. –Messages with high variance are more consistently annotated

21 Sharpening Method (ctd.) How do we use these ranks? –Weight the annotators based on their rank. –Recompute the message matrix with weighted annotator contributions. –Weight the messages based on their rank. –Recompute the kappa values with weighted message contributions. –Repeat these steps until the weights change beneath a threshold.

22 Summary The ACM corpus annotation raises two problems. –By allowing annotators to assign a message one or two labels, there is no clear way to calculate an annotation statistic. An augmentation to the kappa statistic is proposed –Interannotator reliability is low (K <.3) Annotator reeducation and/or annotation material redesign are most likely necessary. Available annotated data can be used, hypothetically, to improve category assignment.