Cultural Factors in the Regression of Non-verbal Communication Perception Tim Sheerman-Chase, Eng-Jon Ong and Richard Bowden CVSSP, University of Surrey, UK Workshop on Human Interaction in Computer Vision ICCV 2011, 12 November 2011, Barcelona
Introduction State of NVC and data annotation TwoTalk Corpus Data annotation using crowd sourcing Automatic Regression of NVC Feature Extraction Testing and Performance Future work
Background Non-verbal communication in HCI Useful for novel interfaces Most emotion/NVC datasets are acted Difficulty in processing naturalistic data Annotation is time consuming and tedious A single or limited number of annotators Single cultural view point
Background Difference between acted and posed Cultural differences in expressing and perceiving NVC/emotions Diagram for cultural encoding and decoding rules?
TwoTalk Corpus Aim: minimum constraints on natural conversation Two PAL cameras, two participants Seated opposite across table 4 conversations, 12 minutes each Selected 527 files, 37 min total
Questionnaire Categorical vs. continuous, Exemplars vs. abstraction How it is presented, cultural impact Commonly seen NVC signals Agreeing, thinking, questioning, understanding
Crowdsourcing Suitable for large tasks that can be split into simple steps Web based, usually browser based Motivation by money/altruism/challenge Quality control Crowdflower, Mechanical Turk (Amazon), Samasource, different demographics
Annotation Results 711 annotators, 79130 questions answered Annotations are sparse Three main cultures identified by IP address India, Kenya and UK Many annotators did not cooperate Random results need to be removed
Annotation Quality Uncooperative workers Erroneous work may be rejected Prevention No way to pre-screen workers Workers are almost anonymous (apart from IP address and timing) Sanity questions Filtering results, during work or after
Annotation Filtering Cooperative annotators Pearson's correlation with some ideal standard Correlation: 1 (or -1) perfect correlation, 0 uncorrelated Use mode in culture to find robust consensus Remove annotators below 0.2 correlation Take mean of remaining annotators 𝜌 𝑋,𝑌 = 𝑐𝑜𝑣 𝑋,𝑌 𝜎 𝑋 𝜎 𝑌 Covariance of X w.r.t. Y Population variance
Annotation Filtering
Frequency of Correlation Correlation with own culture mode Discard annotators with correlation < 0.2
Cultural Patterns in Annotation Check for cultural differences in annotation For each culture, For each clip, Concatenate culture consensus into one 4D vector Flatten space into 2D using Sammon mapping Attempts to preserve distances
Cultural Patterns in Annotation Cultures occupy different areas of space, caused either by differences in perceiving NVC or differences in using questionnaire.
Compare Annotators to Consensus Compare filtered annotators with their culture mean consensus Better to use specialised culture model rather than ignoring culture (global mean) Correlation of Annotators with Mean Consensus
Overview of System Track Facial Features LP flock trackers (Ong et al. 2009) Extract features Distances between pairs of trackers Train regressor, ν-SVR (Schölkopf et al. 2009) 8 fold person independent testing
Position of Trackers 46 position
Feature Extraction Euclidean distance between pairs 1035 pairs between 46 trackers Features are whitened and centred Removes face shape information
System Overview
Results Correlation performances are relatively low Extreme difficulty of task Low inter-annotator agreement Questioning is lowest, verbal component
Results Training and testing on same culture is optimal Performance is worse if test data is different to training data
Results Typical results for a single NVC category Thinking, UK annotation Correlation 0.46
Results
Conclusions Crowd sourcing annotation data is effective if quality problems are managed Naturalistic NVC Regression is possible But challanging Specialising regressor for cultural annotations is better then ignoring culture
Future Work Using mean and variance of clip discards information Temporal Some frames are more important that others Multiple Instance Learning Record participants from multiple cultures then do multi-annotation, social factors Applications
Summary NVC and data annotation TwoTalk Corpus Data annotation using crowd sourcing Automatic Regression of NVC Feature Extraction Testing and Performance Future work