Predictive Semantic Social Media Analysis David A. Ostrowski System Analytics and Environmental Sciences Research and Advanced Engineering Ford Motor Company
Social media Influential Sample of the web –News driven CRM –Real-time –Less biased Unique opportunities for analytics
Opportunities Old Model –Reactionary Damage control Inquiries Confirm positive reaction New Model –Preemptive Focused engagement –Promotions –Events –Media Anticipatory
Social Dimensions Describes affiliations across a network Values / Community Reinforced by relationships Utilize to predict purchase behavior
Relational Learning ‘Birds of a Feather’ Leverage each local network to semantic understanding Relational Learning =>Social dimensions
Framework Overview Relational learning –Strengthen representation –Support knowledge Unsupervised classification –Generation of dimensions Supervised classification –Dimensions => behavior
Framework Overview Local network taxonomy labels Social Dimension RN classification K-means cluster features Supv. classification behaviors features Higher level features
Case Study One 4000 facebook identifiers Associations to two vehicle lines Question: –What can we extract to characterize between these two purchase behaviors
Relational Learning Step Extracted data from FB Consolidated interests Applied the RN algorithm Guided by taxonomy
Preliminary cluster statistics normalized differences between vehicle lines
Extracted social dimensions Applied feature sets to k-means (3-6) Each classification attempt to characterize between vehicle line and a social dimension (value / interest..) All classification to be considered towards behavioral training Also considered community detection –Via maximization of a modularity matrix via leading eigenvectors
Applied Supervised Classification for the Behavior prediction Applied sets through three Machine Learning algorithm Simple Bayes precision.7, recall.69 Weightily Averaged One-dependence Estimators (WAODE) precision.69 recall.70 J48 precision.69 recall.70
Case Study Facebook IDs across four vehicle lines Relational modeling –Similar performance as first case study Social Dimensions generated for k=(3-7) –Not as much separation after k=6 clustering Precision recall (among simple bayes, WAODE, J48).469, , ,.536
Next Steps Institutionalization –Extract / define exactly what our dimensions are explaining in our data sets. Relate to specific association –Values –community
Q/A See me for friends and neighbors discount….
Appendix (software) ‘R’ igraph ‘R’ km module Weka Ruby -Watir