Crowdsourcing Insights with Opinion Space Ken Goldberg, IEOR, School of Information, EECS, UC Berkeley
“We’re moving from an Information Age to an Opinion Age.” - Warren Sack, UCSC
Motivation Goals Engage community Understand community – Solicit input – Understand the distribution of viewpoints – Discover insightful comments Goals of Community Members Understand relationship to other community members Participate, express ideas, and be heard Encounter a diversity of viewpoints
Motivation Classic approach: surveys, polls Drawbacks: limited samples, slow, doesn’t increase engagement Modern approach: online forums, comment lists Drawbacks: data deluge, cyberpolarization, hard to discover insights
Approach: Visualization
Approach: Level the Playing Field
Approach: Wisdom of Crowds
Related Work: Visualization Clockwise, starting from top left: Morningside Analytics, MusicBox, Starry Night
Related Work: Politics Clockwise, starting from top left: EU Profiler, Poligraph, The How Progressive Are You? quiz
Related Work: Opinion Sharing Polling & Opinion Mining – Fishkin, 1991: deliberative polling – Dahlgren, 2005: Internet & the public sphere – Berinsky, 1999: understanding public opinion – Pang & Lee, 2008: sentiment analysis Increasing Participation – Bishop, 2007: theoretical framework – Brandtzaeg & Heim: user study – Ludford et al, 2004: uniqueness & group dissimilarity
Related Work: Info Filtering K. Goldberg et al, 2001: Eigentaste E. Bitton, 2009: spatial model Polikar, 2006: ensemble learning
Opinion Space: Live Demonstration
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
To try it: google: “opinion space” contact us:
Dimensionality Reduction low variance projectionmaximal variance projection
Dimensionality Reduction Principal Component Analysis (PCA) Assumes independence and linearity Minimizes squared error Scalable: compute position of new user in constant time
Canonical Correlation Analysis 2-view PCA Assume: – Each data point has a latent low-dim canonical representation z – Observe two different representations of each data point (e.g. numerical ratings and text) Learn MLEs for low-rank projections A and B Equivalently, pick projection that maximizes correlation between views z z x x y y Graphical model for CCA x = Az + ε y = Bz + ε z = A -1 x = B -1 y
CCA on Opinion Space Each user is a data point – x i = user i’s responses to propositions – y i = vector representation of textual comment Run CCA to find A and B, use A -1 to find 2D representation Position of users reflects rating vector and textual response Ignores ratings that are not correlated with text, and vice versa Given text, can predict ratings (using B) z z x x y y Graphical model for CCA x = Az + ε y = Bz + ε z = A -1 x = B -1 y
Multidimensional Scaling Goal: rearrange objects in low dim space so as to reproduce distances in higher dim Strategy: Rearrange & compare solns, maximizing goodness of fit: Can use any kind of similarity function Pros – Data need not be normal, relationships need not be linear – Tends to yield fewer factors than FA Con: slow, not scalable δ ij i j d ij i j
Kernel-based Nonlinear PCA Intuition: in general, can’t linearly separate n points in d < n dim, but can almost always do so in d ≥ n dim Method: compute covariance matrix after transforming data into higher dim space Kernel trick used to improve complexity If Φ is the identity, Kernel PCA = PCA
Kernel-based Nonlinear PCA Pro: Good for finding clusters with arbitrary shape Cons: Need to choose appropriate kernel (no unique solution); does not preserve distance relationships Input dataKPCA output with Gaussian kernel
Stochastic Neighbor Embedding Converts Euclidean dists to conditional probabilities p j|i = Pr(x i would pick x j as its neighbor | neighbors picked according to their density under a Gaussian centered at x i ) Compute similar prob q j|i in lower dim space Goal: minimize mismatch between p j|i and q j|i : Cons: tends to crowd points in center of map; difficult to optimize
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
Opinion Space: Crowdsourcing Insights Scalability: n Participants, n Viewpoints n 2 Peer to Peer Reviews Viewpoints are k-Dimensional Dim. Reduction: 2D Map of Affinity/Similarity Insight vs. Agreement: Nonlinear Scoring Ken Goldberg, UC Berkeley Alec Ross, U.S. State Dept
Opinion Space Wisdom of Crowds: Insights are Rare Scalable, Self-Organizing, Spatial Interface Visualize Diversity of Viewpoints Incorporate Position into Scoring Metrics Ken Goldberg UC Berkeley