Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Slides:



Advertisements
Similar presentations
An Adjusted Matching Market: Adding a Cost to Proposing Joschka Tryba Brian Cross Stephen Hebson.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Social Learning Theory
Boosting Approach to ML
School Store Operations Chapter 1
Nir Piterman Department of Computer Science TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA Bypassing Complexity.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Comparison between Orkut and facebook
Chapter 7 – Classification and Regression Trees
Face detection Many slides adapted from P. Viola.
Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.
Economics Today Chapter 1 The Nature of Economics Roger LeRoy Miller
Learning Cooperative Games Maria-Florina Balcan, Ariel D. Procaccia and Yair Zick (to appear in IJCAI 2015)
Object-Oriented Metrics. Characteristics of OO ● Localization ● Encapsulation ● Information hiding ● Inheritence ● Object abstraction.
Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.
Sparse vs. Ensemble Approaches to Supervised Learning
Ch. 18: Economic Inequality
© 2014 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Slide 1.1 Boddy et al., Managing Information Systems, 3 rd Edition, © Pearson Education Limited 2009 MIS – Boddy et al. Ch1. Information systems and organisations.
Accountability through Information Flow Experiments Michael Carl Tschantz UC Berkeley Amit Datta, CMU Anupam Datta, CMU Jeannette M. Wing, MSR
Chapter 1 marketing is all around us Section 1.1
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Sex, Gender, and Gender Role Socialization Chapter 3.
Economics 175 American Economic History Contact Information Phone: Office: JH 3125 Office.
1 Towards a Generic Bidding Standard for Online Advertising Sihem Amer-Yahia Sebastien Lahaie David Pennock Yahoo! Research.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Marketing Is All Around Us
Chapter 11 LEARNING FROM DATA. Chapter 11: Learning From Data Outline  The “Learning” Concept  Data Visualization  Neural Networks The Basics Supervised.
Fundamentals of Marketing Chapter 1, Section 3. 10/9/2015Page 2 Critical Thinking… Take 2-3 minutes to reflect on one recent marketing trend you have.
Chapter 9 – Classification and Regression Trees
1 CO Games Development 2 Week 19 Probability Trees + Decision Trees (Learning Trees) Gareth Bellaby.
Project Objectives This project idea has been selected because it can be hard for programmers fresh out of university or the inexperienced programmer to.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Scaling up Decision Trees. Decision tree learning.
Chapter 8 Object Design Reuse and Patterns. Object Design Object design is the process of adding details to the requirements analysis and making implementation.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
© 2013 South-Western, a part of Cengage Learning. All rights reserved. Chapter 11 | Slide 1 Chapter 11: Building Customer Relationships Through Effective.
Recruit, Train, and Educate Airmen to Deliver Airpower for America How Focus Groups Can Help Your Unit 1.
Course Website: Digital Image Processing Image Enhancement (Spatial Filtering 1)
CSE 473 Ensemble Learning. © CSE AI Faculty 2 Ensemble Learning Sometimes each learning technique yields a different hypothesis (or function) But no perfect.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
14-1 The Nation’s Sick Economy. Economic Troubles on the Horizon How did diminished demand affect farmers and businesses in the 1920s? How did falling.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Developing metrics and predictive algorithms for your institution– Marist Story JISC LEARNING ANALYTICS NETWORK EVENT SANDEEP JAYAPRAKASH.
Mr. Samir Taghiyev YES-Country Network Coordinator -Advocating for Young People in Azerbaijan- “Azerbaijan Youth Employment Coalition” ProjectOctober-2004.
Mr. Jason The Social Sciences Research Methods.
Chapter 1 MARKETING IS ALL AROUND US. The Scope of Marketing Marketing is activity, set of institutions, and processes for creating, communicating, delivering,
Roger LeRoy Miller Economics Today Chapter 1 The Nature of Economics.
VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,
Algorithmic Transparency & Quantitative Influence
Automated Experiments on Ad Privacy Settings
CO Games Development 2 Week 22 Trees
What is Activity Profiling?
AI Powered ADS A STEP BY STEP GUIDE TO EXTREME PERSONALIZATION
Psychology 209 – Winter 2017 March 9, 2017
18734: Foundations of Privacy Information Flow Experiments
Algorithmic Transparency with Quantitative Input Influence
Title: Validating a theoretical framework for describing computer programming processes 29 November 2017.
White Label CRM for Your Business
k-center Clustering under Perturbation Resilience
Presentation to the ACARA Digital Technologies National Working Group
Social Media Management
Thinking Critically, Challenging Cultural Myths
Introduction to Producing Data
Presentation transcript:

Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)

Big Data Analysis and Transparency Big data is big business. It is “good”: able to identify trends, produce accurate results, impartial (algorithms are not inherently discriminatory). It is not transparent! As a user (or even as a data scientist!) it is hard to tell what factors determine classification outcomes.

Motivation We are given classified dataset (flagged clients in a bank). Classifier is unknown. What is the importance of a given feature to the classification outcome? (F,25-35,English,PA) (M,18-25,English,CO) (F,35-55,Spanish,NY) (M,25-35,English,PA) (F,18-25,Spanish,PA) (M,18-25,Spanish,PA) (M,25-35,Spanish,PA) (F,35-55,Spanish,PA) (M,18-25,Spanish,PA)

Methodology Feature selection: learn a classifier, see what features add the most information. ▫Are we choosing the right classifier to learn? Can be very complex. ▫Some classifiers have no intuitive notion of feature importance (e.g. decision trees). ▫Requires a lot of knowledge about the dataset (what happens when features are removed).

Methodology

Notation

Ideas from Game Theory

Causality

Axiomatic Approach A measure is state symmetric if relabeling of states does not change its value. A measure is feature symmetric if relabeling of features does not change their value.

Axiomatic Approach Bad news… Standard notions will not immediately work.

Axiomatic Approach

Relation to Linear Classifiers High weight translates to high influence!

Extensions

Implementation To test our measure’s behavior, we measure influence on a generated dataset. We employ the AdFisher framework [Datta et al. 2014] to create fake Google user profiles and observe the ads that they are presented.

Implementation

Top Ads for Age Title/Ad DescriptionInfluence Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash ”My Insomnia Trick”/Naturally Fall Asleep Fast, Stay Asleep All Night – Wake Up Refreshed Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth Sciatica Exercises?/Stop: What You MUST know Before attempting to Treat your Sciatica: StatisticValue Mean Median0.031 StdDev0.0144

Top Ads for Gender Title/Ad DescriptionInfluence Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All Tech Gadgets/Daily Deals on Modern Gadgets. Exclusive Pricing - Up To 70% Off Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo StatisticValue Mean Median StdDev0.0161

Top Ads for Language Title/Ad DescriptionInfluence Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo The Greatest Penny Stocks/Get free daily penny stock alerts. Join now. New pick out soon Business Leads CRM/Business Lead Manager, Dialer, CRM. 400% Boost in Conversion Rates Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All StatisticValue Mean0.033 Median StdDev0.024

Findings Overall influence of specific features over ads is somewhat limited (except for language). Ads seem to be targeted at specific subsets (e.g. young men and elderly women). Further (more refined) measurements on larger dataset needed.

Future Work Beyond single state changes (what is the minimal number of changes to others’ states that we need in order to affect a change in value?); necessary if we want to use our measure in datasets where we cannot control the features. What happens when there are priors on data? White box vs. Black box analysis. Thank you! Questions?