Download presentation
Presentation is loading. Please wait.
Published byChrystal Cook Modified over 9 years ago
1
Influence in Classification via Cooperative Game Theory Amit Datta, Anupam Datta, Ariel D. Procaccia and Yair Zick (to appear in IJCAI’15)
2
Big Data Analysis and Transparency Big data is big business. It is “good”: able to identify trends, produce accurate results, impartial (algorithms are not inherently discriminatory). It is not transparent! As a user (or even as a data scientist!) it is hard to tell what factors determine classification outcomes.
3
Motivation We are given classified dataset (flagged clients in a bank). Classifier is unknown. What is the importance of a given feature to the classification outcome? (F,25-35,English,PA) (M,18-25,English,CO) (F,35-55,Spanish,NY) (M,25-35,English,PA) (F,18-25,Spanish,PA) (M,18-25,Spanish,PA) (M,25-35,Spanish,PA) (F,35-55,Spanish,PA) (M,18-25,Spanish,PA)
4
Methodology Feature selection: learn a classifier, see what features add the most information. ▫Are we choosing the right classifier to learn? Can be very complex. ▫Some classifiers have no intuitive notion of feature importance (e.g. decision trees). ▫Requires a lot of knowledge about the dataset (what happens when features are removed).
5
Methodology
6
Notation
8
Ideas from Game Theory
9
Causality
10
Axiomatic Approach A measure is state symmetric if relabeling of states does not change its value. A measure is feature symmetric if relabeling of features does not change their value.
11
Axiomatic Approach Bad news… Standard notions will not immediately work.
12
Axiomatic Approach
15
Relation to Linear Classifiers High weight translates to high influence!
16
Extensions
17
Implementation To test our measure’s behavior, we measure influence on a generated dataset. We employ the AdFisher framework [Datta et al. 2014] to create fake Google user profiles and observe the ads that they are presented.
18
Implementation
19
Top Ads for Age Title/Ad DescriptionInfluence Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All. 0.07 Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash. 0.0663 ”My Insomnia Trick”/Naturally Fall Asleep Fast, Stay Asleep All Night – Wake Up Refreshed 0.0661 Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth 0.0611 Sciatica Exercises?/Stop: What You MUST know Before attempting to Treat your Sciatica: 0.0606 StatisticValue Mean0.0318 Median0.031 StdDev0.0144
20
Top Ads for Gender Title/Ad DescriptionInfluence Jim Rickards Project 2015/Economist, Jim Rickards explains the coming economic crash. 0.07 Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All. 0.0583 Tech Gadgets/Daily Deals on Modern Gadgets. Exclusive Pricing - Up To 70% Off. 0.0564 Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth 0.0561 Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo 0.0534 StatisticValue Mean0.0324 Median0.0299 StdDev0.0161
21
Top Ads for Language Title/Ad DescriptionInfluence Elabore su Presupuesto/Nuestros Consejeros Certificados Est´an listos para ayudarlo 0.1667 The Greatest Penny Stocks/Get free daily penny stock alerts. Join now. New pick out soon. 0.0755 Business Leads CRM/Business Lead Manager, Dialer, CRM. 400% Boost in Conversion Rates. 0.0683 Get In Now With Graphene/Money-Making Mineral Set To Launch Can Shape The World And Your Wealth 0.0644 Buy Home For Taxes Owed/Or Get 18-36% Interest! Watch 8min Video That Explains All. 0.06 StatisticValue Mean0.033 Median0.0291 StdDev0.024
22
Findings Overall influence of specific features over ads is somewhat limited (except for language). Ads seem to be targeted at specific subsets (e.g. young men and elderly women). Further (more refined) measurements on larger dataset needed.
23
Future Work Beyond single state changes (what is the minimal number of changes to others’ states that we need in order to affect a change in value?); necessary if we want to use our measure in datasets where we cannot control the features. What happens when there are priors on data? White box vs. Black box analysis. Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.