Categorization Ethics: Questions about Truth, Privacy and Big Data Joseph Busch
Categorization overview Classification goals Make sense Clear perception Trust Classification Bias Likes/Dislikes Comfort/Fear Culture Family Education
Statistical Bias Epidemiology Media Machine Learning Is bias inherent? Sampling error Measurement error Epidemiology Selection bias Media Source omission Machine Learning Unsupervised analysis
ProPublica “Breaking the Black Box: How Machines Learn to Be Racist” Jeff Larson, Julia Angwin and Terry Parris Jr. “How Machines Learn to Be Racist.” (October 19, 2016) https://www.propublica.org/article/breaking-the-black-box-how-machines-learn-to-be-racist?word=Trump
Inherent bias BIAS
How does automated categorization work? CHAIR CAT CHAIR CAT CHAIR CAT CHAIR CAT OR CHAIR CAT AND CHAIR CAT NOT CHAIR CAT NOT CHAIR CAT CHAIR NOT CAT NOT CHAIR
Natural language processing enables automated categorization Feature extraction Tokenization Weighting Output ID nouns & noun phrase Count occ & co-occ’s Weight tf counts by dl Tag docs in coll Query Deliver IR services Text Collection Text Collection Text Collection Text Collection NLP Auto Cat
GDPR Article 5 Article 5 provides important restrictions on commercial uses of personally identifying information (PII) – even aggregated personal information, that has not been explicitly collected for a particular and personally approved purpose. Restricts the nature of collections used for machine learning by excluding anything that might be PII Permits PII to be collected for specified, explicit and legitimate purposes Does not permit further processing beyond those purposes except “for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes” Does not apply to processing of public or published content collections such as news stories or Wikipedia articles.
Does GDPR have an impact on classification bias? GDPR requires that personal identifying information be accurate, and that if requested by an individual, that PII be corrected or deleted. GDPR could have an unintended impact on selection bias by allowing deletion of PII leading to incomplete or inadequate representation of a selection class.
Summary GDPR provides some guidelines for aggregation of personal identifying information, but not on categorization bias itself. For information aggregators and information analyzers, the guidelines for appropriate behavior are not always clear When errors and bias are commonly held, this can be reflected in the information ecology. The responsibility for outcomes as a result of errors and bias is not clear.
Discussion Truth Big Data Are morals subjective (like ice cream preference) or are they objective (like insulin)? Do we create moral truth or discover it? When morality is reduced to personal tastes, people exchange the question, “What is good?” for the pleasure question, “What feels good?” Why is lying wrong? What harm do lies do? When is it OK to lie? Big Data Who or what data is being collected? Who's being left out of that kind of data collection? Who makes the decisions about what is being done with that data? How much can we rely on it?
Resources Jeff Catlin. “The Role of Artificial Intelligence in Ethical Decision Making.” Forbes Technology Council. (Dec 21, 2017) https://www.forbes.com/sites/forbestechcouncil/2017/12/21/the-role-of-artificial-intelligence-in-ethical-decision-making/#7d94a54f21dc. ProPublica. “Breaking the Black Box” series. Julia Angwin, Terry Parris Jr. and Surya Mattu. “What Facebook Knows About You.” (September 28, 2016) https://www.propublica.org/article/breaking-the-black-box-what-facebook-knows-about-you. Julia Angwin, Terry Parris Jr. and Surya Mattu. “When Algorithms Decide What You Pay.” (October 5, 2016) https://www.propublica.org/article/breaking-the-black-box-when-algorithms-decide-what-you-pay. Julia Angwin, Terry Parris Jr., Surya Mattu and Seongtaek Lim. “When Machines Learn by Experimenting on Us.” (October 12, 2016) https://www.propublica.org/article/breaking-the-black-box-when-machines-learn-by-experimenting-on-us. Jeff Larson, Julia Angwin and Terry Parris Jr. “How Machines Learn to Be Racist.” (October 19, 2016) https://www.propublica.org/article/breaking-the-black-box-how-machines-learn-to-be-racist?word=Trump. Seth Earley. “The Problem with AI.” 19 IT Professional 04 (July-Aug 2017) pp 63-67. https://www.computer.org/csdl/mags/it/2017/04/mit2017040063.html.
More Resources Olivia Solon. “The Rise of ‘Pseudo-AI’: How Tech Firms Queitly Use Humans to Do Bots’ Work.” The Guardian (July 6, 2018) https://www.theguardian.com/technology/2018/jul/06/artifial-intelligence-ai-humans-bots-tech-companies/. Jelani Harper. “The Global Expansion of Master Data Management.” Information Management (January 31, 2018) https://www.information-management.com/opinion/the-global-expansion-of-master-data-management
Questions Joseph Busch, jbusch@taxonomystrategies.com joseph@semanticstaffing.com (m) 415-377-7912