Interactive Visual Analytics for Discovering Simpson’s Paradox

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

World of work How is science used in the world of work? Science in the world of work.
Multiple Analysis of Variance – MANOVA
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Statistics 350 Lecture 21. Today Last Day: Tests and partial R 2 Today: Multicollinearity.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Statistics for the Social Sciences Psychology 340 Fall 2006 Putting it all together.
DESIGNING, CONDUCTING, ANALYZING & INTERPRETING DESCRIPTIVE RESEARCH CHAPTERS 7 & 11 Kristina Feldner.
Tse-Chuan Yang, Ph.D The Geographic Information Analysis Core Population Research Institute Social Science Research Institute Pennsylvania State University.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
1 Numerical Data Masking Techniques for Maintaining Sub-Domain Characteristics Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State.
Data Mining Chun-Hung Chou
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
A “Dose-Response” Strategy for Assessing Program Impact in Naturalistic Contexts Megan PhillipsGeorge Tremblay Antioch University Antioch University New.
An Introduction to Multivariate Multilevel GLMs Hello and welcome.
Tile-based parallel coordinates and its application in financial visualization Jamal Alsakran, Ye Zhao Kent State University, Department of Computer Science,
Computational Diagnostics: A Novel Approach to Viewing Medical Data 5th International Conference on Coordinated & Multiple Views in Exploratory Visualization.
Geovisualization and Spatial Analysis of Cancer Data: Developing Visual-Computational Spatial Tools for Cancer Data Research Challenges for Spatial Data.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
SIMPSON’S PARADOX Any statistical relationship between two variables may be reversed by including additional factors in the analysis. Application: The.
The Self-Compassion Scale (SCS) is the primary measure of self- compassion in both social/personality psychology and clinical research (Neff, 2003). It.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Data Mining Consultant GlaxoSmithKline: US Pharma IT
Chapter 6 Conducting & Reading Research Baumgartner et al Chapter 6 Selection of Research Participants: Sampling Procedures.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Data Analysis: Statistics for Item Interactions. Purpose To provide a broad overview of statistical analyses appropriate for exploring interactions and.
Bar Graphs Used to compare amounts, or quantities The bars provide a visual display for a quick comparison between different categories.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
OSCAR MCKNIGHT PH.D. ASSISTANT DEAN FOR STUDENT AFFAIRS DIRECTOR OF PSYCHOLOGICAL COUNSELING SERVICES ASHLAND UNIVERSITY Student Affairs Program Evaluation:
Qualitative Research Quantitative Research. These are the two forms of research paradigms (Leedy, 1997) which are qualitative and quantitative These paradigms.
APPLICATIONS FOR STRATEGIC ASSESSMENT,
Participants & Procedure
Analysis and Interpretation: Multiple Variables Simultaneously
JMP Discovery Summit 2016 Janet Alvarado
LINEAR REGRESSION 1.
Chapter 12: The Nuts and Bolts of Multi-factor experiments.
Analysis of Variance and Covariance
Parental Alcoholism and Adolescent Depression?
Chapter 10 Causal Inference and Correlational Designs
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Statistical Data Analysis
Qualitative Research Quantitative Research.
Introduction to the General Linear Model (GLM)
NView Overview We developed this tool as part of a team of visualization and biomedical researchers to better understand the physiology of DBS and patient.
CSE Social Media & Text Analytics
BMTRY 747: Introduction Jeffrey E. Korte, PhD
AP Exam Review Chapters 1-10
Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador.
Data Mining: Exploring Data
Grant Number: IIS Institution of PI: WPI PIs: Matthew O
Practice- How to Present the Evidence
Multivariate Statistics
Inference of Environmental Factor-Microbe and Microbe-Microbe Associations from Metagenomic Data Using a Hierarchical Bayesian Statistical Model  Yuqing.
Statistical Data Analysis
Intrinsic and Task-Evoked Network Architectures of the Human Brain
An Introduction to Correlational Research
Theme 4 Elementary Analysis
Jeffrey A. Hussmann, William H. Press  Cell Reports 
Regression and Categorical Predictors
Association between 2 variables
Wellington Cabrera Advisor: Carlos Ordonez
Machine Learning – a Probabilistic Perspective
Chapter 4: More on Two-Variable Data
Data exploration and visualization
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Interactive Visual Analytics for Discovering Simpson’s Paradox Chenguang Xu, Sarah M. Brown, Chris Weaver, Christan Grant Email: {chguxu, cweaver, cgrant}@ou.edu, smb@sarahmbrown.org https://oudalab.github.io, https://fairnessforensics.github.io ABSTRACT BIVARIATE COLOR FOR CORRELATION MATRICES Simpson’s Paradox, when subgroups of a dataset exhibit the opposite trend of the whole dataset, is a well-known and widely studied statistical phenomenon. However, as the name suggests, it is generally a surprising observation and often counter-intuitive to non-statisticians. It is increasingly important for individuals without statistical expertise to conduct exploratory analyses of data and as such, interpretable visual analytics for discovering Simpson’s Paradox are also important. This paper provides a novel visual analytics approach to facilitate the exploration of the data to detect Simpson’s paradox. We use a bivariate color scheme to illustrate the relationship between trends over the full population and within subgroups. The detection and analysis are tightly coupled by multiple coordinated views that pro-vide users with an overview of the entire data set and details as desired. The interactive features empower users to effectively and efficiently conduct exploratory analyses. For a data set with a binary target variable, Y and for G total grouping variables, x1, . . . xG. First, we compute the G overall comparison vectors: Next, we partition the data set by further conditioning on each of then values of the explanatory grouping variable and compute a m × n rate comparison matrix: SIMPSON’S PARADOX Rate based Simpson’s paradox is a study when the trend is the relative rates of a binary outcome in two groups for example admissions by gender. An overview of interactive visualization for rate based SP detection: https://en.wikipedia.org/wiki Regression type Simpson’s paradox occurs when the trend is based on the sign of a correlation between two variables. BIVARIATE COLOR FOR RATE COMPARISON MATRICES For a data set with d continuous variables we compute the d× d correlation matrix for the whole population. After the data set is partitioned by the C group-by variables, we compute a d× d correlation matrix for each of the kc values of group-by variable c. Correlation matrix for the whole population Kievit, Rogier A., et al. "Simpson's paradox in psychological science: a practical guide."Frontiers in psychology4 (2013). BIVARIATE COLOR SCHEME A bivariate color scheme is a two dimensional array of colors that combines two sets of univariate color schemes. This configuration allows detailed comparisons of two variables and illustrates the relationship between those variables. We present a 3×3 sequential/sequential scheme (A), a 3×3 diverging/diverging scheme (B) and a 5×5 diverging/diverging scheme (C) as shown in figure below. An overview of interactive visualization for regression based SP detection: