Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting.

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
BSoft’s Customer Comparison Report The information pipeline for your company’s customers’ spending Bringing you what you need.. when you need it!
Copyright © 2014 by the American Academy of Actuaries. All Rights Reserved. Michael E. Angelina, MAAA, ACAS Price Optimization Casualty Actuarial & Statistical.
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
PowerPoint Authors: Susan Coomer Galbreath, Ph.D., CPA Charles W. Caldwell, D.B.A., CMA Jon A. Booker, Ph.D., CPA, CIA Cynthia J. Rooney, Ph.D., CPA Copyright.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Business Intelligence Michael Gross Tina Larsell Chad Anderson.
Slides by JOHN LOUCKS St. Edward’s University.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Managerial Accounting
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
Comparisons across normal distributions Z -Scores.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
1//hw Cherniak Software Development Corporation ARM Features Presentation Alacrity Results Management (ARM) Major Feature Description.
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Software Quality Engineering CS410 Class 5 Seven Basic Quality Tools.
Slide Slide 1 Baby Leo’s 4-month “Healthy Baby” check-up reported the following: 1)He is in the 90 th percentile for weight 2)He is in the 95 th percentile.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
The next step in performance monitoring – Stochastic monitoring (and reserving!) NZ Actuarial Conference November 2010.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
Describe the Program Development Cycle. Program Development Cycle The program development cycle is a series of steps programmers use to build computer.
BIA 2610 – Statistical Methods Chapter 1 – Data and Statistics.
Slide 6-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Importing and Exporting Data - QuickBooks Simon Hutchinson – Reckon Product Management.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 1994 Carnegie Mellon University Disciplined Software Engineering - Lecture 3 1 Software Size Estimation I Material adapted from: Disciplined.
1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Disciplined Software Engineering Lecture #3 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Sponsored by the U.S. Department.
Central Tendency & Dispersion
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
CANE 2007 Spring Meeting Visualizing Predictive Modeling Results Chuck Boucek (312)
Capital Adequacy and Allocation John M. Mulvey Princeton University Michael J. Belfatti & Chris K. Madsen American Re-Insurance Company June 8th, 1999.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Microsoft Excel 2013 Chapter 8 Working with Trendlines, PivotTable Reports, PivotChart Reports, and Slicers.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
If you have a transaction processing system, John Meisenbacher
1 Programming and problem solving in C, Maxima, and Excel.
Atlantic Coast Operations Business Intelligence Mobility Project.
Scatterplots & Correlations Chapter 4. What we are going to cover Explanatory (Independent) and Response (Dependent) variables Displaying relationships.
Correlation and Linear Regression
Analyzing One-Variable Data
CHAPTER 3 Describing Relationships
Exploring, Displaying, and Examining Data
Presentation of Flowchart
How could data be used in an EPQ?
Using statistics to evaluate your test Gerard Seinhorst
Business Intelligence
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Honors Statistics Review Chapters 4 - 5
Displaying data Seminar 2.
Atlantic Coast Operations
Presentation transcript:

Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

1 Copyright © 2011 Deloitte Development LLC. All rights reserved. Agenda Overview Reasons to Use Data Visualization in Data Cleansing Examples Tools Wrap-Up

2 Copyright © 2011 Deloitte Development LLC. All rights reserved. Checking data for reasonability is an important part of any modeling process Can’t be avoided – no such thing as “perfect data” Can be time intensive – the “preprocess” portion of any project takes 60-80% of the total project time QC often relies on individual queries that test for specific conditions, or filters in Excel How QC has worked up to now SELECT policy_no, pol_eff_date, tot_incurred_loss FROM claim_data WHERE tot_incurred_loss) > 0 ORDER BY DESCENDING tot_incurred_loss; proc summary data=pol_info; by company state; var pol_ct earned_prem written_prem; output out=pol_info_sum (drop=_type_ _freq_) sum=; run;

3 Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization is the communication of information using graphical representations Introducing data visualization “A picture is worth a thousand words.” “The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey Data visualization is an alternative and a supplement to the more traditional querying methods Use is becoming more prevalent in presentation Not yet widely used by actuaries in QC process – more used to seeing numbers

4 Copyright © 2011 Deloitte Development LLC. All rights reserved. Foundations of data visualization

5 Copyright © 2011 Deloitte Development LLC. All rights reserved. Amount of data is growing Information harder to be expressed numerically in simple spreadsheets Improved computing power Improved tools and visualization methods, which are easier to use Increased demand from clients / management Evolution in data visualization

6 Copyright © 2011 Deloitte Development LLC. All rights reserved. Benefits of data visualization Which one is easiest to convey the message?

7 Copyright © 2011 Deloitte Development LLC. All rights reserved. Data visualization allows users see several different perspectives of the data. Data visualization makes it possible to interpret vast amounts of data Data visualization offers the ability to note exceptions in the data Data visualization allows the user to analyze visual patterns in the data Data visualization equips users with the ability to see influences that would otherwise be difficult to find Packages allow drag and drop functionality, limited need for coding Also allows for easy drill down, filtering, grouping Data visualization allows a natural progression for QC to insight in modeling Benefits of data visualization in QC

8 Copyright © 2011 Deloitte Development LLC. All rights reserved. QC, Data Mining, Exploratory Data Analysis (EDA) Model interpretation Results / Presentation Uses of data visualization

9 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples - Histogram Mass points Extreme values Unreasonable values Histogram after correction – but is it fixed? Still some outliers For extremes, higher loss ratio indicates potential problems with adjusted premium

10 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples - Histogram Incorrectly calculated exposure is one of the causes

11 Copyright © 2011 Deloitte Development LLC. All rights reserved. Descriptive of median, standard deviation, outliers Particularly useful in comparing groups within a variable Examples - Boxplots

12 Copyright © 2011 Deloitte Development LLC. All rights reserved. Split into subgroups as well Examples - Boxplots

13 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples – Line Charts Unusual spikes by month? Seasonal activity consistent with business knowledge? Consistent behavior over time?

14 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples – Line Charts Back to insurance: Total premium average premium, policy counts by month Is peak at January renewal (both counts and average premium) consistent with business?

15 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples – Scatterplots Modeled points outside reasonable bounds (1-10)

16 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples – Scatterplots Adding dimension of business age (size of circles) shows that age incorrectly included in model score

17 Copyright © 2011 Deloitte Development LLC. All rights reserved. Correlations between variables Look for: Very high correlations (near +1 or -1 Low correlations for variables that should be related Examples - Heatmaps

18 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples - Geospatial Customers in a five state area compared to sales locations Customers in area that are not within driving distance Customers mapped well outside area

19 Copyright © 2011 Deloitte Development LLC. All rights reserved. Examples - Geospatial Sales and gross margin rate by zip code Large mass point in sales, also large negative GMR

20 Copyright © 2011 Deloitte Development LLC. All rights reserved. Software (not an exhaustive list!) ExcelThird party add-ins available PowerPivotMicrosoft add-in to Excel ROpen source, multitudes of packages Tableau SASSAS/GRAPH module Spotfire Microstrategy Geospatial mappingESRI, Alteryx, Google API, etc.

21 Copyright © 2011 Deloitte Development LLC. All rights reserved. Wrapping Up Data Visualization allows analysts to communicate with clarity, precision, and efficiency However, there are benefits in utilizing visualizations early in the process, as they may uncover data issues more easily than individual queries Standard tools, like histograms, scatterplots, maps, are becoming more available and easier to use