1 Data Quality: Opportunities, Data, and Examples
2
3 – Level of analysis Take a quick look at what/why use data Linking data from disparate and third party sources – Explore data types – Typical issues & Tricks Cross validation and sourcing Reverse Look-up GIS layering Backfill from text correlated to codes – Information from operations Text analytics – Level of analysis Take a quick look at what/why use data Linking data from disparate and third party sources – Explore data types – Typical issues & Tricks Cross validation and sourcing Reverse Look-up GIS layering Backfill from text correlated to codes – Information from operations Text analytics Better and More Data
4 Sales and Distribution Producer Segmentation Market Planning Revenue Forecasting Cross sell and Up sell Retention and Profitability Underwriting Risk Selection and Pricing Portfolio Management Premium Adequacy Billing and Collections Management Claims Payment Accuracy Claim Collaboration > Fraud Detection > Subrogation > Risk Transfer > 3 rd Party Deductible > Reinsurance Recoverable General Organizational Overview An information business focused on risk taking. Make. Sell. Serve.
5 Same Problems – Different Lines of Business Personal – Auto, HO, Umbrella Small Commercial – BOP, CPP Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto Large Commercial Accounts Commercial Auto Workers Comp Umbrella/Excess Specialty Lines – D&O, EPL, E&O, Farm, FI Personal – Auto, HO, Umbrella Small Commercial – BOP, CPP Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto Large Commercial Accounts Commercial Auto Workers Comp Umbrella/Excess Specialty Lines – D&O, EPL, E&O, Farm, FI
6 Structured data Semi-structured data Unstructured data Text Spatial Pictographic Graphic Voice Video Data Types and Forms
7 Data Archive, Legacy Systems Current System Claim Multiple States Billing Systems Finance Systems CRM Systems, other data Policy Multiple Underwriting Systems Medical Data - Bill Review - PPO - Case Management - Paradigm Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation and sourcing Identify Data Systems Get right data from right systems Overcome internal Organizational Barriers Bridge to legacy systems and archived data Augment to create rich data mining environment Expect the need to negotiate for resources ACTIONS Vendors/Partners External Data
8 Dun & Bradstreet Experian Bureau of Labor and Statistics Market Stance AM Best Equifax US Census Claritas Melissa Data ISO GIS vendors U&C Data sets Code Sets for ICD-s and CPT’s … Some typical external data sources and vendors
9 Data Glitches – historical and on-going Systemic changes to data not process related – Changes in data layout / data types – Changes in scale / format – Temporary reversion to defaults – Missing and default values – Gaps in time series Systemic changes to data not process related – Changes in data layout / data types – Changes in scale / format – Temporary reversion to defaults – Missing and default values – Gaps in time series
10 Process Reasons for poor data entry
11 Defining Issues-sample Source Data 1-Define Issues
12 Data Elements DZ BE CN DK EG FR... ZW ISO 3166 English Name ISO Numeric Code ISO Alpha Code Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 French Name L`Algérie Belgique Chine Danemark Egypte La France... Zimbabwe DZA BEL CHN DNK EGY FRA... ZWE ISO Alpha Code MORE ISSUES… Mapping across sources: Same Fact, Different Terms Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Data Element Concept
13 Data Filling Manual Statistical Imputation Temporal Spatial Spatial-temporal Manual Statistical Imputation Temporal Spatial Spatial-temporal
14 Geographic Hierarchy
15 Deriving Data = Power Totals: Household Income Trends: Rate of Medical Bill Increases Ratios: Claims/Premium, Target/Median Friction: Level of inconvenience, ratio of rental to damage Sequences: Lawyer-Doctor, Auto-Life Policy Circumstances: Minimal Impact Severe Trauma Temporal: Loss shortly after adding collision Spatial: Distance to Service, proximity of stakeholders Logged: Progress Notes, Diaries, Who did it, When, “Why”
16 Deriving Data = Power (Cont’d) Behavioral: Deviation from past usage, spike buying Experience Profiles: Vendor, Doctor, Premium Audit Channel: How applied, How reported, Service Chain Legal Jurisdiction: Venue Disposition, Rules Demographics: Working, Weekly wage, lost income Firmographics: Industry Class Code Vs Injuries Claimed Inflation: Wage, Medical, Goods, Auto, COLA Gov’t Statistics: Crime Rate, Employment, Traffic Other Stats: Rents, Occupancy, Zoning, Mgd Care
17 “Search” versus “Discover” Data Mining Text Mining Data Retrieval Information Retrieval Search (goal-oriented) Discover (opportunistic) Structured Data Unstructured Data (Text)
18 Word Replacement Lists Input Value [Jim] SearchingSearching Returns “Similar Matches” All Records Found: Jimmy Jim James JimmyJimJames JAMESJAMESJAMES Transformed Input Value [JAMES]
19 Motivation for Text Mining Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. 90% Structured Numerical or Coded Information 10% Unstructured or Semi-structured Information
20 Convergence of Disciplines Example
21 Techniques for attacking text data: Rules-based Statistical Text Analysis and Clustering Linguistic and Semantic Clustering Support Vector Machines Pattern Matching or other statistical algorithms Neural Networks Combination of methods from above Text is like a data iceberg
22 Claims processing – Progress notes and Diaries CLAIMS ADJUSTER Medical Management Staff Special Investigation Unit NICB Vendor Management Consulting Engineers Hearing Representative Structured Settlement Unit Recovery Staff Legal Staff Home Office Staff Field Office Claim Staff Insured Risk Manager Agent or Broker Diary forward – “call Dr Jones next week” Business Rule – large loss review System Reminder – update case reserves Correspondence Tracking – legal letter sent Service
23 Semantic processing: Named Entity Extraction Identify and type language features Examples: People names Company names Geographic location names Dates Monetary amount Phone #, zipcodes, SSN, FEIN Others… (domain specific) Identify and type language features Examples: People names Company names Geographic location names Dates Monetary amount Phone #, zipcodes, SSN, FEIN Others… (domain specific)
24 Feedback to UW
25 Data Quality: Opportunities, Data, and Examples