1 Data Quality: Opportunities, Data, and Examples.

Slides:



Advertisements
Similar presentations
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Advertisements

4268 Lakefall Court Riverside, CA Toll Free (877)
“This workforce solution was funded by a grant awarded under Workforce Innovation in Regional Economic Development (WIRED) as implemented by the U.S. Department.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1 Text and Predictive Analytics. 2 Analytic Value Efforts Reporting = “Having the data” Timeliness and accuracy Reports and Tables Surfacing data with.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Chapter 14 The Second Component: The Database.
Chapter 4: Insurance Company Operations
Data Mining – Intro.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
The Challenges of Medium to Large Sized Commercial Insurance Risks July 4, 2007 Presentation for the Ontario Conference of Casualty Actuaries.
Finance 431: Property-Liability Insurance Lecture 4: Marketing and Distribution Systems.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
The Audit Toolkit - Empowering Employers To Take Control© The audit toolkit was developed to help employers focus on the key points of preparing for an.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
OASIS – Customer Information Quality (CIQ) January 2004 John Glaubitz Member, OASIS CIQ TC.
Business Intelligence Solutions for the Insurance Industry DAT – 13 Data Warehousing Rasool Ahmed.
Offshore Outsourcing: Effect on American Competitiveness Subhash C. Jain University of Connecticut Presentation made at the Fairfield Economy Conference.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
@ Hanover Insurance Group: Catherine Eska 1 FROM CLASS TO INDIVIDUAL RATING CAS Predictive Modeling Seminar October 4 th, 5 th 2006 Data Challenges and.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Reserving for Self-Insureds Kevin M. Bingham – Deloitte. Casualty Actuarial Society September 12, :30 PM – 3:00 PM Boston,
Using External Secondary Data Chapter 7. Standardized Marketing Information Services Commercial sources of secondary data The data are usually collected,
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
FEA Data and Information Reference Model (DRM): the Interoperability Message Presented by Eliot Christian, USGS based on work of ISO/IEC JTC1/SC32 Data.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1999 CAS RATEMAKING SEMINAR PRODUCT DEVELOPMENT (MIS - 32) BETH FITZGERALD, FCAS, MAAA.
SDC JE-2027 January 18, 2000 Bruce Bargmeyer Chair, SC 32 – Data Management and Interchange U.S. Environmental Protection Agency Telephone: (202)
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Marketing and Distribution Systems Marketing - developing strategies to get and keep customers Measuring the market (1995) Premium volume $260 billion.
1 Where Is My Market? Mining Data to Find a Niche Commercial Lines Segmentation Workshop Lisa Sayegh Presentation to the CAS March 2003.
John Ykema, Director of Sales & Marketing Arthur Frisch, President, i3 Analytics.
-1- Oracle E-Business Suite R12.1 Accounts Receivables Essentials Partner Boot Camp Training Courseware.
Academic Year 2014 Spring Academic Year 2014 Spring.
PROPRIETARY  2003 Data Research Analysis & Consultancy Solutions All Rights Reserved. This is achieved by: Improving availability / reducing stock outs.
Concept Proposal Sixth Open Forum on Metadata Registries Semantic Interoperability between Registries To be held January 20-24, 2003 Bruce Bargmeyer
A sound managerial control requires proper management of liquid assets & inventory. These assets are part of working capital of the business. Receivables.
International/Interagency Collaboration – IT for Environmental Information & Environmental Data Exchange Network Copenhagen, Denmark April 25, 2002 Bruce.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Concept Presentation Sixth Open Forum on Metadata Registries To be held January 20-24, 2003 Bruce Bargmeyer
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Personal Finance Home and Auto Insurance
Makes Insurance Smarter.
PRIMARY DATA vs SECONDARY DATA RESEARCH Lesson 23 June 2016
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
Taking a Tour of Text Analytics
DATA MINING © Prentice Hall.
Data and Applications Security Introduction to Data Mining
Timothy L. Wisecarver FCAS, FCA, MAAA September 8, 2003
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Applications of Data Mining in Software Engineering
Overview of Insurance Operations
Ecoinformatics Technical Projects Workgroup
Data Mining Concepts and Techniques
Minimising claims leakage and identification of suspicious behaviour by providers and members through more effective use of data Doug Campbell Matt Kuperholz.
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Presentation transcript:

1 Data Quality: Opportunities, Data, and Examples

2

3 – Level of analysis Take a quick look at what/why use data Linking data from disparate and third party sources – Explore data types – Typical issues & Tricks Cross validation and sourcing Reverse Look-up GIS layering Backfill from text correlated to codes – Information from operations Text analytics – Level of analysis Take a quick look at what/why use data Linking data from disparate and third party sources – Explore data types – Typical issues & Tricks Cross validation and sourcing Reverse Look-up GIS layering Backfill from text correlated to codes – Information from operations Text analytics Better and More Data

4 Sales and Distribution Producer Segmentation Market Planning Revenue Forecasting Cross sell and Up sell Retention and Profitability Underwriting Risk Selection and Pricing Portfolio Management Premium Adequacy Billing and Collections Management Claims Payment Accuracy Claim Collaboration > Fraud Detection > Subrogation > Risk Transfer > 3 rd Party Deductible > Reinsurance Recoverable General Organizational Overview An information business focused on risk taking. Make. Sell. Serve.

5 Same Problems – Different Lines of Business Personal – Auto, HO, Umbrella Small Commercial – BOP, CPP Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto Large Commercial Accounts Commercial Auto Workers Comp Umbrella/Excess Specialty Lines – D&O, EPL, E&O, Farm, FI Personal – Auto, HO, Umbrella Small Commercial – BOP, CPP Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto Large Commercial Accounts Commercial Auto Workers Comp Umbrella/Excess Specialty Lines – D&O, EPL, E&O, Farm, FI

6 Structured data Semi-structured data Unstructured data Text Spatial Pictographic Graphic Voice Video Data Types and Forms

7 Data Archive, Legacy Systems Current System Claim Multiple States Billing Systems Finance Systems CRM Systems, other data Policy Multiple Underwriting Systems Medical Data - Bill Review - PPO - Case Management - Paradigm Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation and sourcing Identify Data Systems Get right data from right systems Overcome internal Organizational Barriers Bridge to legacy systems and archived data Augment to create rich data mining environment Expect the need to negotiate for resources ACTIONS Vendors/Partners External Data

8 Dun & Bradstreet Experian Bureau of Labor and Statistics Market Stance AM Best Equifax US Census Claritas Melissa Data ISO GIS vendors U&C Data sets Code Sets for ICD-s and CPT’s … Some typical external data sources and vendors

9 Data Glitches – historical and on-going Systemic changes to data not process related – Changes in data layout / data types – Changes in scale / format – Temporary reversion to defaults – Missing and default values – Gaps in time series Systemic changes to data not process related – Changes in data layout / data types – Changes in scale / format – Temporary reversion to defaults – Missing and default values – Gaps in time series

10 Process Reasons for poor data entry

11 Defining Issues-sample Source Data 1-Define Issues

12 Data Elements DZ BE CN DK EG FR... ZW ISO 3166 English Name ISO Numeric Code ISO Alpha Code Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 French Name L`Algérie Belgique Chine Danemark Egypte La France... Zimbabwe DZA BEL CHN DNK EGY FRA... ZWE ISO Alpha Code MORE ISSUES… Mapping across sources: Same Fact, Different Terms Algeria Belgium China Denmark Egypt France... Zimbabwe Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Data Element Concept

13 Data Filling Manual Statistical Imputation Temporal Spatial Spatial-temporal Manual Statistical Imputation Temporal Spatial Spatial-temporal

14 Geographic Hierarchy

15 Deriving Data = Power  Totals: Household Income  Trends: Rate of Medical Bill Increases  Ratios: Claims/Premium, Target/Median  Friction: Level of inconvenience, ratio of rental to damage  Sequences: Lawyer-Doctor, Auto-Life Policy  Circumstances: Minimal Impact Severe Trauma  Temporal: Loss shortly after adding collision  Spatial: Distance to Service, proximity of stakeholders  Logged: Progress Notes, Diaries,  Who did it, When, “Why”

16 Deriving Data = Power (Cont’d)  Behavioral: Deviation from past usage, spike buying  Experience Profiles: Vendor, Doctor, Premium Audit  Channel: How applied, How reported, Service Chain  Legal Jurisdiction: Venue Disposition, Rules  Demographics: Working, Weekly wage, lost income  Firmographics: Industry Class Code Vs Injuries Claimed  Inflation: Wage, Medical, Goods, Auto, COLA  Gov’t Statistics: Crime Rate, Employment, Traffic  Other Stats: Rents, Occupancy, Zoning, Mgd Care

17 “Search” versus “Discover” Data Mining Text Mining Data Retrieval Information Retrieval Search (goal-oriented) Discover (opportunistic) Structured Data Unstructured Data (Text)

18 Word Replacement Lists Input Value [Jim] SearchingSearching Returns “Similar Matches” All Records Found: Jimmy Jim James JimmyJimJames JAMESJAMESJAMES Transformed Input Value [JAMES]

19 Motivation for Text Mining Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. 90% Structured Numerical or Coded Information 10% Unstructured or Semi-structured Information

20 Convergence of Disciplines Example

21 Techniques for attacking text data:  Rules-based  Statistical Text Analysis and Clustering  Linguistic and Semantic Clustering  Support Vector Machines  Pattern Matching or other statistical algorithms  Neural Networks  Combination of methods from above Text is like a data iceberg

22 Claims processing – Progress notes and Diaries CLAIMS ADJUSTER Medical Management Staff Special Investigation Unit NICB Vendor Management Consulting Engineers Hearing Representative Structured Settlement Unit Recovery Staff Legal Staff Home Office Staff Field Office Claim Staff Insured Risk Manager Agent or Broker Diary forward – “call Dr Jones next week” Business Rule – large loss review System Reminder – update case reserves Correspondence Tracking – legal letter sent Service

23 Semantic processing: Named Entity Extraction Identify and type language features Examples: People names Company names Geographic location names Dates Monetary amount Phone #, zipcodes, SSN, FEIN Others… (domain specific) Identify and type language features Examples: People names Company names Geographic location names Dates Monetary amount Phone #, zipcodes, SSN, FEIN Others… (domain specific)

24 Feedback to UW

25 Data Quality: Opportunities, Data, and Examples