Data Mining to Predict and Prevent Errors in Health Insurance Claims Processing Mohit Kumar, Rayid Ghani and Zhu-Song Mei Copyright © 2010 Accenture All.

Slides:



Advertisements
Similar presentations
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Bootstrapping judgemental adjustments to improve forecasting accuracy - judgemental bootstraps vs error bootstraps Robert Fildes Centre for Forecasting,
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Evaluating Search Engine
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
Statistics Are Fun! Analysis of Variance
Active Learning with Support Vector Machines
One-Sample Tests of Hypothesis
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Part I: Classification and Bayesian Learning
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Maximizing long-term ROI for Active Learning Systems
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Active Learning for Class Imbalance Problem
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Software Estimation and Function Point Analysis Presented by Craig Myers MBA 731 November 12, 2007.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
August th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht.
Future of the Server Room Tour. Ottawa Montreal Calgary Vancouver Toronto Future of Your Server Room Three Pillars of Windows Server 2008 Virtualization.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Project Portfolio Management Business Priorities Presentation.
Chapter 10 Verification and Validation of Simulation Models
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Interactive Data Mining and Business Applications Rayid Ghani Collaboration with Chad Cumby, Divna Djordjevic, Andy Fano, Marko Krema, Mohit Kumar, Abhimanyu.
What we mean by Big Data and Advanced Analytics
Experience Report: System Log Analysis for Anomaly Detection
Claims Leakage Control
Data Based Decision Making
One-Sample Tests of Hypothesis
Introduction Characteristics Advantages Limitations
Evaluation of IR Systems
Estimation and Confidence Intervals
Introduction.
De-mystifying Big Data Testing using new generation tools / technology
[ March 9, 2017] [ Bill Bowles, Audit Supervisor]
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chapter 10 Verification and Validation of Simulation Models
Statistical Methods Carey Williamson Department of Computer Science
An Inteligent System to Diabetes Prediction
Dr. Morgan C. Wang Department of Statistics
Introduction to Statistics for Business Application
Smita Vijayakumar Qian Zhu Gagan Agrawal
Learning Algorithm Evaluation
iSRD Spam Review Detection with Imbalanced Data Distributions
Psych 231: Research Methods in Psychology
Virtual University of Pakistan
Reasoning in Psychology Using Statistics
Psych 231: Research Methods in Psychology
PolyAnalyst Web Report Training
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 9 Hypothesis Testing: Single Population
DESIGN OF EXPERIMENTS by R. C. Baker
Semi-Supervised Learning
Presentation transcript:

Data Mining to Predict and Prevent Errors in Health Insurance Claims Processing Mohit Kumar, Rayid Ghani and Zhu-Song Mei Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.

Accenture Technology Labs R&D Groups in 4 Locations Consulting & Services company 180,000 people in over 50 countries Chicago Silicon Valley Sophia Antipolis Bangalore Applied Research Motivated by real business problems Focus Areas include: Machine Learning, Data Mining Software Engineering Collaboration & Knowledge Management Cloud/Distributed Computing Green Computing Biometrics

Motivation Inefficiencies in the healthcare insurance process result in large monetary losses affecting corporations and consumers $91 billion over-spent in US every year on Health Administration and Insurance McKinsey study’ Nov 2008 131 percent increase in insurance premiums over past 10 years To put it in perspective with other research areas; we heard in the Plenary Invited Talk that online ad market is projected to be $48 billion in 2011 & $67 billion in 2013 $650 billion overall more than

Health Insurance Claim Process

Motivation Inefficiencies in the healthcare process result in large monetary losses affecting corporations and consumers $91 billion over-spent in US every year on Health Administration and Insurance (McKinsey study’ Nov 2008) 131 percent increase in insurance premiums over past 10 years Claim payment errors drive a significant portion of these inefficiencies Increased administrative costs and service issues of health plans Overpayment of Claims - direct loss Underpayment of Claims – loss in interest payment for insurer, loss in revenue for provider Some statistics 33% of all the workforce is involved in taking care of these errors For 6 million member insurance plan, $400 million identified overpayments Source: [Anand and Khots, 2008] For large (10 million+) insurance plan, estimated $1 billion in loss of revenue Source: Discussion with domain experts

Early Rework Detection –How its done today Random Audits for Quality Control Claims Database Random Samples Manual Audits Auditors Extremely Low Hit Rates Long audit times due to fully manual audits

Early Rework Detection – Hypothesis and Rule based audits Database Queries Claims Database Generate Expert Hypotheses Hypothesis- based Audits Auditors Better Hit Rates but still lot of manual effort in discovering, building, updating, executing, and maintaining the hypotheses PROBLEM Setup/Characteristics Identify rework claims before payment Identify a wide variety of rework types, not limited to manual rules Flag suspect claims with enough accuracy and explain the reason for error to make a secondary pre-pay audit practical Adapt to changes in environment

Problem Formulation Classification problem Payment Errors or Not Use confidence score for Ranking to prioritize the claims to be reviewed Alternate formulations Ranking problem Multi class classification to predict the error category Multi instance modeling Characteristics Skewed class distribution (Rare events) Biased sampling of labeled data Concept drift Expensive domain experts

Feature Design Raw features Derived Features Statistical (Avg, Min, Max, STDs) STDev of Charges/Paid amounts Most features are client independent hence generalizable Interesting features: STDevs are significant

Classification Algorithms Domain characteristics High dimensional data Sparse data Fast training, updating and scoring required Ability to generate explanation for domain experts Selected Fast Linear SVMs Distance from margin is used as the ranking score As SVM’s are not able to handle categorical features, convert categorical to boolean features 100k-1Million features in typical insurance data High dimensional Sparse Fast training, updating and scoring SVMs: svm perf, pegasos, sofia

Data Insurance company 1 Insurance company 2 Duration 2 years Number of claims 3.5 million 23 million Labeled claims 121k (49k errors) 380k (247k errors) Number of Features 110,000 ~1 million Talk about labeled data distribution Labeled data comes from several systems (qa, provider) 5-10% in real life System needs to take that into account

Offline Evaluation Metric Precision @ Percentile 10 Audit Capacity: In production, only 5-10 % of all the claims (~3.5 million) can be manually reviewed Offline experiments critical for model selection, tuning, and calibration before deployment Also useful to give evidence that the system works before deployment can be considered Interesting Anecdote: How to explain these results? Precision at 10 (precision vs recall), to hit rate, and catch rate, and audit rate

Experimental Setup & Infrastructure Run experiments to pick optimal parameters varying: SVM parameters Feature selection Temporal sample selection Claim level vs line level classification 1000s of experiments Fast training time to do automatic model selection versus intuiion Adapt to changing environment 7 generalize across clients

Results 93% precision in the top 10% examined claims 23% of Error claims are found in top 10% examined claims

Estimating performance on Unlabeled Data ~40% of known error claims are discovered in the top 10% examined claims Mix test data with Unlabeled data Rank the entire set and measure the recall of the labeled test set More offline experiments…to check the scalability of the system & simulating live deployment .. realistic numbers than just labeled data ~25% of known correct claims are in the bottom 10%

Live evaluation (with auditors) Insurance Company 1 Audit time: 20 min – 1 hour Gave a sample of 100 claims to auditor Precision: 65% Pilot deployment (4 weeks) : Insurance Company 2 Audit time: 4 min – 10 min Total claims audited: 1000 Precision: 29-55% based on audit strategy First evaluation was a sanity check. Second one was a pilot deployment Precision lower than offline results but much higher than 5-10% that is the current performance These numbes reflect $10-$25 million saving/yr So far the results were for a static system disregarding the timing information. When considering real deployments, the system has to ‘look ahead’ Prep for concept drift: The observation was that if we split the data temporally the performance goes down significantly compared to random 70/30 split

Concept Drift Diagnostics for determining whether Concept Drift is present What is the optimal time window to train the system? Most recent 2-3 months data gives the best performance

System Demo EOB attached to claim showing patient responsibility of $2625.00. insurer incorrectly paid as primary. Claim overpaid by $805.00.

Challenges / Recent work Concept Drift Active Learning Interactive cost-sensitive approaches Alternative formulations Ranking Multi-instance Multi-class

Summary Payment errors result in large monetary losses in the healthcare system Our approach is able to accurate detect these errors and help fix them efficiently Current estimates suggest $10-$25million/year savings for typical insurers in the US Currently being industrialized for deployment Currently looking for people

Questions