Download presentation
Presentation is loading. Please wait.
Published byMeryl Joseph Modified over 6 years ago
1
Knowledge Discovery From Massive Healthcare Claims Data
Varun Chandola, Sreenivas Sukumar, Jack Schryver Presented by Anatoli Shein
2
Motivation: US health care
2008: 15.2% of GDP 2017: 19.5% of GDP Anatoli Shein 5/25/2018
3
Goal: Improve cost-care ratio
Improve healthcare operations. Reduce fraud, waste, and abuse. Anatoli Shein 5/25/2018
4
Big Data Analytics in HealthCare
Anatoli Shein 5/25/2018
5
Big Data in HealthCare Categorized
Anatoli Shein 5/25/2018
6
Data quality and availability
Clinical Data, Behavior data, and Pharmaceutical Data: Useful but unavailable Anatoli Shein 5/25/2018
7
Data quality and availability
Health insurance Data Available but needs preparation Anatoli Shein 5/25/2018
8
State of the Art Analytics for Massive HealthCare Data:
Network analysis Text mining Temporal analysis Higher order feature construction Anatoli Shein 5/25/2018
9
Health Insurance 85% of Americans have it It’s data is stored to :
Track payments Address fraud Address economic challenges. Strong analytic insight into healthcare. Anatoli Shein 5/25/2018
10
Health Insurance Data Model
Fee-for-service model Provider -> Service -> Patient -> Cost -> Justification -> Payor Anatoli Shein 5/25/2018
11
Data Maintained for Operation
Claims information Patient enrollment and eligibility Provider enrollment Anatoli Shein 5/25/2018
12
Challenges and Opportunities
Fraud Waste Abuse Anatoli Shein 5/25/2018
13
Fraud Billing for not provided services Large scale fraud
Anatoli Shein 5/25/2018
14
Waste Improper payments Double payments Duplicate claims
Outdated fee schedule Anatoli Shein 5/25/2018
15
Abuse Prospective payment system Upcoding Anatoli Shein 5/25/2018
16
Data Used Claims data (48 million beneficiaries in the US) from transactional data warehouses Provider enrollment data (from private organizations) Fraudulent providers (from Office of Inspector General’s exclusion) The rest are treated as non-fraudulent Anatoli Shein 5/25/2018
17
Claims Data Anatoli Shein 5/25/2018
18
Analysis Identification of typical treatment profiles
Identification of costly areas Anatoli Shein 5/25/2018
19
Text Analysis, profile building
Apache Mahout Hadoop Based technology Map Reduce Anatoli Shein 5/25/2018
20
Entities as Documents Document-term matrixes
P(providers) B(beneficiaries) C(procedures) G(diagnoses) D(drugs) Ex: PG (providers/diagnoses) Anatoli Shein 5/25/2018
21
Anatoli Shein 5/25/2018
22
Interesting find Some seemingly different diagnosis codes got grouped to the same topics Ex: Diabetes and Dermatoses Anatoli Shein 5/25/2018
23
Social Network Analysis
Estimate the risk of a provider fraud before making any claims by constructing social network Anatoli Shein 5/25/2018
24
Provider Network Anatoli Shein 5/25/2018
25
Texas Provider Network
Anatoli Shein 5/25/2018
26
Extracting Features from Provider Network
Anatoli Shein 5/25/2018
27
Information complexity measure
Most distinguishing features showed to be: Node degree Number of fraudulent providers in 2-hop network Eigenvector centrality Current-flow closeness centrality Anatoli Shein 5/25/2018
28
Anatoli Shein 5/25/2018
29
Temporal Feature Construction
By looking at provider data over time we can find anomalies Increase in number of patients Taking patients with conditions different from their past profiles Anatoli Shein 5/25/2018
30
Fraudulent Provider Detection
Anatoli Shein 5/25/2018
31
Conclusions Introduced domain of “big” healthcare claims data
Analyzed health care claims data on a country level using state of art analytics for massive data Problem was transformed to well known analysis problems in the data mining community Several approaches presented for identifying fraud, waste and abuse Anatoli Shein 5/25/2018
32
Thank you. Questions? Anatoli Shein 5/25/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.