Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality Models for High Volume Transaction Streams: A Case Study Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group Joseph Bugajski.

Similar presentations


Presentation on theme: "Data Quality Models for High Volume Transaction Streams: A Case Study Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group Joseph Bugajski."— Presentation transcript:

1 Data Quality Models for High Volume Transaction Streams: A Case Study Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group Joseph Bugajski Visa International

2 The Problem: Detect Significant Changes in Visa’s Payments Network Account Merchant Issuing Bank Acquiring Bank

3 Visa Payment Network  Over 1.59 billion Visa cards in circulation  6800 transactions per second (peak)  Over 20,000 member banks  Millions of merchants

4 The Challenge: Payments Data is Highly Heterogeneous  Variation from cardholder to cardholder  Variation from merchant to merchant  Variation from bank to bank

5 Observe: If Data Were Homogeneous, Could Use Change Detection Model  Sequence of events x[1], x[2], x[3], …  Question: is the observed distribution different than the baseline distribution?  Use simple CUSUM & Generalized Likelihood Ratio (GLR) tests Observed Model Baseline Model 

6 Key Idea: Build 10 4 + Models, One for Each Cell in Data Cube  Build separate model for each bank (1000+)  Build separate model for each geographical region (6 regions)  Build separate model for each different type of merchant (c. 800 types of merchants)  For each distinct cube, establish separate baselines for each metric of interest (declines, etc.)  Detect changes from baselines Geospatial region Type of Transaction 20,000+ separate baselines Modeling using Cubes of Models (MCM) Bank

7 Greedy Meaningful/Manageable Balancing (GMMB) Algorithm Fewer alerts Alerts more manageable To decrease alerts, remove breakpoint, order by number of decreased alerts, & select one or more breakpoints to remove More alerts Alerts more meaningful To increase alerts, add breakpoint to split cubes, order by number of new alerts, & select one or more new breakpoints One model for each cell in data cube Breakpoint

8 Augustus  Open source Augustus data mining platform was used to: Estimate baselines for over 15,000 separate segmented models Score high volume operational data and issue alerts for follow up investigations  Augustus is PMML compliant  Augustus scales with Volume of data (Terabytes) Real time transaction streams (15,000/sec+) Number of segmented models (10,000+)

9 Some Results to Date  System has been operational for 2.5 years  ROI –5.1xYear 1(over 6 months) –7.3xYear 2(12 months) –10.0x Year 3(12 months)  Currently estimating over 15,000 individual baseline models  The system has issued alerts for: Merchants using incorrect Merchant Category Code (MCC) - account testing Sales channel variations Incorrect use of merchant city name field Incorrect coding or recurring payments

10 Summary  Used new methodology (Modeling using Cubes of Models) for modeling large, highly heterogeneous data sets  This project contributed in part to the development of a Baseline Model in the open source Augustus system  Integrated system to generate alerts using Baseline Models with manual investigation process  Project is generating over 10x ROI  Poster #20 in the Tuesday night Poster Session.


Download ppt "Data Quality Models for High Volume Transaction Streams: A Case Study Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group Joseph Bugajski."

Similar presentations


Ads by Google