Machine Learning at Intuit 5 Delightful Use Cases

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

©2006 Prentice Hall Business Publishing, Auditing 11/e, Arens/Beasley/Elder Other Assurance Services Chapter 25.
PIN Navigator Overview
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Overview of Financial Statement Analysis
Chapter 6 Organizational Information Systems
What You Need before You Deploy Master Data Management Presented by Malcolm Chisholm Ph.D. Telephone – Fax
Importing and Exporting Data - QuickBooks Simon Hutchinson – Reckon Product Management.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution Structured programming Product SW.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Approaches to quantifying uncertainty-related risk There are three approaches to dealing with financial and economic risk in benefit-cost analysis: = expected.
 System Requirement Specification and System Planning.
Overview of Transaction Processing and Enterprise Resource Planning Systems Chapter 2.
CUSTOMER RELATIONSHIP MANAGEMENT
People Inc. from P&A Software
Risk Identification and Evaluation Chapter 2
TECHLEADS IT fusion financials
SAP Point Of Sale Data Management(POSDM) ONLINE TRAINING IN INDIA
Analytics as a First-Class Concern
Office 365 Security Assessment Workshop
Information Systems in Organizations 3. 1
Cash and Liquidity Management Scenario Overview
DEVELOPING A BUSINESS PLAN FOR A MANUFACTURING COMPANY: BUDGETING
Sap sales & distribution
Software Testing.
Modern Systems Analysis and Design Third Edition
Oracle Subledger Accounting
Ministry of Finance of the Republic of Azerbaijan
Overview of MDM Site Hub
Functional Aspects of MIS
Contextual Intelligence as a Driver of Services Innovation
It’s All About Me From Big Data Models to Personalized Experience
Technology & Analytics
MIGRATING TO NEW TECHNOLOGY
Ing. Athanasios Podaras, Ph.D 2017
IS442 Information Systems Engineering
EzyAccounting An Accounting Software An Accounting Software By: Delicate Software Solutions Dubai, Manage Your Business… Not Just Accounts.
CHAPTER 8: LEARNING OUTCOMES
Personal Finance Portfolio Management App for Better ROI and Control As portfolios become more diverse, they naturally become more complex. This increased.
The Business Plan.
Distribution and Marketing Channel
Introduction to ERP.
Machine Learning Platform Life-Cycle Management
Local Government Corporation
Unit 6 Finance Knowledge Organiser 6 The Role of the Finance Function
SAD ::: Spring 2018 Sabbir Muhammad Saleh
Overview of Transaction Processing and Enterprise Resource Planning Systems Chapter 2.
Modern Systems Analysis and Design Third Edition
People Inc. from P&A Software
Machine Learning at Intuit 5 Delightful Use Cases
CHAPTER 8: LEARNING OUTCOMES
Modern Systems Analysis and Design Third Edition
CS240: Advanced Programming Concepts
Using Use Case Diagrams
The accounting system produces information used by businesses to make decisions.
Welcome Back Atef Abuelaish.
Credit risk analysis & debt capacity
Event Name Here 22 January 2019
The CPA Firm Of NOW! June 8, 2017.
IFRS 15 - Revenue from Contracts with Customers
Analytics, BI & Data Integration
Chapter 6 The Master Budget and Responsibility accounting
KEY INITIATIVE Finance Function Management
Data Wrangling as the key to success with Data Lake
Enterprise Resource Planning Systems
Customer 360.
Co-production: Enablement Tracking & Reporting
Morteza Kheirkhah University College London
Presentation transcript:

Machine Learning at Intuit 5 Delightful Use Cases Calum Murray Chief Data Architect, Consumer Group, Intuit May 24th 2018

Machine learning at Intuit This talk is ... An overview of how Intuit thinks of ML A high-level view of some Intuit’s ML use cases This talk is not ... A detailed examination of models

Who we serve Small Businesses Self-Employed Consumers

Unlock the power of many for the prosperity of one Our mission Powering prosperity around the world Unlock the power of many for the prosperity of one

Intuit has access to very rich data Small business transactions, financial transactions, tax returns, etc. Categories of data Machine learning dimensions BE Business events Solve very complex tasks BD Behavioral data Automate time-consuming activities TD Speak to the number of users (~ population of the UK) , types of data (SMB Combine with 6 Introduce types of data in slide 6 Do same thing for categories Third-party data Enable new insights 3 categories of data power 3 dimensions of ML

Transactional systems Data pipeline Enterprise Marketing Customer Care Back office systems Transactional systems Small Business Tax Consumer 1. Business events Machine learning Analyst tools Publish Clickstream Ingest (real-time) 2. Behavioral data Ingest Consume Consume 3 types of data 1: Business events 2: Behavioral data 3:Thirdparty data Ingest (batch) ETL 3. Third-party data Data lake MPP BE BD TD

Transactional systems Our implementation Enterprise Marketing Customer Care Back office systems Transactional systems Small Business Tax Consumer 1. Business events AWS SageMaker Tableau Qlickview Publish Clickstream Kafka 2. Behavioral data Ingest Consume Consume Journey – hand coded -> models deployed to Yhat -> SageMaker working closely with amazon etc Sqoop ETL 3. Third-party data S3 Vertica

Machine learning environment Online real-time Business events Features Score Insights Behavioral data Business events Offline batch Develop Train Score Behavioral data Data lake Third-party data Both online and offline ML environments

Use case 1: Managing transactional risk (Payments) Description: Judge the risk of a single financial transaction in real-time. Model basics: Features: Merchant, customer, transaction Training: Batch against business event & third-party data Scoring: Real-time scoring against business event data Benefit: Looking at risk at the transaction level allows us to better protect the merchant from fraudulent transactions. Using ML gets you to a much better loss profile than using rules alone. Organizational principles Segments Difficulty timeline PrComplex task using Business Event Data and Third party data Merchant Features: Counterparty features: Transaction features: BE TD

Managing transactional risk Online real-time Business events Features Score Business events Offline batch Develop Train Data lake Third-party data Batch training, run-time scoring

Use case 2: Automating financial transaction categorization (QuickBooks Online) Description: Small businesses and the self-employed have to categorize financial transactions to an account. Model basics: Features: 235 distinct features including amount, merchant, institution type Training: Batch against business event & third-party data Scoring: Real-time scoring against business event data Benefit: Categorizing transactions can take time, is tedious and can be error-prone. Using ML to automate, we’ve gotten to a 70-80% success rate. Training data spans over billions of unique words and word pairs. 100M reviewed transactions marked as business or personal to train the model. Scoring done when data is imported from Financial Institution BE TD

Automating financial transaction categorization: run-time Online real-time Business events Features Score Business events Offline batch Develop Train Data lake Third-party data Batch training, run-time scoring

Use case 3: Personalized experiences (TurboTax) Description: Provide better and more contextual in-product help. Predict relevant and popular FAQs based on specific customer tax profile info and screen help accessed. Model basics: Features: Current year and prior year, product usage, e-file status Training: Batch against business event & behavioral data Scoring: Real-time scoring against business event & behavioral data Benefit: Helps users navigate the product, reducing care contact rate by 2 points and increasing customer engagement by 3.5%. BE BD

Personalized experiences: run-time Online real-time Features Score Behavioral data Business events Offline batch Develop Train Behavioral data Data lake Batch training, run-time scoring

Use case 4: Matchmaking (experimenting with ML in QuickBooks for Accountants) Description: Find the right match between an accountant and a small business. Model basics: Features: Accounting firm, small business, historical data Training: Batch against business event & third-party data Scoring: Offline against business event & third-party data Benefit: By year 5, half of small businesses fail. They’re 50% more likely to survive if they get help from an accountant. Having the right accountant increases that likelihood. BE TD

Matchmaking: run-time Online real-time Insights Business events Offline batch Develop Train Score Data lake Third-party data Trained and scored offline

Use case 5: Cash flow projection Description: Given the history of a business’s transactions and similar small businesses, predict the cash flow of a small business. Model basics (predict and then forecast): Features: Financial transactions Training: Batch against business event & third-party data Scoring: Offline against business event & third-party data for an individual Benefit: Small business owners can manage their cash flow proactively, making adjustments before they run out of money. The purpose of the prediction engine is to, for a given user, predict new transactions. This primarily includes transactions that haven’t occurred; it might also include transactions that have occurred but are not yet in the system (e.g. haven’t been entered by the user). Predictions are made primarily based on each user’s historic transactions. The cash flow engine deals primarily with creating/updating predictions and forecasts based on an individual user’s data. Although data and information from other users may go into the predictions, these do not need to be updated for real-time or interactive predictions. Hence, any logic depending on other user data will be built ahead of time (in a batch mode) and made available to the cash flow engine. There are at least two key modules that must be supported: Bayesian Probability Distribution. These will represent distributions for the overall population and perhaps additionally for certain segments. They may be updated in the cash flow engine based on data for the individual user. Machine Learning Model. These will be trained on a larger set of user data and made available to the cash flow engine for use when predicting for an individual user. Prediction Engine: Predict transactions. The prediction engine must support a variety of methods (or algorithms) for predicting transactions. We will internally develop algorithms based on data (historic and cross-company) in order to predict transactions. The goal of these algorithms is solely to make the most accurate predictions of transactions for any user given all available relevant information. Additionally, we will want to allow the user themselves to specify various algorithms for prediction or scenario planning. These could include algorithms based on budgets, formulas, or even just the user’s own innate knowledge of their business. These algorithms must also be expressed and executed in the prediction engine. In all cases, the output is a set of predicted transactions. Forecast Engine: Aggregate transactions. The forecast engine applies the appropriate aggregation to the appropriate set of transactions (historic, predicted, recurring/scheduled, scenario planning, etc.) to produce statistical measures associated with a specified cash flow quantity.   For example, a quantity of interest could be the amount of cash on hand on a given day and the statistical measures used to quantify this could be an estimated value along with lower and upper bounds. The user or client must directly or indirectly specify the quantities and statistical measures of interest.  In order to support a full range of statistical measure on the result, aggregation of predicted transactions must propagate the associated measures of uncertainty to produce a result with its own measure of uncertainty. BE TD

Predictions and insights: run-time Online real-time Insights Business events Offline batch Develop Train Score Data lake Third-party data Trained and scored offline

Unlock the power of many for the prosperity of one Key takeaways Unlock the power of many for the prosperity of one Machine learning is changing the way we think about products Machine learning can be used to solve a number of types of problems Different categories of data can be combined and used online and offline BE BD TD

Q&A Calum_Murray@intuit.com