Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Lecture-8/ T. Nouf Almujally
Big Data A big step towards innovation, competition and productivity.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
IT – DBMS Concepts Relational Database Theory.
Database Systems – Data Warehousing
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Organizing Data and Information
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Denver ● SPT 104 ● March 1-3, 2016.
Machine Learning. Definition Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
This is a free Course Available on Hadoop-Skills.com.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Apache Hadoop on Windows Azure Avkash Chauhan
Foundations of Business Intelligence: Databases and Information Management Chapter 6 VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors.
Data Resource Management Chapter 5 McGraw-Hill/IrwinCopyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Microsoft Ignite /28/2017 6:07 PM
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data and Hadoop Data and Data Analytics. Big Data Massive volumes of diverse and rapidly growing data Mostly unstructured, semi-structured, lightly.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
SAS users meeting in Halifax
Big Data Enterprise Patterns
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Chapter 14 Big Data Analytics and NoSQL
Hadoopla: Microsoft and the Hadoop Ecosystem
Big Data Young Lee BUS 550.
Charles Tappert Seidenberg School of CSIS, Pace University
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big DATA.
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software Technologies Revised: 14 th Dec

Agenda Wilshire Software Technologies Revised: 14 th Dec Common Banking Frauds Fraud Fighting Activities Enterprise Fraud Systems Diagnostic Anatomy Big Data Hadoop Ecosystem Banks Data Source Social Network Data Providers Big Data Integration – Technology Stack Reporting Tools

A deception deliberately practiced in order to secure unfair or unlawful gain or causing loss to another party. Wilshire Software Technologies Revised: 14 th Dec Fraud

A bank is typically exposed to different types of frauds. Wilshire Software Technologies Revised: 14 th Dec Common Banking Frauds

Fraud fighting activities can be grouped into three primary categories:  Fraud Prevention - Proactive  Fraud Detection - Reactive  Fraud Investigation - Action Wilshire Software Technologies Revised: 14 th Dec Fraud Fighting Activities

Wilshire Software Technologies Revised: 14 th Dec Source: Enterprise Fraud Systems Diagnostic Anatomy

7 Policy Data Collection Data Logs Banking Servers Data Analysis Fraud Detection Compliance Legal Action Business Process Change Adopt New Technologies Report Management Users ATMS ONLINE CREDIT FRAUD PREVENTION FRAUD ACTIONS External Data Feeds FRAUD DETECTION

8 Policy Data Collection Data Logs Banking Servers Data Analysis Fraud Detection Compliance Legal Action Business Process Change Adopt New Technologies Report Management Users ATMS ONLINE CREDIT External Data Feeds FRAUD DETECTION FRAUD PREVENTION FraudMA P™ Reputation Manager 360 FRAUD ACTIONS

9 FRAUD PREVENTION Monitoring Account Holder Behavior It is organized around different phases or aspects of the online banking process.

Wilshire Software Technologies Revised: 14 th Dec FRAUD PREVENTION

11 Policy Data Collection Data Logs Banking Servers Data Analysis Fraud Detection Compliance Legal Action Business Process Change Adopt New Technologies Report Management Users ATMS ONLINE CREDIT External Data Feeds FRAUD DETECTION FRAUD PREVENTION FRAUD ACTIONS

How Banks can leverage Data Mining capabilities of Big Data for Fraud Detection Wilshire Software Technologies Revised: 14 th Dec

Wilshire Software Technologies Revised: 14 th Dec Velocity  Moves at very high rates (think sensor-driven systems).  Valuable in its temporal, high velocity state. Volume  Fast-moving data creates massive historical archives.  Valuable for mining patterns, trends and relationships. Variety  Structured (logs, business transactions).  Semi-structured and unstructured. BIG DATA

Wilshire Software Technologies Revised: 14 th Dec  Hadoop is a combination of : HDFS  Storage MapReduce  Computation  Hadoop Distributed File System (HDFS) Distributed file system for redundant storage. Designed to reliably store data on commodity hardware.  MapReduce A programming model for distributed data processing. A data processing primitives are functions: Mappers and Reducers. BIG DATA BY HADOOP

Wilshire Software Technologies Revised: 21/10/ Hadoop Ecosystem  Pig High-level data flow language. Made of two components:  Data processing language Pig Latin (Pig Scripts).  Compiler to translate Pig Latin to MapReduce.  Hive Data Warehousing Layer on top of Hadoop. Allows analysis and queries using SQL–like language.  Mahout Scalable machine learning algorithms on top of Hadoop.

Wilshire Software Technologies Revised: 14 th Dec  Sqoop A tool to automate data transfer between structured datastores and Hadoop.  Flume Distributed data/log collection service. Collects data/log from their sources and puts in a centralized location for storage and processing. Hadoop Ecosystem

Wilshire Software Technologies Revised: 14 th Dec Hadoop Ecosystem

Wilshire Software Technologies Revised: 14 th Dec Identify Data Sources Consider what data sources you’ll need to take advantage of.  Existing data sources This includes a wide variety of data, such as transactional data, survey data, web logs, etc.  Purchased data sources Does your organization use supplemental data, such as demographics? If not, consider social media and news stream would complement your current data to create additional project value. Banks Data Source

Wilshire Software Technologies Revised: 14 th Dec Social Network Data Providers This data works as input data to build big-data and can integrate with Bank’s Customer data.

CRM/customer support POS/purchases /documents/collab. BI & data warehouse system & network logs web logs/clickstream google analytics/omniture facebook/twitter/yelp/ foursquare/google experian/epsilon/acxiom mobile devices sensors product reviews google search results + more many terabytes of data, sometimes many PETABYTES Banks Internal and Purchased Data Wilshire Software Technologies Revised: 14 th Dec BIG DATA

Wilshire Software Technologies Revised: 14 th Dec Big Data Integration – Technology Stack

Wilshire Software Technologies 22 Data Logs RDBMS Analytics

Wilshire Software Technologies Revised: 21/10/ Reporting Tools

81% of global banks say Big Data is a top priority in 2015 Are You Ready? Wilshire Software Technologies Revised: 14 th Dec

Thank You! Questions? Wilshire Software Technologies, based in Hyderabad, India is engaged in Consulting & Training for Big Data Analytics. Contact Information: Madhu Malapaka Managing Director Wilshire Software Technologies Hyderabad, India Cell Wilshire Software Technologies Revised: 14 th Dec