Download presentation
Presentation is loading. Please wait.
Published byEmerald Harrison Modified over 8 years ago
1
© Hortonworks Inc. 2013 Speaker: Jamie Engesser, Hortonworks Big Data: Making Sense of it All! Big Data is everywhere. We see it on commercials. We hear it in conversations over coffee. It is an expanding topic in the boardroom. The hype is palpable but what is real and better yet, how does it affect the status quo? At the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of new, multi-structured data that doesn't fit into traditional systems. In this session we will discuss: ?The evolution of Apache Hadoop and Hadoop's role within enterprise data architectures ?The relationship between Hadoop and existing data infrastructures such as the enterprise data warehouse Use-cases and best practices on how to incorporate Apache Hadoop into your big data strategy ?Common patterns of Hadoop Use, Refine, Explore and Enrich and how they impact the Financial Services industry Page 1
2
© Hortonworks Inc. 2013 Jamie Engesser VP, Solutions Engineering E-mail : jamie@hortonworks.com Page 2 Big Data: Making Sense of it all!
3
© Hortonworks Inc. 2013 Data Driven Business? Facts not Intuition! Page 3 1110010100001010011101010100010010100100101001001000010010001001000001000100000100 01001001000100001011100001001000100010100100101111010100100010010010100101001001111 1001010010100011111010001001010000010010001010010111101010011001001010010001000111 Data driven decisions are better decisions – its as simple as that. Using big data enables managers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management Harvard Business Review October 2012
4
© Hortonworks Inc. 2013 Web giants proved the ROI in data products applying data science to large amounts of data Page 4 Amazon: 35% of product sales come from product recommendations Netflix: 75% of streaming video results from recommendations Prediction of click through rates
5
© Hortonworks Inc. 2013 Page 5
6
© Hortonworks Inc. 2013 Page 6
7
© Hortonworks Inc. 2013 Enterprise Data Trends @ Scale Page 7 Organizations are redefining data strategies due to the requirements of the evolving Enterprise Data Warehouse (EDW). Hadoop is in the center around those strategies.
8
The Need for Hadoop Allows semi-structured, unstructured and structured data to be processed in a way to create new insights of significant business value. Instead of looking at samples of data or small sections of data, organizations can look at large volumes of data to get new perspective and make business decisions with higher degree of accuracy. Reducing latency in business is critical for success. The massive scalability of big data systems allow organizations to process massive amounts of data in a fraction of the time required for traditional systems. Self-healing, extremely scalable, highly available environment with cost- effective commodity hardware. Traditional Database SCALE (storage & processing) Hadoop Platform NoSQL MPP Analytics EDW
9
© Hortonworks Inc. 2013 Cost per Terabyte Adoption Page 9 Size of Bubble Equal to Cost Effectiveness Of Solution Source: Think Big Analytics Hadoop has the capability to process extremely large volumes of data, much faster and at a fraction of the cost of traditional data systems.
10
© Hortonworks Inc. 2013 A little history… it’s 2005
11
© Hortonworks Inc. 2013 A Brief History of Apache Hadoop Page 11 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2007: Yahoo team extends focus to operations to support multiple projects & growing clusters Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 20042008201020122006 STABILITY 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
12
© Hortonworks Inc. 2013 Storage Apache Hadoop: Center of Big Data Strategy Open Source data management with scale-out storage & distributed processing Page 12 HDFS Distributed across “nodes” Natively redundant Name node tracks locations Processing Map Reduce Splits a task across processors “near” the data & assembles results Self-Healing, High Bandwidth Clustered Storage Key Characteristics Scalable –Efficiently store and process petabytes of data –Linear scale driven by additional processing and storage Reliable –Redundant storage –Failover across nodes and racks Flexible –Store all types of data in any format –Apply schema on analysis and sharing of the data Economical –Use commodity hardware –Open source software guards against vendor lock-in
13
© Hortonworks Inc. 2013 OSCloudVMAppliance Enterprise Hadoop Distribution Page 13 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Process and Access Data Enterprise Readiness: HA, DR, Snapshots, Security, … HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Hortonworks Data Platform (HDP) Enterprise Hadoop The 100% open source and complete distribution Enterprise grade, proven and tested at scale Ecosystem endorsed to ensure interoperability HDFSYARN (in 2.0) WEBHDFSMAP REDUCE HCATALOG HIVEPIG HBASE SQOOP FLUME OOZIE AMBARI
14
© Hortonworks Inc. 2013 Where does it fit in the enterprise? Page 14
15
© Hortonworks Inc. 2013 Existing Data Architecture Page 15 APPLICATIONS DATA SYSTEMS TRADITIONAL REPOS RDBMSEDWMPP DATA SOURCES OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications
16
© Hortonworks Inc. 2013 Existing Data Architecture Page 16 APPLICATIONS DATA SYSTEMS TRADITIONAL REPOS RDBMSEDWMPP DATA SOURCES OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications New Sources (web logs, email, sensor data, social media) ?
17
© Hortonworks Inc. 2013 An Emerging Data Architecture Page 17 APPLICATIONS DATA SYSTEMS TRADITIONAL REPOS RDBMSEDWMPP DATA SOURCES MOBILE DATA OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications
18
© Hortonworks Inc. 2013 Interoperating With Your Tools Page 18 APPLICATIONS DATA SYSTEMS TRADITIONAL REPOS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications DATA SOURCES MOBILE DATA OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
19
© Hortonworks Inc. 2013 Patterns of Use Page 19
20
© Hortonworks Inc. 2013 Operational Data Refinery Page 20 DATA SYSTEMS DATA SOURCES 1 3 1 Capture Capture all data – New and Traditional Process Parse, cleanse, apply structure & transform Exchange Push to existing data warehouse for use with existing analytic tools 2 3 Refine Explore Enric h 2 APPLICATIONS Collect data and apply a known algorithm to it in trusted operational process TRADITIONAL REPOS RDBMSEDWMPP Business Analytics Custom Applications Enterprise Applications Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
21
© Hortonworks Inc. 2013 Big Data Exploration & Visualization Page 21 DATA SYSTEMS DATA SOURCES Refine Explore Enrich APPLICATIONS 1 Capture Capture all data Process Parse, cleanse, apply structure & transform Exchange Explore and visualize with analytics tools supporting Hadoop 2 3 Collect data and perform iterative investigation for value 3 2 TRADITIONAL REPOS RDBMSEDWMPP 1 Business Analytics Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications
22
© Hortonworks Inc. 2013 Application Enrichment Page 22 DATA SYSTEMS DATA SOURCES RefineExplore Enrich APPLICATIONS 1 Capture Capture all data Process Parse, cleanse, apply structure & transform Exchange Incorporate data directly into applications 2 3 Collect data, analyze and present salient results for online apps 3 1 2 TRADITIONAL REPOS RDBMSEDWMPP Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications NOSQL
23
© Hortonworks Inc. 2013 Use Cases Page 23
24
© Hortonworks Inc. 2013 Hadoop Handles Five New Data Types Not Suited for Relational Databases Data TypeDefinition Sentiment Data on opinions, emotions, and attitudes contained in social media, blogs, news, product reviews, and customer support interactions. Examples: Twitter, Facebook, LinkedIn, website comments Clickstream A virtual trail that a user leaves behind while visiting a website. A clickstream is a record of a user's activity on the Internet. For users not logged in to the site, this data may be captured using cookies. Examples: pages visited, length of visit per page, flows between pages, interaction with web forms, bounce rates Sensor / Machine-generated Data that is automatically created from a computer process, application, or other machine without the intervention of a human. Examples: smart electric meters, network event logs, manufacturing QA Geo tracking Location data from connected devices whose position is determined using GPS or by triangulation from cell towers. Examples: oil and gas exploration, first responders, defense Free-form text Information that does not have a pre-defined or predictable data model or does not fit well into relational tables. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Examples: CRM record, books, documents, metadata, health records, video, e- mail messages, or Web page content
25
© Hortonworks Inc. 2013 Key use-cases in Finance/Insurance Trading Analysis: –How do I predict Trading trends based on market sentiment? –How do I test Algorithms against years vs days of data? Fraud detection: –Detect illegal credit card activity and alert bank/consumer –Detect illegal insurance claims Customer risk profiling: –How likely is this customer to pay back his mortgage? –How likely is this customer to get sick? Internal fraud detection (compliance): –Is this employee accessing financial information they are not allowed to access? Page 25
26
© Hortonworks Inc. 2013 Twenty Hadoop Enterprise Use Cases Page 26 VerticalUse CaseData Type Financial Services New Account Risk ScreensText Fraud PreventionSystem Logs Horizontal View of Trading RiskSystem Logs 360° View of the CustomerClickstream, Machine, Text Data Combination for Insurance ClaimsGeographic, Text Telecom Ingestion and Storage of Call Detail Records (CDRs)Machine Throttling Network Bandwidth IntelligentlySystem Logs Next Product to Buy (NPTB)Clickstream, Geographic Network and Product Adoption PatternsSystem Logs Preventative Maintenance and RepairsMachine Retail 360° View of the CustomerClickstream, Sensor Analyze Brand SentimentSentiment Supply Chain and LogisticsGeographic Website and eCommerce OptimizationClickstream Optimize Store Layout Using Sensor DataSensor Manufacturing Supply Chain and LogisticsSensor Sensor Data for Assembly Line Quality AssuranceSensor Ingest and Storage of Call Detail Records (CDRs)Machine Preventative Maintenance and RepairsMachine Identify Product Defects in the Social Media StreamSentiment
27
© Hortonworks Inc. 2012 Financial Services Solutions Page
28
© Hortonworks Inc. 2012 Enterprise Fin. Serve Vision Page 28 Data Reservoir Internal Data LOB Specific (Private) External Data (Public) Third Party Data (Private) Classic Data Integration & ETL Data Movement ETL Security Access / Encryption De-Identification Masking / Tokenization Aggregation Materialized Views 1234 Encrypt Firewall Distillery -Data Movement -Security (Access / Encryption) -De-Identification -Aggregation -Data Inventory / Catalog Distillery -Data Movement -Security (Access / Encryption) -De-Identification -Aggregation -Data Inventory / Catalog Explore Harvest Visualization MPP / SQL Map Reduce Analytics Output -Data -Models -Algorithms -Dashboard -Reports -Apps Output -Data -Models -Algorithms -Dashboard -Reports -Apps Catalog Inventory 5
29
© Hortonworks Inc. 2013 “Data Reservoir” Architecture Financial Services all Check Systems Trading EWS Banker Notes Acct Tables LOAD Sqoop Flume Table meta- data repo HCatalog Table meta- data repo HCatalog Web HDFS Data transformation Pig & Hive Data transformation Pig & Hive REFINE VISUALIZE EDW ML BI/Vis ual Page 29
30
Data Refinery Primary use case is to provide Capital Markets a common landing zone for structured and unstructured data Reduce ETL operational complexity by providing data sources and applications a single point of ingest and export for batch- driven information Offload EDW processing as desired: process and analyze data on HDP and transfer only what is required to the EDW Schema-on-read (HCatalog) instead of Schema-on-write (RDBMS) allows for retention of raw data and exportation in multiple and convenient schemas Storage and Processing cluster performs double duty as a centralized repository for audit and other regulatory requirements (OSC, SEC-CAT, etc)
31
Trade Analytics Holding Trade data in an EDW is not a long term feasible strategy Trades have thin margins, long-term analysis on trade performance can yield value Trade algorithms generate a great deal of exhaust data (order trails, logs) that can be analyzed to choose and improve algorithms Models used for real-time analytics or actions can be derived and maintained by ongoing large data- set analytics made possible by HDP Analyze Trade Exception Management in one location
32
Risk Assessment Risk Assessment is more focused on events/activities rather than assets; effective risk analysis needs to be based on activity modeling from as much data as possible Risk Assessment algorithms can be run on full event data rather than aggregate totals Computationally expensive “What If” predictive modeling scenarios can be executed frequently over data sets from multiple business lines Store all client and trade information in a single location; linking related data becomes a data processing task rather than a data standardization task Execute more Risk Assessment algorithms more often
33
Analytical Use Cases Build on Simple Store Use Case Trade Analytics -Improve Trade Algorithm Performance -Assess Trade Performance for Individuals and suggest actions -Retain data for analysis and audit in a single location DataTrades, Settlements, Exceptions, Client Information, Client Txn History, other market APIs Key Architecture HDP Cluster, CEP/HBase, Flume Ingestion Framework AnalyticsMahout/R, External BI via Hive/ODBC Risk Management -Calculate Risk based on “what- if” scenarios by portfolio -Create models for real-time usage on entire trade data sets -Execute computationally intensive optimization scenarios to mitigate risk and provide cut-off parameters to trade algorithms DataTrades, Settlements, Exceptions, Client Information, Client Txn History, other market APIs Key Architecture HDP Cluster, CEP/HBase, Flume Ingestion Framework AnalyticsMahout/R, External BI via Hive/ODBC Data Refinery on HDP
34
Increasingly complex use cases can be added over others… Data Refinery on HDP Risk Management Trade Analytics Predictive Portfolio Models Trade Algorithm Optimization
35
© Hortonworks Inc. 2013 Fraud Prevention Business Problem Financial institutions are always at risk of fraud. Today’s bank robbers use data more than masks and guns. Many are sophisticated criminals that test the company’s risk controls, looking for vulnerabilities to exploit. This testing leaves a trail of activity that is often too subtle for any individual to notice and “fraudsters” know to keep their illegal activity below a level where the institution, its algorithms, or law enforcement would take notice. Hadoop can detect those subtle patterns. Solution Financial services companies can use Hortonworks Data Platform (HDP) to reduce the cost to detect fraudulent behavior, and keep more fraudsters from stealing their money. Fraud-detection algorithms on top of Hadoop can examine more data for a longer period of time. Risk teams can pull from the HDP data lake to examine years of detailed financial transactions, IP addresses, and account records. This makes their fraud applications smarter than the many thieves who elude them today. Financial Services System Logs Page 35
36
© Hortonworks Inc. 2013 Rogue Trading Financial traders make large authorized transactions on behalf of their companies. In a few high-publicity events, certain traders learned those systems so well that they were able unauthorized trades that went undetected until the institution faced catastrophic losses. In 1995, Nick Gleason lost more than £800 million and drove Barings Bank out of business. More recently, rogue traders at Société Générale and JPMorgan Chase created billions of dollars of losses. Hortonworks Data Platform (HDP) can help identify trading patterns like Gleason’s: those too subtle or complex for any human minder to easily identify. Data irregularities between data managed by different systems can raise red flags if found early on. Often, retaining more data for longer can expose previously hidden patterns. HDP satisfies the need analyze data from application logs, emails and other exhaust data to conduct an investigation and prevent massive losses. Financial Services Data Refinery Page 36
37
© Hortonworks Inc. 2013 Combine Structured & Unstructured Data for Insurance Claims Much of the valuable information stored in an insurance company’s claim systems is text-based and unstructured. The ability to extract that data and combine it with structured data gives the insurance provider new tools to more accurately measure risk. They can use that insight to protect themselves from moral hazard, offer rate discounts to low-risk customers and even offer new insurance products that they could not underwrite previously, when data was scarcer. For example, sensor data now helps insurance companies more accurately assess the risk incurred by a driver based on actual driving patterns, rather than the average miles driven by someone in that demographic. But all of these new capabilities require the insurance company to store and process more data, from more sources, for longer. Hortonworks Data Platform (HDP) is ideally suited to deliver greater insight and confidence that comes from managing more data. Page Financial Services Data Refinery
38
© Hortonworks Inc. 2013 Set Interest Rates to Maximize Spread on Deposits Retail banks make revenue by attracting deposits and then loaning those deposits at an interest higher than the rate they pay to depositors. This is the “spread.” Banks sometimes find that the profitability of particular deposit accounts is declining. Deposit pricing (the interest rate that the bank offers to customers) may be out of line with the market, leading to risk customer attrition or declining margins. A pricing strategy known as “retire and replace” manages this risk. The bank launches a new deposit product and ceases to offer the retired product. Some customers may switch from the retired account to the replacement product, but this occurs at a manageable rate. Hortonworks Data Platform (HDP) helps bank product managers set the profit-maximizing interest rate for the replacement product and respond more intelligently to their competitors’ price response. Page Financial Services Data Refinery
39
© Hortonworks Inc. 2013 Accelerate Processing of Loan Documentation Banks originate, process and settle loan transactions with large volumes of unstructured data in paper documents, faxes and emails. Multiple teams may handle a particular loan file, contributing to errors. Lenders have responded to the changing regulatory environment by instituting more audits and internal controls. Unfortunately, auditors are often forced to manually search paper and electronic files to locate documents. This causes additional delay, which reduces both servicing margins on a loan portfolio and customer satisfaction. Hortonworks Data Platform (HDP) brings much of that unstructured data into view. By facilitating the entire process of finding and analyzing data, HDP reduces errors and improves operational efficiency within the loan processing team. Financial Services Data Refinery Page 39
40
© Hortonworks Inc. 2013 Protect the Brand with Sentiment Analysis Sentiment analysis, for example, tells companies what people are feeling about their brands. The availability of social media sources creates a relatively new way to gauge public sentiment. Solid sentiment analysis can be considered straightforward, as the data resides outside the firm and is therefore not bound by organizational boundaries. Hortonworks Data Platform (HDP) can store and process the social media feeds that make sentiment analysis possible. The resulting insight can help the company understand its customers better and make sure that its brand and service levels match what those customers expect. Page Financial Services New Analytic Applications
41
© Hortonworks Inc. 2013 Becoming Data Driven Page 41
42
© Hortonworks Inc. 2013 4 Path to Becoming Big Data Driven “Simply put, because of big data, managers can measure, and hence know, radically more about their businesses, and directly translate that knowledge into improved decision making & performance.” - Erik Brynjolfsson and Andrew McAfee Key Considerations for a Data Driven Business 1.Large web properties were born this way, you will have to adapt a strategy 2.Start with a project tied to a key objective or KPI – Don’t OVER engineer 3.Make sure your Big Data strategy “fits” your organization and grow it over time 4.Don’t do big data just to do big data – you can get lost in all that data
43
© Hortonworks Inc. 2013 Thank You! Questions & Answers Page 43
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.