Migrating Master Data to a Data Lake

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Supporting End-User Access
Oncor’s EIM Program.
Complete Information Integration A Shared Architecture for Operational and Analytic Integration Andy Flower Vice President Session # 101.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 Components of A Successful Data Warehouse Chris Wheaton, Co-Founder, Client Advocate.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Application Architecture Sample Presenter’s Name Presenter’s Title Organization,
Logical Data Models for Agile BI David D. Schoeff Teradata - EDW Data Architect & Principal Consultant.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
The GPAA RFP to implement Enterprise Data Management 1 GPAA15/2015.
UBC IT Integrated Reporting Governance Committee June 13 th, 2011.
Interfacing Registry Systems December 2000.
Data Warehouse Development Methodology
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
SAP Decision Support Environments in Higher Education
CSS/417 Introduction to Database Management Systems Workshop 4.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
CISB594 – Business Intelligence Data Warehousing Part II.
Zhangxi Lin Texas Tech University
© Copyright IBM CorporationSolvency II prototype – Solution Design focused on your success 1 02/09/1002/09/10.
Andy Roberts Data Architect
Using Big Data for Customer Analytics at Transamerica David Beaudoin Vishal Bamba John LoGiudice Enterprise Computing Community Conference Marist College.
Business Insights Play briefing deck.
Energy Management Solution
Big Data & Test Automation
The BI360 Business Intelligence Suite
OMOP CDM on Hadoop Reference Architecture
Protecting a Tsunami of Data in Hadoop
Big Data Enterprise Patterns
PROTECT | OPTIMIZE | TRANSFORM
Hadoop and Analytics at CERN IT
Getting Down to Business
Accelerate Your Journey to Cloud
Zhangxi Lin Texas Tech University
IC Conceptual Data Model (CDM)
Overview of MDM Site Hub
Chapter 14 Big Data Analytics and NoSQL
Components of A Successful Data Warehouse
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Energy Management Solution
Partner Solution Overview
Establishing A Data Management Fabric For Grid Modernization At Exelon
Operationalize your data lake Accelerate business insight
SYSTEMART, LLC We Optimize. You Capitalize Software Application Development
If Data Has Value, Is Your Business Realizing It?
Out of the swamp Suggestions to bring your analytics back on track
Confidential – Oracle Internal/Restricted/Highly Restricted
Business Intelligence
MANAGING DATA RESOURCES
MMISR- Project Certification Committee
Stop Data Wrangling, Start Transforming Data to Intelligence
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
06 | Managing Enterprise Data
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Supporting End-User Access
Technical Capabilities
ARTHA SOLUTIONS Corporate Overview Your Premier Partner for
Kovaion Consulting IT Services Portfolio Date : Apr-2015
Enterprise Architecture at Penn State
Data Warehousing Data Mining Privacy
Welcome to SQLSaturday #767! Hosted by Lincoln SQL Server User Group
Oracle 1z0-928 Oracle Cloud Platform Big Data Management 2018 Associate.
Data Wrangling for ETL enthusiasts
Customer 360.
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
SQL Server 2019 Bringing Apache Spark to SQL Server
Presentation transcript:

Migrating Master Data to a Data Lake DAMA Chicago – December 2017 Chapter Meeting

My Background Employed by Protective Insurance (just started in October of this year): Senior Enterprise Data Architect Previous employer was CNO Financial Group (Director – Data Strategy & Architecture) Experience (IT, over 25 yrs; Data focus, nearly 20 yrs): Disciplines: Enterprise Data Strategy, Data Architecture, Data Design, Data Integration, Reference & Master Data, Data Warehousing, Business Intelligence, Metadata, Data Quality, Data Governance Industries: Insurance & Financial Services Pharmaceutical State Government Manufacturing Other Items: Founding member (since 2009) and current President (2016) of DAMA Indiana chapter Hold CDMP certification (Master level since 2010) Contributing author to DM-BOK2 (Reference & Master Data) released June of this year

Discussion Topics Current State Review Future State Proposed Data Model Data Architecture Future State Proposed Overall Architecture Data Lake Specific Big Data POC (Proof of Concept) Environment Setup Use Case Review POC Results Items on Deck Data Access / Presentation Layer Information Governance Implications Wrap up and Questions

Current State – Enterprise Data Model (High-level Conceptual) Main Business Entities: (9 in Total) Product (Coverage Master) Client (Consolidated Level View) Party (Source Level View) Point of Contact (Communication Method) Agent (Producer Contracts & Licenses) Application (for Policy Coverage) Policy (Pending, Active, or Terminated) Claim (Submitted against Policy) Event (Type and Timestamp) Subject Area Relationships: Identify Relationship Type / Role Enterprise Data Glossary: Business Terms & Attributes Vetted by Data Governance Council

Current State – Data Sharing Model (High-level Logical) Current Data Design: Relational Model Abstract Design Source Linkage and Lineage Lends Itself to Columnar Reference Entities: Static Reference Data Environment Metadata Subject Area Entities: Domain Specific (by Business Entity) Key-value Pairs (Simulate Columnar) Model instantiated for each Subject Area identified (9 in total)

Current State – Data Sharing Architecture Current Data Stores (all Oracle): Landing Zone Master Data Hub Enterprise Data Warehouse Current Data Flows: Traditional ETL (Informatica) Custom Extracts (COBOL, PL/SQL) Current Reporting & Analytics: Static (Business Objects) Visualization (Tableau) Predictive / Statistical (SAS) Current Data Profiling: Informatica IDQ and Traditional SQL

Future State – Proposed Architecture Data Layer Components: Operational Zone Presentation Zone + DV Data Lake (BDE) Ad-Hoc Zone Data Flows: Batch (solid black lines) Service (solid red lines) proxied via ESB RT Query (dashed black lines) All Data Layer components expected to be on-prem with exception of Ad-Hoc Zone (to enable variable use and cost models)

Future State – Proposed Architecture Architecture Approach: Assure Data Centric Design as Hub-n-Spoke Reduce Point-to-Point Enable Data Accessibility Implement Data Services Data Layer as Hub: Manage Client Identities Proxy Transactions Implement EDW Provide Data Domain Perspective Views Curate Master Data Link Transactional Data Enable Data Archiving Establish Enterprise LZ

Future State – Proposed Data Lake Data Lake Environment: Cloudera distribution of Hadoop 14 Node cluster (10 data, 4 name/edge) Technical Considerations: Enterprise Landing Zone (HDFS + Hive) Archive Zone (HDFS) Curation Zone (Hive + Impala + Kudu) Insights Zone (Hive + Impala + HBase) Sandbox Zone (Hive + Hbase + SAS) Ingestion (Sqoop + Syncsort) Transformation (M/R + Hive + Python + SAS) Existing MDS Hub to be migrated from relational Oracle data store to columnar Kudu data store Existing ETL to be migrated from Informatica to Hive + Impala Utilize Security Toolset from Cloudera to ensure Data encrypted at rest Note that Informatica BDM (Big Data Management) suite was reviewed / considered

Data Lake POC (Proof of Concept) POC Environment: MS Azure (IaaS set up) Cloudera distribution of Hadoop 4 Node cluster (3 data, 1 name/edge) Focused on Three (3) Use Cases: Actuarial Valuation Analysis (Single Product Type) Ingestion of Relational and Mainframe Data Data Service Query (Performance Goal <= 300ms) Results: Condensed Valuation Process (From Two Weeks to Twenty Hours) Ingestion of Relational Data (via Sqoop) and Mainframe Data (via Syncsort) Successful Mirrored 1000 simultaneous executions (Average Response Time Obtained of 150ms)

Next Steps – Items on Deck Data Access / Presentation Layer: Perform POC on Data Virtualization Product (Denodo) Determine How to Package Conformed Dimensions from EDW to Present ‘Perspective Views’ Establish Integration Patterns within ESB Environment (Semantic / Taxonomic Messaging Approach) Execute Performance Testing of Data Service Queries from Presentation Zone Information Governance Implications: Establish Governance Policies Determine Data Classification Approach Define Security Architecture for Data Lake Identify Access Roles and Security Controls Certify Security of Data Lake Environment

Next Steps – Plans for 2018 Funding Secured for POC Environment until June: But Establish a Larger Cluster (10 data, 4 name/edge) Along with Security Set-up and Data Encryption Collaborate with Business Areas on new / expanded prospective Use Cases: Expand Actuarial Valuation to Other Product Types Additional Actuarial Items outside of Valuation Agent Recruiting and Retention Claims Fraud (although this one has a long tail…) Customer Experience (Journey Map and/or Retention) Go on the Road… Presentations to Business Partners and IT folks Extoll the Value of BD and Future State Architecture Troll for Funding…$$$ (Sad but true…)

Recap In the end it is all about… Current State Review Data Model (Conceptual and Logical) Data Architecture Future State Proposed Overall Architecture (Layout and Approach) Data Layer Components Data Lake Environment Big Data POC (Proof of Concept) Environment Setup Use Case Review POC Results Items on Deck Data Access / Presentation Layer Information Governance Implications Next Steps

Thank You For Your Time and Interest…!!! Contact Information: Gene Boomer Protective Insurance gboomer@protectiveinsurance.com