©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Supporting New Business Imperatives Creating a Framework for Interoperable Media Services (FIMS)
© Experian Information Solutions, Inc All rights reserved. Confidential and proprietary. Press ALT+F4 to quit Take a risk-based approach to authentication.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Mutual OATH HOTP Variants 65th IETF - Dallas, TX March 2006.
Copyright © 2005, SAS Institute Inc. All rights reserved. Making the Transition from MDDB-based OLAP Applications to a SAS ® 9 OLAP Solution Ivy Parker.
Power BI Sites and Mobile BI. What You Will Learn Sharing and Collaboration Introducing Power BI Exploring Power BI Features and Services Partner Opportunities.
1 ©2012 Experian Information Solutions, Inc. All rights reserved. Experian Public ©2012 Experian Information Solutions, Inc. All rights reserved. Experian.
VoipNow Core Solution capabilities and business value.
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
© 2009 Experian Limited. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other.
BI in the Cloud – Sky is the limit Vishal Agrawal Product Technical Architect Infosys Tech Ltd Anand Govindarajan Principal Technology Architect Infosys.
Deploying Visual Studio Team System 2008 Team Foundation Server at Microsoft Published: June 2008 Using Visual Studio 2008 to Improve Software Development.
©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
© Experian Limited All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other.
© Experian Information Solutions, Inc All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
1 ©2012 Experian Information Solutions, Inc. All rights reserved. Experian Public. ©2012 Experian Information Solutions, Inc. All rights reserved. Experian.
© Experian Information Solutions, Inc All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
1 ©2013 Experian Information Solutions, Inc. All rights reserved. Experian Public ©2013 Experian Information Solutions, Inc. All rights reserved. Experian.
Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27 th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager.
© 2010 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
© Experian Limited All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other.
IBM Remote Data Protection and IBM Remote Data Express PART 1 IPS – Information Protection Services October 2008 © Copyright International Business Machines.
© Experian Information Solutions, Inc All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Chapter 4: Organizing and Manipulating the Data in Databases
Introduction to Hadoop and HDFS
The Eyeblaster ACM Advertising Campaign Management.
© 2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
© Experian Limited All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other.
Electronic Health Records: Healthcare System’s Common Trends Based on Cloud Computing Group 2: OU Jin FANG Ting
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
MDB Connectivity Scalability Tests r11 October 25 th
Cloud Computing Project By:Jessica, Fadiah, and Bill.
© 2011 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Microsoft’s Worldwide Marketing Database with Windows 2000 Datacenter Server Scaling Up to the Needs of the Worldwide Marketing Database with Windows.
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Virtual Classes Provides an Innovative App for Education that Stimulates Engagement and Sharing Content and Experiences in Office 365 MICROSOFT OFFICE.
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Avanade Confidential – Do Not Copy, Forward or Circulate © Copyright 2014 Avanade Inc. All Rights Reserved. For Internal Use Only SharePoint Insights (BETA)
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Ellis Paul Technical Solution Specialist – System Center Microsoft UK Operations Manager Overview.
Built atop SharePoint Online, WorkPoint 365 Offers a Project and Case Management Solution to Boost Business Productivity and Deliver Governance OFFICE.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Easily Organize Common Templates, Phrases, Statements and More to Quickly Reuse Within Microsoft Office 365 Using Dolphin Compose WHAT WE OFFER Replace.
©2012 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Introducing Novell ® Identity Manager 4 Insert Presenter's Name (16pt) Insert Presenter's Title (14pt) Insert Company/ (14pt)
The VERSO Product Returns Portal Incorporates Office 365 Outlook and Excel Add-Ins to Create Seamless Workflow for All Participating Users OFFICE 365 APP.
Solving Today’s Data Protection Challenges with NSB 1.
© 2011 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian.
Microsoft Partner since 2011
Microsoft Ignite /28/2017 6:07 PM
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
SQL Server 2008 R2 Report Builder 3.0 SQL Server 2008 Feature Pack Report Builder 2.0 SQL Server 2008 General Availability Authoring & Collaboration (Acquisition:
© 2009 Experian Limited. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other.
The Big Data Network (phase 2) Cloud Hadoop system
Data Platform and Analytics Foundational Training
Power BI Premium overview
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Establishing A Data Management Fabric For Grid Modernization At Exelon
Automation in an XML Authoring Environment
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
BluVault Provides Secure and Cost-Effective Cloud Endpoint Backup and Recovery Using Power of Microsoft OneDrive Business and Microsoft Azure OFFICE 365.
Office 365 and Microsoft Project Integrations for HULAK Project Management Software Enable Teams to Remain Productive and Within Budget OFFICE 365 APP.
UNIT 6 RECENT TRENDS.
Presentation transcript:

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.

©2014 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Information Solutions, Inc. Other product and company names mentioned herein are the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian. Experian Confidential. Data Hub Enabling easy and safe access to Experian’s data Greg Bonin Principal Scientist | Experian DataLabs

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 3 Introduction and overview How do we cost-effectively and safely provide simple access to Experian’s internal data to clients and ourselves? From Experian Analytical Sandbox™ to data hub  Extending the Experian Analytical Sandbox™ to other parts of Experian  Making the Experian Analytical Sandbox™ into a delivery platform The Experian Analytical Sandbox™ – a case study  What is it?  How did we build it?

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 4 An ad-hoc environment where clients and internal users can access something like MAD(Monthly Analytic Dataset) and perform statistical analysis What is the Experian Analytical Sandbox™  Dataset will be shared across many users (should be scalable)  Underlying data will be anonymized (but real data)  Dataset should contain all records (not a sample)  Dataset will be shared across many users (should be scalable)  Underlying data will be anonymized (but real data)  Dataset should contain all records (not a sample)  Client’s have their own environment, where they may bring in data  Clients should not be able to pull data out of the system  Client’s have their own environment, where they may bring in data  Clients should not be able to pull data out of the system  Clients must be able to access data through SAS Key design goals

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 5 What is the MAD data?  Raw tradeline data (one record per trade per consumer)  Various scores and attributes (one record per consumer)  MAD data is a 10% sample of U.S. consumers and is typically produced monthly How much storage do we need?  We want to store 100% of the raw files ► One month of 100% file is approximately 10TB (uncompressed)  Five years of monthly history needed for analytical use  Our total storage needs are around 700TB! Experian Analytical Sandbox™ – Data requirements

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 6 Experian Analytical Sandbox™ – Design overview  Utilize Hadoop as a cost efficient scalable data store  Access data through HIVE  Strong authentication via Kerberos  Leverage CITRIX to ensure all data stays within Experian  Utilize Hadoop as a cost efficient scalable data store  Access data through HIVE  Strong authentication via Kerberos  Leverage CITRIX to ensure all data stays within Experian

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 7 Cluster Specs  30 node Hadoop cluster running CDH  128GB and 16 cores per data node  700TB total disk (usable ~230TB) Cost  ~$700,000 for hardware  Funded by CIS Usage  Currently have one client(AMEX). Current contract recovers most of initial cost What Do We Have Now

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 8 We need to store and access large amounts of data in a cost-effective way  Works well with off-the-shelf hardware  Can meet performance needs by adding servers  Limited licensing costs We want to make the data access easy and flexible  Hadoop supports several SQL like languages (Hive, Impala, etc.)  We needed to integrate with SAS, which works with Hive Usage pattern fits well with Hadoop Shared data store – Why Hadoop?

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 9 Hadoop does not have strong authentication by default  We used Kerberos to handle the authentication … which was painful to setup  Complicates client applications as they need to support Kerberos SAS and Hadoop are not ideal bed-fellows  Pulling large quantities of data down through SAS is slow  It is hard to force SAS to utilize the cluster efficiently  Managing DB permissions with SAS is annoying Technical challenges

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 10 Case study – Using the Experian Analytical Sandbox™ to answer questions Auto opened recentlyHas autoOverall “What is the trend of VantageScore ® for people who recently obtained an auto loan?”  A simple SQL query was able to answer this question in 2.5 minutes ► Process involved joining a 2TB file with a 250GB file  Similar analysis using SAS on a single server could take x longer

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 11 Building Experian Analytical Sandbox’s™ for other parts of Experian Experian Analytical Sandbox™ name Type of dataPotential use Business Information Services Raw trade line data Similar to use case for Experian Analytical Sandbox™ except less regulatory sensitivity Healthcare Claims and eligibility checks from Experian Healthcare Provide researchers or private parties a rich data set to analyze Digital Advertising IP impression information (Audience IQ SM ) Device ID’s (41 st Parameter ® ) Allow third parties to use this data for model-building or reporting ConsumerView SM Monthly-trended ConsumerView SM data Provide insight into changes in demographic data over time  Opportunities exist to build more sandboxes  Building more sandboxes across Experian’s data assets will allow broad, safe access to data  We believe this would lead to increased opportunities for innovation

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 12 The “Cloud” landscape Data Tools Client-driven Proprietary Client-driven Proprietary

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 13  Extending our design will allow solutions developed in the Experian Analytical Sandbox™ to be deployed  Using Experian tools will allow quick deployment of models ► Example: Model outputs written in PMML would allow quick deployment From Experian Analytical Sandbox™ to Data Hub

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. 14  The Experian Analytical Sandbox™ is one way to make Experian’s internal data easier to access and use ► Making access to data easier reduces barriers to innovation  Extending the functionality of the Experian Analytical Sandbox™ could lead to a new way of using Experian data ► Easy and safe access to raw data can allow clients to understand their customers better ► Streamlined deployment can make those insights actionable Conclusion

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. #FOIC2014

©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential. Greg Bonin Principal Scientist Experian DataLabs e: t: (858)