MetaScale is a subsidiary of Sears Holdings Corporation The 3 Ts of Hadoop Wuheng Luo Ankur Gupta 06.2013.

Slides:



Advertisements
Similar presentations
Life Science Services and Solutions
Advertisements

Drive Data Quality at Your Company: Create a Data Lake George Corugedo Chief Technology Officer & Co-Founder.
INTEGRATING BIG DATA TECHNOLOGY INTO LEGACY SYSTEMS Robert Cooley, Ph.D.CodeFreeze 1/16/2014.
© 2014 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Accelerate Business Success With CRM CRM Interoperability.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
A NEW PLATFORM FOR A NEW ERA. 2 Pivotal Confidential–Internal Use Only 2 The Pivotal Big Data Suite.
Business Intelligence
Components of the Data Warehouse Michael A. Fudge, Jr.
David Besemer, CTO On Demand Data Integration with Data Virtualization.
FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS Tomaž Špeh UNECE Workshop on the Modernisation of Statistical Production.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Understanding Data Warehousing
Mobile data. Introduction Wireless (cellular) communications has experienced a tremendous growth in this decade. Most of the wireless users also access.
Ch 5. The Evolution of Analytic Processes
Data Warehouse Architecture. Inmon’s Corporate Information Factory The enterprise data warehouse is not intended to be queried directly by analytic applications,
Video Media Center (VMC 1000 ™ ) Turn communications into content.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
1 Data Warehouses BUAD/American University Data Warehouses.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Agenda Motion Imagery Challenges Overview of our Cloud Activities -Big Data -Large Data Implementation Lessons Learned Summary.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
Matthew Winter and Ned Shawa
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
© 2015 IBM Corporation IBM PureApplication Executive Symposium Diego Segre Vice President, Middleware, Break down the barriers to digital.
Data-Centric Security and User Access Controls for Hadoop on Microsoft Azure MICROSOFT AZURE APP BUILDER PROFILE: BLUETALON BlueTalon provides data-centric.
MAR Capability Overview Deck Protean Analytics.
Microsoft Azure and ServiceNow: Extending IT Best Practices to the Microsoft Cloud to Give Enterprises Total Control of Their Infrastructure MICROSOFT.
Yes, Data Management Can Be Agile! Michele Goetz, Principal Analyst.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Overture Is a Unique Omni-channel E-commerce Platform that Leverages the Power of Microsoft Azure to Orchestrate Every Customer Transaction MICROSOFT AZURE.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
ROLE OF ANALYTICS IN ENHANCING BUSINESS RESILIENCY.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
DATA Storage and analytics with AZURE DATA LAKE
Malektron: Company Profile
Analytics Warehouse P.J. Kelly.
Data Platform and Analytics Foundational Training
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connecting the world of software delivery
Overview of MDM Site Hub
Zhangxi Lin, The Rawls College,
with the Microsoft BI Ecosystem
What the Heck is Next Generation Enterprise IT?
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Get Real Value and Insights from Your Data: Biin Solutions Provides Predictive Analytics, IoT, and Business Intelligence with Microsoft Azure Power MICROSOFT.
Establishing A Data Management Fabric For Grid Modernization At Exelon
Journey of Quality Analysts towards Data Analytics
Welcome! Power BI User Group (PUG)
Swagatika Sarangi (Jazz), MDM Expert
Out of the swamp Suggestions to bring your analytics back on track
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Through the Microsoft Azure Platform, TARGIT Decision Suite Enables Organizations to Analyze Critical Data, Giving Them the Courage to Act MICROSOFT AZURE.
Welcome! Power BI User Group (PUG)
XtremeData on the Microsoft Azure Cloud Platform:
Last.Backend is a Continuous Delivery Platform for Developers and Dev Teams, Allowing Them to Manage and Deploy Applications Easier and Faster MICROSOFT.
Enterprise Integration
School Districts Can Analyze and Report on Data Across Multiple Systems with EdWire, a Powerful Integration Solution that Utilizes Microsoft Azure MICROSOFT.
Business Intelligence
Vision for the Enterprise Data Warehouse (EDW) Programme
UNIT 6 RECENT TRENDS.
Presentation transcript:

MetaScale is a subsidiary of Sears Holdings Corporation The 3 Ts of Hadoop Wuheng Luo Ankur Gupta

The 3 Ts of Hadoop 3-Stage Circular Process of Enterprise Big Data

What is the 3Ts? 3Ts = Transfer, Transform, and Translate A new enterprise big data pattern  to bring disruptive change to conventional ETL  To leverage Hadoop for streamlining data processes  To move toward real-time analytics

The 3Ts Goal To simplify enterprise data processing, reduce latency to turn enterprise data from raw form to products of discovery so as to better support business decisions.

The 3Ts One Liners Transfer Once the Hadoop system is in place, a mandate is needed to immediately and continuously capture and deliver all enterprise data, from all data sources, through all data systems, to Hadoop, and store the data under HDFS. Transform When source data is in, clean, standardize, and convert the data through dimensional modeling. Data transformation should be performed in-place within Hadoop, without moving the data out again for integration reasons. Translate Finish the data flow cycle by turning analytical data aggregated in Hadoop to data products of business wisdom. Use batch and streaming tools built on top of Hadoop to Interact with data scientists and end users.

Hadoop as Enterprise Data Hub “Data Hub” is not a new concept, but: Conventional Data HubHadoop Enterprise Data Hub RDBMS or EDW basedHadoop ecosystem based No consistent architectural style: ODS, MDM, messaging or publish- subscribe, etc. 3-phased architecture to cover full enterprise data flow cycle from data source to data products Heavily reply on ETL3Ts-driven Intermediate, partial, siloedTrue center of enterprise data ……

TRANSFER Sourcing Data into Hadoop Intent Capture continuously all enterprise data at earliest touch points possible, deliver the data from all sources, through all source data systems, to Hadoop, and store the data under HDFS.

TRANSFER Motivation To gain distinctive competing capability, enterprises need to build an integrated data infrastructure as the foundation for big data analytics. Use Hadoop as THE centralized enterprise data repository, and make it the grand destination for all enterprise source data.

TRANSFER (3 Ts’) Transfer vs. (ETL’s) Extract Traditional ETL - ExtractHadoop - Transfer Bottom-upTop-down Task/project specificEnterprise-wide mandate PassiveProactive Data is not available when neededData is ready when needed Same datasets are moved around again and again, with no value added Move the data once, and use it many times, each time with value increased

TRANSFER Consequences BeforeAfter Isolated, disconnected in various siloed data/file systems Consolidated and centralized in Hadoop Monolithically segmentedHeterogeneous, diverse, huge Separated and partialFederated and holistic

TRANSFER Implementation  Always do a data gap analysis first  Fork the ingestion in both batch and streaming if needed  Have a delivery plan for the data feed  Synchronize data changes between source system and Hadoop

TRANSFORM Integrating Data within Hadoop Intent Keep the data flow beyond the ingest phase by transforming the data from dirty to clean, from raw to standardized, and from transactional to analytical, all within Hadoop.

TRANSFORM Motivation As the latency or speed from raw data to business insight becomes the focal point of enterprise data analytics, use Hadoop as data integration platform to perform in-place data transformation.

TRANSFORM Implementation  Partition enterprise-wide standardized data and job-specific analytical data in HDFS, and retain history.  Use dimensional modeling to transform and standardize, make dimensional data as the atomic unit of enterprise data.  Identify all enterprise data entities, and add finest grain attributes to each entity as dimensional data.  Take a bottom-up approach, also think about data usage across the enterprise, not specific task bound.

TRANSFORM (3 Ts’) Transform vs. (ETL’s) Transform Transform in ETL / ELTTransform in 3 Ts in vitro, outside Hadoopin vivo, within Hadoop Use Hadoop as rental spaceUse Hadoop as integration platform Non-value adding data movement in between data storage and transformation Data is transformed while flowing from one partition to another under HDFS High latencyLow latency Network bottleneckData locality

TRANSLATE Making Data Products out of Hadoop Intent Turn analytical data into data products of business wisdom using home-made or commercial tools of analytics built on top of Hadoop. Business decisions supported by data products will help generate more new data, thus a new round of enterprise data flow cycle…

TRANSLATE Motivation Low-latency big data analytics requires right platform/tools Use Hadoop as the platform of choice for enterprise data analytics because of its openness and flexibility Choose analytical tools that are flexible, agile, interactive and user friendly

TRANSLATE Implementation  Big data analytics takes a team effort  Include statisticians, data scientists and developers  Utilize both generic and Hadoop specific technologies  Consider both batch and streaming based approaches  Provide access to pre-computed view and on-the-fly query  Use both home-made and Hadoop-based commercial tools  Use web-based, mobile friendly UI  Visualize

The 3 Ts of Hadoop Continuous Iteration of Enterprise Data Flow

Thank You! For further information visit: MetaScale is a subsidiary of Sears Holdings Corporation