Hybrid Cloud Strategies for Big Data

Hybrid Cloud Strategies for Big Data
Oracle Cloud and Engineered Systems Marcos Arancibia Product Management, Big Data and Data Science Jean-Pierre Dijcks Product Management, Big Data jpdijcks

cloudcustomerconnect.oracle.com Start a conversation with the community and Oracle and vote on existing ideas or submit your own, new ideas to the Big Data Idea Lab.

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Product Update & Roadmap

Oracle Offers you Three Big Data Deployment Choices
Exact equivalents: same architecture, same standards, unified management On-Premises Cloud at Customer Public Cloud Deliver Big Data results and speed time to value with Oracle

3 Pillars of Oracle Big Data
Open Platform Operational Simplicity Enterprise Grade Security

Big Data Configuration Optimizations for Apache Spark
Development project to tune Spark jobs by modifying the setting configuration parameters Reduce the number of execution issues when running Spark jobs Increase Performance out of the box Tested a large number of settings: At Oracle on synthetic workloads At Customer on real workloads Performance Increases from 50% to 70x

“The new Oracle R&D project, BDA optimized performance and reliability for Spark and YARN, improved our Spark Jobs performance by as much as 5 times in some cases. We were happy to be among the early adopters of this new software technology.” Oracle Global Leaders Program. Early adopters. VOYA. USA Debbie Patterson, Voya Financial

“As part of Global Data warehouse Leaders customer we got chance to experience new feature “BDA optimized for Spark” which really boosted our BDA cluster to 3X times with processing time and memory utilization” Laxman Damodar, CIMB Bank

Roadmap: Big Data Cloud Service
Support for Oracle Cloud Infrastructure Supported by Oracle Small to large clusters Secure by Default Full Hadoop Services stack available, based on Cloudera CDH Current Service: UI based, simplified stack upgrades IDCS integration – deep integration of cloud identities with Kerberos and Encryption inside Hadoop Choose your Cloudera version Cloudera 6 support ✔️

Roadmap: Big Data Cloud at Customer
Elasticity: Ability to start with a limited set of OCPUS enabled and either burst or permanently add to the cluster Ability expand storage within the nodes Updated pricing models Mimic Exadata Cloud at Customer models Updated Control Plane Plus: All features relevant in the Big Data Cloud Service roadmap

Roadmap: Big Data Appliance
Migration for Oracle Linux 6 to Oracle Linux 7 Tested and automated process and utilities to move cluster infrastructure to OL7 Uptake of Cloudera 6 Initial version of Cloudera will be 6.0.1 Including migration from CDH 5.x to 6.0.1 note that versions are tbd., so check in on the cloudcustomerconnect forums Subsequent support for 6.1 etc CDH 5.16 (final 5.x release from Cloudera) is currently schedule as BDA and planned for December 2018 BDA X8-2 Hardware mid calendar year 2019

Hybrid Cloud Architectures

Enterprise Data & Reporting
Conceptual Actionable Events Actionable Data Sets Actionable Metrics Input Events Streaming Engine Structured Enterprise Data Data Lake Enterprise Data & Reporting Execution Innovation Data Discovery Output Discovery Lab

Enterprise Data & Reporting
Practical Actionable Events Actionable Data Sets Actionable Metrics Object Store Hadoop/HDFS Input Events Streaming Engine Structured Enterprise Data Data Lake Enterprise Data & Reporting Execution Innovation Notebooks/Analytic Services Data Discovery Output Discovery Lab

Where does Big Data Live - and why?
Scale-out pub-sub platform for messages with data retention and play-back Ingest and hosting system for your events/messages More and more a standard ingest point for all data Scale-out Processing Platform on top of a clustered file system – HDFS Store and analyze data of unknown value and analyze new data More and more a standard for large data sets in the organization Hadoop Replicated low-cost storage platform for any data structures Share data across computing engines Often used to archive or park data Object Store High performance, massively concurrent and secure DBMS system Handle transactions, complex SQL at speed, and mixed / ad-hoc queries Standard for data warehouse workloads across organizations

Where does Big Data Get Deployed - and why?
Elasticity – scale up, out and in when needed at the click of a button Pay for what you use – switch of when not needed Try and Throw Away – experiment with new technology on the cheap Public Cloud On-Prem High Performance – optimized for the hardware, tuned to performance Secure – out-of-the-box and standard, tested and maintained Versatile – install what you need, where you need it for optimal business needs

The Big Data “What goes Where” List
Move the Discovery Lab to Oracle Cloud Start your Big Data / Data Lake environment in Oracle Cloud Move your Back-ups to Oracle Cloud Move your DR to Oracle Cloud Our Focus Today

2 Ways to Implement the Discovery Lab in Cloud
Hadoop/HDFS Autonomous Data Warehouse Object Store Data Lake Enterprise Data & Reporting

Moving data is a key item
Notebooks/Analytic Services Discovery Lab Innovation Create Test Data Sets Discovery Output Execution Hadoop/HDFS Data Lake Enterprise Data & Reporting

Moving Data to Discovery Lab with Big Data Manager
Included with all Big Data offerings (BDA, BDCC and BDCS) Enables massive parallel copy of data leveraging Apache Spark HDFS <--> HDFS HDFS <--> Object Storage (Cloud) File Diff-ing and checking after copies Build and manage pipelines Embedded Zeppelin Notebook Analyze data instantly Analyze at scale with Oracle R Advanced Analytics for Hadoop

Big Data Manager – Moving Data to Discovery Lab
Included with all Big Data offerings (BDA, BDCC and BDCS) Enables massive parallel copy of data leveraging Apache Spark HDFS <--> HDFS HDFS <--> Object Storage (Cloud) File Diff-ing and checking after copies Build and manage pipelines Embedded Zeppelin Notebook Analyze data instantly Analyze at scale with Oracle R Advanced Analytics for Hadoop See Live Demos of Big Data Manager: Autonomous Hub – Big Data Cloud Service Moscone South (Monday – Wednesday)

File browser enables self-services data movement from for example HDFS to Object Storage

Importantly this drag&drop or copy is turned into a Spark program, which is executed or scheduled

Jobs are reusable, editable and can be re-run manually or added into a pipeline.

Build simple pipelines to orchestrate data movement between your environments.

Your Discovery Lab in Today’s Cloud World
Pick your favorite notebook environment and start to code in here against your analytics libraries Use libraries like R, Tensorflow and Caffe for your analytics and ML – if possible in parallel The easiest way to build out a lab is to leverage some known basics and clustering to run jobs in parallel Hadoop/HDFS

The Big Data “What goes Where” List
Move the Discovery Lab to Oracle Cloud Start your Big Data / Data Lake environment in Oracle Cloud Move your Back-ups to Oracle Cloud Move your DR to Oracle Cloud Blogs.oracle.com/datawarehousing

Questions?

Hybrid Cloud Strategies for Big Data
Oracle Cloud and Engineered Systems Marcos Arancibia Product Management, Big Data and Data Science Jean-Pierre Dijcks Product Management, Big Data jpdijcks

Hybrid Cloud Strategies for Big Data

Similar presentations

Presentation on theme: "Hybrid Cloud Strategies for Big Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hybrid Cloud Strategies for Big Data

Similar presentations

Presentation on theme: "Hybrid Cloud Strategies for Big Data"— Presentation transcript:

Similar presentations

About project

Feedback