Download presentation
Presentation is loading. Please wait.
Published byMelanie Perry Modified over 6 years ago
1
Confidential – Oracle Internal/Restricted/Highly Restricted
2
Big Data Made Simple with Oracle Data Flow
Just Add Code Big Data Made Simple with Oracle Data Flow Carter Shanklin Product Management Oracle Confidential – Oracle Internal/Restricted/Highly Restricted
4
The Challenge of Big Data
Flexible frameworks for ETL, SQL, ML and more. Process huge datasets. Retain data longer than ever. Extreme complexity. Lack of skilled operations teams. Difficult to align cost and value. Over the past 5 or so years, Big Data has led to amazing advancements across all industries. My personal favorite Big Data story is British Airway’s Know Me program, which by combining data across disparate inputs, is able to do things like predict when a customer is going to miss a flight and automatically offer to re-book them on another flight. There are hundreds of stories of how Big Data has changed the business landscape by bringing in data from many fronts, combining them and analyzing them. Everyone understand the potential of Big Data, but most people have not realized it. Why? Confidential – Oracle Internal/Restricted/Highly Restricted
5
Big Data Is Just Too Complicated!
“70% of Hadoop deployments will fail to meet cost savings and revenue generation objectives due to skills and integration challenges.” - Merv Adrian, VP Research, Gartner “For three successive years the number of deployments for (Hadoop) projects for our clients has been about 15%. It's hardly moved.” - Merv Adrian, VP Research, Gartner Gartner has looked at this question in great detail, and reports that only about 15% of Big Data projects make it into production. The reason is complexity. Big Data infrastructures require deep operational expertise that is hard to find, hard to hire and hard to retain, so as much as companies want to use Big Data, they don’t have the skills and ability to do it. We need a better way to consume Big Data. Source: get-slow-start-in-enterprise/d/d-id/ ?print=yes March 20, 2018 “...technologies such as Hadoop and Spark are new, they are difficult, and they are complicated.” - Merv Adrian, VP Research, Gartner Confidential – Oracle Internal/Restricted/Highly Restricted
6
Oracle Data Flow Makes Big Data Truly Easy
1 Fast: Launch Apache Spark® Jobs in Just Seconds. 2 Serverless: No Infrastructure to Deploy or Manage. 3 Complete: Process data in the Cloud or in your Datacenter. Realizing that complexity is the main challenge, we have begun development on Oracle Data Flow, which lets you run Apache Spark® jobs in a true Serverless model Spark as you may know has emerged as the leading Big Data processing framework, supporting SQL, ETL, Graph, ML and more, and is at the center of Modern Big Data. Data Flow focuses squarely on solving the main complexity challenge we all face with Big Data. With no infrastructure to manage, no OS to patch or tune, nothing to upgrade, the most complex aspects of Big Data are handled for you transparently and automatically, allowing you to focus on the analytics, focus on those analytics that make it so your customers just can’t live without you. On top of this, Data Flow launches almost instantly, which helps you stay agile whether you’re in test or dev Data Flow Supports any type of Spark job including SQL, Python, Java, Scala and R. And Data Flow can be Run via UI or API, for easy integration with your production pipelines or your favorite scheduler. Oracle Data Flow The easiest way to run Apache Spark® in the cloud. Serverless, so you never worry about any infrastructure. Any type of Spark job, no re-coding required. Access data from any source, wherever it lives. Availability targeted for 2019. Oracle Confidential – Restricted
7
Oracle Data Flow: Fast Launch
1 Fast: Launch Apache Spark® Jobs in Just Seconds. 2 Serverless: No Infrastructure to Deploy or Manage. 3 Complete: Process data in the Cloud or in your Datacenter. Let’s drill down into some of these points before we jump into a demo Oracle Data Flow Launch in Seconds, Simple UI and API Hadoop on Prem: Weeks to Months Hadoop in Cloud: 20 Minutes or More Oracle Confidential – Restricted
8
Oracle Data Flow: Serverless
1 Fast: Launch Apache Spark® Jobs in Just Seconds. 2 Serverless: No Infrastructure to Deploy or Manage. 3 Complete: Process data in the Cloud or in your Datacenter. Upgrading and re-certifying Hadoop can take weeks to months. HDFS upgrades touch all data and can lose/corrupt data Hadoop APIs are not backward compatible and frequently break apps YARN and HDFS performance characteristics differ substantially between versions, affecting post- upgrade SLAs Serverless lets you choose a compatible version for each job. No data upgrade. Roll development forward independently of production Application performance is isolated Net result is you can upgrade in hours/days rather than weeks/months Hadoop: 30% or more of your time spent keeping the lights on. Oracle Data Flow: No patches, easy to upgrade = spend 100% on innovation. MAY s m t w f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 MAY s m t w f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Dev: Spark 2.4 Prod: Spark 2.3 Downtime: Upgrade OS Downtime: Upgrade Hadoop Switch to Spark 2.4 Certify New Hadoop Oracle Confidential – Restricted
9
Oracle Data Flow: Complete
1 Fast: Launch Apache Spark® Jobs in Just Seconds. 2 Serverless: No Infrastructure to Deploy or Manage. 3 Complete: Process data in the Cloud or in your Datacenter. Oracle Cloud Data Lake Oracle Object Storage SaaS Oracle Autonomous DB Co-Located Access Oracle Data Flow Compute Multi Cloud Oracle Data Flow S3 / WASB NoSQL RDBMS Compute Oracle Data Flow Hybrid Cloud Compute Object Storage EDW Private VPN Oracle Catalog
10
How Does it Work? Run Any Spark Job in 3 Steps
1 2 3 Sign Up Upload your Spark App Run! .py .jar SQL Data in Oracle Object Storage cloud.oracle.com Includes $300 of free trial credits Configure your app Select number of VMs Run! Oracle Confidential – Restricted
11
Oracle Data Flow: Access Data Wherever it Lives
1 2 3 In Oracle Cloud From Other Clouds From Your Datacenter Secure via VPN 1: Includes all the connectors you need to talk to Database or MySQL 2. Connect to any cloud to offload analytics 3. Peer a virtual private network to access your on -premises data for data that doesn’t live in the cloud Azure AWS Oracle Oracle Data Flow Oracle Autonomous Database MySQL Oracle Object Storage Oracle Data Flow Private VCN On-Prem EDW Oracle Confidential – Restricted
12
Oracle Data Flow: Secure Execution
Control Your Spark jobs are isolated in private VMs. Data is always encrypted, at- rest and in-motion. Service Controller dfcs.oraclecloud.com Customer 1 Job Customer 2 Job Spark Driver Spark Executor Spark Executor Spark Driver Spark Executor Spark Executor Spark Executor Spark Executor Storage Container Storage Container Customer 2 Objects Customer 2 Objects Customer 2 Objects Oracle Confidential – Restricted
13
Oracle Data Flow: No Complex Capacity Planning
Control Traditional Model: Noisy neighbors mean you provision to maximum expected load. Data Flow: Jobs run on VMs that come and go on-demand. No noisy, no expensive max-load provisioning. Hit SLAs while paying for only what you need. Service Controller dfcs.oraclecloud.com Customer 1 Job Customer 2 Job One of the most complex problems is sizing shared clusters to ensure predictable, consistent SLAs, when workloads are bursty and when workloads come and go. In traditional big data architectures you have to size your cluster to handle the busiest possible load, meaning you massively overspend on your big data architectures Spark Driver Spark Executor Spark Executor Spark Driver Spark Executor Spark Executor Spark Executor Spark Executor Storage Container Storage Container Customer 2 Objects Customer 2 Objects Customer 2 Objects Oracle Confidential – Restricted
14
Oracle Data Flow is API-Driven.
Service driven entirely by REST APIs. Integrate with your applications or workflow engine of choice. Oracle Object Storage Oracle Data Flow Spark Job Spark Jobs REST APIs Secure Read / Write Oracle Confidential – Restricted
15
Big Data Made Really Simple
DEMO Oracle Data Flow Big Data Made Really Simple Confidential – Oracle Internal/Restricted/Highly Restricted
16
Oracle Data Flow Demo: Convert XML Mess into a SQL Table for Reporting
Documents Parquet Table Oracle Data Flow Apache Spark Processing Fully Serverless Oracle Object Storage Oracle Compute Oracle Object Storage Oracle Confidential – Restricted
17
Summary Oracle Data Flow Simplest Way to run Spark in the Cloud.
Any type of Spark job, any Spark version. Infinitely scalable compute and storage. Pay for only what you use. Oracle Confidential – Restricted
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.