Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle Big Data eSeminar Series

Similar presentations


Presentation on theme: "Oracle Big Data eSeminar Series"— Presentation transcript:

1 Oracle Big Data eSeminar Series

2 Oracle Big Data eSeminar Series
TITLE DATE, TIME Oracle Big Data Appliance April 10, 2012, 10AM PST Oracle NoSQL Solutions April 17, 2012, 10AM PST Oracle Data Integrator Application Adapter for Hadoop April 24, 2012, 10AM PST Oracle Analytics for Big Data May 3, 2012, 10AM PST Integrating Big Data with the Enterprise May 10, 2012, 10AM PST

3 Announcing Oracle Big Data 3 Day Hands-On Technical Workshop
When: May 8-10, 2012, Registration details coming soon Where: Oracle Redwood Shores, CA, USA Agenda Oracle Big Data Appliance technical architecture and its hardware and software features Oracle NoSQL Database Oracle R Connector for Hadoop and Oracle Analytics for Big Data Oracle Loader for Hadoop and Oracle Direct Connector for HDFS

4 Oracle Big Data Platform
Introduce the 4 keys phases. Acquire, Organize, Analyze & Decide

5 Oracle Integrated Solution Stack for Big Data
ACQUIRE Oracle NoSQL Database HDFS Enterprise Applications ORGANIZE Hadoop (MapReduce) Oracle Big Data Connectors Oracle Data Integrator ANALYZE In-Database Analytics Data Warehouse DECIDE Analytic Applications

6 Oracle Big Data Platform
Marty Gubar BI/DW Product Management

7 Agenda Use Cases Introduce the Oracle Big Data Platform
Demonstrate Sentiment Analysis of Tweets Introduce the Oracle Big Data Platform Review Steps to Building Application

8 Big Data Use Cases Industry Big Data Use Cases Potential Benefits
Banking & Finance Analysis of data sets across lines of business (loans, insurance, on-line banking, card products) for market assessment Risk analysis & revenue lift for new & existing products Analysis of stock portfolio trends & risk Increased share of customer Increased customer loyalty Increased overall revenue Decreased financial risk Healthcare Analysis of unexpected health condition associations using electronic health records and visualization Improved quality of care Reduced cost of care High Tech / Manufacturing / Mobile Devices Product failure analysis Patent records research Analysis of mobile device usage by location Optimized manufacturing Lower cost of warranty claims Faster problem resolution Retail Location based targeted programs & promotions Social network buying analysis Just-in-time promotions raising spend Understanding of customer sentiments

9 Sentiment Analysis Using Twitter Feeds
Demonstration

10 Social Media Sentiment Analysis
Airlines actively monitoring and responding to Tweets Identify opportunities “It’s really cold. I wish I were going to…” Customer conversion Identify customer service issues Keep customers happy. Nothing is private! Avoid negative “buzz Millions of tweets – which ones are important?

11 Oracle Big Data Platform

12 Oracle Big Data Platform
Big Data Appliance Exadata Exalytics Big Data Appliance Exadata Exalytics Hadoop Oracle Advanced Analytics Analytic Applications Alerts, Dashboards, MD-Analysis, Reports, Query Web Services BI Abstraction Open Source R Oracle Big Data Connectors Data Warehouse In-Database Analytics Oracle NoSQL Database InfiniBand InfiniBand Oracle Database Oracle Data Integrator Applications ACQUIRE ORGANIZE ANALYZE Decide

13 Acquire all available data
Big Data In Action DECIDE ACQUIRE Acquire all available data Let’s start with the data acquisition phase. Let’s run the video. ANALYZE ORGANIZE

14 Two Sets of Characteristics
Batch-Oriented Real-Time Process data to use Deliver a service Bulk storage Fast access to specific record Write once, read all Read, write, delete update How you want to use the data is going to drive how Copyright 2011 Oracle Corporation

15 Best Choices Hadoop Distributed File System (HDFS)
Oracle NoSQL Database File System Database Parallel scanning Indexed storage No inherent structure Simple data structure High volume writes High volume random reads and writes Copyright 2011 Oracle Corporation

16 Acquiring Twitter Data
Bulk Collect vs. Streaming Search Tweet User Name Time User / Lookup User Name # Followers # Friends Bulk collect using Search & User/Lookup Use search terms to acquire relevant tweets Search does not return social importance metrics Pair with User/Lookup for complete user details Oracle BDA Hadoop File System Save to HDFS Hadoop delivers FileSystem API over HDFS Use standard Java file i/o classes and methods for reading/writing to that file system Streaming Tweet User Name Time # Followers # Friends Continuous data collection using Streaming Use search terms to acquire relevant tweets Streaming returns all the key user metrics

17 Tweets Stored in HDFS Tweet stream captured to XML file in HDFS

18 Organize and distill big data using massive parallelism
Big Data in Action DECIDE ACQUIRE Organize and distill big data using massive parallelism ANALYZE ORGANIZE

19 Organize Derive Meaning from Source Data
Map Filters and interprets the source – producing key/value pairs Reduce Summarizes the sorted map results – producing the final key/value output

20 Organize Load Results into Oracle Database at 12TB/hour
Oracle Loader for Hadoop (OLH) A MapReduce utility to optimize data loading from HDFS into Oracle Database Oracle Direct Connector for HDFS Access data directly in HDFS using external tables ODI Application Adapter for Hadoop ODI Knowledge Modules optimized for Hive and OLH

21 Oracle Loader for Hadoop
Use The Cluster Shuffle /Sort Reduce MAP Oracle Loader for Hadoop Last stage in MapReduce workflow Partitioned and non-partitioned tables Online and offline loads

22 Oracle Direct Connector for HDFS
Direct Access from Oracle Database SQL access to HDFS External table view Data query or import Oracle Database HDFS SQL Query External Table Infini Band DCH HDFS Client DCH DCH

23 Analyze all your data, at once
Big Data in Action ANALYZE DECIDE ACQUIRE Analyze all your data, at once ANALYZE ORGANIZE

24 Oracle In-Database Analytics Platform
Oracle R Enterprise Spatial Analytics Oracle Data Mining Text and Search SQL Analytics Parallel Processing Engine All integrated, secure, leverages all key database capabilities. XML Relational OLAP Spatial Data Layer RDF Media

25 R Statistical Programming Language
Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality graphs Highly extensible

26 Why R Wasn’t Ready for the Enterprise
Small data models only are stored and run on user’s laptop

27 Oracle R Enterprise Approach
Models run in-database Processes large data sets Uses the power of Oracle Database 11g and Exadata Same code, much faster Faster Highly Secure Scalable

28 Oracle R Hadoop Connector
Native R Access to Hadoop Client Host Oracle Big Data Appliance Native R MapReduce Native R HDFS access R Engine R Engine ORE ORHC ORHC Hadoop Cluster Software MapReduce Nodes HDFS

29 Oracle Exalytics In-Memory Machine
Speed of Thought Interactive Analysis Interactive Analysis Free Exploration Dense Visualizations Fully Mobile

30 Big Data Technologies Time to Build? Required Optimizations?
Cost and Difficulty Maintaining? Considerations for production system Research, Design and Planning Hardware & Cloudera acquisition costs Installation & Configuration Management/Support

31 Oracle Big Data Appliance Hardware
18 Sun X4270 M2 Servers per BDA 864 GB memory 216 cores 648 TB storage 40 Gb/s InfiniBand Fabric Inter-rack Connectivity Inter-node Connectivity 10 Gb/s Ethernet Connectivity Data center connectivity Full Rack Configuration Only

32 Horizontal Scale Out Model
Scale out by connecting racks to each other using InfiniBand InfiniBand Top of Rack 8 node cluster = over 5 petabytes Same way to connect Exadata machines in your configuration

33 Cloudera Distribution Including Apache Hadoop
Fast evolution in critical features Built by the Hadoop experts in the community Practical instead of esoteric Focus on what is needed for large clusters Proven at very large scale In production at all the large consumers of Hadoop Extremely stable in those environments Managed and Tested by Cloudera Managed Open Source components Contains a rich management GUI tool

34 Cloudera CDH3 Distribution Details Apache Hadoop Apache Sqoop
Apache Hive Apache Pig Apache HBase Apache Zookeeper Apache Flume Apache Sqoop Apache Mahout Apache Whirr Apache Oozie Fuse-DFS Hue Plus Cloudera Manager

35 Big Data Platform Summary Big Data for the Enterprise
Optimized and Complete Everything you need to store and integrate your lower information density data Integrated with Oracle Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full Oracle support for the entire system and software set

36


Download ppt "Oracle Big Data eSeminar Series"

Similar presentations


Ads by Google