Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1.

Similar presentations


Presentation on theme: "Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1."— Presentation transcript:

1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

2 2 Finding Gold in Your Data Warehouse: Oracle Advanced Analytics 杨 雷杨 雷 Big Data Solution of Center of Excellence Principal Consultant Lei.L.Yang@oracle.com

3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 3 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 4 Program Agenda  Big Data & Big Data Analytics  Oracle Advanced Analytics – Overview – SQL in-Database Data Mining – Integration with R  Predictive Applications and Business Analytics  Customer Success and Benchmarks  Pointers & Summary

5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 5 STRUCTURED DATA UNSTRUCTURED DATA Source: IDC 2011 Content Provided By Cloudera. 200520152010  More than 90% is unstructured data  Approx. 500 quadrillion files  Quantity doubles every 2 years 1.8 trillion gigabytes of data was created in 2011… 10,000 5,000 0 “There was 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” - Google CEO Eric Schmidt Requires capability to rapidly: Collect and integrate data Understand data & their relationships Respond and take action GIGABYTES OF DATA) CREATED (IN BILLIONS) “Big Data”  “Big Data Analytics”

6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 6 Advanced Analytics introduction

7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 7 Oracle Exadata Oracle Exalytics Oracle Big Data Platform StreamAcquire Organize Discover & Analyze Oracle Big Data Appliance Oracle Big Data Connectors Optimized for Analytics & In-Memory Workloads “System of Record” Optimized for DW/OLTP Optimized for Hadoop, R, and NoSQL Processing Oracle Enterprise Performance Management Oracle Business Intelligence Applications Oracle Business Intelligence Tools Oracle Endeca Information Discovery Oracle Enterprise Performance Management Oracle Business Intelligence Applications Oracle Business Intelligence Tools Oracle Endeca Information Discovery Hadoop Oracle R Distribution Applications Oracle NoSQL Database Oracle Big Data Connectors Oracle Data Integrator In-Database Analytics Data Warehouse Oracle Advanced Analytics Oracle Database ROracle for TimesTen

8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 8 Automatically sifting through large amounts of data to find previously hidden patterns, discover valuable new insights and make predictions Identify most important factor (Attribute Importance) Predict customer behavior (Classification) Predict or estimate a value (Regression) Find profiles of targeted people or items (Decision Trees) Segment a population (Clustering) Find fraudulent or “rare events” (Anomaly Detection) Determine co-occurring items in a “baskets” (Associations) What is Advanced Analytics and Data Mining? A1 A2 A3 A4 A5 A6 A7

9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 9 How does Advanced Analytics work? Transactions from a Bank # Transactions $ Amount Fraud(2) Fraud(1) Typical Fraud case: Fraudsters getting smarter, mimic normal usage: (1) normal # of transactions with a higher $ (2) higher # of smaller $ transactions

10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 10 How does Advanced Analytics work? Traditional Analytics Slicing # Transactions $ Amount Fraud(2) Fraud(1) “Normal” Behavior “Normal” Behavior Outlier Typical Fraud case: Fraudsters getting smarter, mimic normal usage: (1) normal # of transactions with a higher $ (2) higher # of smaller $ transactions Traditional Business sense and BI Tools: Looking at one variable at a time, unable to differentiate Fraud transactions from non-Fraud.

11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 11 eg: Linear Regression 99% 95% 90% Neural Net Confidence Non- Fraud Levels How does Advanced Analytics work? Applying Predictive Models # Transactions $ Amount Fraud(1) Fraud(2) Typical Fraud case: Fraudsters getting smarter, mimic normal usage: (1) higher # of smaller $ transactions (2) normal # of transactions with a higher $ Traditional Business sense and BI Tools: Looking at one variable at a time, unable to differentiate Fraud transactions from non-Fraud. Advanced Analytics, multivariate Analysis: A simple visualization of 2 variables and intervals of confidence let us identify transactions that are out of the “norm” in the bivariate space. Enterprise scenarios require thousands of variables to be analyzed at the same time, with an automated selection done through powerful in- Database/in-HDFS algorithms for future scoring eg: Decision Trees

12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 12 Oracle Advanced Analytics Overview R

13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 13 What is Oracle Advanced Analytics Option? 1. Powerful – Combination of in-database data mining algorithms and open source R algorithms – Accessible via SQL, PL/SQL, R and database APIs – Scalable, parallel in-database execution 2. Easy to Use – Range of GUI and IDE options for business users to data scientists 3. Enterprise-wide – Integrated feature of the Oracle Database – Seamless support for enterprise analytical applications and BI environments Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics

14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 14 Value Proposition 1. Powerful Scalable implementation of R programming language in-database Data preparation for analytics is automated Scalable distributed-parallel implementation of machine learning techniques in the database Data remains in the Database S avings Flexible interface options – SQL, R, IDE, GUI Fastest and most Flexible analytic deployment options Value Proposition Fastest Time from “Data” to “Actionable Information” Fastest analytical development Fastest in-database scoring engine on the planet Flexible deployment options for analytics Lowest TCO by eliminating data duplication Secure, Scalable and Manageable Can import 3 rd party models Model “Scoring” Embedded Data Prep Data Preparation Model Building Oracle Advanced Analytics Secs, Mins or Hours R Traditional Analytics Hours, Days or Weeks Data Extraction Data Prep & Transformation Data Mining Model Building Data Mining Model “Scoring” Data Preparation and Transformation Data Import Source Data Dataset Work Area Analytic Process Output Target R

15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 15 Value Proposition  Range of GUI and IDE options for business users to data scientists  Oracle Data Miner/SQL Developer “Work flow” GUI – Drag and drop paradigm – Competitive to SAS/Enterprise Miner and SPSS Predictive Modeler SQL Script code – Generates SQL Script code for immediate deployment  R IDEs support – Direct access to Oracle data – Writing R scripts; run them inside database 2. Easy to Use R

16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 16 Value Proposition  Integrated feature of the Oracle Database  Seamless support for enterprise analytical applications and BI environments  Enables Pervasive Predictive Analytics 3. Enterprise-wide R

17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 17 Oracle Advanced Analytics  “Information Consumers” – CEOs, CMOs, CFOs, CIOs, VPs, Directors/Mgrs of lines of operations, etc.  “Information Producers” – Data analysts, Marketing analysts, Business Managers, Statisticians, Data Scientists, DBAs, Application Developers Target Audiences

18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 18 Oracle Advanced Analytics SQL in-Database Data Mining

19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 19 Oracle Advanced Analytics  Easy to Use – Oracle Data Miner GUI for data analysts – Explore data—discover new insights – “Work flow” paradigm for analytical methodologies  Powerful – Multiple algorithms & data transformations – Runs 100% in-DB – Build, evaluate and apply data mining models  Automate and Deploy – Generate and deploy SQL scripts for automation – Share analytical workflows Oracle Data Miner GUI - SQL Developer 3.2 Extension —Free OTN Download

20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 20 Classification Association Rules Clustering Attribute Importance Problem Algorithms Applicability Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Minimum Description Length (MDL) Attribute reduction Identify useful data Reduce data noise Hierarchical K-Means Hierarchical O-Cluster Product grouping Text mining Gene and protein analysis Apriori Market basket analysis Link analysis Multiple Regression (GLM) Support Vector Machine Classical statistical technique Wide / narrow data / text Regression Feature Extraction Nonnegative Matrix Factorization Text analysis Feature reduction Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine One Class SVM Lack examples of target field Anomaly Detection A1 A2 A3 A4 A5 A6 A7 F1 F2 F3 F4 Oracle Advanced Analytics SQL Data Mining Algorithms R

21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 21 Oracle Advanced Analytics SQL Statistics and SQL Analytics Descriptive Statistics – DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, median, stats_mode, variance, standard deviation, quantile values, +/- n sigma values, top/bottom 5 values Correlations – Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs – Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing – Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov- Smirnov test, One-way ANOVA Distribution Fitting – Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi- Squared Test, Normal, Uniform, Weibull, Exponential Ranking functions – rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving & cumulative) – Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions – Direct inter-row reference using offsets Reporting Aggregate functions – Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates – Correlation, linear regression family, covariance Linear regression – Fitting of an ordinary-least-squares regression line to a set of number pairs. – Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition and Enterprise Edition

22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 22 Oracle Advanced Analytics Tables and Views Transformations Explore Data Modeling Text Oracle Data Miner GUI Nodes — Partial List R

23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 23 Insurance Example Identify “Likely Insurance Buyers” and their Profiles R OAA work flows capture analytical process and generates SQL code for deployment

24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 24 Oracle Advanced Analytics  Mines unstructured i.e. “text” data  Include text and comments in models  Cluster and classify documents  Oracle Text used to preprocess unstructured text Mining Unstructured Data – Text Mining

25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 25 Oracle Advanced Analytics  On-the-fly, single record apply with new data (e.g. from call center) SQL Scoring on Data Mining Models R Select prediction_probability(CLAS_DT_1_1, 'Yes' USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; Likelihood to respond:

26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 26 Exadata + Data Mining 11g Release 2 pushed to storage level for execution  SQL predicates and OAA models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Data Mining Model ”Scoring” Pushed to Storage Exadata “smart scan” SQL function scoring Faster R

27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 27 Oracle Advanced Analytics Integration with R

28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 28 Oracle Advanced Analytics overview R is a statistical programming and analysis environment  Powerful  Extensible  Graphical  Extensive statistics  OOTB functionality with many ‘knobs’ but smart defaults  Ease of installation and use Statisticians typically are…  Not SQL literate  Not familiar with DBA tasks What is R?

29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 29 Oracle Advanced Analytics overview Analysts need advanced graphical capabilities

30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 30 Traditional R and Database Interaction  Paradigm shift: R  SQL  R  R memory limitation – data size, call-by-value  R single threaded  Access latency, backup, recovery, security…?  Ad hoc script execution R script cron job RODBC / RJDBC / ROracle SQL Database Flat Files extract / export read export load R script {CRAN packages}

31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 31 Oracle R Enterprise Oracle Database ROracle Transparency Oracle R Enterprise R  SQL Translation R script {CRAN packages} R Script Repository Embedded R script execution Exec user R function via DB user-controlled data parallelism and XML output for R objects and graphs {CRAN packages} Eliminate client memory limitation Achieve database computation performance Leverage database parallelism Statistics engine In-database statistical functions SQL script Exec R function from SQL XML graph representation for web dashboards

32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 32 R IDE environments Customers can choose the most comfortable R Interface

33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 33 Transparency through function overloading Invoke in-database aggregation function Client R Engine Oracle Database aggdata <- aggregate(ONTIME_S$DEST, by = list(ONTIME_S$DEST), FUN = length) class(aggdata) head(aggdata) Oracle R package Transparency Layer User tables In-db stats select DEST, count(*) from ONTIME_S group by DEST > aggdata class(aggdata) [1] "ore.frame" attr(,"package") [1] "OREbase" > head(aggdata) Group.1 x 1 ABE 237 2 ABI 34 3 ABQ 1357 4 ABY 10 5 ACK 3 6 ACT 33

34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 34 Database Server Machine Embedded R Script Execution R ore.groupApply ( ) – data parallel execution Client R Engine Other R packages Oracle Database modList <- ore.groupApply( X = ONTIME_S, INDEX = ONTIME_S$DEST, function(dat) { library(biglm) biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat) }); modList_local <- ore.pull(modList) summary(modList_local$BOS) ## return model BOS Transparency Layer 1 Oracle R package 2 rq*Apply () interface 3 User tables 4 DB R Engine Other R packages Oracle R package Transparency Layer extproc … … DB R Engine Other R packages Oracle R package Transparency Layer 5 5 ©2012 Oracle – All Rights Reserved

35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 35 Oracle R Enterprise Embedded R Execution SQL interface rqEval – generate XML string for graphic output begin sys.rqScriptCreate('Example6', 'function(){ res <- 1:10 plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) res }'); end; / select value from table(rqEval(NULL,'XML','Example6'));  R script output is often dynamic – not conforming to pre-defined structure  R apps generate stats, new data, graphics  Example – Plot 100 random numbers – Return a vector with values 1 to 10 – Return the results as XML ©2012 Oracle – All Rights Reserved

36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 36 Oracle’s R Strategic Offerings  Oracle R Distribution – Free download, pre-installed on Oracle Big Data Appliance, bundled with Oracle Linux – Enhanced linear algebra performance: Intel’s Math Kernel Library, AMD’s Core Math Library, SUN Solaris and IBM AIX – Enterprise support for customers of Oracle R Enterprise, Big Data Appliance, and Oracle Linux – Contribute bug fixes and enhancements to open source R  Oracle R Enterprise – Transparent access to database-resident data from R – Embedded R script execution through database managed R engines – Statistics engine  Oracle R Connector for Hadoop (Part of Oracle Big Data Connectors) – R interface to Oracle Hadoop Cluster on BDA and non-Oracle Hadoop clusters – Access and manipulate data in HDFS, database, and file system – Write MapReduce functions using R and execute through natural R interface  ROracle – Open source Oracle database interface driver for R based on OCI, now also available for TimesTen data – Maintainer is Oracle – rebuilt from the ground up; many bug fixes and optimizations Deliver enterprise-level advanced analytics based on R environment

37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 37 New Features  Exadata storage tier scoring for R models (ORE) – glm, glm.nb, hclust, kmeans, lm, multinom, nnet, randomForest, rpart  Comprehensive sampling techniques (ORE) – Simple, Systematic, Stratified, Cluster, Quota, Accidental  A new ORE package for high performance in-database predictive algorithms (ORE)  Support for HIVE as a first class SQL source (ORCH)  A new native Hadoop analytics library (ORCH) – Collaborative filtering techniques, Non negative matrix factorization for feature extraction, Linear regression, Neural networks, Mahout Analytics Library Support Big Data Analytics

38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 38 Oracle Advanced Analytics Predictive Applications and Integration with Business Analytics

39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 39 Oracle Advanced Analytics  Interactive Dashboards  Parameter Selection  Streaming of graphical results  Endeca integration through database scoring Integration with Oracle Business Analytics solutions

40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 40 Oracle Financial Services Analytic Applications architecture Expanded Data Store Real time query and response EXPLORE Powered by Oracle R Enterprise Expanded Compute Unstructured analytics

41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 41 Business Analytics: Real Time Decisions Oracle R Enterprise Integration 2- ORE Based Filtering Rules 3- R Model Real-Time Scoring 1- Dynamic ORE Script Injection

42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 42 Enabling Predictive Applications Human Capital Management – Predictive Workforce—employee turnover and performance prediction and “What if?” analysis CRM – Sales Prediction Engine--prediction of sales opportunities, what to sell, amount, timing, etc. Supply Chain Management – Spend Classification-real-time Real-time flagging of noncompliance and anomalies in expense submissions Identity Management – Oracle Adaptive Access Manager—real-time security and fraud analytics Retail Analytics – Oracle Retail Customer Analytics—”shopping cart analysis” and next best offers Customer Support – Predictive Incident Monitoring (PIM) Customer Service offering for Database customers Manufacturing – Response surface modeling in chip design Predictive capabilities in Oracle Industry Data Models – Communications Data Model implements churn prediction, segmentation, profiling, etc. – Retail Data Model implements loyalty and market basket analysis – Airline Data Model implements analysis frequent flyers, loyalty, etc. Example Applications Using Oracle Advanced Analytics R

43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 43 Oracle Communications Industry Data Model Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics OAA’s clustering and predictions available in-DB for OBIEE

44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 44 Integrated Business Intelligence  In-database construction of predictive models that predict customer behavior  OBIEE’s integrated spatial mapping shows where Integrate a range of in-DB SQL & R Predictive Analytics & Graphics Customer “most likely” to be HIGH and VERY HIGH value customer in the future

45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 45 Fusion HCM Predictive Analytics Built-in Predictive Analytics Oracle Advanced Analytics factory-installed predictive analytics show employees likely to leave, top reasons, expected performance and real-time "What if?" analysis

46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 46 Retail GBU Market Basket Analysis Market Basket Analysis to identify co-occurring items found in “baskets” and potential product bundles

47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 47 Customer Success and Benchmarks

48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 48 Oracle Micro-processor: ORE vs. Open Source R Bilinear model methodR^2 Number of Regressors mean(rel_error)Elapsed Time (seconds) step0.966863.52E-022,110 ore.stepwise0.9971243.50E-0232.1 performance difference ore.stepwise is approx. 65X faster than step at similar R^2 and relative error as stepwise. Quadratic model methodR^2 Number of Regressors mean(rel_error)Elapsed Time (seconds) step0.9961541.05E-0212,600 ore.stepwise0.9962101.04E-0269.5 performance difference ore.stepwise is approx. 180X faster than step at similar R^2 relative error.

49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 49 Oracle Advanced Analytics on an Exadata X2 From impossible to Practical: Instantaneous actions on customer interactions Find all households that bought a set of products from 5 Billion POS records OAA: 80 seconds Find influential factors that correlate with specific product purchase on 100mi households, then Model testing across SVM, GLM and DT OAA: 20 minutes Cluster 100 million house holds into 50 segments OAA: 5 minutes Correlation Analysis of 100 million house holds based on 50 measures OAA: 12 minutes General Linear Model on 500 million observations OAA: 2 minutes Testing executed on the Exadata platform Single workload performance on Exadata. Parallel models do not affect performance. Scoring is “blink of an eye” on Billion row Datasets.

50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 50 Learn More Marcos.Arancibia@oracle.com  Oracle learning Library: www.oracle.com/oll/www.oracle.com/oll/ – Oracle Data Mining 11g R2 OBE Series – Oracle R Enterprise Tutorial Series  Oracle Advanced Analytics http://www.oracle.com/us/products/database/options/advanced-analytics/overview/index.html

51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 51 Learn More Free online education www.coursera.org www.udacity.com  Machine Learning  Artificial Intelligence  Model Thinking  Social Network Analysis  Probabilistic Graphical Models

52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 52

53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 53


Download ppt "Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1."

Similar presentations


Ads by Google