Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,

Slides:



Advertisements
Similar presentations
Autonomic Scaling of Cloud Computing Resources
Advertisements

V-1 Part V: Collaborative Signal Processing Akbar Sayeed.
1 Continuity Equations: Analytical Monitoring of Business Processes and Anomaly Detection in Continuous Auditing Michael G. Alles Alexander Kogan Miklos.
C6 Databases.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Automated Anomaly Detection, Data Validation and Correction for Environmental Sensors using Statistical Machine Learning Techniques
Data Visualization STAT 890, STAT 442, CM 462
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Chapter 3 Database Management
Chapter 14 The Second Component: The Database.
Chapter 13 The Data Warehouse
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Chapter 1 Overview of Databases and Transaction Processing.
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
SUPPORTING A MODELING CONTINUUM IN SCALATION John A. Miller Michael E. Cotterell Stephen J. Buckley University of Georgia IBM Thomas J. Watson Research.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Anomaly detection with Bayesian networks Website: John Sandiford.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Eurostat – Unit D5 Key indicators for European policies Third International Seminar on Early Warning and Business Cycle Indicators Annotated outline of.
Foundations of Business Intelligence: Databases and Information Management.
Internet Studies. Faculty Members The specialty has now 2 faculty members Prof. Ronen Feldman: Text Mining, Data Mining, Social Media Analysis, Information.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Advanced Database Concepts
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
Copyright © 2015, SAS Institute Inc. All rights reserved. Business & Analytics unite VS.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Mining of Massive Datasets Edited based on Leskovec’s from
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Time Series Data Repository #ODSummit - The Generic, Extensible, and Elastic Data Repository in OpenDaylight for Advanced Analytics.
Chapter 1 Overview of Databases and Transaction Processing.
Managing Data Resources File Organization and databases for business information systems.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Introduction to Machine Learning, its potential usage in network area,
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
SNS COLLEGE OF TECHNOLOGY
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
Chapter 13 The Data Warehouse
Analytics and OR DP- summary.
What is Pattern Recognition?
MANAGING DATA RESOURCES
Dynamic Discrete Disaster Decision Support System D4S2
Analytics: Its More than Just Modeling
3.1.1 Introduction to Machine Learning
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Uncertainty-driven Ensemble Forecasting of QoS in Software Defined Networks Kostas Kolomvatsos1, Christos Anagnostopoulos2, Angelos Marnerides3, Qiang.
Big DATA.
Presentation transcript:

Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS

Copyright © 2013, SAS Institute Inc. All rights reserved. New ways to manage distributed and not structured in classical way data are needed: We need different paradigm to organize data and, above all, to query them. Collect several sources and manage them open several new problems: Relational data (GRAPH DATA) can be useful to understand event spreading in a population. Data in motion coming from several tools on field (sensor devices) provide dynamic pattern often without an history of their form New ways to manage distributed and not structured in classical way data are needed: We need different paradigm to organize data and, above all, to query them. Collect several sources and manage them open several new problems: Relational data (GRAPH DATA) can be useful to understand event spreading in a population. Data in motion coming from several tools on field (sensor devices) provide dynamic pattern often without an history of their form Not always data are in structured data model Often we need to join data with not same keys Often data coming with periodic flow in real time Often we need to recognize pattern from data changing frequently NEW QUESTIONS WITH BIG DATA

Copyright © 2013, SAS Institute Inc. All rights reserved. SQL Queries often are useless to reach these data: Information are not organized into DB structures Data are very different way to provides information: i.e. text are not easy to query using traditional query languages. Merging are driven by fuzzy keys where you can assign group information according statistic relationship. Event can be happen driven from relational with other data rather from specific behavior. SQL Queries often are useless to reach these data: Information are not organized into DB structures Data are very different way to provides information: i.e. text are not easy to query using traditional query languages. Merging are driven by fuzzy keys where you can assign group information according statistic relationship. Event can be happen driven from relational with other data rather from specific behavior. ANALYSIS Not always you can apply sampling to extract data Not always you can join data to define ABT Often you need to know how environment can influence event changements. Often we need to merging information collected in different time window.

Copyright © 2013, SAS Institute Inc. All rights reserved. methods for pattern recognition coming from statistical inference analysis using SEMMA paradigm for supervised and unsupervised data patterns. Other coming from stochastic process analysis both for continue time and discrete events like diffusion process or markov chains process. Time series forecasting: stochastic processes in continue time with continue space Multivariate analysis applied on semantic rules to discover text patterns Graph analysis methods for pattern recognition coming from statistical inference analysis using SEMMA paradigm for supervised and unsupervised data patterns. Other coming from stochastic process analysis both for continue time and discrete events like diffusion process or markov chains process. Time series forecasting: stochastic processes in continue time with continue space Multivariate analysis applied on semantic rules to discover text patterns Graph analysis SAS PROCEDURES BIG DATA REQUIRES ALSO SEVERAL METHODOLOGICAL STRATEGIES:

Copyright © 2013, SAS Institute Inc. All rights reserved. Text Mining Parsing largescale text collections Parsing large-scale text collections Extract entities Extract entities Auto Stemming & synonym detection Auto. Stemming & synonym detection Data Mining Complex relationshipsComplex relationships Tree-based Classification Tree-based Classification Variable Selection Variable SelectionOptimization Local search optimization Local search optimization Large-scale linear & mixed integer problems Large-scale linear & mixed integer problems Graph theory Graph theoryEconometrics Probability of events Probability of events Severity of random events Severity of random events ANALYTICAL CATEGORIES AND TARGET USAGE Forecasting Large-scale multiple hierarchy problems Large-scale, multiple hierarchy problems Statistics Binarytarget &continuous no predictions Binary target & continuous no. predictions LinearNon Linear, & MixedLinear modeling Linear, Non- Linear, & Mixed Linear modeling

Copyright © 2013, SAS Institute Inc. All rights reserved. Data coming from different sources can be tie using different methods like linear or not linear canonical decomposition. Data pattern variability on data in motion like data coming from devices can be sampled or simulate pattern distribution. Sparse vector data with missing values can be simulate using particular regression methods Discrete choice among different events can be defined using multinomial discrete models. Automatic time series forecast considering many series at the same time Data coming from different sources can be tie using different methods like linear or not linear canonical decomposition. Data pattern variability on data in motion like data coming from devices can be sampled or simulate pattern distribution. Sparse vector data with missing values can be simulate using particular regression methods Discrete choice among different events can be defined using multinomial discrete models. Automatic time series forecast considering many series at the same time

Copyright © 2013, SAS Institute Inc. All rights reserved. Network Graph Analysis can be used to: Measuring nodes importance and relationships among them. Measuring changes over time into a net. Identify how events spreading into the net using particular diffusion process. Graph Analysis can be used to: Measuring nodes importance and relationships among them. Measuring changes over time into a net. Identify how events spreading into the net using particular diffusion process. GRAPH ANALYSIS Node Link

Copyright © 2013, SAS Institute Inc. All rights reserved. REAL TIME MONITORING SYSTEM:  Building and managing the behavioral patterns of the measures for each type sensor to detect abnormal process by rules of alarm (offline process).  Building scenario how events spreading and influence different part of system  Monitoring measures to detect anomalies and the validity of the rules over time (online process).  Produce models to predict abnormalities in the medium term. Scenario

Copyright © 2013, SAS Institute Inc. All rights reserved. INTEGRATED PROCESS CONTROL: Shewhart type control charts with identification of the role of the history of the measures and trend-cycle components according to the Box-Jenkins methodology Multivariate analysis of processes: This is the main tool for statistical process control measures in relation to each other considering Markov chain process or diffusion processes Classification system components: The machines can be classified according to their behavior and some information about the specific characteristics of the same Identifying patterns of alarm: Rules of diagnostic thresholds identified by the control charts to minimize false alarms, depending on the history of the event to be monitored in real time Scenario

Copyright © 2013, SAS Institute Inc. All rights reserved. Historical process data storage Measures Metadata and classification Event process thresholds managing for alert process Extraction rules DABT System interface ADMINISTRATION SYSTEM: EXAMPLE Pattern recognition and event handling Module

Copyright © 2013, SAS Institute Inc. All rights reserved. Real time modelling. Alert Rules and pattern thresholds Module in real time check Data streaming analysis and update historical data. REAL TIME MONITORING SYSTEM: EXAMPLE Real time Feedback