Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Lecture 14 – Neural Networks
Sparse vs. Ensemble Approaches to Supervised Learning
Data Mining Techniques Outline
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Three kinds of learning
1 Homework  What’s important (i.e., this will be used in determining your grade): Finding features that make a difference You should expect to do some.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Appendix: The WEKA Data Mining Software
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Particle Filters.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Visualization in Process Mining
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Introduction to Machine Learning
Data Transformation: Normalization
Instance Based Learning
Trees, bagging, boosting, and stacking
Introduction to R Programming with AzureML
Source: Procedia Computer Science(2015)70:
Supervised Learning Seminar Social Media Mining University UC3M
Intelligent Information System Lab
Data Mining Lecture 11.
NBA Draft Prediction BIT 5534 May 2nd 2018
CSEP 546 Data Mining Machine Learning
Advanced Embodiment Design 26 March 2015
Data Mining Practical Machine Learning Tools and Techniques
Project 1 Binary Classification
CSEP 546 Data Mining Machine Learning
Analytics: Its More than Just Modeling
Artificial Intelligence Lecture No. 28
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Creative Activity and Research Day (CARD)
T18-08 Calculate MAD, MSE Purpose Allows the analyst to create and analyze the MAD and MSE for a forecast. A graphical representation of history and.
Chapter 7: Transformations
A task of induction to find patterns
Analysis on Accelerated Learning Cohorts
Modeling IDS using hybrid intelligent systems
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Machine Learning for Cyber
Data Mining CSCI 307, Spring 2019 Lecture 6
Machine Learning for Cyber
Presentation transcript:

Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois

Problems: Price Time Series Display Virtual Machine Monitoring Price Time Series Prediction Conclusion Contents

Problem: Display Price Time Series Graphically Methodology: For input files of form DateTime, Price, InstanceType Zone T18:44: c3.2xlarge us-east-1b T18:44: c3.2xlarge us-east-1b Time Series Display

Approach: Step 1: Transform Date Time to Unix timestamp, for example: T18:44: transformed to Step 2: Extract name space Namespace set to c3.2xlarge us-east-1b Step 3: Extract data field For data line T18:44: c3.2xlarge us-east-1b 999 data field is 999 Results Failure Analysis Conclusion Time Series Display (cont)

Step 4: create message message::= Step 5: Transmit message to Graphite server. The netcat utility may be used Time Series Display (cont)

Results: Time Series Display (Results)

Cloud Node VM Monitoring Problem A method to monitor CPU utilization was required Solution was monitoring script Authored by Shiv

Cloud Node VM Monitoring (cont) Python based Displays information for 1. Minimum CPU utilization 2. Maximum CPU utilization 3. Average CPU Utlixation

Cloud Node VM Monitoring (cont) Results Output u'Unit': 'Percent' u'Average': , u'Maximum': , u'Minimum': 0.0, u'Timestamp': datetime.datetime(2015, 7, 26, 23, 58, tzinfo=tz

Cloud Node VM Monitoring (cont) Future Work 1. The data output will be used to feed graph displays in Graphite

Price PredictionProblem Use predictive models to forecast price values. Take Lessons from stock price prediction Additionally, there are some beliefs that this is a non-deterministic system and cannot be predicted (e.g., like the weather) I can calculate the movement of the stars, but not the madness of men Isaac Newton

Prior Work In most cases, general approaches are discussed but without measures of effectiveness Approaches to price prediction: Physics, Chaos Theory, Stored patterns Machine Learning

Data Price data consists of a series of vertical price points. Example: :12:24, :22:07, :31:52, :41:37, …… Algorithm Selected was Neural Network

Methodology (cont) Algoritm Selected was Neural Network based on: Eric Graubins: Hybrid Voting Algorithms Using Selected Models for Categorical Data. IKE 2006: IKE 2006 Documented success rate of 84.5% Data restructured in horizontal format. For example:

Methodology Data was formatted with horizontal orientation, as:,,, …, Actual data file example: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Data contains 20 price data points.

Methodology (cont ) Data Training data consists of 500 rows. Out of sample test data consists of 30% of training data size. Price prediction was performed on 174 rows

Methodology (Cont) Evaluation was made by inserting data into Excel. Out of sample test data consists of 30% of training data size. The Column Titled Actual Price is the stored price. The difference is: ABS(Actual Price – Predicted Price) The difference value is used to measure success

Methodology (Cont)

Price Prediction Results Data rounded to hundredths: i.e.: $ From 174 predictions, 167 instances of difference==0, or 96% Largest difference.06

Failure Analysis Algorithms Neural Net – Non linear classification C5.0 – Decision Trees – Built by multiple splitting and information gain gives mediocre results: 71% success C&R Decision Tree – Built by binary splitting Logistic Regression – Based on probabilities guves 75% Data Training/Actual data differences Measured by correlations Actual data has greater entropy

Conclusions and Future Work Data Mining techniques appear promising Success rate surpasses price analysis results 96 % success in picking predicting prices Work to use specialized voting algorithms to improve effectiveness Test with bagging variants Possible to perform time series analysis