Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois
Problems: Price Time Series Display Virtual Machine Monitoring Price Time Series Prediction Conclusion Contents
Problem: Display Price Time Series Graphically Methodology: For input files of form DateTime, Price, InstanceType Zone T18:44: c3.2xlarge us-east-1b T18:44: c3.2xlarge us-east-1b Time Series Display
Approach: Step 1: Transform Date Time to Unix timestamp, for example: T18:44: transformed to Step 2: Extract name space Namespace set to c3.2xlarge us-east-1b Step 3: Extract data field For data line T18:44: c3.2xlarge us-east-1b 999 data field is 999 Results Failure Analysis Conclusion Time Series Display (cont)
Step 4: create message message::= Step 5: Transmit message to Graphite server. The netcat utility may be used Time Series Display (cont)
Results: Time Series Display (Results)
Cloud Node VM Monitoring Problem A method to monitor CPU utilization was required Solution was monitoring script Authored by Shiv
Cloud Node VM Monitoring (cont) Python based Displays information for 1. Minimum CPU utilization 2. Maximum CPU utilization 3. Average CPU Utlixation
Cloud Node VM Monitoring (cont) Results Output u'Unit': 'Percent' u'Average': , u'Maximum': , u'Minimum': 0.0, u'Timestamp': datetime.datetime(2015, 7, 26, 23, 58, tzinfo=tz
Cloud Node VM Monitoring (cont) Future Work 1. The data output will be used to feed graph displays in Graphite
Price PredictionProblem Use predictive models to forecast price values. Take Lessons from stock price prediction Additionally, there are some beliefs that this is a non-deterministic system and cannot be predicted (e.g., like the weather) I can calculate the movement of the stars, but not the madness of men Isaac Newton
Prior Work In most cases, general approaches are discussed but without measures of effectiveness Approaches to price prediction: Physics, Chaos Theory, Stored patterns Machine Learning
Data Price data consists of a series of vertical price points. Example: :12:24, :22:07, :31:52, :41:37, …… Algorithm Selected was Neural Network
Methodology (cont) Algoritm Selected was Neural Network based on: Eric Graubins: Hybrid Voting Algorithms Using Selected Models for Categorical Data. IKE 2006: IKE 2006 Documented success rate of 84.5% Data restructured in horizontal format. For example:
Methodology Data was formatted with horizontal orientation, as:,,, …, Actual data file example: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Data contains 20 price data points.
Methodology (cont ) Data Training data consists of 500 rows. Out of sample test data consists of 30% of training data size. Price prediction was performed on 174 rows
Methodology (Cont) Evaluation was made by inserting data into Excel. Out of sample test data consists of 30% of training data size. The Column Titled Actual Price is the stored price. The difference is: ABS(Actual Price – Predicted Price) The difference value is used to measure success
Methodology (Cont)
Price Prediction Results Data rounded to hundredths: i.e.: $ From 174 predictions, 167 instances of difference==0, or 96% Largest difference.06
Failure Analysis Algorithms Neural Net – Non linear classification C5.0 – Decision Trees – Built by multiple splitting and information gain gives mediocre results: 71% success C&R Decision Tree – Built by binary splitting Logistic Regression – Based on probabilities guves 75% Data Training/Actual data differences Measured by correlations Actual data has greater entropy
Conclusions and Future Work Data Mining techniques appear promising Success rate surpasses price analysis results 96 % success in picking predicting prices Work to use specialized voting algorithms to improve effectiveness Test with bagging variants Possible to perform time series analysis