Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois.

Similar presentations


Presentation on theme: "Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois."— Presentation transcript:

1 Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois

2 Problems: Price Time Series Display Virtual Machine Monitoring Price Time Series Prediction Conclusion Contents

3 Problem: Display Price Time Series Graphically Methodology: For input files of form DateTime, Price, InstanceType Zone 2015-04-03T18:44:39.000000 0.064400 c3.2xlarge us-east-1b 2015-04-03T18:44:40.000000 0.064500 c3.2xlarge us-east-1b Time Series Display

4 Approach: Step 1: Transform Date Time to Unix timestamp, for example: 2015-04-03T18:44:39.000000 transformed to 1437936645 Step 2: Extract name space Namespace set to c3.2xlarge us-east-1b Step 3: Extract data field For data line 2015-04-03T18:44:37.000000 0.064300 c3.2xlarge us-east-1b 999 data field is 999 Results Failure Analysis Conclusion Time Series Display (cont)

5 Step 4: create message message::= Step 5: Transmit message to Graphite server. The netcat utility may be used Time Series Display (cont)

6 Results: Time Series Display (Results)

7 Cloud Node VM Monitoring Problem A method to monitor CPU utilization was required Solution was monitoring script Authored by Shiv

8 Cloud Node VM Monitoring (cont) Python based Displays information for 1. Minimum CPU utilization 2. Maximum CPU utilization 3. Average CPU Utlixation

9 Cloud Node VM Monitoring (cont) Results Output u'Unit': 'Percent' u'Average': 0.025499999999999998, u'Maximum': 0.17000000000000001, u'Minimum': 0.0, u'Timestamp': datetime.datetime(2015, 7, 26, 23, 58, tzinfo=tz

10 Cloud Node VM Monitoring (cont) Future Work 1. The data output will be used to feed graph displays in Graphite

11 Price PredictionProblem Use predictive models to forecast price values. Take Lessons from stock price prediction Additionally, there are some beliefs that this is a non-deterministic system and cannot be predicted (e.g., like the weather) I can calculate the movement of the stars, but not the madness of men Isaac Newton - 1721

12 Prior Work In most cases, general approaches are discussed but without measures of effectiveness Approaches to price prediction: Physics, Chaos Theory, Stored patterns Machine Learning

13 Data Price data consists of a series of vertical price points. Example: 2015-04-05 09:12:24, 0.040800 2015-04-05 09:22:07, 0.040900 2015-04-05 09:31:52, 0.040800 2015-04-05 09:41:37, 0.040900 …… Algorithm Selected was Neural Network

14 Methodology (cont) Algoritm Selected was Neural Network based on: Eric Graubins: Hybrid Voting Algorithms Using Selected Models for Categorical Data. IKE 2006: 343-348IKE 2006 Documented success rate of 84.5% Data restructured in horizontal format. For example:

15 Methodology Data was formatted with horizontal orientation, as:,,, …, Actual data file example:. 032900,0.032900,0.032700,0.032700,0.032800,0.032800,0.032700,0.032700,0.033000,0.033000,0.032700,0.032700,0.032900,0.032900,0.032800,0.032800,0.032900,0.032900,0.033000,0.033000 0.032900,0.032900,0.033200,0.033200,0.033100,0.033100,0.032800,0.032800,0.032900,0.032900,0.033000,0.03300 0,0.032900,0.032900,0.032800,0.032800,0.032900,0.032900,0.032700,0.032700 0.032800,0.032800,0.032600,0.032600,0.032700,0.032700,0.032600,0.032600,0.032800,0.032800,0.032700,0.03270 0,0.080000,0.080000,0.065700,0.065700,0.060000,0.060000,0.043700,0.043700 Data contains 20 price data points.

16 Methodology (cont ) Data Training data consists of 500 rows. Out of sample test data consists of 30% of training data size. Price prediction was performed on 174 rows

17 Methodology (Cont) Evaluation was made by inserting data into Excel. Out of sample test data consists of 30% of training data size. The Column Titled Actual Price is the stored price. The difference is: ABS(Actual Price – Predicted Price) The difference value is used to measure success

18 Methodology (Cont)

19 Price Prediction Results Data rounded to hundredths: i.e.: $123.45 From 174 predictions, 167 instances of difference==0, or 96% Largest difference.06

20 Failure Analysis Algorithms Neural Net – Non linear classification C5.0 – Decision Trees – Built by multiple splitting and information gain gives mediocre results: 71% success C&R Decision Tree – Built by binary splitting Logistic Regression – Based on probabilities guves 75% Data Training/Actual data differences Measured by correlations Actual data has greater entropy

21 Conclusions and Future Work Data Mining techniques appear promising Success rate surpasses price analysis results 96 % success in picking predicting prices Work to use specialized voting algorithms to improve effectiveness Test with bagging variants Possible to perform time series analysis


Download ppt "Development Overview Authors: Eric Graubins Fermi National Accelerator Laboratory Batavia, Illinois."

Similar presentations


Ads by Google