Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.

Slides:



Advertisements
Similar presentations
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
1 Detection and Analysis of Impulse Point Sequences on Correlated Disturbance Phone G. Filaretov, A. Avshalumov Moscow Power Engineering Institute, Moscow.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Swinburne University.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
A Generalized Model for Financial Time Series Representation and Prediction Author: Depei Bao Presenter: Liao Shu Acknowledgement: Some figures in this.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Reduced Support Vector Machine
1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
ForecastingOMS 335 Welcome to Forecasting Summer Semester 2002 Introduction.
Characterizing and Predicting TCP Throughput on the Wide Area Network Dong Lu, Yi Qiao, Peter Dinda, Fabian Bustamante Department of Computer Science Northwestern.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
AGEC 622 Mission is prepare you for a job in business Have you ever made a price forecast? How much confidence did you place on your forecast? Was it correct?
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
X. Liu, J. Chen, Z. Wu, Z. Ni, D. Yuan, Y. Yang, CCGrid10, , Melbourne, Australia Handling Recoverable Temporal Violations in Scientific Workflow.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
Host Load Prediction in a Google Compute Cloud with a Bayesian Model Sheng Di 1, Derrick Kondo 1, Walfredo Cirne 2 1 INRIA 2 Google.
Traffic modeling and Prediction ----Linear Models
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Introduction of CS3 and Research in Workflow Technology Program Xiao Liu CS3, Swinburne University of Technology Melbourne, Australia.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
COMP3503 Intro to Inductive Modeling
Verification & Validation
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Xiao Liu CS3 -- Centre for Complex Software Systems and Services Swinburne University of Technology, Australia Key Research Issues in.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.
10 th December, 2013 Lab Meeting Papers Reviewed:.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Xiao Liu 1, Yun Yang 1, Jinjun Chen 1, Qing Wang 2, and Mingshu Li 2 1 Centre for Complex Software Systems and Services Swinburne University of Technology.
Visualization and Exploration of Temporal Trend Relationships in Multivariate Time-Varying Data Teng-Yok Lee & Han-Wei Shen.
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China An Effective Framework for Handling Recoverable Temporal Violations.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Group members :- 1.Vipul S. Basapati ( ) 2.Kathan Tripathi ( )
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
Xiao Liu, Jinjun Chen, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen,
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.

Take control over mission critical processes
Supervised Time Series Pattern Discovery through Local Importance
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chapter 10 Verification and Validation of Simulation Models
Statistical Methods Carey Williamson Department of Computer Science
Presenter: Xudong Zhu Authors: Xudong Zhu, etc.
Smita Vijayakumar Qian Zhu Gagan Agrawal
Laura Bright David Maier Portland State University
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen, kliu, Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December Introduction  Time-Series Forecasting  Time-Series Patterns Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns  Motivation  The Pattern Game  Evaluation Conclusion Content

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Time Series Forecasting A time series is a set of observations made sequentially through time.  Marketing time series  Temperature time series  System performance time series Time-series forecasting is to predict the likely outcome of the time series in the near future, given knowledge of the most recent outcomes  CPU load, network load, activity durations What’s this time series about, mind taking a guess?AUD/USD (1 day in 1 year): from

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Time Series Forecasting It was on the rise, but who knows the crises #%#&… Homer Simpson’s forecasting line

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Time Series Pattern A pattern is a type of theme of recurring events or objects which repeats in a predictable manner Time series patterns can be regarded as a set of time series segments which re-occurs in a statistic sense

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December Introduction  Time-Series Forecasting  Time-Series Patterns Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns  Motivation  Pattern Based Time-Series Forecasting Strategy  Evaluation Conclusion Where Are We

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Motivation Scientific workflow activity durations are important for scientific workflow scheduling, temporal verification and many other time related QoS functionalities  From the initial job submission to the final completion, comprising the execution time and vast scientific workflow overheads: data transfer overheads, middleware overheads, loss of parallelism overheads and etc*.  Dynamic performance of underlying infrastructures, e.g. grid computing, peer to peer, cloud computing… * R. Prodan and T. Fahrigne, Analysis of Scientific Workflow Overheads in Grid Environments, TPDS, 2008)

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Problems Current work mainly utilises linear time-series models, such as MA (Moving Average), AR (Autoregressive), Box-Jenkins…  Focusing on CPU load prediction for the execution time of computation intensive activities  Data intensive activities? Many other overheads?  Forecasting point values  Duration intervals are more applicable in practice  Requiring large sample size  Difficult for scientific workflow activities with constrained concurrent instances and long-term durations  Frequent turning points  Significantly deteriorates the effectiveness of linear time-series models

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December Introduction  Time-Series Forecasting  Time-Series Patterns Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns  Motivation  Pattern Based Time-Series Forecasting Strategy  Evaluation Conclusion Where Are We

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Duration-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Strategy Overview Duration series building  A periodical sampling plan to increase the sample size Duration pattern recognition  A non-linear time-series segmentation algorithm to identify potential pattern set  checking validity  final pattern set Duration pattern matching  Similarity search for the closet pattern give the latest duration sequence Duration interval forecasting  Duration interval forecasting based on the statistics of the matched duration pattern Pattern based time-series forecasting strategy

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Step 1: Duration Series Building A periodical sampling plan where the samples with their submission time belonging to the same observation time unit of each period are joined together to address the problem of limited sample size. A representative duration series is built with the sample mean of each unit. Periodical sampling

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Step 2: Pattern Recognition Discovering potential pattern set  K-MaxSDev time-series segmentation algorithm K-MaxSDev: a hybrid time-series segmentation algorithm based on Bottom-Up, Sliding Windows and Top-Down  K: the initial value for equal segmentation  MaxSDev ( Max imum S tandard Dev iation): the testing criterion for time-series segmentation  K and MaxSDev can be specified with empirical functions provided in the paper ( Formula 1 and Formula 2 )

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Bottom-Up Process Initial K equal segmentation

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Sliding Window Process Sliding Window to merge neighbouring segments

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Sliding Window Process Testing the standard deviation of the new segment SDev with MaxSDev

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Sliding Window Process If SDev ≥ MaxSDev, testing failed, stay separated Failed

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Sliding Window Process If SDev < MaxSDev, testing successful, merge to form a larger segment Successful

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Top-Down Process After Sliding Window process, split those segments which cannot be merged with any neighbours

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 K-MaxSDev: Iteration Repeat Sliding Window and Top-Down until all segments cannot be merged with neighbouring segments.

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Pattern Validation Validating the final segments with Min_pattern_length to ensure its statistic effectiveness. If failed, marked with ‘invalid’, otherwise, marked with ‘valid’.

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Turning Points Discovery Turning points are specified as either the mean of the invalid pattern or the first value of the next valid pattern. K-MaxSDev ensures the violations of MaxSDev only occur on the edge of two adjacent segments where the deviations exceed the threshold of MaxSDev

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Step 3: Pattern Matching The latest duration sequence with SDev and Mean, can be classified into three types  Type 1: SDev>MaxSDev  Cannot match any valid patterns and must contain at least one turning point  First locate the turning points and then conduct pattern matching  If SDev<MaxSDev, searching for the matched pattern based on Mean. The matched pattern with PSDev and PMean  Type 2: SDev ≥ PSDev  Typ3 3: SDev < PSDev

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Step 4: Interval Forecasting The user specified confidence value is α% with λ probability percentile, the predicted mean of the next value is M and its standard deviation is S. The interval of the next value is predicted to be (M- λS, M+ λS) For Type 2: PSDev ≤SDev<MaxSDev  The next value of the sequence will probably be a turning point since it is on the edge of two different patterns. The value of the turning point is TP.  M = TP, S = MaxSDev For Type 3: SDev<PSDev  The next value of the sequence can be predicted with the statistical features of the matched pattern  M = PMean, S= PSDev

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December Introduction  Time-Series Forecasting  Time-Series Patterns Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns  Motivation  Pattern Based Time-Series Forecasting Strategy  Evaluation Conclusion Where Are We

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Simulation Environment SwinDeW-G: a peer-to-peer based grid workflow system running on the SwinGrid (Swinburne service Grid) platform

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Duration Series Building Sample: 15 duration-series, length 8 hour (8:00am~8:00pm), observation unit every 15 mins. Parameters: K=12, MaxSDev=2.24, Min_Pattern_Length=3

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Duration Series Building

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Pattern Recognition

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Patten Validation and Turning Points Discovery

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Forecasting Performance Testing on 30 duration sequences with random length of 3 to 5. Predicted Duration Intervals

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Comparison of Prediction Errors MEAN : Use the mean value of the duration sequence as prediction LAST : Use the last value of the duration sequence as prediction

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December Introduction  Time-Series Forecasting  Time-Series Patterns Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns  Motivation  Pattern Based Time-Series Forecasting Strategy  Evaluation Conclusion Where Are We

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008 Conclusion Scientific workflow activity durations are much more complicated than that of conventional computation tasks Conventional linear time-series forecasting models suffers from limited sample size and frequent turning points Time-series pattern based forecasting strategy  Duration series building  Duration pattern recognition and turning point discovery  Duration pattern matching  Duration interval forecasting Our strategy is more scalable with sample size and robust with turning points

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December The End Thanks! Any Questions?