Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Information Theory For Data Management
Fast Algorithms For Hierarchical Range Histogram Constructions
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Mining Classification: Alternative Techniques
Pattern Recognition and Machine Learning
K-means method for Signal Compression: Vector Quantization
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
K Means Clustering , Nearest Cluster and Gaussian Mixture
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Machine Learning CMPT 726 Simon Fraser University
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance-Based Learners So far, the learning methods that we have seen all create a model based on a given representation and a training set. Once the.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Gini Index (IBM IntelligentMiner)
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Chapter6 Jointly Distributed Random Variables
Review of Probability.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Extrapolation Models for Convergence Acceleration and Function ’ s Extension David Levin Tel-Aviv University MAIA Erice 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Additive Data Perturbation: the Basic Problem and Techniques.
Low-Dimensional Chaotic Signal Characterization Using Approximate Entropy Soundararajan Ezekiel Matthew Lang Computer Science Department Indiana University.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Multivariate Time Series Analysis
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
Mutual Information Brian Dils I590 – ALife/AI
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Multiple Random Variables and Joint Distributions
Machine Learning with Spark MLlib
Data Transformation: Normalization
Chapter 7. Classification and Prediction
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Clustering (3) Center-based algorithms Fuzzy k-means
K Nearest Neighbor Classification
Some Rules for Expectation
Collaborative Filtering Matrix Factorization Approach
STOCHASTIC HYDROLOGY Random Processes
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
LECTURE 23: INFORMATION THEORY REVIEW
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Random Neural Network Texture Model
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM

Time series prediction Suppose we have an univariate time series x(t) for t = 1, 2, …, N. Then we want to know or predict the value of x(N + p). If p = 1, it would be called one-step prediction. If p > 1, it would be called multi-step prediction.

Flowchart

Find k-nearest neighbors Assume the current time index is 20. First we reconstruct the query Then the distance between the query and historical data is

Find k-nearest neighbors If k = 3, and the first k closest neighbors are t 14, t 15, t 16. Then we can construct the smaller data set.

Flowchart

Lag selection Lag selection is the process of selecting a subset of relevant features for use in model construction. Why we need lag? Lag selection is like feature selection, not feature extraction.

Lag selection Usually, the lag selection can be divided into two broad classes: filter method and wrapper method. The lag subset is chosen by an evaluation criterion, which measures the relationship of each subset of lags with the target or output.

Wrapper method The best lag subset is selected according to the model. The lag selection is a part of the learning.

Filter method In this method, we need the criterion which can measures the correlation or dependence. For example, correlation, mutual information, ….

Lag selection Which is better? The wrapper method solve the real problem, but need more time. The filter method provide the lag subset which perform the worse result. We use the filter method because of the architecture.

Entropy The entropy is a measure of uncertainty of a random variable. The entropy of a discrete random variable is defined by 0log0 = 0

Entropy Example, let Then

Entropy

Example, let Then

Joint entropy Definition: The joint entropy of a pair of discrete random variables (X, Y) is defined as

Conditional entropy Definition: The conditional entropy is defined as And

Proof

Mutual information The mutual information is a measure of the amount of information one random variable contains about another. It’s the extended notion of the entropy. Definition: The mutual information of the two discrete random variables is

Proof

The relationship between entropy and mutual information

Mutual information Definition: The mutual information of the two continuous random variables is The problem is that the joint probability density function of X and Y is hard to compute.

Binned Mutual information The most straightforward and widespread approach for estimating MI consists in partitioning the supports of X and Y into bins of finite size

Binned Mutual information For example, consider a set of 5 bivariate measurements, z i =(x i, y i ), where i = 1, 2, …, 5. And the values of these points are

Binned Mutual information

Estimating Mutual information Another approach for estimating mutual information. Consider the case with two variables. The 2-dimension space Z is spanned by X and Y. Then we can compute the distance between each point.

Estimating Mutual information Let us denote by the distance from to its k-nearest neighbor, and by and the distances between the same points projected into the X and Y subspaces. Then we can count the number n x (i) of points x j whose distance from x i is strictly less than, and similarly for y instead of x.

Estimating Mutual information

The estimate for MI is then Alternatively, in the second algorithm, we replace n x (i) and n y (i) by the number of points with

Estimating Mutual information

Then

Estimating Mutual information For the same example, k = 2 For the point p 1 (0, 1) For the point p 2 (0.5,5)

Estimating Mutual information For the point p 3 (1,3) For the point p 4 (3,4)

Estimating Mutual information For the point p 5 (4,1) Then

Estimating Mutual information Example – a=rand(1,100) – b=rand(1,100) – c=a*2 Then

Estimating Mutual information Example – a=rand(1,100) – b=rand(1,100) – d=2*a + 3*b Then

Flowchart

Model Now we have a training data set which contains k records, then we need a model to predict.

Instance-based learning The points that are close to the query have large weights, and the points far from the query have small weights. Locally weighted regression General Regression Neural Network(GRNN)

Property of the local frame

Weighted LS-SVM The goal of the standard LS-SVM is to minimize the risk function: Where the γ is the regularization parameter.

Weighted LS-SVM The modified risk function of the weighted LS-SVM is And

Weighted LS-SVM The weighted is designed as