1 Estimating Structured Vector Autoregressive Models Igor Melnyk and Arindam Banerjee.

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

State Space Models. Let { x t :t T} and { y t :t T} denote two vector valued time series that satisfy the system of equations: y t = A t x t + v t (The.
Dates for term tests Friday, February 07 Friday, March 07
The Simple Regression Model
The Multiple Regression Model.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Portfolio Diversity and Robustness. TOC  Markowitz Model  Diversification  Robustness Random returns Random covariance  Extensions  Conclusion.
The Simple Linear Regression Model: Specification and Estimation
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
Statistical analysis and modeling of neural data Lecture 6 Bijan Pesaran 24 Sept, 2007.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
© 2003 by Davi GeigerComputer Vision November 2003 L1.1 Tracking We are given a contour   with coordinates   ={x 1, x 2, …, x N } at the initial frame.
On L1q Regularized Regression Authors: Han Liu and Jian Zhang Presented by Jun Liu.
Lecture II-2: Probability Review
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Modern Navigation Thomas Herring
Spatial Regression Model For Build-Up Growth Geo-informatics, Mahasarakham University.
Review of Probability.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Random Process The concept of random variable was defined previously as mapping from the Sample Space S to the real line as shown below.
Isolated-Word Speech Recognition Using Hidden Markov Models
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
1 Part 5 Response of Linear Systems 6.Linear Filtering of a Random Signals 7.Power Spectrum Analysis 8.Linear Estimation and Prediction Filters 9.Mean-Square.
EEG Classification Using Maximum Noise Fractions and spectral classification Steve Grikschart and Hugo Shi EECS 559 Fall 2005.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Optimal Experimental Design Theory. Motivation To better understand the existing theory and learn from tools that exist out there in other fields To further.
Estimation of the spectral density function. The spectral density function, f( ) The spectral density function, f(x), is a symmetric function defined.
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Robotics Research Laboratory 1 Chapter 7 Multivariable and Optimal Control.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
CpSc 881: Machine Learning
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Discrete-time Random Signals
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Computacion Inteligente Least-Square Methods for System Identification.
CARMA Models for Stochastic Variability (or how to read a PSD) Jim Barrett University of Birmingham.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
ELG5377 Adaptive Signal Processing Lecture 13: Method of Least Squares.
Canadian Bioinformatics Workshops
Stochastic Process - Introduction
CARMA Models for Stochastic Variability (or how to read a PSD)
(5) Notes on the Least Squares Estimate
Probability Theory and Parameter Estimation I
Basic simulation methodology
Tracking We are given a contour G1 with coordinates G1={x1 , x2 , … , xN} at the initial frame t=1, were the image is It=1 . We are interested in tracking.
Pattern Recognition PhD Course.
Random Process The concept of random variable was defined previously as mapping from the Sample Space S to the real line as shown below.
Clustering Using Pairwise Comparisons
Modern Spectral Estimation
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Estimating Networks With Jumps
Learning Theory Reza Shadmehr
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
16. Mean Square Estimation
Presentation transcript:

1 Estimating Structured Vector Autoregressive Models Igor Melnyk and Arindam Banerjee

2 Consider Vector Autoregressive Model (VAR) where: - multivariate time series - model parameters, order of model - Gaussian noise:,, Widely used model Financial time series : [Tsay ’05] Dynamic systems: [Ljung ’98] Brain function connectivity [Valdes-Sosa et. al. ’05] Anomaly detection in aviation data Introduction

3 Objective: estimate Let be VAR output across steps Estimation Problem

4

5 Objective: estimate Regularized estimator Estimation Problem vectorize

6 - regularization norm, - regularization parameter Examples: - sparsity - group sparsity K-support norm Overlapping group sparsity Main properties: Samples are correlated - any norm, separable along rows of, arbitrary with row Regularized Estimator

7 Properties of VAR model Samples are correlated - separable along rows of, arbitrary within row Regularized Estimator

8 Constrained VAR (Dantzig selector form) [Han & Liu ’13] - based formulation under Gaussian noise [Qiu et. al. ’15] Extension of above to heavy-tailed noise Regularized VAR [Han & Liu. ’13] is and group ; assumptions on data dependency [Kock & Callot ’15] is ; exploited martingale property o data [Loh et. al. ’11] is ; considered only first-order VAR [Basu et. al. ’15] is ; any-order VAR; spectral analysis of data correlation Regularized linear regression [Meinshausen et. al. ’09, Wainwright ’09, Rudelson ’13] is [Negahban et. al. ’09] is any decomposable norm [Banerjee et. al. ’14] is any norm Our approach Use framework of [Banerjee et. al. ’14] and generic chaining of [Talagrand ’06] Use spectral analysis of [Basu et. al. ’15], exploit martingale property of data Related Work

9 Any row: Projected rows: for Data Matrix X

10 Any row of X: Matrix X: Any Row

11 To characterize, consider, for example Can be viewed as composed from consecutive outputs of VAR model Then eigenvalue of can be bounded using spectral density of VAR Matrix X: Any Row

12 Block-Toeplitz matrix Eigenvalues can be bounded is a spectral density Can be viewed as Fourier transform of Matrix X: Any Row

13 Therefore, each row of X: Using VAR spectral density Smallest eigenvalue of can be bounded as where Matrix X: Any Row

14 Now consider, for Since, it follows that where and Matrix X: Multiple Projected Rows

15 To characterize, observe that Can be viewed as composed from consecutive output of VAR model of order 1 Matrix X: Multiple Projected Rows

16 Eigenvalues of block-Toeplitz matrix can be bounded as is a spectral density of VAR model of order 1 Largest eigenvalue of can be bounded as where Matrix X: Multiple Projected Rows

17 Therefore, multiple projected rows of X: Eigenvalues of its covariance matrix Largest eigenvalue of where Matrix X: Multiple Projected Rows

18 Return back to our estimator Denote error between true and estimated parameter Our task Establish conditions on under which error is bounded Error Analysis Framework

19 Assume for Then the error belongs to a set Moreover, assume that for and Then the error is bounded (deterministically) as where Error Analysis Framework Bound on regularization parameter Restricted eigenvalue condition

20 Define a unit set Define Gaussian width of set for For, with probability where are absolute constants Result: Bound on Regularization Parameter

21 Let,, then Let, where - unit sphere Define Gaussian width for For with probability at least As a consequence, Result: Restricted Eigenvalue Condition

22 Select number of data samples such that Then choose regularization parameter such that Restricted eigenvalue condition will hold w. h. p., i.e., Norm of estimation error will be bounded w. h. p. by Lasso: and, therefore Group Lasso: and Interpretation of Results - sparsity - group sparsity - number of groups - size of largest group

23 Need to find such that holds w.h.p. Dual norm Establish bound Proof Strategy for Bound on

24 Proof Strategy for Bound on establish sub-exponential tails use generic chaining argument of Talagrand - summation over martingale difference sequence - majorizing measure inequality

25 Need to show holds w.h.p. for Separate problem into parts Establish bound of the form Proof Strategy for RE

26 Proof Strategy for RE establish concentration of Lipschitz function of Gaussian random vector use generic chaining argument of Talagrand

27 Investigate scaling of errors and lambda Lasso Experiments

28 Investigate scaling of errors and lambda Group Lasso Experiments