CSIC 5011 Project 1: Finance Data PCA, Parallel Analysis

Slides:



Advertisements
Similar presentations
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Advertisements

Gil Dafnai, Jonathan Sidi Research Department, Bank of Israel.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
Dimension reduction (1)
Bank of Greece, 4 February Assessing the predictive power of measures of financial conditions for macroeconomic variables Kostas Tsatsaronis Head.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Genomic Signal Processing: Ensemble Dependence Model for Classification and Prediction of Cancer Based on Gene Expression Data Joseph DePasquale Engineering.
O EQUITY=OWNERSHIP OF PART OF CORPORATION o STOCKHOLDER=PERSON WHO BUYS STOCK o PRICE OF STOCK =DETERMINED BY HOW MUCH INVESTORS WANT TO PAY VIDEO: WHAT.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
What Explains the Stock Market’s Reaction to Federal Reserve Policy? Bernanke and Kuttner.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Duan Wang Center for Polymer Studies, Boston University Advisor: H. Eugene Stanley.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Ratha Chea & Sovan Lek Symposium on Biodiversity and Health th November 2014, Phnom Penh Cambodia Spatial analysis of water quality variability in.
Electronically Traded Funds (Exchange Traded Funds)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Eigenfaces for Recognition Student: Yikun Jiang Professor: Brendan Morris.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
What Explains the Stock Market’s Reaction to Federal Reserve Policy? Bernanke and Kuttner.
Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Syntec Informatique October 25, 2002 WITSA Steering Committee M. François DUFAUX, Syntec Informatique Chairman.
Review of Statistics and Linear Algebra Mean: Variance:
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu February, 2010.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principal Component Analysis (PCA)
Factor & Cluster Analyses. Factor Analysis Goals Data Process Results.
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,
Market analysis for the S&P500 Giulio Genovese Tuesday, December
Instructor: Chris Bemis Random Matrix in Finance Understanding and improving Optimal Portfolios Mantao Wang, Ruixin Yang, Yingjie Ma, Yuxiang Zhou, Wei.
3 “Products” of Principle Component Analysis
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Results H. S. Zhang, J. R. Wei, and J. P. Huang Department of Physics, Fudan University, Shanghai , China Method Abstract Results Conclusion The.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Experience Report: System Log Analysis for Anomaly Detection
Principal Component Analysis
Exploratory Factor Analysis
AS Business Studies Size of a business.
PREDICT 422: Practical Machine Learning
Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.
Information Management course
Stock Market 101 What are stocks? What is the stock market?
Exploring Microarray data
Dimension Reduction in Workers Compensation
CS548 Fall 2017 Anomaly Detection
Reasons for not attending to present at UKSim 2018
Warm Up What does it mean when a person has stock in a company?
Quality Control at a Local Brewery
Overview : What are the relevant factors? What are the
Example of PCR, interpretation of calibration equations
Economics of International Finance Econ. 315
Motivation Do you know of any stock exchanges?.
Techniques for Data Analysis Event Study
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Kostas Tsatsaronis Head of Financial Institutions
ALL the following plots are subject to the filtering :
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
PCA of Waimea Wave Climate
MATH 6380J Mini-Project 1: Realization of Recent Trends in Machine Learning Community in Recent Years by Pattern Mining of NIPS Words Chan Lok Chun
Principal Component Analysis
Stock Selection Instructions
Introduction to the Stock Market
Chaug‐Ing Hsu & Yuh‐Horng Wen (1998)
930 percent 142 percent 50 percent
Presentation transcript:

CSIC 5011 Project 1: Finance Data PCA, Parallel Analysis Meilan WANG1 and Di LIU2 {mwangau, dliuah}@connect.ust.hk 1: Department of Civil and Environmental Engineering, HKUST 2: Department of Mechanical and Aerospace Engineering, HKUST 1. Introduction The SNP’ 500 is an American stock market index based on the market capitalizations of about 500 large companies having common stock listed on the NYSE, NASDAQ, or the Cboe BZX Exchange. Those stocks are representative of the industries in the United States economy. We carried out PCA and Parallel Analysis over a time series of stock closed prices in SNP’500. Through the analysis, we find out some interesting interpretation of principal components in the real world. 2. Overview Dataset The dataset contains 1258-by-452 matrix with closed prices of 452 stocks in SNP’500 for workdays in 4 years Methodology PCA, Parallel Analysis Why? This is a dataset with hundreds of companies or thousands of days as features. In order to know more about the dataset, or modelling with the dataset, dimension reduction is necessary. With further projection the original company time series on to the principal component constructed space, the sector distribution on the overall SNP’500 stock distribution has explainable meaning. With parallel analysis, we can confirm if the number of eigenvalues we chose is in the reasonable range. How? Normalize price  PCA  Project in PC space  Analysis Fg1. 452 Stock Time Series Overview Fg3. PCA explain variance ratio (except outliers) Fg5. PCA explain variance ratio (except outlier companies) Fg2. Normalize before fit PCA Fg4. Project original data on the PC1 x PC2 space 3. Analysis By calculating the “log(current day price – previous day price)” from the original dataset, we normalized the original dataset (Fg2). Then we conducted the PCA to find out the components have most significant impact. While first two component contributes over 8.5% of explained variance ratio, we project the original data on the space of PC1 x PC2. By coloring out different industry sectors on the overall SNP’500 companies distribution. The Energy sector and IT sector is more distant than downstream and traditional industry sectors. The principal components are possible to explain with the sector-wise distribution figure. 4.Conclusion and Further Work The Energy and IT sector is more distant than downstream and daily-life related industry sectors. The principal component 1 could be upstream and downstream of business chain, while the principal component 2 could be the distance to daily-life. The upstream and down stream is in different clusters. If we have exact time information, or more information about the companies, we could conducted PCA using time as feature, or fit PCA within sector companies; With PCA result, we could build a predicting model with less features. PCA on subgroup data with Anaconda-Jupyter, poster Meilan WANG PCA and Parallel Analysis on overall SNP500 with Matlab Di LIU 5. Contribution