Applied Machine Learning For Quant Finance

Slides:



Advertisements
Similar presentations
Mean-variance portfolio theory
Advertisements

Detecting Faces in Images: A Survey
Simple Linear Regression
Learning From Data Chichang Jou Tamkang University.
Economics 173 Business Statistics Lecture 16 Fall, 2001 Professor J. Petry
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Bond Portfolio Management Strategies
Brandon Groeger April 6, I. Stocks a. What is a stock? b. Return c. Risk d. Risk vs. Return e. Valuing a Stock II. Bonds a. What is a bond? b. Pricing.
Chapter 3 Data Exploration and Dimension Reduction 1.
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.
BOND PRICE VOLATILITY. PRICE YIELD PRICE YIELD RELATIONSHIP CONVEX SHAPE.
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Chapter 7 An Introduction to Portfolio Management.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
 Hedge Funds. The Name  Act as hedging mechanism  Investing can hedge against something else  Typically do well in bull or bear market.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Securities Analyst Program
Usman Roshan Dept. of Computer Science NJIT
DSM financial cost and modelling
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Principal Component Analysis (PCA)
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Dimensionality Reduction and Principle Components Analysis
Bank of Montreal CI C. A. P. I. T. A. L
Unsupervised Learning of Video Representations using LSTMs
Decision Support Systems
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Market-Risk Measurement
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
COMP 1942 PCA TA: Harry Chan COMP1942.
Trees, bagging, boosting, and stacking
Supervised Time Series Pattern Discovery through Local Importance
Interest Rate Risk Chapter 9
Principal Component Analysis (PCA)
Basic machine learning background with Python scikit-learn
Neural networks (3) Regularization Autoencoder
Machine Learning Basics
CS 179 Lecture 17 Options Pricing.
Vincent Granville, Ph.D. Co-Founder, DSC
CAMCOS Report Day December 9th, 2015 San Jose State University
Market Risk VaR: Model-Building Approach
Managing Bond Portfolios
What is Regression Analysis?
Word Embedding Word2Vec.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Product moment correlation
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
LO 5-1 Compute various measures of return on multi-year investments.
Asset Allocation and the Use of Hedge Funds
Neural networks (3) Regularization Autoencoder
Statistical Thinking and Applications
Roc curves By Vittoria Cozza, matr
Machine Learning – a Probabilistic Perspective
Usman Roshan Dept. of Computer Science NJIT
CAMCOS Report Day December 9th, 2015 San Jose State University
Risk Management: A History
Introduction to Portfolio Management
Credit Card Fraudulent Transaction Detection
What is Artificial Intelligence?
An introduction to Machine Learning (ML)
Presentation transcript:

Applied Machine Learning For Quant Finance Strata Data Conference March 27, 2019 Chakri Cherukuri Senior Researcher Quantitative Financial Research Group

Outline ML use cases in finance Case studies promoting reproducible research Jupyter notebooks Interactive plots Conclusion

Quantitative Finance Sell Side Buy Side Institutions Banks (Goldman, JPM, etc.) Hedge funds, asset managers Tasks Market Making Derivatives pricing/risk management Asset Allocation Portfolio Management Mathematical tools Stochastic Calculus, Monte Carlo, PDEs Multi variate stats, regression models, convex optimization

ML In Finance: Structured Datasets Tasks Machine Learning Techniques Time series prediction LSTM Illiquid asset pricing Boosted Trees/Random Forests Trading Strategies Dimensionality Reduction PCA/Autoencoder Exotic option pricing Neural Nets

ML In Finance: Unstructured Datasets Tasks Deep Learning Techniques Object detection from satellite images Conv nets Summarization of news articles RNN, attention based models News/Twitter sentiment NLP models (Word embeddings + Nets) Named Entity Recognition LSTM

ML In Finance: Challenges Structured data sets Unstructured/Alt data sets Obtaining labeled datasets Cheap Expensive Labeled dataset QA Minimal High Predictive power Low/Moderate Moderate/High

Yield Curve Dimensionality Reduction

Yield Curve Primer Bonds have a fixed maturity (1M, 3M, 10Y) and pay coupons Examples of bonds – treasury bonds, corporates, munis, etc. Yield Curve: Plot of bond yields against maturities Adjacent points on the yield curve move together (correlated)

U.S. Treasury Yield Curve 11 tenors/maturities Different shapes Pre-crisis Post-crisis Current

Yield Curve Dynamics Yield for each tenor (point on the yield curve) changes every day Problem: How to model the changes in the yield curve driven by 11 correlated variables? Any parsimonious representation possible?

Principal Component Analysis (PCA) PCA can be used to: Reduce dimensionality Retain as much variance in the dataset as possible PCA Factors: Linear combinations of features Typically 3-5 PCA factors enough to explain almost all the variance

PCA Over Different Time Periods PCA factors vary with time periods “Interval Selector” can be used to: Quickly select different time periods Perform statistical analysis on the selected time interval

Yield curve PCA: Crisis

Yield curve PCA: After Crisis

Yield curve PCA: Current

Dimensionality Reduction: Autoencoder linear relu Compressed feature vector

PCA vs. Autoencoder

Dimension Reduction: AE vs. PCA

Twitter Sentiment Analysis

News/Twitter Sentiment News & social sentiment from raw news stories or tweets Unstructured Highly time-sensitive Story-level sentiment Company-level sentiment Sentiment score can be used as a trading signal Buy stocks with positive sentiment Short stocks with negative sentiment

Russell 2000 Stocks

Twitter Sentiment Classification Task: Predict the sentiment (negative, neutral, positive) of a tweet for a company Ex: “$CTIC Rated strong buy by three WS analysts. Increased target from $5 to $8.” = Positive Three way classification problem Input: raw tweets Output: sentiment label ∑ {negative, neutral, positive}

Methodology We are given labeled training and test data sets Train classifier on training data set Predict labels on test data and evaluate performance

One vs. Rest Logistic Regression Features: Bag of words (uni/bi grams) + custom features Train three binary classifiers for each label Model 1: Negative vs. Not Negative Model 2: Positive vs. Not Positive Model 3: Neutral vs. Not Neutral Get probabilities (measures of confidence) for each label Output the label associated with the highest probability

Classifier Performance Analysis Look at misclassifications Confusion Matrix Understand model predicted probabilities Triangle visualization Fix data issues

Triangle Visualization Not sure Very positive Negative / Neutral Model returns 3 probabilities (which sum to 1) How can we visualize these 3 numbers? Points inside an equilateral triangle

Performance Analysis Dashboard Use the dashboard to: Analyze misclassifications (using confusion matrix) Improve model by adding more features (by looking at model coefficients) Fix data issues (using triangle and lasso)

Analyze Misclassifications

Analyze Misclassifications

Analyze Misclassifications

Use Lasso To Find Data Issues

Use Lasso To Find Data Issues

Conclusion Abundance of financial data Abundance of already existing quant models ML techniques can supplement existing models Deep learning techniques useful for ‘alternative’ datasets Interactive plots/diagnostic tools promote reproducible research