Download presentation
Presentation is loading. Please wait.
Published byHolly Palmer Modified over 9 years ago
1
© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility Team Members: John Terzis Tim Wu Jimmy Zhong Oliver Zhou Nov 20, 2014
2
© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms Understanding volatility in financial markets has long been of interest to hedge and speculators. Empirical evidence has shown us that volatility is a highly nonlinear evolving process. Modeling this process using the Hadoop ecosystem can offer tremendous advantages over traditional econometric models that are limited to datasets which fit in main memory. 2 Motivation
3
© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 3 Dataset, Algorithm, Tools Dataset: We have procured a massive dataset of price quotes on equities, exchange traded futures, futures, and market indices over the span of the last ten to fifteen years at the one minute granularity level. In addition to price quotes on specific instruments, our dataset features derivative indicators of price and volume activity. Algorithm: We propose to train and test several scalable machine learning based regression models on our dataset with the goal of producing a functional form of future realized volatility at the symbol level that minimizes bias and variance and ultimately generalizes well to unforeseen data. Feature selection will be integral to the task given the likelihood that many of our input variables are highly correlated. We intend to build a framework on top of Apache Spark that can at a minimum perform an n-fold cross validation of a training model and use beam search or other established methods to calibrate the hyper-parameters of our SVM, random forest, or regularized regression model in a reasonably fast time frame given the algorithmic complexity of the underlying routines employed. Tools: Hadoop Apache Spark Mahout AWS Git R, Java, Python
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.