© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility.

Slides:

Advertisements

Similar presentations

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Quantitative Research and Analytics, Proprietary and Confidential1 Ryan Michaluk

Applications of Stochastic Processes in Asset Price Modeling Preetam D’Souza.

Quantsmile: Quantitative Portfolio Management Quantsmile: Quantitative Portfolio Management.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.

1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.

Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Lectures , & : International Asset Portfolios Galina A Schwartz Department of Finance University of Michigan Business School.

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility.

Rene A. Carmona Bendheim Center for Finance Department of Operations Research & Financial Engineering Princeton University Portfolio Risk in the Electricity.

CS525: Special Topics in DBs Large-Scale Data Management

Ensemble Learning (2), Tree and Forest

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.

“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.

What Moves the Bond Market? Fleming and Remolona.

Academy of Economic Studies DOCTORAL SCHOOL OF FINANCE AND BANKING Bucharest 2003 Long Memory in Volatility on the Romanian Stock Market Msc Student: Gabriel.

Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.

LECTURE 1 - SCOPE, OBJECTIVES AND METHODS OF DISCIPLINE "ECONOMETRICS"

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Lecture # Introduction. The Nature of Derivatives 1.2 A derivative is an instrument whose value depends on the values of other more basic underlying.

Hadoop + Mahout Anton Slutsky, Lead Data Scientist, EPAM Systems

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

Robert Engle UCSD and NYU and Robert F. Engle, Econometric Services DYNAMIC CONDITIONAL CORRELATIONS.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

CORRECTIONS L2 regularization ||w|| 2 2, not ||w|| 2 Show second derivative is positive or negative on exams, or show convex – Latter is easier (e.g. x.

Apache Mahout Qiaodi Zhuang Xijing Zhang.

PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.

CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. Chapter 8 Foreign Currency Derivatives.

Applications of Stochastic Processes in Asset Price Modeling Preetam D’Souza.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

1 Lecture Plan Modelling Profit Distribution from Wind Production (Excel Case: Danish Wind Production and Spot Prices) Reasons for copula.

By: David Johnston, James Mataras, Jesse Pirnat, Daniel Sanchez, Eric Shaw, Sean Vazquez, Brad Warren Stevens Institute of Technology Department of Quantitative.

Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:

Microsoft Ignite /28/2017 6:07 PM

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS

Big Data Analytics and HPC Platforms

FINANCIAL DERIVATIVES

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Big Data is a Big Deal!.

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

ANOMALY DETECTION FRAMEWORK FOR BIG DATA

Boosting and Additive Trees (2)

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Data Platform and Analytics Foundational Training

Basic machine learning background with Python scikit-learn

Big Data Econometrics: Nowcasting and Early Estimates

Hyperparameters, bias-variance tradeoff, validation

XtremeData on the Microsoft Azure Cloud Platform:

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

CSE 491/891 Lecture 25 (Mahout).

Classification Breakdown

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Positive analysis in public finance

Predicting Loan Defaults

H2O is used by more than 14,000 companies

Shih-Yang Su Virginia Tech

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

Presentation transcript:

© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility Team Members: John Terzis Tim Wu Jimmy Zhong Oliver Zhou Nov 20, 2014

© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms Understanding volatility in financial markets has long been of interest to hedge and speculators. Empirical evidence has shown us that volatility is a highly nonlinear evolving process. Modeling this process using the Hadoop ecosystem can offer tremendous advantages over traditional econometric models that are limited to datasets which fit in main memory. 2 Motivation

© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 3 Dataset, Algorithm, Tools Dataset: We have procured a massive dataset of price quotes on equities, exchange traded futures, futures, and market indices over the span of the last ten to fifteen years at the one minute granularity level. In addition to price quotes on specific instruments, our dataset features derivative indicators of price and volume activity. Algorithm: We propose to train and test several scalable machine learning based regression models on our dataset with the goal of producing a functional form of future realized volatility at the symbol level that minimizes bias and variance and ultimately generalizes well to unforeseen data. Feature selection will be integral to the task given the likelihood that many of our input variables are highly correlated. We intend to build a framework on top of Apache Spark that can at a minimum perform an n-fold cross validation of a training model and use beam search or other established methods to calibrate the hyper-parameters of our SVM, random forest, or regularized regression model in a reasonably fast time frame given the algorithmic complexity of the underlying routines employed. Tools: Hadoop Apache Spark Mahout AWS Git R, Java, Python