Public Opinion and Industry Effects on Financial & Tech Markets

Slides:

Advertisements

Similar presentations

Chapter 2 The Process of Experimentation

Advertisements

F-tests continued.

Animal, Plant & Soil Science

Decomposition Method.

Health Care Sector Matt Diffley Marc Travis. Recommendation Short- Term Short- Term Underweight compared to the S&P Underweight compared to the S&P Currently.

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Economics 173 Business Statistics Lecture 14 Fall, 2001 Professor J. Petry

2nd Day: Bear Example Length (in) Weight (lb)

Company LOGO Stock Price Forecasting with Support Vector Machines based on Web Financial Information Sentiment Analysis Run Cao School of Information Renmin.

1 Using Sector Valuations to Forecast Market Returns A Contrarian View February 27, 2003 Lewis Kaufman, CFA Cira Qin Justin Robert Shannon Thomas Vidhi.

Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:

Statement of Cash Flows What information? –Cash lifeblood of organization –If not generate enough – not meet obligations, not stay in business Interrelationships.

1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.

1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.

University of Missouri Southwind Finance Conference

SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.

Chapter 13: Inference in Regression

R&D-Intensity, Mispricing, and Stock Returns in Taiwan Stock Market.

PREDICTING STOCK PRICE USING HISTORICAL FINANCIAL INFORMATION WHY: Stock valuation is a function of many different variables: Actual profitability, perceived.

STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.

1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.

Determinants of Credit Default Swap Spread: Evidence from the Japanese Credit Derivative Market.

Chapter 10 Capital Markets and the Pricing of Risk.

DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.

Beyond the Efficient Frontier: Using a DFA Model to Derive the Cost of Capital CAS Special Interest Seminar The Insurance Market Dallas April 16, 2002.

The Demand for Home Equity Loans at Bank X* An MBA 555 Project Laura Brown Richard Brown Jason Vanderploeg *bank name withheld for proprietary reasons.

1 Everyday is a new beginning in life. Every moment is a time for self vigilance.

Correlation & Regression

Examining Relationships in Quantitative Research

1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.

Time Series Analysis and Forecasting

Correlation & Regression Chapter 5 Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have.

© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.

Financial Risk Management of Insurance Enterprises Measuring a Firm’s Exposure to Financial Price Risk.

 Title : Discussion of The Book-to-Price Effect in Stock Returns: Accounting for Leverage  Topic : Securities Valuation  Theory used by the article.

Contemporary Investments: Chapter 11 Chapter 11 ECONOMIC AND INDUSTRY ANALYSIS Why are economic and industry analyses important? How are investment decisions.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Multiple Logistic Regression STAT E-150 Statistical Methods.

Welcome to MM305 Unit 5 Seminar Prof Greg Forecasting.

Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.

Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.

Chapter 7 An Introduction to Portfolio Management.

Capital Asset Pricing Model (CAPM) Dr. BALAMURUGAN MUTHURAMAN Chapter

Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.

Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.

Applications of Stochastic Processes in Asset Price Modeling Preetam D’Souza.

Stats Methods at IC Lecture 3: Regression.

Correlation and Linear Regression

F-tests continued.

VIII Międzynarodowe Forum Finansowo-Bankowe

Inference for Least Squares Lines

Forecasting Methods Dr. T. T. Kachwala.

Capital Asset Pricing Model (CAPM)

CHAPTER ELEVEN FACTOR MODELS.

Correlation and Regression

Influencing the Adoption of

Student Investment Management Financial Sector

Techniques for Data Analysis Event Study

Cryptocurrencies: A Brief Look & Sentiment Analysis

Linear Model Selection and regularization

Evaluating Impacts: An Overview of Quantitative Methods

PCA of Waimea Wave Climate

Carrying out an Empirical Project

Biological Science Applications in Agriculture

Happiness and Stocks Ali Javed, Tim Stevens

Presentation transcript:

Public Opinion and Industry Effects on Financial & Tech Markets Daniel Swanson, Josh Baker, Zak Meyer, Alex Fuller Josh

Overview Methodology Open Close and Volume Treatment Data collection Tech industry results Financial Industry Results Public Opinion Sentiment analysis Tech industry Results Conclusion Josh

Methodology Background: Stock prices are determined by a company's financial performance and how that performance is interpreted by the public for future returns Theoretically individual stocks should have no relationship between each other. Question: What S&P 500 industries affect the daily open, close and volumes of PNC, Netflix, M&T Bank and FiserV? What is public opinions effect of PNC, Netflix M&T Bank and FiserV daily open, close and volume values? Zack

Methodology (cont.) Hypothesis: We believe there is identifiable relationships between our chosen companies. The industry relationships is strongly driven by the stock movement of companies in the same industry. These relationships are strongest, because companies in similar industries will face the same external economic pressures. The relationship between stock movement and public opinion of those stocks is minimal, because twitter is not instrumental in how investors chose to valuate stocks. Zack

Treatment Data collection Historical Stock Data for all S&P Fortune 500 Companies Data migration and cleaning Sentiment analysis Run a lasso model regressing PNC Open against rest of fortune 500 companies open Find variables that maximize goodness of fit measure Highest R^2 Repeat step 2 for Open, Close and Volume Repeat steps 2-3 for PNC, Netflix, M&T Bank, FiserV Extract variables into Excel Match companies with their respective industries Daniel

Treatment 7. Group explanatory variable based on their industry 8. Determine ratio of industry presence in determining Open, Close and Volume measurements of the four companies examined 9. Draw conclusions Daniel

Data Migration and Cleaning Initially using S&P 500 daily time series data over 5 years. Needed to be processed to fit 6 months with companies as attributes. SSMS The CSV needed to be formatted into a table that was useful for our comparison. SSIS The data from the CSVs needed to be imported. Excel VBA The data points from each company needed to be integrated into the template. Alex

Sentiment Analysis Used RapidMiner and Aylien/Twitter analysis Created composite numerical scores based on positive/negative/neutral classifications and number of retweets. Alex

Lasso Code > x=model.matrix(PNC~.,Stocks.Vol[,-1]) > y=PNC >set.seed(20) >train = sample(1:nrow(x), nrow(x)/2) >grid=10^seq(10,-5,length=100) >cv.out=cv.glmnet(x[train,],y[train],alpha=1) >bestlam=cv.out$lambda.min >bestlam >lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) >plot(lasso.mod) >lasso.mod=glmnet(x[train,],y[train],alpha=1, >lambda=bestlam) >lasso.mod$dev.ratio Zack x=model.matrix(PNC~.,Stocks.Vol[,-1]) y=PNC set.seed(20) train = sample(1:nrow(x), nrow(x)/2) grid=10^seq(10,-5,length=100) cv.out=cv.glmnet(x[train,],y[train],alpha=1) #plot(cv.out) bestlam=cv.out$lambda.min bestlam lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) plot(lasso.mod) lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=bestlam) lasso.mod$dev.ratio vol.pred=predict(lasso.mod,newx=x[-train,], s=bestlam) mean((Stocks.Vol$PNC[-train]-vol.pred)^2) CompanyNames=colnames(Stocks.Vol)[which(coef(lasso.mod, s = "lambda.min") != 0)] Coefficents=coef(lasso.mod, s= bestlam)[which(coef(lasso.mod, s = "lambda.min") != 0)] Vol.output = data.frame(CompanyNames, Coefficents) write.table(Vol.output, file = "VolumeCoefficents.csv", sep =",",row.names =FALSE, col.name=TRUE)

Test MSE > lasso.pred = predict(lasso.mod, s = BestLam, newx = x[-train,]) > testMSE = mean((lasso.pred - y[-train])^2) > testMSE Zack

PNC NFLX FISV MAT Daniel

PNC (Finance) Open: Deviation Ratio = 0.9965462 Test MSE = 68.01663 Close: Deviation Ratio = 0.984508 Test MSE = 2.731872 Volume: Deviation Ratio = 0.8225319 Test MSE = 1229754 -Healthcare largest pull, by a significant margin -The next largest is materials -Third largest utilities Josh

M&T Bank (Finance) Open: Deviation Ratio = .869518 Test MSE = 60.93874 Close: Deviation Ratio = 0.9832232 Test MSE = 41.59011 Volume: Deviation Ratio = 0.4312332 Test MSE = 496638.5 -Strongest pull again is healthcare although not as significant as with PNC -2nd largest pull was in energy -3rd largest materials Josh

Netflix (Technology) Open: Deviation Ratio = 0.9979236 Test MSE = 40.83112 Close: Deviation Ratio = 0.9984096 Test MSE = 4193.4451 Volume: Deviation Ratio = 0.8196637 Test MSE = 836245.1 -Largest pull was the healthcare industry -2nd largest pull was the industrial industry -3rd largest was IT Josh

FiserV (Technology) Open: Deviation Ratio = 0.9880575 Test MSE = 1.341478 Close: Deviation Ratio = 0.9844057 Test MSE = 0.9257086 Volume: Deviation Ratio = 0.7527897 Test MSE = 479136.3 -Largest pull again in healthcare but seems to have the smallest effect compared to the other companies we looked at -2nd largest was the IT industry -3rd largest was in energy Josh

Technology Industry Analysis Influenced by 8 different industries Heavily influenced by health care IT industry only makes up only around ~0% Alex

Financial Industry Analysis Influenced by 10 industries Heavily influenced by health care Finance only makes up ~1% of influence Alex

Discussion Our models seem to be poor predictors of the actual relationship between stocks. They are designed for inferencing and should be developed further for accurate predictive power. Our hypothesis is false, our companies stocks seem to be independent of each other and so cannot predict one another. However there is a strong relationship between healthcare and IT for both. It is interesting to note that the Agilent company had the largest coefficient in all the models, we are not sure why this relationship exists and requires further investigation. Daniel

Public Opinion PNC (Close): > attach(PNC.PO) > set.seed(10) > train = sample(nrow(PNC.PO), nrow(PNC.PO)/2) > lm.fit = lm(Close~Public_Opinion, PNC.PO[train,]) > summary(lm.fit)$r.squared > 0.02180934 > lm.pred = predict(lm.fit, PNC.PO[-train]) > mean((PNC.PO$Close - lm.pred)^2) > 15.83823 Zack

Public Opinion PNC Close: R^2: 0.02180934 Test MSE: 15.83823 Open: Volume: R^2: 0.01302961 Test MSE: 11.120887e+12 NFLX Close: R^2: 0.003654646 Test MSE: 180.734 Open: R^2: 0.004199047 Test MSE: 180.3568 Volume: R^2: 0.003109667 Test MSE: 2.017642e+13 FISV Close: R^2: 0.05058 Test MSE: 180.734 Open: R^2: 0.04241 Test MSE: 18.40573 Volume: R^2: 0.004542 Test MSE: 2.017642e+13 MAT Close: R^2: 0.02853486 Test MSE: 2.387104 Open: R^2: 0.0228784 Test MSE: 2.434558 Volume: R^2: 0.03343536 Test MSE: 1.764246e+13 Zack

Discussion Tech companies have a much larger perception on twitter The models exhibited low R^2 scores for all companies, showing for our data set there was little relation between the perception of companies in the tech and financial industries, on twitter, and their overall stock performance. High bias MSE score were very low indicating a predictive potential of the model Low variance Alex

Conclusion Based on our finding we built a model that is adept at finding the variables influential in determining the stocks open, close and volume values for NC, NFLX, FISV, and MAT. There is no identifiable relationship between a company's twitter perception and open, close and volume values. Due to this lack of relationship there is no actual predictive potential Alex

Going Forward Build prediction model Build more complete inferencing model Determine what is happening with Agilent Technologies Inc