Public Opinion and Industry Effects on Financial & Tech Markets Daniel Swanson, Josh Baker, Zak Meyer, Alex Fuller Josh
Overview Methodology Open Close and Volume Treatment Data collection Tech industry results Financial Industry Results Public Opinion Sentiment analysis Tech industry Results Conclusion Josh
Methodology Background: Stock prices are determined by a company's financial performance and how that performance is interpreted by the public for future returns Theoretically individual stocks should have no relationship between each other. Question: What S&P 500 industries affect the daily open, close and volumes of PNC, Netflix, M&T Bank and FiserV? What is public opinions effect of PNC, Netflix M&T Bank and FiserV daily open, close and volume values? Zack
Methodology (cont.) Hypothesis: We believe there is identifiable relationships between our chosen companies. The industry relationships is strongly driven by the stock movement of companies in the same industry. These relationships are strongest, because companies in similar industries will face the same external economic pressures. The relationship between stock movement and public opinion of those stocks is minimal, because twitter is not instrumental in how investors chose to valuate stocks. Zack
Treatment Data collection Historical Stock Data for all S&P Fortune 500 Companies Data migration and cleaning Sentiment analysis Run a lasso model regressing PNC Open against rest of fortune 500 companies open Find variables that maximize goodness of fit measure Highest R^2 Repeat step 2 for Open, Close and Volume Repeat steps 2-3 for PNC, Netflix, M&T Bank, FiserV Extract variables into Excel Match companies with their respective industries Daniel
Treatment 7. Group explanatory variable based on their industry 8. Determine ratio of industry presence in determining Open, Close and Volume measurements of the four companies examined 9. Draw conclusions Daniel
Data Migration and Cleaning Initially using S&P 500 daily time series data over 5 years. Needed to be processed to fit 6 months with companies as attributes. SSMS The CSV needed to be formatted into a table that was useful for our comparison. SSIS The data from the CSVs needed to be imported. Excel VBA The data points from each company needed to be integrated into the template. Alex
Sentiment Analysis Used RapidMiner and Aylien/Twitter analysis Created composite numerical scores based on positive/negative/neutral classifications and number of retweets. Alex
Lasso Code > x=model.matrix(PNC~.,Stocks.Vol[,-1]) > y=PNC >set.seed(20) >train = sample(1:nrow(x), nrow(x)/2) >grid=10^seq(10,-5,length=100) >cv.out=cv.glmnet(x[train,],y[train],alpha=1) >bestlam=cv.out$lambda.min >bestlam >lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) >plot(lasso.mod) >lasso.mod=glmnet(x[train,],y[train],alpha=1, >lambda=bestlam) >lasso.mod$dev.ratio Zack x=model.matrix(PNC~.,Stocks.Vol[,-1]) y=PNC set.seed(20) train = sample(1:nrow(x), nrow(x)/2) grid=10^seq(10,-5,length=100) cv.out=cv.glmnet(x[train,],y[train],alpha=1) #plot(cv.out) bestlam=cv.out$lambda.min bestlam lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) plot(lasso.mod) lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=bestlam) lasso.mod$dev.ratio vol.pred=predict(lasso.mod,newx=x[-train,], s=bestlam) mean((Stocks.Vol$PNC[-train]-vol.pred)^2) CompanyNames=colnames(Stocks.Vol)[which(coef(lasso.mod, s = "lambda.min") != 0)] Coefficents=coef(lasso.mod, s= bestlam)[which(coef(lasso.mod, s = "lambda.min") != 0)] Vol.output = data.frame(CompanyNames, Coefficents) write.table(Vol.output, file = "VolumeCoefficents.csv", sep =",",row.names =FALSE, col.name=TRUE)
Test MSE > lasso.pred = predict(lasso.mod, s = BestLam, newx = x[-train,]) > testMSE = mean((lasso.pred - y[-train])^2) > testMSE Zack
PNC NFLX FISV MAT Daniel
PNC (Finance) Open: Deviation Ratio = 0.9965462 Test MSE = 68.01663 Close: Deviation Ratio = 0.984508 Test MSE = 2.731872 Volume: Deviation Ratio = 0.8225319 Test MSE = 1229754 -Healthcare largest pull, by a significant margin -The next largest is materials -Third largest utilities Josh
M&T Bank (Finance) Open: Deviation Ratio = .869518 Test MSE = 60.93874 Close: Deviation Ratio = 0.9832232 Test MSE = 41.59011 Volume: Deviation Ratio = 0.4312332 Test MSE = 496638.5 -Strongest pull again is healthcare although not as significant as with PNC -2nd largest pull was in energy -3rd largest materials Josh
Netflix (Technology) Open: Deviation Ratio = 0.9979236 Test MSE = 40.83112 Close: Deviation Ratio = 0.9984096 Test MSE = 4193.4451 Volume: Deviation Ratio = 0.8196637 Test MSE = 836245.1 -Largest pull was the healthcare industry -2nd largest pull was the industrial industry -3rd largest was IT Josh
FiserV (Technology) Open: Deviation Ratio = 0.9880575 Test MSE = 1.341478 Close: Deviation Ratio = 0.9844057 Test MSE = 0.9257086 Volume: Deviation Ratio = 0.7527897 Test MSE = 479136.3 -Largest pull again in healthcare but seems to have the smallest effect compared to the other companies we looked at -2nd largest was the IT industry -3rd largest was in energy Josh
Technology Industry Analysis Influenced by 8 different industries Heavily influenced by health care IT industry only makes up only around ~0% Alex
Financial Industry Analysis Influenced by 10 industries Heavily influenced by health care Finance only makes up ~1% of influence Alex
Discussion Our models seem to be poor predictors of the actual relationship between stocks. They are designed for inferencing and should be developed further for accurate predictive power. Our hypothesis is false, our companies stocks seem to be independent of each other and so cannot predict one another. However there is a strong relationship between healthcare and IT for both. It is interesting to note that the Agilent company had the largest coefficient in all the models, we are not sure why this relationship exists and requires further investigation. Daniel
Public Opinion PNC (Close): > attach(PNC.PO) > set.seed(10) > train = sample(nrow(PNC.PO), nrow(PNC.PO)/2) > lm.fit = lm(Close~Public_Opinion, PNC.PO[train,]) > summary(lm.fit)$r.squared > 0.02180934 > lm.pred = predict(lm.fit, PNC.PO[-train]) > mean((PNC.PO$Close - lm.pred)^2) > 15.83823 Zack
Public Opinion PNC Close: R^2: 0.02180934 Test MSE: 15.83823 Open: Volume: R^2: 0.01302961 Test MSE: 11.120887e+12 NFLX Close: R^2: 0.003654646 Test MSE: 180.734 Open: R^2: 0.004199047 Test MSE: 180.3568 Volume: R^2: 0.003109667 Test MSE: 2.017642e+13 FISV Close: R^2: 0.05058 Test MSE: 180.734 Open: R^2: 0.04241 Test MSE: 18.40573 Volume: R^2: 0.004542 Test MSE: 2.017642e+13 MAT Close: R^2: 0.02853486 Test MSE: 2.387104 Open: R^2: 0.0228784 Test MSE: 2.434558 Volume: R^2: 0.03343536 Test MSE: 1.764246e+13 Zack
Discussion Tech companies have a much larger perception on twitter The models exhibited low R^2 scores for all companies, showing for our data set there was little relation between the perception of companies in the tech and financial industries, on twitter, and their overall stock performance. High bias MSE score were very low indicating a predictive potential of the model Low variance Alex
Conclusion Based on our finding we built a model that is adept at finding the variables influential in determining the stocks open, close and volume values for NC, NFLX, FISV, and MAT. There is no identifiable relationship between a company's twitter perception and open, close and volume values. Due to this lack of relationship there is no actual predictive potential Alex
Going Forward Build prediction model Build more complete inferencing model Determine what is happening with Agilent Technologies Inc