Download presentation
Presentation is loading. Please wait.
Published byVerawati Budiaman Modified over 6 years ago
1
Public Opinion and Industry Effects on Financial & Tech Markets
Daniel Swanson, Josh Baker, Zak Meyer, Alex Fuller Josh
2
Overview Methodology Open Close and Volume Treatment Data collection
Tech industry results Financial Industry Results Public Opinion Sentiment analysis Tech industry Results Conclusion Josh
3
Methodology Background:
Stock prices are determined by a company's financial performance and how that performance is interpreted by the public for future returns Theoretically individual stocks should have no relationship between each other. Question: What S&P 500 industries affect the daily open, close and volumes of PNC, Netflix, M&T Bank and FiserV? What is public opinions effect of PNC, Netflix M&T Bank and FiserV daily open, close and volume values? Zack
4
Methodology (cont.) Hypothesis:
We believe there is identifiable relationships between our chosen companies. The industry relationships is strongly driven by the stock movement of companies in the same industry. These relationships are strongest, because companies in similar industries will face the same external economic pressures. The relationship between stock movement and public opinion of those stocks is minimal, because twitter is not instrumental in how investors chose to valuate stocks. Zack
5
Treatment Data collection
Historical Stock Data for all S&P Fortune 500 Companies Data migration and cleaning Sentiment analysis Run a lasso model regressing PNC Open against rest of fortune 500 companies open Find variables that maximize goodness of fit measure Highest R^2 Repeat step 2 for Open, Close and Volume Repeat steps 2-3 for PNC, Netflix, M&T Bank, FiserV Extract variables into Excel Match companies with their respective industries Daniel
6
Treatment 7. Group explanatory variable based on their industry
8. Determine ratio of industry presence in determining Open, Close and Volume measurements of the four companies examined 9. Draw conclusions Daniel
7
Data Migration and Cleaning
Initially using S&P 500 daily time series data over 5 years. Needed to be processed to fit 6 months with companies as attributes. SSMS The CSV needed to be formatted into a table that was useful for our comparison. SSIS The data from the CSVs needed to be imported. Excel VBA The data points from each company needed to be integrated into the template. Alex
8
Sentiment Analysis Used RapidMiner and Aylien/Twitter analysis
Created composite numerical scores based on positive/negative/neutral classifications and number of retweets. Alex
9
Lasso Code > x=model.matrix(PNC~.,Stocks.Vol[,-1]) > y=PNC
>set.seed(20) >train = sample(1:nrow(x), nrow(x)/2) >grid=10^seq(10,-5,length=100) >cv.out=cv.glmnet(x[train,],y[train],alpha=1) >bestlam=cv.out$lambda.min >bestlam >lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) >plot(lasso.mod) >lasso.mod=glmnet(x[train,],y[train],alpha=1, >lambda=bestlam) >lasso.mod$dev.ratio Zack x=model.matrix(PNC~.,Stocks.Vol[,-1]) y=PNC set.seed(20) train = sample(1:nrow(x), nrow(x)/2) grid=10^seq(10,-5,length=100) cv.out=cv.glmnet(x[train,],y[train],alpha=1) #plot(cv.out) bestlam=cv.out$lambda.min bestlam lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=grid) plot(lasso.mod) lasso.mod=glmnet(x[train,],y[train],alpha=1, lambda=bestlam) lasso.mod$dev.ratio vol.pred=predict(lasso.mod,newx=x[-train,], s=bestlam) mean((Stocks.Vol$PNC[-train]-vol.pred)^2) CompanyNames=colnames(Stocks.Vol)[which(coef(lasso.mod, s = "lambda.min") != 0)] Coefficents=coef(lasso.mod, s= bestlam)[which(coef(lasso.mod, s = "lambda.min") != 0)] Vol.output = data.frame(CompanyNames, Coefficents) write.table(Vol.output, file = "VolumeCoefficents.csv", sep =",",row.names =FALSE, col.name=TRUE)
10
Test MSE > lasso.pred = predict(lasso.mod, s = BestLam, newx = x[-train,]) > testMSE = mean((lasso.pred - y[-train])^2) > testMSE Zack
11
PNC NFLX FISV MAT Daniel
12
PNC (Finance) Open: Deviation Ratio = 0.9965462 Test MSE = 68.01663
Close: Deviation Ratio = Test MSE = Volume: Deviation Ratio = Test MSE = -Healthcare largest pull, by a significant margin -The next largest is materials -Third largest utilities Josh
13
M&T Bank (Finance) Open: Deviation Ratio = .869518 Test MSE = 60.93874
Close: Deviation Ratio = Test MSE = Volume: Deviation Ratio = Test MSE = -Strongest pull again is healthcare although not as significant as with PNC -2nd largest pull was in energy -3rd largest materials Josh
14
Netflix (Technology) Open: Deviation Ratio = 0.9979236
Test MSE = Close: Deviation Ratio = Test MSE = Volume: Deviation Ratio = Test MSE = -Largest pull was the healthcare industry -2nd largest pull was the industrial industry -3rd largest was IT Josh
15
FiserV (Technology) Open: Deviation Ratio = 0.9880575
Test MSE = Close: Deviation Ratio = Test MSE = Volume: Deviation Ratio = Test MSE = -Largest pull again in healthcare but seems to have the smallest effect compared to the other companies we looked at -2nd largest was the IT industry -3rd largest was in energy Josh
16
Technology Industry Analysis
Influenced by 8 different industries Heavily influenced by health care IT industry only makes up only around ~0% Alex
17
Financial Industry Analysis
Influenced by 10 industries Heavily influenced by health care Finance only makes up ~1% of influence Alex
18
Discussion Our models seem to be poor predictors of the actual relationship between stocks. They are designed for inferencing and should be developed further for accurate predictive power. Our hypothesis is false, our companies stocks seem to be independent of each other and so cannot predict one another. However there is a strong relationship between healthcare and IT for both. It is interesting to note that the Agilent company had the largest coefficient in all the models, we are not sure why this relationship exists and requires further investigation. Daniel
19
Public Opinion PNC (Close): > attach(PNC.PO) > set.seed(10)
> train = sample(nrow(PNC.PO), nrow(PNC.PO)/2) > lm.fit = lm(Close~Public_Opinion, PNC.PO[train,]) > summary(lm.fit)$r.squared > > lm.pred = predict(lm.fit, PNC.PO[-train]) > mean((PNC.PO$Close - lm.pred)^2) > Zack
20
Public Opinion PNC Close: R^2: 0.02180934 Test MSE: 15.83823 Open:
Volume: R^2: Test MSE: e+12 NFLX Close: R^2: Test MSE: Open: R^2: Test MSE: Volume: R^2: Test MSE: e+13 FISV Close: R^2: Test MSE: Open: R^2: Test MSE: Volume: R^2: Test MSE: e+13 MAT Close: R^2: Test MSE: Open: R^2: Test MSE: Volume: R^2: Test MSE: e+13 Zack
21
Discussion Tech companies have a much larger perception on twitter
The models exhibited low R^2 scores for all companies, showing for our data set there was little relation between the perception of companies in the tech and financial industries, on twitter, and their overall stock performance. High bias MSE score were very low indicating a predictive potential of the model Low variance Alex
22
Conclusion Based on our finding we built a model that is adept at finding the variables influential in determining the stocks open, close and volume values for NC, NFLX, FISV, and MAT. There is no identifiable relationship between a company's twitter perception and open, close and volume values. Due to this lack of relationship there is no actual predictive potential Alex
23
Going Forward Build prediction model
Build more complete inferencing model Determine what is happening with Agilent Technologies Inc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.