Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian
Setup – SVM Text Regression Output: Future Log Return Volatility, where log returns = ln(P(t+1)/P(t)) Baseline: Historical Volatility – i.e. volatility from previous quarter Predictors/Transcript + Summary Features: N-gram TF/IDF scores, POS tags, Target word frequency
Setup TF/IDF Scores: give weight to transcript specific words - need to ignore analyst names POS Tags: Is higher sentence adjective frequency indicative of risk/uncertainty? Target Words: Handpick words we give more weight to, e.g. risk, uncertain, variable, decline, etc. Summaries: Do we really need the whole text? Why not just extract most informative sentences
Experiments Vary: Prediction time frame i.e. look-back and look- forward periods by 20, 40, and 60 days. Train set size from 25 to 386 transcripts. NLP Models between Unigram with TF/IDF Bigram and Unigram with TF/IDF Unigram and Bigram with Sentence Summaries
Results ……………………………………………………….
Results Bigram and Unigram Models Improve over Historical Baseline POS Tags (JJ) and Target Words also slightly lower MSE Regression on summarized text does not help but the generated summaries were informative
Future Work More Data! Different regression model (CART?) Q/A Based Summarization Feature Selection Better heuristic for choosing bigrams SO-PMI scores with target words