CRCV REU 2019 Week 4
Literature Review this week Attention Model/ Transformer Network ELMo OpenAI Transformer Decoder only Word prediction Use Massive Unlabeled Data (books) BERT Conditioned from left and right Encoder only Sequence to Sequence
Literature Review this week Neural-Symbolic Visual Question Answering (NS-VQA)
Baseline Models LSTM BiLSTM
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q LSTM 42.74% BiLSTM 42.48%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q LSTM 42.74% 42.71% BiLSTM 42.48% 42.67%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q V + Q LSTM 42.74% 42.71% 42.61% BiLSTM 42.48% 42.67%
Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q V + Q S + V + Q LSTM 42.74% 42.71% 42.61% 42.39% BiLSTM 42.48% 42.67% 42.84%
Next Steps Baseline CNN+LSTM (in Progress) Video Action Transformer Network Neuro-Symbolic Concept Learner (NS-CL)