Multimodal Alignment of Scholarly Documents and Their Presentations Bamdad Bahrani JCDL 2013 Submission Feb 2013
Motivation 0 How many papers do you read every week? 0 How many you read deeply? 0 How many you just skim? 0 Title, abstract and conclusion Enough? 0 A summary of the paper Most important issues IntroductionAnalysisMethodExperiment & ResultConclusion 2
Motivation 0 Slide Presentation as a summary 0 It includes important contents from paper 0 It is made by the same author 0 But 0 Not detailed enough 0 Misses some technical parts of the paper IntroductionAnalysisMethodExperiment & ResultConclusion 3
Introduction 0 The Paper 0 and its Slide Presentation 0 Alignment map IntroductionAnalysisMethodExperiment & ResultConclusion 4
Previous Works 0 Hayama et al Japanese technical papers and presentation sheets 0 Using HMM 0 Kan SlideSeer 0 Crawling of paper-presentation pairs, aligning them and GUI 0 Beamer and Girju Detailed analysis of different similarity measures 5 IntroductionAnalysisMethodExperiment & ResultConclusion Only Textual Content
Slide Analysis IntroductionAnalysisMethodExperiment & ResultConclusion 6
Error Analysis Slide TypeIncorrectly aligned in baseline Common reason Nil64%Doesn’t know where to align align to best fit Outline36%Name of some sections in it align to longest one Image81%Very little text available Drawing53%Noisy data: lots of shapes and text boxes Table50%Little text, noisy data Around 70% are showing “Evaluation and Result” IntroductionAnalysisMethodExperiment & ResultConclusion 7
Alignment Modals 0 Text Similarity 0 Between each slide and each section 0 The core aligner unit 0 The baseline 0 A cosine similarity measure: TF. IDF 0 Linear Ordering 0 Ordering between slides and sections are monotonic 0 Visual appearance of slides MotivationAnalysisMethodExperiment & ResultDiscussion 8
Text Extraction Unit 0 Presentation 0 Paper MS PowerPoint VB compiler Slides 1.Slide Title text 2.Slide Body text 3.Slide Number PDFx PDF Parser (via Python) XML 1.Section Title 2.Section Body IntroductionAnalysisMethodExperiment & ResultConclusion 9
Slide Image Classifier Unit Take Snapshot Slides 1. Text 2. Outline 3. Drawing 4. Results Image Classifier Image IntroductionAnalysisMethodExperiment & ResultConclusion 10
Image Class Instructions 0 1. Text 0 Text similarity alignment weight Increase 2/ Outline 0 Text similarity alignment weight Decrease 1/3 0 Linear ordering alignment weight Decrease 1/ Drawing 0 Uniform probability for all weights 0 4. Result 0 Exceptional rule: Align directly to “Experiment and Result” section IntroductionAnalysisMethodExperiment & ResultConclusion 11
Image Classifier experiment and result Manually annotated slides 0 Linear SVM 0 Feature extraction: Histogram of Oriented Gradiants 0 Blurring filters 0 Normalization 0 10 fold cross validation Image ClassTextOutlineDrawingResultAverage Correctly Classified 86%95%83%84%87.2% IntroductionAnalysisMethodExperiment & ResultConclusion 12
Experiments 0 Experiment 1: 0 Baseline 0 Paragraph-to-slide alignment 0 Only textual data 0 Experiment 2: 0 Section-to-slide alignment 0 Only textual data IntroductionAnalysisMethodExperiment & ResultConclusion 13
Experiments 0 Experiment 3: 0 The effect of Linear Ordering alignment was added. 0 Textual data and ordering information 0 Experiment 4: 0 The effect of Image Classification was added. 0 Textual data, ordering information and visual content IntroductionAnalysisMethodExperiment & ResultConclusion 14
Results BaselineSection Ordering Image Class IntroductionAnalysisMethodExperiment & ResultConclusion 15 25%
Conclusion 0 Many slides with images and drawings 0 Textual data is not enough 0 Taking advantage of graphical features of slides IntroductionAnalysisMethodExperiment & ResultConclusion 16
Future Tasks 0 Bigger dataset 0 More efficient text similarity measures 0 Differentiate between Title and Body text weights 0 Support more input file format 0 A GUI to view aligned documents IntroductionAnalysisMethodExperiment & ResultConclusion 17
Thank you…! 18 IntroductionAnalysisMethodExperiment & ResultConclusion
System Architcture Input: Presentation Text Extraction Textual Similarity Input: Document nil Linear Ordering 1. Text 3. Drawing 2. Index 4. Results Multimodal Fusion Slide Image Classifier Output: Alignment 19