Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center
Problem Statement Problem How do we browse video? Goal Create a table-of-contents Solution Look for topic changes in text
TOC Example Chapter 1 Chapter 2
Overview of This Talk Goal and approach Latent semantic indexing (LSI) Scale space Combination Results LSI Scale Space Filter Segment
Approach Sentences -> Semantic Space Filter at multiple scales Look for large jumps Three subjects (loops) shown Loop 1: Polychromaticity Artifacts Loop 2: Emission Tomography Loop 3: Ultrasound Tomography
Courtesy of Jianbo Shi (CMU) Building on Previous Work LSI and clustering Text tiling Change point analysis Segmentation Scale space
Latent Semantic Indexing Collect histogram of word frequencies Use SVD to capture frequent combinations Orthogonal decomposition Represent in low-dimensional space Words Docs 10D
LSI Within a Document Split into chunks Fixed size Sentences Compute histograms Perform SVD Look at results Sources “ Principles of Computerized Tomographic Imaging ” PBS News Hour
LSI – 2D Projection Chapter 4 of Principles of Computerized Tomographic Imaging
LSI – Self-similarity Measure similarity Cosine of angle between “ documents ” Plot all pairs of chunks/sentences Look for block diagonal Chapter 4 of Principles of Computerized Tomographic Imaging
Scale-space Filtering What size are the features? Look at different scales! Continuous scale Used for Object Recognition Feature Detection
Scale-space Movie Green line marks best high-level segmentation 10d semantic space Scale varies from 1 to 400 sentences
Scale-space Segmentation Low pass filter signal Form image of scale vs. time Look for changes Track peaks of vector derivative across scale
Scale-space Example Derivative as function of scale and sentence
LSI and Scale Space Putting it all together Split document/transcript Perform LSI analysis Look at change in angle Perform scale-space segmentation Show tree
Scale-Space Image Peaks in scale- space derivative Peaks traced to their origin
Results – CT Comparison Scale-Space Book Headings
Results – News Comparison Scale-Space Ground Truth
Results – Autocorrelation Block sentences Measure correlation Positive Peak Anti- correlation
Discussion Issues Evaluation (and ground truth) Lafferty ’ s measure Temporal properties Histogram/SVD chunking size Autocorrelation
Computational Effort Histogram: O(N) SVD: O(N 3 ) Scale space: O(N 2 ) N < 1000 Number of sentences in a video or document is not large
LSI Document Lookup Histogram documents Entropy term weighting Compute SVD Use first vectors to model space Encode query as histogram Look for documents in similar direction
LSI Example Collection of book titles Differential equations vs. algorithms and applications