Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Slides:



Advertisements
Similar presentations
Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Advertisements

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Dimensionality Reduction PCA -- SVD
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Hinrich Schütze and Christina Lioma
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
Intro to NLP - J. Eisner1 Words vs. Terms Taken from Jason Eisner’s NLP class slides:
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Review Dec, 2001 Workpackage 4 Image Analysis Algorithms Progress Update Dec Kirk Martinez, Paul Lewis, David Duplaw, Fazly Abbas, Faizal Fauzi,
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University.
Principal Component Analysis Principles and Application.
Chapter 5: Information Retrieval and Web Search
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Computer vision.
What’s Making That Sound ?
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
1 Chapter 5 Image Transforms. 2 Image Processing for Pattern Recognition Feature Extraction Acquisition Preprocessing Classification Post Processing Scaling.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Chapter 6: Information Retrieval and Web Search
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
SINGULAR VALUE DECOMPOSITION (SVD)
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Text Clustering Hongning Wang
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Natural Language Processing Topics in Information Retrieval August, 2002.
Instructor: Mircea Nicolescu Lecture 7
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Document Clustering Based on Non-negative Matrix Factorization
Vector-Space (Distributional) Lexical Semantics
Multimedia Information Retrieval
Representation of documents and queries
Chapter 3 – Combinational Logic Design
Retrieval Utilities Relevance feedback Clustering
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent Semantic Analysis
Presentation transcript:

Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Problem Statement Problem How do we browse video? Goal Create a table-of-contents Solution Look for topic changes in text

TOC Example Chapter 1 Chapter 2

Overview of This Talk Goal and approach Latent semantic indexing (LSI) Scale space Combination Results LSI Scale Space Filter Segment

Approach Sentences -> Semantic Space Filter at multiple scales Look for large jumps Three subjects (loops) shown Loop 1: Polychromaticity Artifacts Loop 2: Emission Tomography Loop 3: Ultrasound Tomography

Courtesy of Jianbo Shi (CMU) Building on Previous Work LSI and clustering Text tiling Change point analysis Segmentation Scale space

Latent Semantic Indexing Collect histogram of word frequencies Use SVD to capture frequent combinations Orthogonal decomposition Represent in low-dimensional space Words Docs 10D

LSI Within a Document Split into chunks Fixed size Sentences Compute histograms Perform SVD Look at results Sources “ Principles of Computerized Tomographic Imaging ” PBS News Hour

LSI – 2D Projection Chapter 4 of Principles of Computerized Tomographic Imaging

LSI – Self-similarity Measure similarity Cosine of angle between “ documents ” Plot all pairs of chunks/sentences Look for block diagonal Chapter 4 of Principles of Computerized Tomographic Imaging

Scale-space Filtering What size are the features? Look at different scales! Continuous scale Used for Object Recognition Feature Detection

Scale-space Movie Green line marks best high-level segmentation 10d semantic space Scale varies from 1 to 400 sentences

Scale-space Segmentation Low pass filter signal Form image of scale vs. time Look for changes Track peaks of vector derivative across scale

Scale-space Example Derivative as function of scale and sentence

LSI and Scale Space Putting it all together Split document/transcript Perform LSI analysis Look at change in angle Perform scale-space segmentation Show tree

Scale-Space Image Peaks in scale- space derivative Peaks traced to their origin

Results – CT Comparison Scale-Space Book Headings

Results – News Comparison Scale-Space Ground Truth

Results – Autocorrelation Block sentences Measure correlation Positive Peak Anti- correlation

Discussion Issues Evaluation (and ground truth) Lafferty ’ s measure Temporal properties Histogram/SVD chunking size Autocorrelation

Computational Effort Histogram: O(N) SVD: O(N 3 ) Scale space: O(N 2 ) N < 1000 Number of sentences in a video or document is not large

LSI Document Lookup Histogram documents Entropy term weighting Compute SVD Use first vectors to model space Encode query as histogram Look for documents in similar direction

LSI Example Collection of book titles Differential equations vs. algorithms and applications