Download presentation
Presentation is loading. Please wait.
Published bySilvester Grant Modified over 8 years ago
1
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B-530 1 Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others
2
Object Bag of ‘words’ 2
3
independent local features extraction B AG OF W ORDS R EPRESENTATION 3
4
D EFINITION OF “B O W” independent local features extraction histogram representation 4
5
T YPES OF F EATURES Regular grid 5
6
T YPES OF F EATURES Regular grid Interest point detector 6
7
E XAMPLE : INTEREST POINT + SIFT Normalize patch Detect patches Compute SIFT descriptor [Lowe’99] 7 …
8
C ODEWORDS DICTIONARY FORMATION … 8
9
Vector quantization … 9
10
Codewords dictionary formation 10
11
Image patch examples of codewords 11
12
Image representation ….. frequency codewords 12
13
A NOTHER EXAMPLE Dictionary Histogram Visual wordsInterest regions 2 20 4 4 15 35 18 39 21 10 61 2 3 4 1 13
14
w N d z D “face” First try - pLSA 14
15
A NALYZING IMAGES WITH TOPIC MODELS Given a corpus: Extract features, then generate dictionary with K- means clustering Determine the topic vectors which are common to all images using pLSA or LDA Learn the mixture coefficients (of topics) for each document (pLSA) or estimate the hyperparameters α and β (LDA) Classification of unseen images with pLSA - topics on one set of images are used to determine the topics in the novel set An image is classified according to the topic with maximal probability (or weight) 15
16
4 categories: faces, motorbikes, airplanes, cars E XAMPLE 1: SCENE RECOGNITION WITH P LSA (S IVIC ET AL, 2005) 16 most common visual words
17
W ORD AND TOPIC DISTRIBUTION 17
18
CategoriespLSALDAKmeans UB: Motorbikes(349) + Airplanes(263) 100%99%91% UB: Faces (435) + Motorbikes (349) + Airplanes(263) 100%96%94% Faces(435) + Motorbikes(800) + Airplanes(800) 97%96%91% Faces(435) + Motorbikes(800) + Airplanes(800) + Car Rears(1155) 98%87%72% Faces(435) + Motorbikes(800) + Airplanes(800) + Car Rears(1155) + Background(1370) 78%77%73% Faces(435) + Motorbikes(800) + Airplanes(800) + Car Rears(1155) + Background(1370), K = 6 76%-- Faces(435) + Motorbikes(800) + Airplanes(800) + Car Rears(1155) + Background(1370), K = 7 83%-- Faces + Motorbikes + Airplanes + Car Rears + Leoperds + Watch(241) + Ketch(114) + Background(1370) 59%64%47% 18
19
Example 2: scene recognition with LDA (Fei Fei & Perona, 2005) 19
20
M ETHOD AND RESULTS Compute LDA model for each category of images Label new images based on maximum likelihood 20
21
D YNAMIC TOPIC MODEL (S HALIT ET AL, 2013) 21
22
D ETECTING UNUSUAL EVENTS Video surveillance – millions of cameras, endless streaming, but who’s watching? 22
23
● Track high level activities (cars, people) ● Represent activities as `bags of words' ● Model typical behaviours using LDA ● Identify atypical events with low probability M ETHOD 23
24
● Compute displacement vector ● Bin into one of 25 quantization bins ● Consider transition between one bin to another as a word (25 * 25 = 625 vocabulary words) ● `Bag of words' representation T UBE TRAJECTORY REPRESENTATION 24
25
25
26
T RACKING RESULTS 26
27
Training and test videos are each an hour long, of an urban street intersection Each hour contributed ~1000 tubes We set k, the number of latent topics to be 8 E XPERIMENTAL R ESULTS 27
28
Data Training and test videos are each an hour long, of an urban street intersection Each hour contributed ~1000 tubes We set k, the number of latent topics to be 8 Learned topics: cars going left to right cars going right to left people going left to right Complex dynamics: turning into top street E XPERIMENTAL R ESULTS 28
29
R ESULTS – L EARNED TOPICS Cars going left to right, or right to left 29
30
R ESULTS – L EARNED TOPICS People walking left to right, or right to left 30
31
R ESULTS – LOW PROBABILITY EVENTS 31
32
N EXT STEP – BEYOND BAG OF WORDS Tracking is not always so simple… Representation with bag of discrete words is not always appropriate for images 32
33
B AG O F W ORDS IN C OMPUTER V ISION G ENERAL APPROACH 33
34
B AG O F W ORDS IN C OMPUTER V ISION G ENERAL APPROACH 34
35
1 2 1 1 0 B AG O F W ORDS IN C OMPUTER V ISION G ENERAL APPROACH 35
36
E XPERIMENT – UCSD PED 2 H ARD A SSIGNMENT 16 training videos containing only pedestrians, and 12 test videos containing also abnormal events. Each event’s support is of size 24x24x21 composed 4x4x3 patches of size 9x9x10 Events and patches with no movement are filtered out. Dictionary contains 100 words. 36
37
G ENERATIVE DICTIONARY OF D YNAMIC T EXTURE G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 2003 37
38
G ENERATIVE DICTIONARY OF D YNAMIC T EXTURE 38
39
E XPERIMENT – UCSD PED 2 H ARD A SSIGNMENT 16 training videos containing only pedestrians, and 12 test videos containing also abnormal events. Each event’s support is of size 24x24x21 containing 4x4x3 patches of size 9x9x10 Events and patches with no movement are filtered out. Dictionary contains 100 words. 39
40
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 40
41
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 41
42
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 42
43
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 43
44
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 44
45
LDA WITH S OFT A SSIGNMENT OF D ESCRIPTORS 45
46
V ARIATION I NFERENCE UNDER LDA WITH SOFT ASSIGNMENT 46 ⇨
47
PARAMETER ESTIMATION UNDER LDA WITH SOFT ASSIGNMENT 47 ⇨ Obtained while maximizing a lower bound over (itself a lower bound)
48
V ARIATION I NFERENCE AND PARAMETER ESTIMATION UNDER LDA 48 ⇨
49
V ARIATION I NFERENCE AND PARAMETER ESTIMATION UNDER LDA WITH SOFT ASSIGNMENTS 49
50
E XPERIMENT – UCSD PED 2 S OFT A SSIGNMENT 16 training videos containing only pedestrians, and 12 test videos containing also abnormal events. Each event’s support is of size 24x24x21 containing 4x4x3 patches of size 9x9x10 Events and patches with no movement are filtered out. Dictionary contains 100 generative words (Dynamic texture). 50
51
E XPERIMENT – UCSD PED 2 51
52
E XPERIMENT – UCSD PED 2 52
53
S UMMARY We discussed a few methods to organize data in an unsupervised manner This is used to analyze and obtain insights about all kinds of data, from text to images to collections of music Many extensions to the basic LDA model: Hierarchical LDA (topics are organized in a hierarchy) Hierarchical Dirichlet Process mixture model, where the number of topics is not pre-determined Dynamic topic model, which allows topics to change with time and more… 53
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.