Download presentation
Presentation is loading. Please wait.
Published byIsabella Wilcox Modified over 9 years ago
1
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides & Speech: Rui Zhang
2
Outline Motivation & Contribution Recursive Neural Network
Scene Segmentation using RNN Learning and Optimization Language Parsing using RNN Experiments
3
Motivation Data naturally contains recursive structures
Image: Scenes split into objects, objects split into parts Language: A noun phrase contains a clause which contains noun phrases of its own
4
Motivation The recursive structure helps to
Identify components of the data Understand how the components interact to form the whole data
5
Contribution First deep learning method to achieve state-of-art performance on scene segmentation and annotation Learned deep features outperform hand-crafted ones(e.g. Gist) Can be generalized for other tasks, e.g. language parsing
6
Recursive Neural Network
Similar to one-layer full-connected network Models transformation from children nodes to parent node Recursively applied to tree structure Parent of one layer become child of the upper layer Parameters shared across layers ๐ 1 ๐ 2 ๐ ๐๐๐๐ข๐ ๐ฅ โ ๐ 3
7
Recursive vs. Recurrent NN
There are two models called RNN: Recursive and Recurrent Similar Both have shared parameter which are applied in a recursive style Different Recursive NN applies to trees, while Recurrent NN applies to sequences Recurrent NN could be considered as Recursive NN for one-way trees
8
Scene Segmentation Pipeline
Over segment image into superpixels Extract feature of superpixels Map feature onto semantic space Compute score for each merge with RNN Permute possible merges Merge pair of nodes with highest score Repeat until only one node is left
9
Input Data Representation
Image Over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer to obtain feature vector Each superpixel has a class label
10
Tree Construction Scene parse trees are constructed in bottom-up style
Leaf nodes are over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer Each leaf has a feature vector An adjacency matrix records neighboring relations ๐ด ๐๐ =๐ ๐ฅ = 0, ๐ ๐๐๐ ๐ ๐๐๐ ๐๐๐ก ๐๐๐๐โ๐๐๐๐ &1, ๐ ๐๐๐ ๐ ๐๐๐ ๐๐๐๐โ๐๐๐๐ Adjacency Matrix
11
Greedy Merging Nodes are merged in a greedy style In each iteration
Permute all possible merge(pairs of adjacent nodes) Compute score for each possible merge Full-connection transformation upon โ Merge the pair with highest score ๐ 1 and ๐ 2 replaced by new node ๐ 12 โ 12 becomes feature for ๐ 12 Union of neighbors of ๐ 1 and ๐ 2 becomes neighbors of ๐ 12 Repeat until only one node is left ๐ 1 ๐ 2 ๐ ๐๐๐๐ข๐ ๐ฅ โ 12 ๐ ๐๐๐๐ ๐ ๐ ๐๐๐๐
12
Training(1) Max Margin Estimation Structured Margin Loss โ
Penalize merging a segment with another one of a different label before merging with all its neighbors of the same label Number of sub-trees not appearing in correct trees Tree Score ๐ Sum of merge scores on all non-leaf nodes Class Label Softmax upon node feature vector Correct Trees Adjacent nodes with same label are merged first One image may have more than one correct tree
13
Training(2) Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin โณ Formulation Margin Loss Function ๐ ๐ ๐ is minimized ๐ is a node in the parse tree ๐ โ is the set of nodes ๐ is all model parameters ๐ is index of training image ๐ฅ ๐ is training image ๐ ๐ ๐ is labels of ๐ฅ ๐ ๐ ๐ฅ ๐ , ๐ ๐ is set of correct trees of ๐ฅ ๐ ฮค ๐ฅ ๐ is all possible trees of ๐ฅ ๐ ๐ โ is the tree score function
14
Training(3) Label of node is predicted by softmax
The margin โณ is no differentiable Therefore only a sub-gradient is computed ๐๐ ๐๐ is obtained by back-propagation Gradient of label prediction is also obtained by back-propagation ๐ 1 ๐ 2 ๐ ๐๐๐๐ข๐ ๐ฅ โ 12 ๐ ๐๐๐๐ ๐ ๐ ๐๐๐๐ ๐ ๐๐๐๐๐ ๐๐๐๐๐
15
Language Parsing Language parsing is similar to scene parsing
Differences Input is natural language sentence Adjacency is strictly left and right Class labels are syntactical classes Word Level Phrase Level Clause(ไปๅฅ) Level Each sentence has only one correct tree
16
Experiments Overview Image Language Scene Segmentation and Annotation
Scene Classification Nearest Neighbor Scene Subtree Language Supervised Language Parsing Nearest Neighbor Phrases
17
Scene Segmentation and Annotation
Dataset Stanford Background Dataset Task: Segment and label foreground and different types of background pixelwise Result 78.1% pixelwise accuracy 0.6% above state-of-art
18
Scene Classification Dataset Task Method Result Discussion
Stanford Background Dataset Task Three classes: city, countryside, sea-side Method Feature: Average of all node features/top node feature only Classifier: Linear SVM Result 88.1% accuracy for average feature 4.1% above Gist, the state-of-art feature 71.0% accuracy for top feature Discussion Learned RNN feature can better capture semantic info of scene Top feature losses some lower level info
19
Nearest Neighbor Scene Subtrees
Dataset Stanford Background Dataset Task Retrieve similar segments from all images Subtrees whose nodes have the same label corresponds to a segment Method Feature: Top node feature of the subtree Metrics: Euclidean Distance Result Similar segments are retrieved Discuss RNN feature can capture segment level characteristics
20
Supervised Language Parsing
Dataset Penn Treebank Wall Street Journal Section Task Generate parse tree with labeled node Result Unlabeled bracketing F-measure 90.29%, comparable to 91.63% of Berkley Parser
21
Nearest Neighbor Phrases
Dataset Penn Treebank Wall Street Journal Section Task Retrieve nearest neighbor of given sentence Method Feature: Top node feature Metrics: Euclidean Distance Result Similar sentences are retrieved
22
Discussion Understanding semantic structure of data is essential for applications like fine-grained search or captioning Recursive NN predicts tree structure along with node labels in an elegant way Recursive NN can be incorporated with CNN If we can jointly learn Recursive NN with
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.