Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Machine learning continued Image source:
Indian Statistical Institute Kolkata
Deep Learning in NLP Word representation and how to use it for Parsing
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
R-CNN By Zhang Liliang.
Distributed Representations of Sentences and Documents
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
Ling 570 Day 17: Named Entity Recognition Chunking.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Application of Machine Learning for Sequential Data Peter Uherek.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Semantic Compositionality through Recursive Matrix-Vector Spaces
Feedforward semantic segmentation with zoom-out features
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Image segmentation.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
A Hierarchical Deep Temporal Model for Group Activity Recognition
Naifan Zhuang, Jun Ye, Kien A. Hua
Neural Machine Translation
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Raymond J. Mooney University of Texas at Austin
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Data Mining, Neural Network and Genetic Programming
Recursive Neural Networks
Relation Extraction CSCI-GA.2591
Visualizing and Understanding Neural Models in NLP
Nonparametric Semantic Segmentation
Improving a Pipeline Architecture for Shallow Discourse Parsing
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Recursive Structure.
Distributed Representation of Words, Sentences and Paragraphs
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Word embeddings based mapping
Word embeddings based mapping
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Natural Language to SQL(nl2sql)
Neural networks (3) Regularization Autoencoder
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Automatic Handwriting Generation
Human-object interaction
Presented by: Anurag Paul
“Traditional” image segmentation
Modeling IDS using hybrid intelligent systems
Presented By: Harshul Gupta
Deep Structured Scene Parsing by Learning with Image Descriptions
Learning to Detect Human-Object Interactions with Knowledge
Huawei CBG AI Challenges
Presentation transcript:

Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides & Speech: Rui Zhang

Outline Motivation & Contribution Recursive Neural Network Scene Segmentation using RNN Learning and Optimization Language Parsing using RNN Experiments

Motivation Data naturally contains recursive structures Image: Scenes split into objects, objects split into parts Language: A noun phrase contains a clause which contains noun phrases of its own

Motivation The recursive structure helps to Identify components of the data Understand how the components interact to form the whole data

Contribution First deep learning method to achieve state-of-art performance on scene segmentation and annotation Learned deep features outperform hand-crafted ones(e.g. Gist) Can be generalized for other tasks, e.g. language parsing

Recursive Neural Network Similar to one-layer full-connected network Models transformation from children nodes to parent node Recursively applied to tree structure Parent of one layer become child of the upper layer Parameters shared across layers 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 𝑐 3

Recursive vs. Recurrent NN There are two models called RNN: Recursive and Recurrent Similar Both have shared parameter which are applied in a recursive style Different Recursive NN applies to trees, while Recurrent NN applies to sequences Recurrent NN could be considered as Recursive NN for one-way trees

Scene Segmentation Pipeline Over segment image into superpixels Extract feature of superpixels Map feature onto semantic space Compute score for each merge with RNN Permute possible merges Merge pair of nodes with highest score Repeat until only one node is left

Input Data Representation Image Over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer to obtain feature vector Each superpixel has a class label

Tree Construction Scene parse trees are constructed in bottom-up style Leaf nodes are over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer Each leaf has a feature vector An adjacency matrix records neighboring relations 𝐴 𝑖𝑗 =𝑓 𝑥 = 0, 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 &1, 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 Adjacency Matrix

Greedy Merging Nodes are merged in a greedy style In each iteration Permute all possible merge(pairs of adjacent nodes) Compute score for each possible merge Full-connection transformation upon ℎ Merge the pair with highest score 𝑐 1 and 𝑐 2 replaced by new node 𝑐 12 ℎ 12 becomes feature for 𝑐 12 Union of neighbors of 𝑐 1 and 𝑐 2 becomes neighbors of 𝑐 12 Repeat until only one node is left 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 12 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑠𝑐𝑜𝑟𝑒

Training(1) Max Margin Estimation Structured Margin Loss ∆ Penalize merging a segment with another one of a different label before merging with all its neighbors of the same label Number of sub-trees not appearing in correct trees Tree Score 𝑠 Sum of merge scores on all non-leaf nodes Class Label Softmax upon node feature vector Correct Trees Adjacent nodes with same label are merged first One image may have more than one correct tree

Training(2) Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin △ Formulation Margin Loss Function 𝑟 𝑖 𝜃 is minimized 𝑑 is a node in the parse tree 𝑁 ∙ is the set of nodes 𝜃 is all model parameters 𝑖 is index of training image 𝑥 𝑖 is training image 𝑖 𝑙 𝑖 is labels of 𝑥 𝑖 𝑌 𝑥 𝑖 , 𝑙 𝑖 is set of correct trees of 𝑥 𝑖 Τ 𝑥 𝑖 is all possible trees of 𝑥 𝑖 𝑠 ∙ is the tree score function

Training(3) Label of node is predicted by softmax The margin △ is no differentiable Therefore only a sub-gradient is computed 𝜕𝑠 𝜕𝜃 is obtained by back-propagation Gradient of label prediction is also obtained by back-propagation 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 12 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑙𝑎𝑏𝑒𝑙 𝑙𝑎𝑏𝑒𝑙

Language Parsing Language parsing is similar to scene parsing Differences Input is natural language sentence Adjacency is strictly left and right Class labels are syntactical classes Word Level Phrase Level Clause(从句) Level Each sentence has only one correct tree

Experiments Overview Image Language Scene Segmentation and Annotation Scene Classification Nearest Neighbor Scene Subtree Language Supervised Language Parsing Nearest Neighbor Phrases

Scene Segmentation and Annotation Dataset Stanford Background Dataset Task: Segment and label foreground and different types of background pixelwise Result 78.1% pixelwise accuracy 0.6% above state-of-art

Scene Classification Dataset Task Method Result Discussion Stanford Background Dataset Task Three classes: city, countryside, sea-side Method Feature: Average of all node features/top node feature only Classifier: Linear SVM Result 88.1% accuracy for average feature 4.1% above Gist, the state-of-art feature 71.0% accuracy for top feature Discussion Learned RNN feature can better capture semantic info of scene Top feature losses some lower level info

Nearest Neighbor Scene Subtrees Dataset Stanford Background Dataset Task Retrieve similar segments from all images Subtrees whose nodes have the same label corresponds to a segment Method Feature: Top node feature of the subtree Metrics: Euclidean Distance Result Similar segments are retrieved Discuss RNN feature can capture segment level characteristics

Supervised Language Parsing Dataset Penn Treebank Wall Street Journal Section Task Generate parse tree with labeled node Result Unlabeled bracketing F-measure 90.29%, comparable to 91.63% of Berkley Parser

Nearest Neighbor Phrases Dataset Penn Treebank Wall Street Journal Section Task Retrieve nearest neighbor of given sentence Method Feature: Top node feature Metrics: Euclidean Distance Result Similar sentences are retrieved

Discussion Understanding semantic structure of data is essential for applications like fine-grained search or captioning Recursive NN predicts tree structure along with node labels in an elegant way Recursive NN can be incorporated with CNN If we can jointly learn Recursive NN with