Visual Grounding 专题报告 Lejian Ren 4.23.

Slides:



Advertisements
Similar presentations
Leveraging Stereopsis for Saliency Analysis
Advertisements

Deep Learning and Neural Nets Spring 2015
Computer and Robot Vision I
CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Multi-task Low-rank Affinity Pursuit for Image Segmentation Bin Cheng, Guangcan Liu, Jingdong Wang, Zhongyang Huang, Shuicheng Yan (ICCV’ 2011) Presented.
Category Discovery from the Web slide credit Fei-Fei et. al.
G52IVG, School of Computer Science, University of Nottingham 1 Administrivia Timetable Lectures, Friday 14:00 – 16:00 Labs, Friday 17:00 -18:00 Assessment.
Cascade Region Regression for Robust Object Detection
Fine-grained Fine-grained Recognition( 细粒度分类 ) 沈志强.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
Gaussian Conditional Random Field Network for Semantic Segmentation
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Face Detection 蔡宇軒.
Wildlife Census via LSH-based animal tracking APOORV PATWARDHAN 1.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
R-NET: Machine Reading Comprehension With Self-Matching Networks
Recent developments in object detection
CS 1674: Intro to Computer Vision Recurrent Neural Networks
LSUN Semantic Segmentation Extended PSPNet
Hierarchical Question-Image Co-Attention for Visual Question Answering
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Predictive Model for Autonomous Driving
Data Driven Attributes for Action Detection
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
Overview of Challenge Aishwarya Agrawal (Virginia Tech)
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Combining CNN with RNN for scene labeling (segmentation)
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Color-Texture Analysis for Content-Based Image Retrieval
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
Structured Predictions with Deep Learning
Finding Clusters within a Class to Improve Classification Accuracy
Distributed Representation of Words, Sentences and Paragraphs
Text Detection in Images and Video
Rob Fergus Computer Vision
Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Progressive Cross-media Correlation Learning
Frontiers of Computer Science, 2015, 9(6):980–989
Object Detection + Deep Learning
Progress Report Meng-Ting Zhong 2015/5/6.
Zero shot learning Presented by: YuYing Chou
CornerNet: Detecting Objects as Paired Keypoints
Semantic segmentation
Outline Background Motivation Proposed Model Experimental Results
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Learning Object Context for Dense Captioning
Sensing Object Semantics for Interactive Multimedia Applications
Related Work in Camera Network Tracking
Learn to Comment Mentor: Mahdi M. Kalayeh
Lecture 21: Machine Learning Overview AP Computer Science Principles
Dynamic Neural Networks Joseph E. Gonzalez
Sequence to Sequence Video to Text
Feature Selective Anchor-Free Module for Single-Shot Object Detection
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Deep learning: Recurrent Neural Networks CV192
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Week 6 Presentation Ngoc Ta Aidean Sharghi.
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A convolutional Neural Aggregation Network Woojae Kim1, Jongyoo Kim2, Sewoong Ahn1,Jinwoo.
Visual Grounding.
SDSEN: Self-Refining Deep Symmetry Enhanced Network
Nguyen Ngoc Hoang, Guee-Sang Lee, Soo-Hyung Kim, Hyung-Jeong Yang
A-CCNN: ADAPTIVE CCNN FOR DENSITY ESTIMATION AND CROWD COUNTING
ICCV 2019.
Lecture 9: Machine Learning Overview AP Computer Science Principles
CVPR 2019 Poster.
Presentation transcript:

Visual Grounding 专题报告 Lejian Ren 4.23

Tasks Grounding Phrases Plummer, Bryan A., et al. "Phrase localization and visual relationship detection with comprehensive image-language cues." Proceedings of the IEEE International Conference on Computer Vision. 2017.

Tasks Grounding Phrases Grounding Referring Expressions 对称任务:Referring Expressions Generation(region captioning) 基于检测 基于分割 Grounding Referring Relationships

Tasks Grounding Phrases Grounding Referring Expressions 对称任务:Referring Expressions Generation(region captioning) 基于检测 基于分割 Grounding Referring Relationships

Grounding VQA中的attention Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. arXiv preprint arXiv:1601.01705 (2016)

Grounding Referring Expression(BBOX) Bag-of-words Image-text-feature-matching Scoring model Generating model

Grounding Referring Expression(BBOX) Bag-of-words Image-text-feature-matching 大多隐式 Scoring model Generating model 多用于分割

Grounding Referring Expression(BBOX) Challenges require the localization of objects multimodal com prehension of context visual attributes (e.g., “largest”, “baby”) relationships (e.g., “behind”) that help to distinguish the referent from other objects, especially those of the same category.

Grounding Referring Expression(BBOX) 将grounding问题视为 object retrieval S. Guadarrama, E. Rodner, K. Saenko, N. Zhang, R. Farrell, J. Donahue, and T. Darrell. Open-vocabulary object retrieval. In Robotics: Science and Systems, 2014.

Grounding Referring Expression(BBOX) 逐box预测与expression的匹配度 Hu, Ronghang, et al. "Natural language object retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Grounding Referring Expression(BBOX) J. Mao, J. Huang, A. Toshev, O. Camburu, A. Yuille, and K. Murphy. Generation and comprehension of unambiguous object descriptions. Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2016

Grounding Referring Expression(BBOX) Liu, J., Wang, L., & Yang, M. (2017). Referring Expression Generation and Comprehension via Attributes. 4866–4874.

Grounding Referring Expression(BBOX) Image caption和grounding在特征层面是可以复用的 Shridhar, Mohit, and David Hsu. "Grounding spatio-semantic referring expressions for human-robot interaction." arXiv preprint arXiv:1707.05720 (2017).

Grounding Referring Expression(BBOX) 显式地关注【相关】物体 Shridhar, Mohit, and David Hsu. "Grounding spatio-semantic referring expressions for human-robot interaction." arXiv preprint arXiv:1707.05720 (2017).

Grounding Referring Expression(BBOX) 关注context Zhang, H., Niu, Y., & Chang, S. F. (2018). Grounding Referring Expressions in Images by Variational Context. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4158–4166.

Grounding Referring Expression(BBOX) 强调relationship Yu, Licheng, et al. "Mattnet: Modular attention network for referring expression comprehension." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Datasets(BBox) RefCOCO RefCOCO+ RefCOCOg RefCLEF 基于MSCOCO 去除位置信息 采集方式不同,句子更长 RefCLEF

Grounding Referring Expression(Seg) 主要是生成模型 围绕怎么结合图像和文本 整个文本作为特征 每个单词作为特征

Grounding Referring Expression(Seg) Hu, R., Rohrbach, M., & Darrell, T. (2016). Segmentation from natural language expressions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS(d), 108–124.

Grounding Referring Expression(Seg) Hu, R., Rohrbach, M., & Darrell, T. (2016). Segmentation from natural language expressions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS(d), 108–124.

Grounding Referring Expression(Seg) Li, R., Kuo, Y., Shu, M., & Qi, X. (2016). Referring Image Segmentation via Recurrent Refinement Networks Supplementary Material.

Grounding Referring Expression(Seg) Li, R., Kuo, Y., Shu, M., & Qi, X. (2016). Referring Image Segmentation via Recurrent Refinement Networks Supplementary Material.

Grounding Referring Expression(Seg) 整个句子可能有信息损失 逐个单词 Liu, C., Lin, Z., Shen, X., Yang, J., & Lu, X. (n.d.). Recurrent Multimodal Interaction for Referring Image Segmentation. 1271–1280.

Grounding Referring Expression(Seg) Margffoy-Tuay, Edgar, et al. "Dynamic multimodal instance segmentation guided by natural language queries." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

Grounding Referring Expression(Seg) 逐个单词可能无法关注到全局信息 Ye, L. (n.d.). Cross-Modal Self-Attention Network for Referring Image Segmentation.

Grounding Referring Expression(Seg+Video) 解决时序上的inconsistent 用时序信息(overlap)re-rank Khoreva, A., Rohrbach, A., & Schiele, B. (n.d.). Video Object Segmentation with Language Referring Expressions.

Datasets (Seg) ReferIt UNC 基于MSCOCO UNC+ 去除位置信息 G-Ref 基于COCO,采集方式不同

Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Grounding Referring Relationship Krishna, Ranjay, et al. "Referring relationships." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.