You Only Look Once: Unified, Real-Time Object Detection

Slides:



Advertisements
Similar presentations
Introduction to Machine Learning BITS C464/BITS F464
Advertisements

Street Crossing Tracking from a moving platform Need to look left and right to find a safe time to cross Need to look ahead to drive to other side of road.
Rear Lights Vehicle Detection for Collision Avoidance Evangelos Skodras George Siogkas Evangelos Dermatas Nikolaos Fakotakis Electrical & Computer Engineering.
Artificial Intelligence. Intelligent? What is intelligence? computational part of the ability to achieve goals in the world.
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.
Real-Time Activity Monitoring of Inpatients Miguel Reyes, Jordi Vitrià, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de.
System Architecture Intelligently controlling image processing systems.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.
Motivation Where is my W-2 Form?. Video-based Tracking Camera view of the desk Camera Overhead video camera.
A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications Lucia Maddalena and Alfredo Petrosino, Senior Member, IEEE.
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
LOCUS Demo Stefan Zickler. Two “different” classes Class “Car Side Views” Class “Car Rears”
Computer Vision. Computer vision is concerned with the theory and technology for building artificial Computer vision is concerned with the theory and.
Kalman Tracking for Image Processing Applications Student : Julius Oyeleke Supervisor : Dr Martin Glavin Co-Supervisor : Dr Fearghal Morgan.
Goals of Computer Vision To make useful decisions based on sensed images To construct 3D structure from 2D images.
CSCE 5013 Computer Vision Fall 2011 Prof. John Gauch
MPEG Video Technology Virtual Lab Tour: Vision Systems for Mobile Robots By: Soradech Krootjohn Vanderbilt University Center for Intelligent Systems Feb.
Object Detection with Discriminatively Trained Part Based Models
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
2 3 Be introduced in H.264 FRExt profile, but most H.264 profiles do not support it. Do not need motion estimation operation.
An Efficient Search Strategy for Block Motion Estimation Using Image Features Digital Video Processing 1 Term Project Feng Li Michael Su Xiaofeng Fan.
Course Instructor: K ashif I hsan 1. Chapter # 1 Kashif Ihsan, Lecturer CS, MIHE2.
Gesture Recognition in a Class Room Environment Michael Wallick CS766.
Training and Evaluating of Object Bank Models Presenter : Changyu Liu Advisor : Prof. Alex Interest : Multimedia Analysis May 16 th, 2013.
Cognitive Modular Neural Architecture
Video Tracking G. Medioni, Q. Yu Edwin Lei Maria Pavlovskaia.
Spring 2007 COMP TUI 1 Computer Vision for Tangible User Interfaces.
Final Year Project. Project Title Kalman Tracking For Image Processing Applications.
Expert Systems. Learning Objectives: By the end of this topic you should be able to: explain what is meant by an expert system describe the components.
Target Tracking In a Scene By Saurabh Mahajan Supervisor Dr. R. Srivastava B.E. Project.
Instructor : Dr. K. R. Rao Presented by : Vigneshwaran Sivaravindiran
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
NEIL: Extracting Visual Knowledge from Web Data Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta Carnegie Mellon University CS381V Visual Recognition -
It’s Time for Cognitive Computing
Brief Intro to Machine Learning CS539
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Machine Learning overview Chapter 18, 21
Automatic Video Shot Detection from MPEG Bit Stream
Artificial intelligence (AI)
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Can computers match human perception?
Machine Learning Training
AV Autonomous Vehicles.
Basic Intro Tutorial on Machine Learning and Data Mining
Image Segmentation Techniques
Eric Grimson, Chris Stauffer,
Object Detection + Deep Learning
Ségolène Martin CEO/Kantify
Progress Report Meng-Ting Zhong 2015/5/6.
AHED Automatic Human Emotion Detection
TA : Mubarakah Otbi, Duaa al Ofi , Huda al Hakami
YOLO-LITE: A Real-Time Object Detection Web Implementation
License Plate Detection Algorithm
RCNN, Fast-RCNN, Faster-RCNN
AHED Automatic Human Emotion Detection
Open Topics.
Machine Learning for Space Systems: Are We Ready?
Human-object interaction
Angel A. Cantu, Nami Akazawa Department of Computer Science
Multi-UAV Detection and Tracking
Multi-UAV to UAV Tracking
Moving Target Detection Using Infrared Sensors
DARPA Autonomous Vehicle
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Counting in High-Density Crowd Videos
Presentation transcript:

You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon University of Washington Santosh Divvala Allen Institute for Artificial Intelligence Ross Girshick Facebook AI Research Ali Farhadi University of Washington IEEE Conference on Computer Vision and Pattern Recognition, 2016

The output block is 7×7×(5𝐵+𝐾). 7×7 is a grid of output cells Each cell is asked to produce 𝐵=2 bounding boxes consisting of (𝑥,𝑦,𝑤,ℎ,𝑐) 𝑥,𝑦 ∈ 0,1 2 is the centre of the box w.r.t. the cell 𝑤,ℎ ∈ 0,1 2 is the size of the box w.r.t. the image 𝑐∈[0,1] is a confidence score Each cell is asked to produce 𝐾=20 logit scores, for each of 𝐾 classes

It seems to me peculiar to encode “This scene has two objects, a 𝑘 1 and a 𝑘 2 in boxes 𝑏 1 and 𝑏 2 ” as data in a 7×7 array. Why not encode it as a sequence, or a maybe a tree if there is more semantic knowledge?

This approach to object detection has been extended, for detecting moving objects in a video stream. (How fast is the car in this bounding box moving?) The approach has been “Can I predict the boxes and their data, given two successive frames of the video?” Sequence/recurrent approaches have not yet been successful. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving (2016) Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun MODNet: Motion and Appearance based Moving Object Detection for Autonomous Driving (2017) Mennatullah Siam, Heba Mahgoub, Mohamed Zahran, Senthil Yogamani, Martin Jagersand