TagSense: A Smartphone-based Approach to Automatic Image Tagging

Slides:



Advertisements
Similar presentations
Multi-hop wireless networks Fact or fiction? Injong Rhee Department of Computer Science North Carolina State University.
Advertisements

By Zheng Sun, Aveek Purohit, Shijia Pan, Frank Mokaya, Raja Bose, and Pei Zhang final38.pdf.
TouchDevelop Chapter 5-7 Chapter 5 Audio Chapter 6 Camera, Graphics and Video Chapter 7 Sensors Mengfei Ren.
ECE 5367 – Presentation Prepared by: Adnan Khan Pulin Patel
Dynamic Occlusion Analysis in Optical Flow Fields
Person Re-Identification Application for Android
Did You See Bob?: Human Localization using Mobile Phones Constandache, et. al. Presentation by: Akie Hashimoto, Ashley Chou.
TagSense: A Smartphone-based Approach to Automatic Image Tagging - Ujwal Manjunath.
SurroundSense Mobile Phone Localization via Ambience Fingerprinting Scott Seto CS 495/595 November 1, 2011
InSight: Recognizing Humans without Face Recognition He Wang, Xuan Bao, Romit Roy Choudhury, Srihari Nelakuditi.
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Written by Martin Azizyan, Ionut Constandache, & Romit Choudhury Presented by Craig.
Virtual Dart: An Augmented Reality Game on Mobile Device Supervisor: Professor Michael R. Lyu Prepared by: Lai Chung Sum Siu Ho Tung.
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Cindy Song Sharena Paripatyadar. Use vision for HCI Determine steps necessary to incorporate vision in HCI applications Examine concerns & implications.
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
OverLay: Practical Mobile Augmented Reality
Overview and Mathematics Bjoern Griesbach
CrowdSearch: Exploiting Crowds for Accurate Real-Time Image Search on Mobile Phones Original work by Yan, Kumar & Ganesan Presented by Tim Calloway.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Research Area B Leif Kobbelt. Communication System Interface Research Area B 2.
Presented by: Z.G. Huang May 04, 2011 Did You See Bob? Human Localization using Mobile Phones Romit Roy Choudhury Duke University Durham, NC, USA Ionut.
Satellites in Our Pockets: An Object Positioning System using Smartphones Justin Manweiler, Puneet Jain, Romit Roy Choudhury TsungYun
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Martin Azizyan, Ionut Constandache, Romit Roy Choudhury Mobicom 2009.
1 SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Ionut Constandache Co-authors: Martin Azizyan and Romit Roy Choudhury.
3D Motion Capture Assisted Video human motion recognition based on the Layered HMM Myunghoon Suk & Ashok Ramadass Advisor : Dr. B. Prabhakaran Multimedia.
CSSE463: Image Recognition Day 30 This week This week Today: motion vectors and tracking Today: motion vectors and tracking Friday: Project workday. First.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Submitted by: Giorgio Tabarani, Christian Galinski Supervised by: Amir Geva CIS and ISL Laboratory, Technion.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
James Pittman February 9, 2011 EEL 6788 MoVi: Mobile Phone based Video Highlights via Collaborative Sensing Xuan Bao Department of ECE Duke University.
JASON BANICH ADVISOR: DR. JOHN SENG Crosswalk Detection via Computer Vision.
Autonomous Robots Vision © Manfred Huber 2014.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
By Naveen kumar Badam. Contents INTRODUCTION ARCHITECTURE OF THE PROPOSED MODEL MODULES INVOLVED IN THE MODEL FUTURE WORKS CONCLUSION.
It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses.
NO NEED TO WAR-DRIVE UNSUPERVISED INDOOR LOCALIZATION He Wang, Souvik Sen, Ahmed Elgohary, Moustafa Farid, Moustafa Youssef, Romit Roy Choudhury -twohsien.
CS 376b Introduction to Computer Vision 03 / 31 / 2008 Instructor: Michael Eckmann.
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
GSU Indoor Navigation Senior Project Fall Semester 2013 Michael W Tucker.
Chapter 8 Sensors and Camera. Figure 08.01: The Accelerometer can gauge the orientation of a stationary device.
Introduction to Mobile-Cloud Computing. What is Mobile Cloud Computing? an infrastructure where both the data storage and processing happen outside of.
Automatic License Plate Recognition for Electronic Payment system Chiu Wing Cheung d.
EYE TRACKING TECHNOLOGY
Mobile Activity Recognition
Decree Mobile Exploration O C T O B E R 2,
Camera Basics.
Musical Instrument Virtual
ArmKeyBoard A Mobile Keyboard Instrument Based on Chord-scale System
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Mobile Application Development
Using Jabber in Global Offices
Ubiquitous Computing and Augmented Realities
Rob Gleasure IS3320 Developing and Using Management Information Systems Lecture 14: Data-Flow Diagrams 1 (Context-Level.
Context Sensing.
Vijay Srinivasan Thomas Phan
Video-based human motion recognition using 3D mocap data
Common Classification Tasks
Auto-tagging of Media using Local Bluetooth Information
Evaluation of Mobile Interfaces
How to Build Smart Appliances?
William Claycomb and Dongwan Shin
Probabilistic Robotics
Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, Zhen Ren
CSSE463: Image Recognition Day 29
Higher School of Economics , Moscow, 2016
Presentation transcript:

TagSense: A Smartphone-based Approach to Automatic Image Tagging 1.TagSense: a mobile phone based collaborative system that senses the people, activity, and context in a picture, and merges them carefully to create tags on-the-fly. 2. TagSense is an attempt to embrace additional dimensions of sensing TagSense: A Smartphone-based Approach to Automatic Image Tagging

Overview Introduction Scope System Overview Design and Implementation Performance Evaluation Limitations Future of TagSense

Introduction sensor-assisted tagging. tags are systematically organized into a “when-where-who-what” format. better than image processing/face recognition??? Challenges faced? Identify individuals in the picture. mine the gathered sensor data. energy-budget

Contributions Envisioning an alternative, out-of-band opportunity towards automatic image tagging. Designing TagSense, an architecture for coordinating the mobile phone sensors, and processing the sensed information to tag images. Implementing and evaluating TagSense on Android phones.

Picture 1: November 21st afternoon, Nasher Museum, in-door, Romit, Sushma, Naveen, Souvik, Justin, Vijay,Xuan, standing, talking. Picture 2: December 4th afternoon, Hudson Hall, out-door, Xuan, standing, snowing.

Picture 3: November 21st noon, Duke Wilson Gym, indoor,Chuan, Romit, playing, music. Tags extracted using Location services, light-sensor readings, accelerometers and sound. TagSense tags each picture with the time, location, individual-name, and basic activity.

Scope of TagSense TagSense requires the content in the pictures to have an electronic footprint that can be captured over at least one of the sensing dimensions. Images of objects (e.g., bicycles, furniture, paintings), of animals, or of people without phones, cannot be recognized. TagSense narrows down the focus to identifying the individuals in a picture, and their basic activities.

System Overview TagSense architecture – the camera phone triggers sensing in participating mobile phones and gathers the sensed information. It then determines who is in the picture and tags the picture with the people and the context.

SYSTEM OVERVIEW the application prompts the user for a session password. password acts as a shared session key. Phone to phone communication is performed using the WiFi ad hoc mode. phones perform basic activity recognition on the sensed information, and send them back.

Mechanisms Pause signature from the accelerometer readings. compass directions multiple snapshots. 11/12/2018

Design and implementation 11/12/2018 Design and implementation Who are in the picture What are they doing Where is the picture taken When is the picture taken

Who are in the picture? Accelerometer based motion signatures Complementary compass directions Moving objects Combining the opportunities 11/12/2018

Accelerometer based motion signatures subjects of the picture often move into a specific posture in preparation for the picture, stay still during the picture click, and then move again to resume normal behavior. 11/12/2018

Complementary compass directions Posing signature may be a sufficient condition but is obviously not necessary. people in the picture roughly face the direction of the camera, and hence, the direction of their compasses will be roughly complementary to the camera’s facing direction. User and phone may not be facing the same direction. UserFacing=(CameraAngle + 180) mod 360 PCO=((UserFacing + 360) - CompassAngle) mod360 11/12/2018

Periodically recalibrates the PCO If TagSense identifies Alice in a picture due to her posing signature, her PCO can be computed immediately. In subsequent pictures, even if Alice is not posing, her PCO can still reveal her facing direction, which in turn identifies whether she is in the picture This can continue so long as Alice does not change the orientation of her phone 11/12/2018

Figure 4: (a) Personal Compass Offset (PCO) (b) PCO distribution from 50 pictures where subjects are facing the camera. PCO calibration is necessary to detect people in a picture using compass.

Moving Subjects The essential idea is to take multiple snapshotsfrom the camera, derive the subject’s motion vector from these snapshots, and correlate it to the accelerometer measurementsrecorded by different phones. The accelerometer motion that matches best with the optically derived motion is deemed to be in the picture 11/12/2018

Figure 5: Extracting motion vectors of people from two successive snapshots in (a) and (b): (c) The optical flow field showing the velocity of each pixel; (d) The corresponding color graph; (e) The result of edge detection; (f) The motion vectors for the two detected moving objects.

Color of each pixel is redefined based on velocity. Velocity of each pixel is computed by performing a spatial correlation across two snapshots. (Optical flow) the average velocity for the four corner pixels are computed, and subtracted from the object’s velocity-compensates for jitter. Color of each pixel is redefined based on velocity. Edge-finding algorithm identifies the objects in the picture. the average velocity of one-third of the pixels, located in the center of each object, is computed and returned as the motion vectors of the people in the picture. TagSense assimilates the accelerometer readings from different phones and computes their individual velocities TagSense then matches the optical velocity with each of the phone’s accelerometer readings. 11/12/2018

Combining the opportunities First search for the posing signature and compute the user's facing direction. If present the person is deemed to be present in the picture and her PCO is caibrated. In the absence of the posing signature check whether the person is reasonably static If so and her facing direction makes less than 45o , name is added to the tag. If the person is not static compute the pictures's optical motion vectors and correlate with accelerometer/compass readingss. 11/12/2018

Discussion Cannot pinpoint people in a picture cannot identify kids in a picture compass based method assumes people are facing the camera. 11/12/2018

What are they doing Accelerometer: Standing, Sitting, Walking, Jumping, Biking, Playing. Acoustic: Talking, Music, Silence. 11/12/2018

Where is the picture taken Place - derived from the GPS coordinates Indoor/Outdoor-light sensor on the phone Combine location information and phone compass to tag picture backgrounds. 11/12/2018

When is the picture taken Time inherited from the device. Contact internet weather service to fetch weather information. 11/12/2018

Performance evaluation 11/12/2018 Performance evaluation Tagging People Tagging activities and context Tab based image search

Tagging People

Overall Performance Figure 10: The overall precision of TagSense is not as high as iPhoto and Picasa, but its recall is much better, while their fall-out is comparable

Method-wise and Scenerio-wise performance 11/12/2018

Searching images by name 11/12/2018

Tagging activities and Context 11/12/2018

Tag based image search 11/12/2018

LIMITATIONS OF TAGSENSE TagSense vocabulary of tags is quite limited. TagSense does not generate captions. TagSense cannot tag pictures taken in the past. TagSense requires users to input a group password at the beginning of a photo session.

FUTURE OF TAGSENSE Smartphones are becoming context-aware with personal sensing. The granularity of localization will approach a foot. Smartphones are replacing point and shoot cameras.

Conclusion Mobile phones are becoming inseparable from humans and are replacing traditional cameras. TagSense leverages this trend to automatically tag pictures with people and their activities. TagSense has somewhat lower precision and comparable fall-out but significantly higher recall than iPhoto/Picasa.