Visually Fingerprinting Humans without Face Recognition

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
Dialogue Policy Optimisation
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Kien A. Hua Division of Computer Science University of Central Florida.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
1 “Did you see Bob?”: Human Localization using Mobile Phones Ionut Constandache Co-authors: Xuan Bao, Martin Azizyan, and Romit Roy Choudhury Modified.
Introduction To Tracking
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Did You See Bob?: Human Localization using Mobile Phones Constandache, et. al. Presentation by: Akie Hashimoto, Ashley Chou.
IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.
Vision Based Control Motion Matt Baker Kevin VanDyke.
Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.
InSight: Recognizing Humans without Face Recognition He Wang, Xuan Bao, Romit Roy Choudhury, Srihari Nelakuditi.
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting MARTIN AZIZYAN, IONUT CONSTANDACHE, ROMIT ROY CHOUDHURY Presented by Lingfei Wu.
Face Recognition & Biometric Systems, 2005/2006 Face recognition process.
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Written by Martin Azizyan, Ionut Constandache, & Romit Choudhury Presented by Craig.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
MPEG-7 Motion Descriptors. Reference ISO/IEC JTC1/SC29/WG11 N4031 ISO/IEC JTC1/SC29/WG11 N4062 MPEG-7 Visual Motion Descriptors (IEEE Transactions on.
TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.
WP -6: Human Tracking and Modelling Year–I Objectives: Simple upper-body models and articulated tracks from test videos. Year-I Achievements: Tracking.
OverLay: Practical Mobile Augmented Reality
Project Objectives o Developing android application which turns standard cellular phone into a tracking device that is capable to estimate the current.
ICBV Course Final Project Arik Krol Aviad Pinkovezky.
Crowd++: Unsupervised Speaker Count with Smartphones Chenren Xu, Sugang Li, Gang Liu, Yanyong Zhang, Emiliano Miluzzo, Yih-Farn Chen, Jun Li, Bernhard.
Smart Environments for Occupancy Sensing and Services Paper by Pirttikangas, Tobe, and Thepvilojanapong Presented by Alan Kelly December 7, 2011.
Presented by: Z.G. Huang May 04, 2011 Did You See Bob? Human Localization using Mobile Phones Romit Roy Choudhury Duke University Durham, NC, USA Ionut.
Ambulation : a tool for monitoring mobility over time using mobile phones Computational Science and Engineering, CSE '09. International Conference.
Satellites in Our Pockets: An Object Positioning System using Smartphones Justin Manweiler, Puneet Jain, Romit Roy Choudhury TsungYun
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
1 SurroundSense: Mobile Phone Localization via Ambience Fingerprinting.
Robot Compagnion Localization at home and in the office Arnoud Visser, Jürgen Sturm, Frans Groen University of Amsterdam Informatics Institute.
Shape Recognition and Pose Estimation for Mobile Augmented Reality Author : N. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst Date : Speaker.
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Martin Azizyan, Ionut Constandache, Romit Roy Choudhury Mobicom 2009.
Action recognition with improved trajectories
1 SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Ionut Constandache Co-authors: Martin Azizyan and Romit Roy Choudhury.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Speaker : Meng-Shun Su Adviser : Chih-Hung Lin Ten-Chuan Hsiao Ten-Chuan Hsiao Date : 2010/01/26 ©2010 STUT. CSIE. Multimedia and Information Security.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Zhiphone: A Mobile Phone that Learns Context and User Preferences Jing Michelle Liu Raphael Hoffmann CSE567 class project, AUT/05.
TGI Friday’s Mistletoe Drone /12/10/mistletoe- quadcopter-draws- blood/
WSCG2008, Plzen, 04-07, Febrary 2008 Comparative Evaluation of Random Forest and Fern classifiers for Real-Time Feature Matching I. Barandiaran 1, C.Cottez.
Person detection, tracking and human body analysis in multi-camera scenarios Montse Pardàs (UPC) ACV, Bilkent University, MTA-SZTAKI, Technion-ML, University.
Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.
1 Lecture 1 1 Image Processing Eng. Ahmed H. Abo absa
Stereo Object Detection and Tracking Using Clustering and Bayesian Filtering Texas Tech University 2011 NSF Research Experiences for Undergraduates Site.
Expectation-Maximization (EM) Case Studies
James Pittman February 9, 2011 EEL 6788 MoVi: Mobile Phone based Video Highlights via Collaborative Sensing Xuan Bao Department of ECE Duke University.
1 Iris Recognition Ying Sun AICIP Group Meeting November 3, 2006.
Autonomous Robots Vision © Manfred Huber 2014.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
1 SurroundSense: Mobile Phone Localization via Ambience Fingerprinting.
Stereo Vision Local Map Alignment for Robot Environment Mapping Computer Vision Center Dept. Ciències de la Computació UAB Ricardo Toledo Morales (CVC)
Dejavu:An accurate Energy-Efficient Outdoor Localization System SIGSPATIAL '13.
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
Type of Vehicle Recognition Using Template Matching Method Electrical Engineering Department Petra Christian University Surabaya - Indonesia Thiang, Andre.
Frank Bergschneider February 21, 2014 Presented to National Instruments.
Thrust IIA: Environmental State Estimation and Mapping Dieter Fox (Lead) Nicholas Roy MURI 8 Kickoff Meeting 2007.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Trilateral Filtering of Range Images Using Normal Inner Products
Subway Station Real-time Indoor Positioning System for Cell Phones
Results and Conclusions
Sensor Fusion Localization and Navigation for Visually Impaired People
Mole: Motion Leaks through Smartwatch Sensors
Presentation transcript:

Visually Fingerprinting Humans without Face Recognition He Wang, Xuan Bao, Romit Roy Choudhury, Srihari Nelakuditi

Can a camera identify a person in its view? Consider holding up your camera and looking at a person. In this work, we ask can the camera identify the person in its view?

Can a camera identify a person in its view? Face Recognition is a possible solution But … Face not always visible Face is a permanent visual identifier … privacy? Can a camera identify a person in its view? One natural approach is face recognition. However, a person’s face may not always be visible. Moreover, a face is a permanent identifier and a user can not turn off identification at will.

{color + motion} = a “spatiotemporal descriptor” of a person This paper Face may not be the only identifier for a human Clothing colors are also partial identifiers Human motion (sequence of walk, stops, turns) could also be unique identifiers {color + motion} = a “spatiotemporal descriptor” of a person This paper observes that face may not be the only identifier for a human. Other visual attributes could also pose as identifiers. For example, the color of human clothing could serve as a partial identifier. Even the motion patterns, the direction in which they walk, the durations for which they pause, can also discriminate one human from others. In other words, color and motion can together form a “spatiotemporal descriptor” of a person and can be useful to identify him. We refer to this spatiotemporal descriptor as a visual fingerprint.

Joe Bob Kim Joe: {my color + my motion} Bob: {my color + my motion} Kim: {my color + my motion} Importantly, this visual fingerprint can be gathered by both the subject and the viewer. For instance, each phone can sense its user’s motion pattern. Now, when Alice looks at a person through her smartphone, her smartphone can also compute the person’s motion pattern from the video. If Joe, Bob, and Kim announce their motion fingerprints, Alice’s phone can match them with the one in its view, and determine that Alice is looking at Bob.

Joe Bob Kim Joe: {my color + my motion} Bob: {my color + my motion} Kim: {my color + my motion} ???: {his color + his motion} from video

Joe Bob Kim Joe: {my color + my motion} Bob: {my color + my motion} Kim: {my color + my motion} ???: {his color + his motion} from video

Many applications if humans can be visually fingerprinted Joe Bob Kim Many applications if humans can be visually fingerprinted Many applications are possible if we can visually fingerprint humans. He is Bob

1. Augmented Reality James Bob John Aah! I see people’s messages Looking for a cofounder Am skilled in UI design Kevin Jason Paul David James Consider a university-organized matchmaking event for startups, where students and faculty come to form teams. One can design an application such that when you look at a person through a smartphone or Google Glass, their virtual name tag will pop up on the display. Even better, you can see the messages they want to share around them. Visual fingerprinting can enable such human-centric augmented reality applications. Bob John

2. Privacy Preserving Pictures/Videos Please do not record me. 00:00:10 Another potential application relates to privacy. With the proliferation of cameras in recent years, many people are concerned about being included in a video or a picture taken by a stranger. They wish to have an ability to express their privacy preference such as “please do not record me” and then have cameras honor such privacy preferences by automatically removing them from videos and pictures. It is possible to develop such an application by visually identifying and erasing the requesting user.

3. Communicating to Shoppers B C Visual fingerprints can also enable a store to communicate to its shoppers. Suppose Walmart wants to send a coupon to customer A based on her shopping behavior. Then, using the surveillance camera view, it can compute the motion fingerprint of A and broadcast a coupon with A’s motion fingerprint as the destination address. A’s phone will accept this coupon since its sensor based motion fingerprint matches the destination address. In other words, a user’s motion trajectory can serve as his virtual address for communication.

Our System: InSight InSight Server == {my visual address} We refer to our system as InSight. InSight can be thought of as a switchboard operator that matches two forms of motion fingerprints: one derived from the viewer’s camera and another from the viewee’s smartphone sensors. By matching, InSight establishes a communication channel between a viewer and a viewee. {his visual address} {my visual address}

System Design: Extracting Motion Fingerprint

Difficult to match raw data from sensors and videos motion from sensors Difficult to match raw data from sensors and videos motion from video Observe that the raw motion data from sensors and videos can not be directly matched, we need to map them to a common alphabet.

Need to map raw data to common alphabet motion from sensors Need to map raw data to common alphabet motion from video Observe that the raw motion data from sensors and videos can not be directly matched, we need to map them to a common alphabet.

String of Motion Alphabet walk east walk east pause In this work, we adopt a discrete approach and abstract motion as a string, composed of basic motion alphabet. In this example, this person’s motion string is walk east, walk east, pause, walk south, etc… denoted by EEPS. motion string = {E, E, P, S, …} walk south

Comparing Motion Strings Bob my motion string = {E, E, P, S, …} Joe my motion string = {E, E, P, N, …} Now, comparing motion patterns boils down to comparing motion strings. Suppose the phone looks at a person and from the viewfinder, extracts the person’s motion string as “E, E, P, S”. Bob and Joe’s phones extract their owner’s strings as shown here. After comparing, the camera can infer that it is looking at Bob. from video his motion string = {E, E, P, S, …}

Comparing Motion Strings Bob my motion string = {E, E, P, S, …} Joe my motion string = {E, E, P, N, …} We believe that human motion patterns are reasonably dissimilar that we can distinguish them even with small motion strings. Even if two people walk in lock-step, such as in a grocery store, it is likely that they would soon perform different motions, enabling InSight to disambiguate them. He is Bob from video his motion string = {E, E, P, S, …}

Motion Alphabet α Step duration IsPausing IsRotating IsWalking Let’s now talk about designing individual alphabets. We explore 6 motion alphabet, namely IsPausing, IsRotating, IsWalking, Step duration, Step phase and Step direction. IsPausing IsRotating IsWalking Step phase Step direction

extracting motion from sensor extracting motion from video

𝜔 𝑔 = 𝜔 g |𝑔| 𝜔 α = 𝜔 𝑔 𝑑𝑡 α > α 𝑡ℎ𝑟 α 𝑔 IsRotating == IsRotating IsWalking Step duration Step phase Step direction 𝜔 𝑔 = 𝜔 g |𝑔| 𝜔 α = 𝜔 𝑔 𝑑𝑡 α > α 𝑡ℎ𝑟 IsRotating == α 𝑔

𝑠𝑡𝑑(|𝑎𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛|) α α bagged decision tree IsWalking rotation 1 2 IsRotating IsWalking Step duration Step phase Step direction 1 𝑠𝑡𝑑(|𝑎𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛|) α bagged decision tree IsWalking rotation 2 α

Magnitude after filtering IsRotating IsWalking Step duration Step phase Step direction Raw magnitude reading 𝑇 𝑠𝑡𝑒𝑝 = ∆𝑇 2 Magnitude after filtering ∆𝑇 Primary footsteps Secondary footsteps

Magnitude after filtering IsRotating IsWalking Step duration Step phase Step direction Raw magnitude reading Step phase markers Magnitude after filtering Primary footsteps Secondary footsteps

𝐻= ℛ 𝑎𝑥𝑖𝑠 ×𝑔 ℛ 𝑎𝑥𝑖𝑠 𝑔 8 directions magnetic field IsRotating IsWalking Step duration Step phase Step direction 8 directions 𝐻= ℛ 𝑎𝑥𝑖𝑠 ×𝑔 ℛ 𝑎𝑥𝑖𝑠 magnetic field 𝑔

extracting motion from sensor extracting motion from video

Detection and Tracking error occlusion box association Kalman filter position, speed, size 𝑠 𝑘 =[ 𝑥 𝑘 , 𝑦 𝑘 , 𝑣 𝑥 𝑘 , 𝑣 𝑦 𝑘 ]

{ 𝑣 𝑥 𝑘 , 𝑣 𝑦 𝑘 } {ℎ 𝑘 } IsWalking IsWalking Step direction Step duration Step phase IsRotating { 𝑣 𝑥 𝑘 , 𝑣 𝑦 𝑘 } IsWalking {ℎ 𝑘 }

{ 𝑣 𝑥 𝑘 , 𝑣 𝑦 𝑘 } {ℎ 𝑘 } 8 directions IsWalking Step direction Step duration Step phase IsRotating { 𝑣 𝑥 𝑘 , 𝑣 𝑦 𝑘 } 8 directions {ℎ 𝑘 }

2D Gaussian smoothing kernel – Space IsWalking Step direction Step duration Step phase IsRotating Space-Time Interest Points 2D Gaussian smoothing kernel – Space 𝑅= (𝐼∗𝑔∗ ℎ 𝑒𝑣 ) 2 + (𝐼∗𝑔∗ ℎ 𝑜𝑑 ) 2 1D Gabor filters – Time 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 × × × × ×

2D Gaussian smoothing kernel – Space IsWalking Step direction Step duration Step phase Step duration Step phase IsRotating Step phase markers 𝑇 𝑠𝑡𝑒𝑝 Space-Time Interest Points 2D Gaussian smoothing kernel – Space 𝑅= (𝐼∗𝑔∗ ℎ 𝑒𝑣 ) 2 + (𝐼∗𝑔∗ ℎ 𝑜𝑑 ) 2 1D Gabor filters – Time 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 𝑥 𝑜 × × × × ×

distribution features IsWalking Step direction Step duration Step phase IsRotating Space-Time Interest Points distribution features bagged decision tree IsRotating not rotating rotating

Matching Motion Alphabets IsPausing IsRotating Step duration a threshold ratio × IsWalking Step phase range limit Step direction identical or adjacent

Matching Motion String IsPausing IsRotating 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑚𝑜𝑡𝑖𝑜𝑛 = 𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑝𝑎𝑖𝑟𝑠 𝑡𝑜𝑡𝑎𝑙 𝑝𝑎𝑖𝑟𝑠 Step duration a threshold ratio × IsWalking Step phase range limit Step direction identical or adjacent

Extracting Color Fingerprint System Design: Extracting Color Fingerprint

Extracting Color Fingerprint pose estimation clothing area color conversion RGB HSV

Extracting Color Fingerprint pose estimation clothing area color conversion Spatiograms color histogram RGB HSV spatial distribution

Matching Color Fingerprint 𝑠 1 ={𝑛,𝜇,𝜎} 𝑠 2 ={𝑛′,𝜇′,𝜎′} 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑠𝑝𝑎𝑡𝑖𝑜𝑔𝑟𝑎𝑚𝑠 = 𝑏=1 𝐵 𝑛 𝑏 𝑛 𝑏 ′ 8𝜋 Σ 𝑏 Σ 𝑏 ′ 1 4 𝑁( 𝜇 𝑏 ; 𝜇 𝑏 ′ ,2( Σ 𝑏 +Σ 𝑏 ′ )) color histograms spatial distributions 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑐𝑜𝑙𝑜𝑟 = 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑠𝑝𝑎𝑡𝑖𝑜𝑔𝑟𝑎𝑚𝑠 > 𝑇 𝑐𝑜𝑙𝑜𝑟 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦= 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑚𝑜𝑡𝑖𝑜𝑛 + 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑐𝑜𝑙𝑜𝑟 2

Evaluation

Evaluation Scenario with real users Video simulation with more users and different environments

Scenario with Real Users Experiment design 12 volunteers Samsung Android phone Naturally moved around or paused, and did as they pleased an area of 20mx15m Not instructed about clothing Random request 100 times

Scenario with Real Users Motion ( 10s, 72% )

Scenario with Real Users Motion Color ( 6s, 90% )

Video Simulation (for Scale) Experiment design record videos of people in public places Extract motion fingerprint from video: 𝑉(𝑖) Label ground truth: T(𝑖) Simulate motion fingerprint from sensor: S 𝑖 =𝑇 𝑖 +𝐸𝑟𝑟𝑜𝑟(𝑖) outside student union university cafe

Student Union Scenario 40 people, summer, outdoor Motion 40% Color 50%

Student Union Scenario 40 people, summer, outdoor Motion Color 90% (5s, 88%) (8s, 90%)

Cafe Scenario 15 people, winter, indoor Motion 80% Color 0%

Cafe Scenario 15 people, winter, indoor Motion Color 93% (6s, 80%)

Faces are permanent visual identifiers for humans Conclusion Faces are permanent visual identifiers for humans

Conclusion Faces are permanent visual identifiers for humans This paper observes that human clothing and motion patterns can also serve as visual, temporary identifiers.

Conclusion Faces are permanent visual identifiers for humans This paper observes that human clothing and motion patterns can also serve as visual, temporary identifiers. Given rich diversity in human behavior, these spatio-temporal identifiers can be sensed and expressed with a few bits. In other words, 2 humans are similar only for short segments in space-time, enabling complimentary techniques to face recognition Human recognition, sans faces, enables various new applications

Questions, Comments? Thank You