Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and.

Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk Beaty

Clustered by shared subject

CHALLENGES

CAN IMAGE PROCESSING SOLVE THIS PROBLEM?

Camera 2 Camera 4 Camera 3 Camera 1 5 LOGICAL similarity does not imply VISUAL similarity

6 VISUAL similarity does not imply LOGICAL similarity

CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?

Sensors are noisy, hard to distinguish subjects… Why not triangulate?

GPS-COMPASS Line-of-Sight

INSIGHT

Don’t need to visually identify actual SUBJECT, can use background as PROXY hard to identify easy to identify Simplifying Insight 1

same basic structure persists Simplifying Insight 2 Don’t need to directly match videos, can compare all to a predefined visual MODEL

Simplifying Insight 3 Light-of-sight (triangulation) is almost enough, just not via sensing (alone)

FOCUS Fast Optical Clustering of live User Streams Sensing Cloud Vision

Hadoop/HDFS Failover, elasticity Image processing Computer vision Video Streams (Android, iOS, etc.) Clustered Videos FOCUS Cloud Video Analytics Video Extraction Watching Live home: 2 away: 1 Users Select & Watch Organized Streams Change Angle Change Focus

Clustered Videos FOCUS Cloud Video Analytics Video Extraction Watching Live home: 2 away: 1 Users Select & Watch Organized Streams Change Angle Change Focus pre-defined reference “model” Hadoop/HDFS Failover, elasticity Image processing Computer vision

17 Model construction technique based on Photo Tourism: Exploring image collections in 3D Snavely et al., SIGGRAPH 2006 z multi-view reconstruction z keypoint extraction estimates camera POSE and content in field-of-view Multi-view Stereo Reconstruction

Visualizing Camera Pose

~ 1 second at 90 th % ~ 18 seconds at 90 th % 19 z multi-view reconstruction z keypoint extraction z frame-by-frame video to model alignment z sensory inputs Given a pre-defined 3D, align incoming video frames to the model Also known as camera pose estimation

z multi-view reconstruction z keypoint extraction z integration of sensory inputs Gyroscope, provides “diff” from vision initial position Gyroscope, provides “diff” from vision initial position 20 01234t - 1t - 2 Filesize ≈ 1/Blur Sampled Frame Gyroscope

21 Field-of-view Using POSE + model POINT CLOUD, FOCUS geometrically identifies the set of model points in background of view z multi-view reconstruction z keypoint extraction z pairwise model image analysis

1 1 3 3 2 2 Similarity between image 1 & 2 = 18 Similarity between image 1 & 3 = 13 22 Finding the similarity across videos as size of point cloud set intersection Finding the similarity across videos as size of point cloud set intersection z multi-view reconstruction z keypoint extraction z pairwise model image analysis

Clustering “similar” videos Similarity Score 1 1 3 3 3 3 2 2 2 2 1 1 Application of Modularity Maximization high modularity implies: high correlation among the members of a cluster minor correlation with the members of other clusters

RESULTS

Collegiate Football Stadium Stadium 33K seats 56K maximum attendance Model: 190K points 412 images (2896 x 1944 resolution) Android App on Samsung Galaxy Nexus, S3 325 videos captured 15-30 seconds each 25

26 Line-of-Sight Accuracy (visual)

Line-of-Sight Accuracy GPS/Compass LOS estimation is <260 meters for the same percentage 27 In >80% of the cases, Line-of-sight estimation is off by < 40 meters

FOCUS Performance 75% true positives Trigger GPS/Compass failover techniques 28

Natural Questions What if 3D model is not available? – Online model generation from first few uploads Stadiums look very different on a game day? – Rigid structures in the background persists Where it won’t work? – Natural or dynamic environment are hard

Conclusion Computer vision and image processing are often computation hungry, restricting real-time deployment Mobile Sensing is a powerful metadata, can often reduce computation burden Computer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applications FOCUS, displays one such fusion, a ripe area for further research

Thank You http://cs.duke.edu/~puneet

Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and.

Similar presentations

Presentation on theme: "Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and.

Similar presentations

Presentation on theme: "Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and."— Presentation transcript:

Similar presentations

About project

Feedback