Purposive Sensor Placement PhD Proposal Ser-Nam Lim.

Slides:



Advertisements
Similar presentations
CSE473/573 – Stereo and Multiple View Geometry
Advertisements

3D reconstruction.
Computer vision: models, learning and inference
Foreground Background detection from video Foreground Background detection from video מאת : אבישג אנגרמן.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor :王聖智 教授 Student :周節.
Adviser : Ming-Yuan Shieh Student ID : M Student : Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Structure from motion.
Active Calibration of Cameras: Theory and Implementation Anup Basu Sung Huh CPSC 643 Individual Presentation II March 4 th,
Last Time Pinhole camera model, projection
Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.
Multi video camera calibration and synchronization.
Sensor, Motion & Temporal Planning PhD Defense for Ser-Nam Lim Department of Computer Science University of Maryland, College Park.
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
CS485/685 Computer Vision Prof. George Bebis
Multi-view stereo Many slides adapted from S. Seitz.
Multiple-view Reconstruction from Points and Lines
Stereopsis Mark Twain at Pool Table", no date, UCR Museum of Photography.
Fast Illumination-invariant Background Subtraction using Two Views: Error Analysis, Sensor Placement and Applications Ser-Nam Lim, Anurag Mittal, Larry.
Object Detection and Tracking Mike Knowles 11 th January 2005
The plan for today Camera matrix
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Camera Calibration CS485/685 Computer Vision Prof. Bebis.
CSE473/573 – Stereo Correspondence
Announcements PS3 Due Thursday PS4 Available today, due 4/17. Quiz 2 4/24.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Optical flow (motion vector) computation Course: Computer Graphics and Image Processing Semester:Fall 2002 Presenter:Nilesh Ghubade
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Computer vision: models, learning and inference
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Y. Moses 11 Combining Photometric and Geometric Constraints Yael Moses IDC, Herzliya Joint work with Ilan Shimshoni and Michael Lindenbaum, the Technion.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Stereo Many slides adapted from Steve Seitz.
1 Formation et Analyse d’Images Session 7 Daniela Hall 25 November 2004.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Expectation-Maximization (EM) Case Studies
Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,
stereo Outline : Remind class of 3d geometry Introduction
1 Self-Calibration and Neural Network Implementation of Photometric Stereo Yuji IWAHORI, Yumi WATANABE, Robert J. WOODHAM and Akira IWATA.
Robust Principal Components Analysis IT530 Lecture Notes.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Computer vision: models, learning and inference M Ahad Multiple Cameras
Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.
1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.
Auto-calibration we have just calibrated using a calibration object –another calibration object is the Tsai grid of Figure 7.1 on HZ182, which can be used.
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
Planning Tracking Motions for an Intelligent Virtual Camera Tsai-Yen Li & Tzong-Hann Yu Presented by Chris Varma May 22, 2002.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
Video object segmentation and its salient motion detection using adaptive background generation Kim, T.K.; Im, J.H.; Paik, J.K.;  Electronics Letters 
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
Paper – Stephen Se, David Lowe, Jim Little
Motion Detection And Analysis
Epipolar geometry.
Factors that Influence the Geometric Detection Pattern of Vehicle-based Licence Plate Recognition Systems Martin Rademeyer Thinus Booysen, Arno Barnard.
Common Classification Tasks
Vehicle Segmentation and Tracking in the Presence of Occlusions
Multiple View Geometry for Robotics
Filtering Things to take away from this lecture An image as a function
Filtering An image as a function Digital vs. continuous images
Presentation transcript:

Purposive Sensor Placement PhD Proposal Ser-Nam Lim

Overview Part I: –Sensor placement to reduce the sensitivity of background subtraction to illumination artifacts at online frame rate. Part II: –Collections of PTZ cameras: Given a set of generic surveillance tasks, how can we schedule a collection of PTZ cameras to satisfy as many of those tasks as possible? Part III: –Scalable closed-form solution for large camera networks (will not give detail, refer to proposal writeup). Part IV: –Extensions, future work.

Wide Baseline Stereo Configurations for Fast Illumination-invariant Background Subtraction PhD Proposal Part I Ser-Nam Lim

Problem Description Single-camera background subtraction: –Shadows. –Illumination changes. –Specularities. Stereo-based background subtraction: –Can overcome many of these problems, but –Slow and –Inaccurate online matches.

Project Goal Real time, dual camera background subtraction. –Develop a fast two camera background subtraction algorithm that doesn’t require solving the correspondence problem online. –Analyze advantages of various camera configurations with respect to robustness of background subtraction: We assume objects to be detected move on a known ground plane.

Related Work Fast two-cameras background subtraction: –Yuri A. Ivanov, Aaron F. Bobick and John Liu, “Fast Lighting Independent Background Subtraction”, IEEE Workshop on Visual Surveillance, ICCV'98, Bombay, India, January Statistical approach: –Ahmed Elgammal, David Harwood, Larry Davis, “Non-parametric Model for Background Subtraction”, 6th European Conference on Computer Vision, Dublin, Ireland, July –Chris Stauffer, W. Eric L. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking”, IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, June Range based: –G. Gordon, T. Darrell, M. Harville, J. Woodfill, “Background Estimation and Removal Based on Range and Color”, IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, June –B. Goldlucke, M.A. Magnor, “Joint 3D Reconstruction and Background Separation in Multiple Views”, IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, June –C. Eveland, K. Konolige, R. C. Bolles, “Background Modeling for Segmentation of Video- rate Stereo Sequences”, IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, California, USA, June 1998.

Fast Illumination-invariant Multi- cameras Approach Clever idea due to Ivanov et. al. Background model: –Established background conjugate pixels offline. –Color differences between conjugate pixels. What are the problems? –False and missed detections caused by homogeneous objects.

False and Missed Detections

False Detections Given a background conjugate pair (p, p’): –p’ occluded by a foreground object. –p is visible in reference view. Can be expressed as: E p = min(|P*b sec 1 - P*b sec 2 |, |P*b ref – P*b sec 2 |), where –P is the camera projection matrix.

Missed Detections p and p’ occluded by a foreground object. Can be expressed as: E n = max(|P*b sec 1 - P*b sec 2 | - |P*b ref – P*b sec 2 |, 0).

Eliminating False Detections Consider a ground plane. Two-cameras placement: –baseline orthogonal to ground plane. –Lower camera is used as reference.

Reducing Missed Detections Initial detection free of false detections. –And the missed detections form a component adjacent to the ground plane. For a detected pixel I t along each epipolar line in an initial foreground blob: 1.Compute conjugate pixel (constrained stereo). 2.Determine base point I b. 3.If | I t - I b | > thres, increment I t and repeat step 1. 4.Mark I t as the lowermost pixel.

Determining Base Point Using Weak Perspective Model Proposition 1: In 3D space, the missed proportion of a homogeneous object with negligible front- to-back depth is independent of object position. Equivalently, the proportion that is correctly detected remains constant.

Determining Base Point Using Weak Perspective Model (Cont’d) Proof: Extent of missed detection = being the length of baseline. Thus, proportion of missed detection = = ¤

Determining Base Point Using Weak Perspective Model (Cont’d) Let t = [X t, Y t, Z t, 1] be 3D point of I t. Let conjugate of I t be I ’ t. Let m = [X m, Y m, Z m, 1] be 3D point of I m =  -1 * I’ t, with  being the ground plane homography. Let b = [X b, Y b, Z b, 1] be the 3D base point of t.

Determining Base Point Using Weak Perspective Model (Cont’d) Y-direction: Thus, The images are (f is focal length):

Determining Base Point Using Weak Perspective Model (Cont’d) Using a constant Z ave (weak perspective): Same applies to X-direction. Thus: I m can be determined independently using  and I ’ t : –Homogeneous and ground plane assumptions not necessary.

Determining Base Point Using Perspective Model Weak perspective model can be violated but much simpler. Perspective model: –Computationally less stable (in practice), sensitive to calibration errors. –Slower. Criminisi et. al. –A. Criminisi, I. Reid, A.Zisserman, “Single View Metrology”, 7th IEEE International Conference on Computer Vision, Kerkya, Greece, September 1999.

Determining Base Point Using Perspective Model (Cont’d)

Projection matrix P = [P 1 P 2 P 3 P 4 ]. t = [X, Y, h, 1] T, b = [X, Y, 0, 1] T. I b =  b (XP 1 + YP 2 + P 4 ). I t =  t (XP 1 + YP 2 + hP 3 + P 4 ). Both  b,  t unknown scale factors. P 3 is vertical vanishing point v ref of the ground plane:  ref is unknown scale factor.

Determining Base Point Using Perspective Model (Cont’d) Take vector product with I b : is the normalized vanishing line of ground plane for reference and second camera. v sec is vanishing point of second camera. Applying to both cameras and equating the height:

Robustness to Illumination Changes Geometrically, the algorithm is unaffected by: 1.Lighting changes, and 2.Shadows, on background conjugate pairs.

Robustness to Specularities Perform morphological operation, after which two possibilities: 1.Specularities in a single blob. 2.Specularities in a different blob.

Robustness to Specularities (Cont’d) Case 1 - Specularities in the same blob: –Virtual image lies below the ground plane. –Eliminated by base-finding operations. Case 2 – Specularities in different blob: –Hard to find a good stereo match. –Lambertian + Specular at point of reflection. –Even if matched, typically causes I m above I t.

Near-background Object Detection Problem with detecting objects near background. Our approach overcomes this easily: –Need only detect top portion of object (easy). –Base-finding operations.

Additional Advantages Fast!! –Background model updates on the color dissimilarity, not disparities. Much more accurate: –Offline establishment of conjugate pairs, can use very accurate stereo algorithm that is otherwise too slow for real-time. Or, even manually correct wrong stereo matches.

Experiments 1.Dealing with illumination changes using our sensor placement. 2.Dealing with specularities (night scene). 3.Dealing with specularities (day raining scene). 4.Near-background object detection. 5.Indoor scene (requiring perspective model).

Experiments (1)

Experiments (2)

Experiments (3)

Experiments (4)

Experiments (5)

Conclusions (Part I) Initial detection free of false detections using our sensor placement. Fast and illumination-invariant. Base-finding operations – effective removal of missed detections. Effective removal of detections due to specularities. Effective detection of near-background objects.

Collecting Surveillance Videos PhD Proposal Part II Ser-Nam Lim

Problem Description Given: –Collection of calibrated PTZ cameras and –Large surveillance site. How to control cameras to acquire surveillance videos? Why collect surveillance videos? Satisfy surveillance tasks e.g. gait recognition.

Project Goals - Visual Requirements Targets have to be unobstructed in the collected videos during useful video collections. –Involves predicting object trajectories in the future based on tracking. Targets have to be in the field of view in the collected videos. –Constrains PT parameters for cameras as a function of time during periods of visibility. Targets have to satisfy some task-specific minimum resolutions in the collected videos. –Constrains Z parameter.

Project Goals - Performance Requirements Scheduling cameras to maximize task coverage. Determine future time intervals within which visual requirements of tasks are satisfied: –We first do this for each camera, task pair. –We then combine these solutions across tasks and then cameras to schedule tasks.

Related Work Static Sensor Planning: –Anurag Mittal, Larry S. Davis, “Visibility Analysis and Sensor Planning in Dynamic Environments”, 8th European Conference on Computer Vision, Prague, Czech Republic, May Dynamic Sensor Planning –S. Abrams, P.K. Allen, K.A. Tarabanis, “Dynamic Sensor Planning”, International Conference on Intelligent Autonomous Systems, Pittsburgh, PA, USA, February –S. Abrams, P.K. Allen, K.A. Tarabanis, “Computing Camera Viewpoints in an Active Robot Work Cell”, International Journal on Robotic Research, vol. 19, no. 3, pp , Viewpoints locus: –C. K. Cowan, P. D. Kovesi, “Automatic Sensor Placement from Vision Task Requirements”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 3, pp , –I. Stamos, P. K. Allen, “Interactive Sensor Planning”, IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, California, USA, June 1998.

Related Work (Cont’d) Scene Modeling, reconstruction: –K. N. Kutulakos, C. R. Dyer, “Global Surface Reconstruction by Purposive Control of Observer Motion”, IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Washington, USA, June –K. Tarabanis, R.Y. Tsai, A. Kaul, “Computing Occlusion-free Viewpoints”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 3, pp , Logic, scheduling: –Fusun Yaman, Dana Nau, “Logic of Motion”, 9th International Conference on Knowledge Representation and Reasoning, Whistler, British Columbia, Canada, June –Chandra Chekuri, Amit Kumar, “Maximum Coverage Problem with Group Budget Constraints and Applications”, 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Harvard University, Cambridge, USA, August 2004.

System Setup Detection – background subtraction. Tracking – mean shift, Kalman filter. Prediction – motion model derived from detection and tracking. Visibility intervals – based on prediction (future), uncertainties. Schedule – maximize coverage.

Task Visibility Intervals (TVI) Atomic representation: (c, (T, o), [r, d], [  c -,  c + ] t, [  c -,  c + ] t, [f c -,f c + ] t ), where –c is the camera, –(T, o) is the task-object pair involved, –r is the earliest release time of T, –d is the deadline of T, –t is time instances s.t. r · t · d,

TVI (Cont’d) –[  c -,  c + ] t is the range of valid azimuth angles for the viewing orientation at time t, –[  c -,  c + ] t is the range of valid elevation angles for the viewing orientation at time t, and –[f c -,f c + ] t is the range of valid focal lengths at time t. Note: times are in the future based on prediction from motion model.

Tasks Representation Tasks themselves are 3-tuples: (p, ,  ), where –p is the required duration of the task including latencies involved in re-positioning cameras,  is a predicate indicating the object direction (front, side, back) w.r.t. camera, and –  is the minimal resolution.

Determine TVIs - Overview Procedure for each camera: 1.Determine temporal intervals of continuous visibility – visibility intervals. 2.Prune to determine sub-intervals of feasible camera settings. 3.Combine visibility intervals so that multiple tasks can be satisfied simultaneously.

Scheduling (Non-preemptive) - Overview TVIs are in the atomic form: (c, (T, o), [r, d], [  c -,  c + ] t, [  c -,  c + ] t, [f c -,f c + ] t ). Multiple Tasks Visibility Intervals (MTVIs) as a result of step 3. A schedule for a camera is a sequence of (M)TVIs.

Scheduling (Non-preemptive) - Overview (Cont’d) A TVI cannot be interrupted once execution begins i.e. non-preemptive. Best single-camera schedule – DP using source-sink DAG of TVIs. Multiple cameras – multiple sources and sinks.

Determine TVIs – Step 1 Represent a 3D object’s shape and positional uncertainty using an ellipsoid: X T QX = 0, where X is a 3D point on the quadric coefficient matrix Q (4 £ 4). The polar plane  of camera c projection center c p is given:  = Q * c p.

Determine TVIs – Step 1 (Cont’d)   is defined by the tangent points of the cones of rays through c p with the ellipsoid.  intersects the ellipsoid to form a conic C.

Determine TVIs – Step 1 (Cont’d)  Define the object’s motion model M: M(t) = (c, o,  C - (t),  C + (t),  C - (t),  C + (t)), where – C is camera, o is object, –  C - (t),  C + (t) are the minimum and maximum azimuth angles of conic C, and –  C - (t),  C + (t) are the minimum and maximum elevation angles of conic C.

Determine TVIs – Step 1 (Cont’d) Iterate through discrete time steps: 1.Identify pairs of objects with overlapping [  c - (t),  c + (t)] – plane sweep. 2.Construct an n £ n matrix O t, where n is the number of objects. 3.Mark entry (i, j) as -1 if object i is closer to camera than object j and their azimuth ranges overlap, otherwise mark entry (j, i).

Determine TVIs – Step 1 (Cont’d) 4.Repeat steps 1-3 for the elevation angles. Entry (i, j) is 1 if object i occludes object j. 5.For all rows in each column j, perform OR operation to form matrix OC t. OC t (j) = 1 if object j is occluded. 6.Increment t. Repeat steps 1-5.

Determine TVIs – Step 1 (Cont’d) 7.Stack all OC t. 8.Scan columns of the stacked matrix.

Determine TVIs – Step 2 What do we have at this point? –Visibility intervals of continuous visibility. What’s next? –Determine sub-intervals of feasible camera settings.

Determine TVIs – Step 2 (Cont’d) Camera projection matrix: P(R) = K[R | I], where –K is the camera intrinsic matrix, and –R is the rotation matrix in the world coordinate frame. Assume pure rotating cameras, P(R) parameterized by R.

Determine TVIs – Step 2 (Cont’d) Image of the object ellipsoid (Q(t)) is an image conic C(t): C ’ (t) = P(R) * Q ’ (t) * P T (R), where –C ’ (t) = C -1 (t) is the dual of C(t) (assume C(t) full rank), and –Q ’ (t) = adjoint Q(t) is the dual of Q(t).

Determine TVIs – Step 2 (Cont’d) Let C(t) = [C 1 C 2 C 3 ], then for ellipsoid: –C 1 = [b 2, 0, 0] T, –C 2 = [0, a 2, 0] T, –C 3 = [0, 0, a 2 * b 2 ] T. a and b are the width and height of C respectively. Use min(a, b) to adjust focal length in K so as to satisfy minimum resolution f c -.

Determine TVIs – Step 2 (Cont’d) Procedure for each camera and task-object pair: 1.Iterate t = start to end of initial visibility interval (from occlusion analysis). 2.Iterate  c = min to max azimuth and  c = min to max elevation. 3.Determine P(R), R given by  c and  c.

Determine TVIs – Step 2 (Cont’d) 4.Set f = f c - such that min(a, b) satisfies the minimum resolution. 5.If conic C at f has boundary image coordinates outside the image size, go to step 7. 6.Increment f and repeat step 5.

Determine TVIs – Step 2 (Cont’d) 7.If f  f c -, let f c + = f, which gives the max resolution. 8.Update [r, d], [  c -,  c + ] t, [  c -,  c + ] t, [f c -, f c + ] t for time t.

Determine TVIs – Step 3 What do we have at this point? –TVIs of continuous visibility and feasible camera settings. Note single task only. What next? –Can schedule based on TVIs. –Can we do better? Compositing TVIs?

Determine TVIs – Step 3 (Cont’d) Combine TVIs so that multiple tasks can be performed in a single scheduled capture: (c, (T, o) i, ∩ i [r i, d i ], ∩ i [  c,i -,  c,i + ] t, ∩ i [  c,i -,  c,i + ] t, ∩ i [f c,i -, f c,i + ] t ), Conditions: –∩ i [r i, d i ]  ;, –∩ i [r i, d i ] ¸ p max, where p max is the largest processing time among the tasks and for t 2 ∩ i [r i, d i ], –∩ i [  c,i -,  c,i + ] t  ;, –∩ i [  c,i -,  c,i + ] t  ;, and –∩ i [f c,i -, f c,i + ] t )  ;.

Determine TVIs – Step 3 (Cont’d)

Forming MTVIs is computationally expensive. Need an efficient approach to determine MTVIs: –Plane sweep algorithm.

Determine TVIs – Step 3 (Cont’d) Define slack  as:  = [t  -, t  + ] = [r, d – p], where (again) –d is the deadline, –r is the earliest release time and –p is the processing time (duration of task). Let |  | be t  + - t  -.

Determine TVIs – Step 3 (Cont’d) Creating a timeline for each camera: 1.Compute the set S 2 of all pairwise MTVIs with an O(n 2 ) approach. 2.Compute slack  i = [t  i -, t  i + ], using the larger processing time of the two tasks, for i 2 S 2. 3.Define a timeline as U i {t  i -, t  i + }. 4.Define Ordered Timeline (OT) as the sorted timeline.

Determine TVIs – Step 3 (Cont’d) Why OT defined on slack and not r and d? –Splitting [r, d] may cause the interval to become smaller than p (processing time).

Determine TVIs – Step 3 (Cont’d) Define along OT: 1.SS interval as sequential pair of slacks’ start times. 2.SE interval as sequential pair of slacks’ start and end time. 3.ES interval as sequential pair of slacks’ end and start time. 4.EE interval as sequential pair of slacks’ end times.

Determine TVIs – Step 3 (Cont’d) Inference algorithm, sweep along OT: 1.SS – task-object pairs from first MTVI can be performed. 2.SE – task-object pairs from both MTVI can be performed. 3.ES – none. 4.EE – task-object pairs from second MTVI can be performed.

Determine TVIs – Step 3 (Cont’d) 5.Keep “active” set of MTVIs. 6.Combination conditioned on being able to find common camera settings. 7.A set cover (note: need not be minimum) for each time interval along OT.

Determine TVIs – Step 3 (Cont’d) An example:

Determine TVIs – Step 3 (Cont’d) Theorem 1: Let the slack’s start and end times of any MTVI be t  - and t  +. Then t  -, t  + 2 OT. Proof: Let the number of tasks in a MTVI be n (¸ 2). We can form up to n C 2 feasible pairwise MTVIs. The corresponding slacks [t  i -, t  i + ] 2 OT, for i = 1 … n C 2. Then, t  - = max(t  i - ) and t  + = min(t  i + ). ¤

Sensor Scheduling What do we have at this point? –(M)TVIs (for every camera) for scheduling. What next? –Tractability, Job Shop Scheduling, NP!! –Want to maximize coverage of tasks.

Sensor Scheduling – Tractability Theorem 2: Consider a camera. Let  max = argmax  i (|  i |) and p min be the smallest processing time among all (M)TVIs. Then, if |  max | < p min, any feasible schedule for the camera is ordered by the slacks’ start times.

Sensor Scheduling – Tractability (Cont’d) Proof: Consider that  1 = [t  1 -, t  1 + ] precedes  2 = [t  2 -, t  2 + ] in a schedule and t  1 - > t  2 -. Let the processing time corresponding to  1 be p 1. Then t  p 1 > t  p 1. We know that if t  p 1 > t  2 +, then the schedule is infeasible. This happens if t  2 + · t  p 1 i.e., t  t  2 - · p 1. Given that |  max | < p min, t  t  2 - · p 1 is true. ¤

Sensor Scheduling – Tractability (Cont’d) Theorem 3: A feasible schedule contains a sequence of n (M)TVIs each with slack  i = [t i -, t i + ], where i = 1 … n represents the order of the sequence, such that t n + - t 1 - ¸ (  i=1…n-1 p i ) – (  i=1…n-1 |  i |), p i being the processing time of the i th (M)TVI in the schedule.

Sensor Scheduling – Tractability (Cont’d) Proof: For the schedule to be feasible the following must be true: t p 1 · t 2 +, t p 2 · t 3 +, …, t n p n-1 · t n +. Summing them up gives t n + - t 1 - ¸ (  i=1…n-1 p i ) – (  i=1…n-1 |  i |). It is also clear that this is a sufficient condition to check for feasibility of a schedule. ¤

Sensor Scheduling – Tractability (Cont’d) Corollary 1: Define a new operator ¹, such that if  1 (= [t  1 -, t  1 + ]) ¹  2 (= [t  2 -, t  2 + ]), then t  p 1 · t  2 +. Consider a schedule of (M)TVIs with slacks  i=1…n. If  1 ¹  2,  2 ¹  3, …,  n-1 ¹  n, then the schedule is feasible. Converse is also true. Proof: Follows easily from Theorem 3. ¤

Sensor Scheduling – Single camera Forming the best schedule for each camera: 1.Based on Theorem 2, sort the (M)TVIs based on the corresponding slacks’ start times. 2.Form a DAG (Directed Acyclic Graph): Each (M)TVI is a node. Define a source node with outgoing edges to all nodes. The weights of these edges are the number of tasks covered in the nodes.

Sensor Scheduling – Single camera (Cont’d) Define a sink node with incoming edges from all nodes except the source. Let their weights be 0. A directed edge exist between two nodes, with slack  1 and  2 respectively, if  1 ¹  2. Weights of these edges determine on the fly during DP in step 3 based on the backtracked path.

Sensor Scheduling – Single camera (Cont’d) 3.Run DP on the DAG to find the optimal path. Based on Corollary 1, such a path is a feasible schedule. Trick of the trade: Make sure that |  max | < p min. Typically easy for practical purposes (Theorem 2).

Sensor Scheduling – Single camera (Cont’d) Example: Consider the following set of (M)TVIs represented by the tasks they accomplished and sorted in order of their slacks’ start times: {{T 1, T 2 }, {T 2, T 3 }, {T 3, T 4 }, {T 5, T 6 }}. node 1 node 2 node 3 node 4

Sensor Scheduling – Single camera (Cont’d) Resulting DAG: s t

Sensor Scheduling – Single camera (Cont’d) Dynamic programming: –Best path = {s, 1, 3, 4, t} The layout of the DAG stays the same by ensuring |  max | < p min. Distance from sink Nodes S X X X X 46 6 X X X {T 1,T 2,T 3,T 4 }{T 1,T 2 }{T 2,T 3,T 5,T 6 }{T 3,T 4,T 5,T 6 } {T 1,T 2, T 3,T 4 } {T 1,T 2,T 3, T 4,T 5,T 6 } {T 2,T 3,T 4, T 5,T 6 } {T 1,T 2,T 3, T 4,T 5,T 6 } {T 1,T 2,T 3, T 4,T 5,T 6 }

Sensor Scheduling – Multiple Cameras Single camera scheduling made tractable by keeping |  max | < p min. Extension to multi-cameras scheduling: –(source, sink) pair per camera. –Connecting cameras: outgoing edge from one camera sink to the source of another camera. –Outgoing edges from all camera sinks to a final sink node.

Three-cameras example: C 1 : {{T 1,T 2 }, {T 3,T 4 }, {T 5,T 6 }}, C 2 : {{T 1,T 2 }, {T 3,T 4 }, {T 7,T 8 }}, and C 3 : {{T 1,T 2 }, {T 9,T 10 }, {T 11,T 12 }}. Sensor Scheduling – Multiple Cameras (Cont’d) node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 8 node 9

1 2 3 s1s1 t1t s2s2 t2t s3s3 t3t t Resulting DAG: Sensor Scheduling – Multiple Cameras (Cont’d)

DP table gives {s 1, n 3, t 1, s 2, n 5, n 6, t 2, s 3, n 7, n 8, n 9, t 3, t}: Ds1s1 s2s2 s3s3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 n9n9 t1t1 t2t2 t3t3 1XXXXXXXXXXXX000 2XXXT 1,2 t 1 T 3,4 t 1 T 5,6 t 1 T 1,2 t 2 T 3,4 t 2 T 7,8 t 2 T 1,2 t 3 T 9,10 t 3 T 11,12 t 3 XXX 3T 1,2 n 1 T 1,2 n 4 T 1,2 n 7 T 1,2,3,4 n 2 T 3,4,5,6 n 3 XT 1,2,3,4 n 5 T 3,4,7,8 n 6 XT 1,2,9,10 n 8 T 9,10,11,12 n 9 XXXX 4T 1,2,3,4 n 1 T 1,2,3,4 n 4 T 1,2,9,10 N 7 T 1,2,3,4,5,6 n 2 XXT 1,2,3,4,7,8 n 5 XXT 1,2,9,10,11,12 n 8 XXT 1,2 s 2 T 1,2 s 3 X ……………………………………… 12T 1,2,3,4,5,6,7,8,9,10,11,12 n 3 XXXXXXXXXXXXXX

Experiments 1.Constructing (M)TVIs. 2.Application demonstration: Background subtraction, 3D estimation (height and width) ! ellipsoid, and Analysis: object visibility, camera settings, schedule. 3.Heavy occlusion: Anurag Mittal, Larry S. Davis, “M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-based Stereo”, 7th European Conference on Computer Vision, Copenhagen, Denmark, June A flavor of zoom control.

Experiments (1a)

Experiments (1b)

Experiments (2)

Experiments (3)

Experiments (4) – Just a Flavor

Conclusions (Part II) 1.Determining TVIs: Occlusion, Camera settings. 2.Determining MTVIs: Plane sweep inference engine. 3.Making the scheduling problem tractable: Single-camera ! ordering by slacks’ start times, source-sink DAG. Multi-cameras ! multiple sources-sinks.

A Scalable Closed-form Solution for Multi-camera Visibility Planning PhD Proposal Part III Ser-Nam Lim

Summary Will not give detail here. Basically: –Express trajectories in closed-form (continuous time). –Plane-sweep. –E.g. simulations show that for > 3-4 objects and 4 cameras, plane-sweep already more efficient. Nice for huge camera network. –Circles represent object sizes and positional uncertainties + probabilistic prediction.

Experiment

Future Work PhD Proposal Part IV Ser-Nam Lim

Extensions 1.Binocular + monocular background subtraction. 2.Dynamic background subtraction. 3.Keck Lab four-camera system. 4.Large camera network calibration. 5.Does sensing strategies play a role in tracking?

Q & A Thank you.