Aditya Mavlankar and David Varodayan

Aditya Mavlankar and David Varodayan
Region-of-Interest Prediction for Interactively Streaming Regions of High Resolution Video Good Afternoon! I am Aditya Mavlankar and my project partner is David Varodayan. The title of our project is .... The motivation behind our project is that high res videos will be more broadly available in the near future. However there exist challenges in delivering this high res video content to the client. The client might have a low res display panel or a low bit-rate for communication. Aditya Mavlankar and David Varodayan EE392J Class Project (Winter 2007) Stanford University

Outline Demo Problem description Tracking mode Motion vectors
Feature tracker Manual mode Autoregressive moving average predictor Conclusions

Demo We plan to overcome this challenge by making the system interactive. What I mean by interactive should be clear from this demo. The client’s screen is divided into 2 parts. The overview display and the ROI display. Zoom factor --- scroll of the mouse. ROI can be moved around by keeping the left mouse button pressed and moving the mouse.

Parts of the Client’s Display
Overview display area ROI display area Just to define the terminology: we call this the overview display and we call this the ROI display. The location of the ROI is shown by overlaying a rectangle on the overview video. You might have noticed that the size and color of the rectangle vary according to the zoom factor.

Region-of-Interest Trajectory
ROI ROI ROI Original video is available in resolutions Now we define that the ROI trajectory as the path over which the ROI moves. Lets say these are the various resolutions or zoom factors possible. And the ROI is allowed to move about within any of these resolutions, e.g. this animation. by for and , i.e., highest resolution

Streaming over a Realistic Network
ROI pixels ROI location info Delay, delay jitter and packet loss Server Client Overview video pixels Note that we plan to design the system so that it works over a realistic network. Notice that the loop of interaction is closed for the ROI display. For the overview video it is very well known beforehand to the server, which pixels are to be streamed. But for the ROI part, these pixels are being decided by the user on the fly. Our goal is to render the ROI immediately, at the most after one frame-interval. And this we have to do despite the delay and the delay jitter due to the network. One way is to predict the user’s trajectory beforehand at the client’s end and request the pixels in advance to make sure that they arrive in time. Pro-active pre-fetching of ROI pixels to adapt to changing ROI Start-up delay and buffering to combat delay jitter

Prior Work vs. Our Approach
Extrapolate mouse moves to predict user’s ROI trajectory ‘d’ frame-intervals in advance Ramanathan et al. (interactive streaming of lightfields) Kurutepe et al. (interactive 3DTV) Above approaches are agnostic of the actual video content We plan to process frames of overview video present in the client’s buffer to better predict the user’s ROI Tracking objects is possible by processing future frames of overview video present in the client’s buffer People have designed interactive systems in the past. PR from who graduated from our group… Kurutepe from Murat Tekalp’s group. In their work, they extrapolate the mouse moves. Ramanathan uses a simple ARMA filter, whereas Kurutepe uses a more advanced linear filter, namely the Kalman filter. However, these approaches have been agnostic of the actual video content. We plan to process …. You might remember that the overview video is buffered at the client’s side and hence a few future frames are already available.

Modes of Operation Overview video frames available in the client’s buffer ahead of time are used in both modes. Manual mode: The user continues to indicate his desired ROI. Predict ROI ahead of time and pre-fetch data. Tracking mode: The user indicates a desired object by clicking on it. Track the object ahead of time and pre-fetch data until the mode is switched off.

Timeline n+d+a-b+1 n+d+a-b+1 n n+d n+d n+d+a
Currently available b overview frames in buffer Currently used overview frames for ROI prediction and pre-fetching Currently displayed Currently pre-fetched

Motion Vectors Included in the compressed bitstream sent by the server
Used in the tracking mode: User indicates a point on the desired object Propagate this point in future frames by following motion vectors and pre-fetch ROI 4 4

Demonstration of Tracking using Motion Vectors
Videos shown: aditya_tracked_sequence_tt_hood_algo_1.avi aditya_tracked_sequence_tt_hood_v1.avi aditya_tracked_sequence_tt_wheel_v1.avi aditya_tracked_sequence_tt_equipment_v1.avi aditya_tracked_sequence_sf_bee_v1.avi aditya_tracked_sequence_cg_baldguy_v1.avi 1) Tt_hood_algo1 2) Tt_hood_v1 3) Tt_equipment_v1 4) Tt_wheel_v1 5)Sf_bee_v1 6)Cg_baldguy_v1

Kanade-Lucas-Tomasi Feature Tracking
Performs Lucas-Kanade optical flow estimation for a limited number of most easy-to-track feature windows Applies iterative Newton-Raphson method over a multiresolution pyramid to minimize the feature window residuals Easy-to-track feature windows are defined by local gradient matrices with large eigenvalues Open implementation

Example: Tractor Low-Resolution Sequence

Example: Sunflower Low-Resolution Sequence

Demonstration of Tracking using KLT Tracker
Videos shown: david_tt1_center_cinepak.avi david_tt1_stabilize_cinepak.avi david_tt1_blend_cinepak.avi david_sf1_blend_cinepak.avi david_cg1_blend_cinepak.avi

Manual Mode Differences with respect to tracking mode
Always show user the requested trajectory, using concealment from low resolution if necessary User continues to provide trajectory input Interested in accuracy of trajectory prediction, not smoothness Measure distortion incurred when the low-resolution video must be upsampled to conceal prediction mishits Trajectory prediction approaches Autoregressive Moving Average Model: extrapolates trajectory based on estimated velocity Kanade-Lucas-Tomasi Feature Tracker: follows feature nearest the ROI center into future frames

Distortion for Tractor Sequence (trajectory 1)
KLT Feature Tracker ARMA

Distortion for Tractor Sequence (trajectory 2)
ARMA KLT Feature Tracker

Conclusions For interactive streaming, pre-fetching data is crucial to render the ROI with acceptable latency and quality Tracking mode: Tracking objects by processing the overview video frames present in the client’s buffer is both feasible and gives good results It relieves the user of indicating the ROI explicitly while the mode is switched on Manual mode: Trajectory prediction by processing the buffered video can reduce distortion when the content and the user’s trajectory are jerky

The End

Aditya Mavlankar and David Varodayan

Similar presentations

Presentation on theme: "Aditya Mavlankar and David Varodayan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aditya Mavlankar and David Varodayan

Similar presentations

Presentation on theme: "Aditya Mavlankar and David Varodayan"— Presentation transcript:

Similar presentations

About project

Feedback