A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department.

Slides:



Advertisements
Similar presentations
Jung-Hwan Low Redundancy Layered Multiple Description Scalable Coding Using The Subband Extension Of H.264/AVC Department of Electrical.
Advertisements

CAUSES & CURE OF LATENCY IN THE INTERNET TELEPHONY DR. OLUMIDE SUNDAY ADEWALE Dept of Industrial Math & Computer Science Federal University of Technology.
IT Issues and Support Structures Simulation Education and Complex Technology Based Practice.
Kien A. Hua Division of Computer Science University of Central Florida.
THINC: A Virtual Display Architecture for Thin-Client Computing Ricardo A. Baratto, Leonard N. Kim, Jason Nieh Network Computing Laboratory Columbia University.
Multimedia Systems As Presented by: Craig Tomastik.
Practical and Scalable Transmission of Segmented Video Sequences to Multiple Players using H.264 Fabian Di Fiore, Panagiotis Issaris Expertise Centre for.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Comp 1001: IT & Architecture - Joe Carthy 1 Review Floating point numbers are represented in scientific notation In binary: ± m x 2 exp There are different.
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
Managing Redundant Content in Bandwidth Constrained Wireless Networks Tuan Dao, Amit K. Roy- Chowdhury, Srikanth V. Krishnamurthy U.C. Riverside Harsha.
Suphakit Awiphan, Takeshi Muto, Yu Wang, Zhou Su, Jiro Katto
Anahita: A System for 3D Video Streaming with Depth Customization
Measurements of Congestion Responsiveness of Windows Streaming Media (WSM) Presented By:- Ashish Gupta.
Networks & Multimedia Amit Pande, Post-doctoral fellow, Department of Computer Science, University of California Davis
Recursive End-to-end Distortion Estimation with Model-based Cross-correlation Approximation Hua Yang, Kenneth Rose Signal Compression Lab University of.
A Systems Architecture for Ubiquitous Video Neil J. McCurdy William G. Griswold Department of Computer Science and Engineering University of California,
CPSC 643 Aligning Windows of Live Video from an Imprecise Pan-Tilt-Zoom Robotic Camera into a Remote Panoramic Display Dezhen Song Department of Computer.
T.Sharon-A.Frank 1 Multimedia Size of Data Frame.
-Multimedia Basics- Digital Video Integrating Technology into the Curriculum © Jim Lockard 2004.
RealityFlythrough: Harnessing Ubiquitous Video Neil J. McCurdy Department of Computer Science and Engineering University of California, San Diego.
Prediction-based Monitoring in Sensor Networks: Taking Lessons from MPEG Samir Goel and Tomasz Imielinski Department of Computer Science Rutgers, The State.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
CSCE 641: Computer Graphics Image-based Rendering Jinxiang Chai.
Wireless Video Sensor Networks Vijaya S Malla Harish Reddy Kottam Kirankumar Srilanka.
Remote Surveillance System Presented by: Robarin Holdings Limited Telephone: Facsimile:
Streaming media over the Internet A million channels and there is still nothing on! By Samuel Shiffman Streaming Technologist Seton Hall University
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
Camera specifications -3 channels for streaming -FullHD capability (1980x1080 resolution) -MP4, H.264 or MJPEG encoding -Motion detection sensors -IR.
+ Video Compression Rudina Alhamzi, Danielle Guir, Scott Hansen, Joe Jiang, Jason Ostroski.
Video Streaming © Nanda Ganesan, Ph.D..
Packet and Circuit Switching
Webcams, Streaming Video, and the Future
Jani Pousi Supervisor: Jukka Manner Espoo,
1 HW-SW Framework for Multimedia Applications on MPSoC: Practice and Experience Adviser : Chun-Tang Chao Adviser : Chun-Tang Chao Student : Yi-Ming Kuo.
1 Digital Video. 2  Until the arrival of the Pentium processor, in 1993, even the most powerful PCs were limited to capturing images no more than 160.
Introduction to Data communication
MANAGED IMMERSIVE AUDIO COMMUNICATION Alfonso Torrejon Part time MSD.
E0262 MIS - Multimedia Playback Systems Anandi Giridharan Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India.
Kien A. Hua Data Systems Lab Division of Computer Science University of Central Florida.
Digital Video and Multimedia If images can portray a powerful message then video (as a series of related images) is a serious consideration for any multimedia.
Copyright 1998, S.D. Personick. All Rights Reserved1 Telecommunications Networking I Lectures 2 & 3 Representing Information as a Signal.
Facial animation retargeting framework using radial basis functions Tamás Umenhoffer, Balázs Tóth Introduction Realistic facial animation16 is a challenging.
Lector: Aliyev H.U. Lecture №15: Telecommun ication network software design multimedia services. TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES THE DEPARTMENT.
WebCCTV 1 Contents Introduction Getting Started Connecting the WebCCTV NVR to a local network Connecting the WebCCTV NVR to the Internet Restoring the.
CMPD273 Multimedia System Prepared by Nazrita Ibrahim © UNITEN2002 Multimedia System Characteristic Reference: F. Fluckiger: “Understanding networked multimedia,
High-Resolution Interactive Panoramas with MPEG-4 발표자 : 김영백 임베디드시스템연구실.
Scientific Writing Abstract Writing. Why ? Most important part of the paper Number of Readers ! Make people read your work. Sell your work. Make your.
Fast Handoff for Seamless wireless mesh Networks Yair Amir, Clauiu Danilov, Michael Hilsdale Mobisys’ Jeon, Seung-woo.
MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - VIDEO. In this chapter How digital video differs from conventional analog video How digital video differs from.
1 A Systems Architecture for Ubiquitous Video Neil J. McCurdy and William G. Griswold Mobisys, 2005 Presented by Sangjae Lee.
Compression of Real-Time Cardiac MRI Video Sequences EE 368B Final Project December 8, 2000 Neal K. Bangerter and Julie C. Sabataitis.
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
LECTURE 07 RAZIA NISAR NOORANI Digital Video. Basic Digital Video Concepts CS118 – Web Engineering 2 Movie length Frame size Frame rate Quality Color.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
August Video Management Software ViconNet Enterprise Video Management Software Hybrid DVR Kollector Strike Kollector Force Plug & Play NVR HDExpress.
1 Department of Electrical Engineering, Stanford University EE 392J Final Project Presentation Shantanu Rane Hash-Aided Motion Estimation & Rate Control.
System Optimization Networking
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
NETWORK VIDEO SURVEILLANCE. CCTV Closed-Circuit Television (CCTV) is the use of video cameras to transmit signal to a specific place on a designated device.
A Measurement Study of Oculus 360 Degree Video Streaming
Visual Information Retrieval
Klara Nahrstedt Spring 2012
MDC METHOD FOR HDTV TRANSMISSION OVER EXISTING IP NETWORK
Teng Wei and Xinyu Zhang
Web Programming– UFCFB Lecture 8
Coding Approaches for End-to-End 3D TV Systems
Digital Video Faraz Khan.
Presentation transcript:

A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department of Computer Science and Engineering University of California, San Diego

2 Why stream first-person video? Remote vision at dangerous job sites –Disaster Response –Hazmat –SWAT Live streams for remote loved ones –My-day live diaries Citizen reporting –Cell-phone cameras broadcasting news-worthy events –Think YouTube, but live –No tripods, no expert camera work

3 Why stream first-person video? Remote vision at dangerous job sites –Disaster Response –Hazmat –SWAT Live streams for remote loved ones –My-day live diaries Citizen reporting –Cell-phone cameras broadcasting news-worthy events –Think YouTube, but live –No tripods, no expert camera work

4 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities

5 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities

6 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities

7 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities –Must either reduce frame rate or image quality –Low frame-rate video is disorienting. How do the frames relate to one-another?

8 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities –Must either reduce frame rate or image quality –Low frame-rate video is disorienting. How do the frames relate to one-another Aesthetic challenges –Blair Witch-type nausea –Constant motion difficult to track –Camera operator’s interests may not intersect viewer’s interests

9 RealityFlythrough (RFT): A novel solution What we do Reduce frame-rate Approximately reconstruct camera motion using sensors and image processing Benefits High-quality frames Disorientation minimized Long dwell-time on each frame Aesthetically appealing –Calm –Mesmerizing

10 RealityFlythrough (RFT): A novel solution What we do Reduce frame-rate Approximately reconstruct camera motion using sensors and image processing Benefits High-quality frames Disorientation minimized Long dwell-time on each frame Aesthetically appealing –Calm –Mesmerizing

11 Roadmap Introduction Video compression challenges How RealityFlythrough works Experimental results Conclusion

12 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation

13 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation

14 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation

15 RFT System Architecture H323 Video Conferencin g Stream RFT MCU (Multipoint Control Unit) RFT Engine Cameras ImageCaptureSensorCapture StreamCombine (352x288 video resolution) RFT Server How RFT Works 1xEVDO Cellular (~60 Kbps)

16 Simplifying 3d space We know the orientation of each frame We project the camera’s image onto a virtual wall at that same orientation When the user’s orientation is the same as the camera’s, the entire screen is filled with the image Results in a 2d simplification of 3d space How RFT Works

17 The transition A transition between frames is achieved by moving the user’s orientation from the point of view of the source frame to the point of view of the destination frame The virtual walls are shown in perspective Overlapping portions of images are alpha-blended How RFT Works

18 Images are projected inside a sphere How RFT Works

19 Images are projected inside a sphere How RFT Works

20 Point matching improves experience If frames overlap, point matching allows for more accurate placement –Use SIFT method [Lowe, 2004]; autopano implementation –Client device computes match and transmits meta-data w/ frame 2d morphing between frames improves blend Works w/ inter-frame and inter- camera How RFT Works

21 Point matching meets sensors New point- matched frames join the panorama The panorama consists of the 5 most recent frames (older ones discarded) A new panorama is started when a non-point- matched frame arrives. Sensor data positions the frame. How RFT Works

22 Field study Experimental setup Hazmat bulking process –Wore full hazmat suits –Labor-intensive –Accurate motion model for head- mounted camera.5 fps video transmitted over 1xEVDO Hazmat supervisor used video to explain the bulking process Results Ran for 64 minutes Much more camera motion than expected Supervisor preferred transitions over other encoding techniques –Not because of frame quality –Traditional first-person video was too busy (“It interferes with my thinking. Literally, it’s messing with my head”) –1 fps “video” w/o transitions seen as useless Experimental results

23 Field study Experimental setup Hazmat bulking process –Wore full hazmat suits –Labor-intensive –Accurate motion model for head- mounted camera.5 fps video transmitted over 1xEVDO Hazmat supervisor used video to explain the bulking process Results Ran for 64 minutes Much more camera motion than expected Supervisor preferred transitions over other encoding techniques –Not because of frame quality –Traditional first-person video was too busy (“It interferes with my thinking. Literally, it’s messing with my head”) –1 fps “video” w/o transitions seen as useless Experimental results

24 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps Experimental results

25 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) Experimental results

26 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) –encChoppy: Regular video encoded at 5 fps Experimental results Same bitrate

27 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) –encChoppy: Regular video encoded at 5 fps Subjects did side-by-side comparisons and ranked encodings in order of preference Subjects answered questions to help them arrive at a task- independent ranking Experimental results

28 Taking out the trash Experimental results encChoppy encFast encIdeal

29 Taking out the trash Experimental results encChoppy encFast encIdeal

30 Taking out the trash Experimental results encChoppy encFast encIdeal

31 Results and analysis 12/14 subjects preferred one of our encodings to encChoppy 4/14 subjects preferred our encodings to encIdeal w/ 4 more on fence! Our encodings grew on people (4 people ranked our encodings higher at end of experiment than at beginning) Experimental results Positives: calm, smooth, slow-motion, sharp, artistic, soft, not-so- dizzy Negatives: herkey-jerkey, artificial, makes me feel detached, insecure Our encodings gave subjects time to catch up with what the camera operator was seeing. First-person video tends to dart around too much.

32 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality

33 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality

34 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality

35 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame- rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta-data

36 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame- rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta-data

37 Other slides

38 Lab study results

39 Why digital instead of analog? RealityFlythrough piggy-backs on wireless mesh network that is deployed by first-responders on-site Varying conditions of the network can be better managed in digital domain. Frame-rates can be throttled and image quality can be degraded. –Also can guarantee eventual delivery of high-quality data Support multiple cameras using same bandwidth managing techniques

40 Related Work Panoramic Viewfinder –Baudisch, et al. Recognizing Panoramas –Brown, Lowe View Morphing –Seitz and Dyer Efficient Representations of Video Sequences and their Applications –Irani, et al. Predictive perceptual compression for real time video communication –Komogortsev, Khan