Download presentation
Presentation is loading. Please wait.
1
A Robust Abstraction for First-Person Video Streaming: Techniques, Applications, and Experiments Neil J. McCurdy William G. Griswold Leslie A. Lenert Department of Computer Science and Engineering University of California, San Diego
2
2 Why stream first-person video? Remote vision at dangerous job sites –Disaster Response –Hazmat –SWAT Live streams for remote loved ones –My-day live diaries Citizen reporting –Cell-phone cameras broadcasting news-worthy events –Think YouTube, but live –No tripods, no expert camera work
3
3 Why stream first-person video? Remote vision at dangerous job sites –Disaster Response –Hazmat –SWAT Live streams for remote loved ones –My-day live diaries Citizen reporting –Cell-phone cameras broadcasting news-worthy events –Think YouTube, but live –No tripods, no expert camera work
4
4 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on 802.11 drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities
5
5 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on 802.11 drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities
6
6 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on 802.11 drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities
7
7 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on 802.11 drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities –Must either reduce frame rate or image quality –Low frame-rate video is disorienting. How do the frames relate to one-another?
8
8 Challenges of first-person video Limited bandwidth “in the wild” –Cellular networks (60-80 Kbps) –Multiple cameras on 802.11 drops total throughput First-person video compression is difficult –Low inter-frame overlap reduces compression opportunities –Must either reduce frame rate or image quality –Low frame-rate video is disorienting. How do the frames relate to one-another Aesthetic challenges –Blair Witch-type nausea –Constant motion difficult to track –Camera operator’s interests may not intersect viewer’s interests
9
9 RealityFlythrough (RFT): A novel solution What we do Reduce frame-rate Approximately reconstruct camera motion using sensors and image processing Benefits High-quality frames Disorientation minimized Long dwell-time on each frame Aesthetically appealing –Calm –Mesmerizing
10
10 RealityFlythrough (RFT): A novel solution What we do Reduce frame-rate Approximately reconstruct camera motion using sensors and image processing Benefits High-quality frames Disorientation minimized Long dwell-time on each frame Aesthetically appealing –Calm –Mesmerizing
11
11 Roadmap Introduction Video compression challenges How RealityFlythrough works Experimental results Conclusion
12
12 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation
13
13 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation
14
14 Video compression challenges revisited High-panning video has little redundancy between frames –Most codecs do little better than MJPEG –e.g. sizes of different encodings of 1 st clip mpg4: 364 KB mjpeg: 359 KB Of course, with redundancy, mpg4 improves –For 2 nd clip mpg4: 284 KB mjpeg: 386 KB Decimating frame-rate to preserve image quality further reduces temporal redundancy, forcing further decimation in the frame rate –Causes confusion and disorientation
15
15 RFT System Architecture 802.11 H323 Video Conferencin g Stream RFT MCU (Multipoint Control Unit) RFT Engine Cameras ImageCaptureSensorCapture StreamCombine (352x288 video resolution) RFT Server How RFT Works 1xEVDO Cellular (~60 Kbps)
16
16 Simplifying 3d space We know the orientation of each frame We project the camera’s image onto a virtual wall at that same orientation When the user’s orientation is the same as the camera’s, the entire screen is filled with the image Results in a 2d simplification of 3d space How RFT Works
17
17 The transition A transition between frames is achieved by moving the user’s orientation from the point of view of the source frame to the point of view of the destination frame The virtual walls are shown in perspective Overlapping portions of images are alpha-blended How RFT Works
18
18 Images are projected inside a sphere How RFT Works
19
19 Images are projected inside a sphere How RFT Works
20
20 Point matching improves experience If frames overlap, point matching allows for more accurate placement –Use SIFT method [Lowe, 2004]; autopano implementation –Client device computes match and transmits meta-data w/ frame 2d morphing between frames improves blend Works w/ inter-frame and inter- camera How RFT Works
21
21 Point matching meets sensors New point- matched frames join the panorama The panorama consists of the 5 most recent frames (older ones discarded) A new panorama is started when a non-point- matched frame arrives. Sensor data positions the frame. How RFT Works
22
22 Field study Experimental setup Hazmat bulking process –Wore full hazmat suits –Labor-intensive –Accurate motion model for head- mounted camera.5 fps video transmitted over 1xEVDO Hazmat supervisor used video to explain the bulking process Results Ran for 64 minutes Much more camera motion than expected Supervisor preferred transitions over other encoding techniques –Not because of frame quality –Traditional first-person video was too busy (“It interferes with my thinking. Literally, it’s messing with my head”) –1 fps “video” w/o transitions seen as useless Experimental results
23
23 Field study Experimental setup Hazmat bulking process –Wore full hazmat suits –Labor-intensive –Accurate motion model for head- mounted camera.5 fps video transmitted over 1xEVDO Hazmat supervisor used video to explain the bulking process Results Ran for 64 minutes Much more camera motion than expected Supervisor preferred transitions over other encoding techniques –Not because of frame quality –Traditional first-person video was too busy (“It interferes with my thinking. Literally, it’s messing with my head”) –1 fps “video” w/o transitions seen as useless Experimental results
24
24 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps Experimental results
25
25 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) Experimental results
26
26 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) –encChoppy: Regular video encoded at 5 fps Experimental results Same bitrate
27
27 Lab study Determine if people may actually prefer transitions to traditional first-person video Experimental setup Three first-person videos encoded in 4 different ways –encFast: RFT Transitions sampled at 1 fps –encSlow: RFT Transitions sampled at.67 fps –encIdeal: Regular video encoded at 11 fps (∞ bitrate) –encChoppy: Regular video encoded at 5 fps Subjects did side-by-side comparisons and ranked encodings in order of preference Subjects answered questions to help them arrive at a task- independent ranking Experimental results
28
28 Taking out the trash Experimental results encChoppy encFast encIdeal
29
29 Taking out the trash Experimental results encChoppy encFast encIdeal
30
30 Taking out the trash Experimental results encChoppy encFast encIdeal
31
31 Results and analysis 12/14 subjects preferred one of our encodings to encChoppy 4/14 subjects preferred our encodings to encIdeal w/ 4 more on fence! Our encodings grew on people (4 people ranked our encodings higher at end of experiment than at beginning) Experimental results Positives: calm, smooth, slow-motion, sharp, artistic, soft, not-so- dizzy Negatives: herkey-jerkey, artificial, makes me feel detached, insecure Our encodings gave subjects time to catch up with what the camera operator was seeing. First-person video tends to dart around too much.
32
32 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality
33
33 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality
34
34 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame-rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta- data Low overlap Low frame-rate Low quality
35
35 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame- rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta-data http://www.realityflythrough.comnemccurd@cs.ucsd.edu
36
36 Conclusion First-person video is difficult to compress To stream it, we must sacrifice image quality or frame-rate Very low frame-rate video (< 5 fps) is disorienting Video streamed at a low bitrate (e.g. 60 Kbps) loses both frame- rate and image quality and can be painful to watch Our solution –Transmit high-quality low frame-rate (~1 fps) video along with tilt sensor meta-data –“Reconstruct” intervening frames by inferring camera motion from meta-data http://www.realityflythrough.comnemccurd@cs.ucsd.edu
37
37 Other slides
38
38 Lab study results
39
39 Why digital instead of analog? RealityFlythrough piggy-backs on wireless mesh network that is deployed by first-responders on-site Varying conditions of the network can be better managed in digital domain. Frame-rates can be throttled and image quality can be degraded. –Also can guarantee eventual delivery of high-quality data Support multiple cameras using same bandwidth managing techniques
40
40 Related Work Panoramic Viewfinder –Baudisch, et al. Recognizing Panoramas –Brown, Lowe View Morphing –Seitz and Dyer Efficient Representations of Video Sequences and their Applications –Irani, et al. Predictive perceptual compression for real time video communication –Komogortsev, Khan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.