Download presentation
Presentation is loading. Please wait.
Published byCornelia Hamilton Modified over 9 years ago
1
MULTIMEDIA TECHNOLOGIES FOR ENHANCED TELE-COLLABORATION zhang@microsoft.com Zhengyou Zhang Principal Researcher Communications and Collaboration Systems R
2
Motivation Mission Research and develop new technologies to improve users’ experiences in collaboration across distances Ultimate goal Provide users with immersive experiences in remote collaboration similar to face-to-face meetings Focus Audio; Video; Data
3
Outline 1. Improve audio and video capture 2. Improve captured audio and video 3. Enable novel scenarios
4
Distributed Meetings RoundTable Novel AV capture devices
5
Motivation Traveling to meetings is time consuming, expensive, and stressful Back-to-back meetings are difficult to go between If you miss a meeting, there is no good way to capture and view it inexpensively
6
Main Scenario for Distributed Meetings Fred sets up meeting with Outlook At beginning of meeting, Fred starts DM Barney views it remotely with DM client and uses telephone/VoIP for voice communication Betty views the meeting later, using WB and speaker indexing and time compression
7
DM Room Diagram
8
DM Capture Devices Whiteboard images Overview video 360 video and audio
9
Prototype: RingCam 360º video using 5 $60 640x480 1394 cameras 3000x480 panorama, used in 1500x240 mode 8 element microphone array, low to table to minimize reflections Camera array elevated from table to give good viewpoint Camera and MicArray connected by thin rod Privacy mode with status light
10
DM Capture Devices: RingCam 360 video audio Sound source localization Beamforming
11
Microsoft RoundTable Product Shipped !
12
Video
13
Active Speaker Detection (ASD) Why? Bandwidth requirement for the full-resolution panoramic video is too high for the current Internet infrastructure Even if possible, display is usually not wide enough Solution: Automatically detect the person who begins to speak Send a close-up of the current speaker to remote side Panoramic video is sent in lower resolution
14
ASD Challenges People do not always look at the camera Many people in a meeting, confusing the detector Different rooms have different colors; skin color based technique not reliable Head size could be very small (10x10), so face detector does not work ASD module must be very efficient to implement on a DSP chip
15
Our ASD Approach Multimodal: Audio & Visual Audio: output from SSL (Sound Source Localization) Visual: Head & upper body appearance; motion Boosting Learn the difference between speakers & non-speakers Explore implicitly correlation between audio & video Cascade pruning mechanism to select features that reject non-speaker early Merely 20 SSL & image features are selected 47% in error rate reduction compared w/ SSL-only
16
ASD Examples
17
Recorded Meeting User Study 10 meetings recorded using MSR groups At least one group member was absent from the meeting Total of 11 meetings viewed by absent members in a usability lab In-meeting and offline participants were interviewed afterwards
18
User Study Results: In-meeting Participants Question N = 10 groups AvgStd dev I was comfortable having this meeting recorded. 3.90.7 The system got in the way of us having a productive meeting. 1.70.4 I felt like I acted differently because the meeting was being recorded. 3.11.1 It was awkward having the camera sitting in the center of the table. 3.00.8 It was awkward having the camera in the upper corner of the room. 1.80.5 1: Strongly Disagree 5: Strongly Agree
19
User Study Results: Offline Participants Question N = 11 AvgStd dev It was important for me to view this meeting. 3.70.5 I was able to get the information I needed from the recorded session. 4.60.5 I would use this system again if I had to miss a meeting. 4.40.8 I would recommend the use of this system to my peers. 4.00.9 Being able to browse the meeting using the whiteboard was useful 3.21.2 Being able to browse the meeting using the timeline was useful 4.00.9 Being able to speed up the meeting using time compression was useful 4.11.3 Being able to see the panoramic (360 º ) view of the meeting room was useful 4.40.9 Being able to see the current speaker in the top-left corner was useful 4.11.2
20
Head-Size Equalization Virtual Lighting Improving captured audio & video
21
♥ Head-Size Equalization ♥ Why? Strong foreshortening People sitting at the far end of the table appear very small relative to those near the camera Solution Warp image to equalize head size w/ minimal distortion
22
Spatially Varying Uniform Scaling Warp green curve to red curve Problem: Faces at the far end may be very blurred !
23
Half-Ring Camera Array Five cameras, each with a different lens FOVs: 60˚ 45˚ 25˚ 45˚ 60˚ Total FOV: 180˚ Central camera w/ 25˚ FOV provides enough resolution for the far end of the table
24
Results
25
♥ Virtual Lighting: Motivation ♥ Improve perceptual video quality Two factors that affect perceptual image quality Camera device Lighting condition Observation Camera devices are getting better Lighting conditions will stay the same
26
Learning-based color tone mapping Improve “virtual” lighting condition Learn from images taken by professional photographers (in good lighting conditions) Algorithm: Data collection: 400 celebrity images Training: Mixture of Gaussians to model the color statistics of the face region
27
Learning-based color tone mapping Algorithm (continued): Color tone mapping: Given an input image, detects its face region Creating a color tone mapping function so that the face region color statistics is similar to the color statistics of the training images Applying the color tone mapping function to all the pixels in the input image
28
Application to video sequence Image intensities change over time: Automatic gain control Lighting change Global intensity change detector: Updating color tone mapping function whenever a global intensity change is detected
29
Experiment results Video
30
User study 16 video sequences taken by 8 different webcams User is asked to view the side-by-side view for each sequence, and rate each one with a MOS score: 1: very bad quality 2: bad quality 3: acceptable 4: good quality 5 very good quality 18 users responded Avg. improvement 2.55 3.30 T-test score: 0.001%
31
Live Whiteboard Whiteboard Archiving & Browsing Collaborative Projector-WB-Camera Systems Enabling Novel Scenarios
32
Motivations Seamlessly connect the physical world with the digital world Seamlessly connect two physical worlds Make them a shared collaborative space
33
♥ Live Whiteboard ♥ Whiteboard provides a large shared space for the participants to focus their attention and express their ideas spontaneously Many meetings use whiteboard heavily brainstorming sessions, lectures, project planning meetings, patent disclosures, etc Difficulties Content is hard to archive or share with remote participants Busy in note taking, instead of spending time sharing and absorbing ideas Solution: let a camera watch the whiteboard
34
A typical image sequence Segmentation of the person and WB background is needed
35
Image analyzer overview Heuristics WB content is stationary Pixels of WB background are typically the majority Strategies for speed Image analysis at cell level (16x16 pixel blocks) Cells pass through 6 modules, not processed further if not WB Compute in Bayer format 1 channel instead of 3
36
Live Whiteboard Integration with Messenger Whiteboard update Annotations Windows Messenger T120 Local WB client Remote WB client USB 2 Demo Video Demo Video Real-Time WB Processing Video Whiteboard changes
37
♥ Whiteboard Archiving ♥ Capture whiteboard content + audio/Video high-resolution digital still camera Produce key frames A KF usually corresponds to a major topic Print as notes, or cut & paste into documents Record time stamps of each stroke Efficient meeting browsing Key frames to navigate between sections Strokes to bring up the audio at the moment when they were written
38
Key Frame Extraction Number of strokes Chapter 1Chapter 2 Key Frame 1 Key Frame 2
39
Browsing Interface: DemoDemo Key Frame Thumbnails Future Strokes Current Strokes Raw Image VCR & Timeline Control
40
♥ Projector-Whiteboard-Camera ♥ Projector A great tool for presentation Not so convenient for discussions Whiteboard-Whiteboard-Camera System Whiteboard: Projecting surface (Output) and Writing surface (Input) Seamless integration of computer presentation with whiteboard discussions Enable remote collaboration on a shared workspace
41
Full-Duplex Collaborative System
42
Problem: Visual Echo Visual info captured in the 1 st room is projected in the second room That projected visual info is re-captured in the 2 nd room And is sent back and projected to the 1 st room Visual echo
43
Visual Echo Cancellation Suppress projected content from captured image Send only the visual info of the physical world Communi- cation Network Video Mixer Presentation Video Mixer P Visual Echo Cancellation W r i t i n g s A A Annotation A PP P W W Local Meeting Room Remote Room Whiteboard Camera Projector Display Remote Room P+A P+A+W P+W
44
albedo estimation & color clustering VEC Components Prerequisites Geometric Calibration Photometric calibration recovered writings W Color Lookup Table (given by color calibration) Color Mapping& Geometric Warping estimated visual echo E captured image I Homography H (given by geometric calibration)
45
VEC Results Demo
46
Acknowledgments Ross Cutler, Li-wei He, Miao Liao, Zicheng Liu, Ruigang Yang, Cha Zhang et al.
47
Thank you ! Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.