Presentation is loading. Please wait.

Presentation is loading. Please wait.

MULTIMEDIA TECHNOLOGIES FOR ENHANCED TELE-COLLABORATION Zhengyou Zhang Principal Researcher Communications and Collaboration Systems.

Similar presentations


Presentation on theme: "MULTIMEDIA TECHNOLOGIES FOR ENHANCED TELE-COLLABORATION Zhengyou Zhang Principal Researcher Communications and Collaboration Systems."— Presentation transcript:

1 MULTIMEDIA TECHNOLOGIES FOR ENHANCED TELE-COLLABORATION zhang@microsoft.com Zhengyou Zhang Principal Researcher Communications and Collaboration Systems R

2 Motivation  Mission Research and develop new technologies to improve users’ experiences in collaboration across distances  Ultimate goal Provide users with immersive experiences in remote collaboration similar to face-to-face meetings  Focus Audio; Video; Data

3 Outline 1. Improve audio and video capture 2. Improve captured audio and video 3. Enable novel scenarios

4 Distributed Meetings RoundTable Novel AV capture devices

5 Motivation  Traveling to meetings is time consuming, expensive, and stressful  Back-to-back meetings are difficult to go between  If you miss a meeting, there is no good way to capture and view it inexpensively

6 Main Scenario for Distributed Meetings  Fred sets up meeting with Outlook  At beginning of meeting, Fred starts DM  Barney views it remotely with DM client and uses telephone/VoIP for voice communication  Betty views the meeting later, using WB and speaker indexing and time compression

7 DM Room Diagram

8 DM Capture Devices Whiteboard images Overview video 360  video and audio

9 Prototype: RingCam  360º video using 5 $60 640x480 1394 cameras  3000x480 panorama, used in 1500x240 mode  8 element microphone array, low to table to minimize reflections  Camera array elevated from table to give good viewpoint  Camera and MicArray connected by thin rod  Privacy mode with status light

10 DM Capture Devices: RingCam 360  video audio Sound source localization Beamforming

11 Microsoft RoundTable Product Shipped !

12 Video

13 Active Speaker Detection (ASD)  Why?  Bandwidth requirement for the full-resolution panoramic video is too high for the current Internet infrastructure  Even if possible, display is usually not wide enough  Solution:  Automatically detect the person who begins to speak  Send a close-up of the current speaker to remote side  Panoramic video is sent in lower resolution

14 ASD Challenges  People do not always look at the camera  Many people in a meeting, confusing the detector  Different rooms have different colors; skin color based technique not reliable  Head size could be very small (10x10), so face detector does not work  ASD module must be very efficient to implement on a DSP chip

15 Our ASD Approach  Multimodal: Audio & Visual  Audio: output from SSL (Sound Source Localization)  Visual: Head & upper body appearance; motion  Boosting  Learn the difference between speakers & non-speakers  Explore implicitly correlation between audio & video  Cascade pruning mechanism to select features that reject non-speaker early  Merely 20 SSL & image features are selected  47% in error rate reduction compared w/ SSL-only

16 ASD Examples

17 Recorded Meeting User Study  10 meetings recorded using MSR groups  At least one group member was absent from the meeting  Total of 11 meetings viewed by absent members in a usability lab  In-meeting and offline participants were interviewed afterwards

18 User Study Results: In-meeting Participants Question N = 10 groups AvgStd dev I was comfortable having this meeting recorded. 3.90.7 The system got in the way of us having a productive meeting. 1.70.4 I felt like I acted differently because the meeting was being recorded. 3.11.1 It was awkward having the camera sitting in the center of the table. 3.00.8 It was awkward having the camera in the upper corner of the room. 1.80.5 1: Strongly Disagree 5: Strongly Agree

19 User Study Results: Offline Participants Question N = 11 AvgStd dev It was important for me to view this meeting. 3.70.5 I was able to get the information I needed from the recorded session. 4.60.5 I would use this system again if I had to miss a meeting. 4.40.8 I would recommend the use of this system to my peers. 4.00.9 Being able to browse the meeting using the whiteboard was useful 3.21.2 Being able to browse the meeting using the timeline was useful 4.00.9 Being able to speed up the meeting using time compression was useful 4.11.3 Being able to see the panoramic (360 º ) view of the meeting room was useful 4.40.9 Being able to see the current speaker in the top-left corner was useful 4.11.2

20 Head-Size Equalization Virtual Lighting Improving captured audio & video

21 ♥ Head-Size Equalization ♥  Why?  Strong foreshortening  People sitting at the far end of the table appear very small relative to those near the camera  Solution  Warp image to equalize head size w/ minimal distortion

22 Spatially Varying Uniform Scaling  Warp green curve to red curve Problem: Faces at the far end may be very blurred !

23 Half-Ring Camera Array  Five cameras, each with a different lens  FOVs: 60˚ 45˚ 25˚ 45˚ 60˚ Total FOV: 180˚  Central camera w/ 25˚ FOV provides enough resolution for the far end of the table

24 Results

25 ♥ Virtual Lighting: Motivation ♥  Improve perceptual video quality  Two factors that affect perceptual image quality  Camera device  Lighting condition  Observation  Camera devices are getting better  Lighting conditions will stay the same

26 Learning-based color tone mapping  Improve “virtual” lighting condition  Learn from images taken by professional photographers (in good lighting conditions)  Algorithm:  Data collection: 400 celebrity images  Training: Mixture of Gaussians to model the color statistics of the face region

27 Learning-based color tone mapping  Algorithm (continued):  Color tone mapping: Given an input image, detects its face region Creating a color tone mapping function so that the face region color statistics is similar to the color statistics of the training images Applying the color tone mapping function to all the pixels in the input image

28 Application to video sequence  Image intensities change over time:  Automatic gain control  Lighting change  Global intensity change detector:  Updating color tone mapping function whenever a global intensity change is detected

29 Experiment results Video

30 User study  16 video sequences taken by 8 different webcams  User is asked to view the side-by-side view for each sequence, and rate each one with a MOS score:  1: very bad quality  2: bad quality  3: acceptable  4: good quality  5 very good quality  18 users responded  Avg. improvement 2.55 3.30  T-test score: 0.001%

31 Live Whiteboard Whiteboard Archiving & Browsing Collaborative Projector-WB-Camera Systems Enabling Novel Scenarios

32 Motivations  Seamlessly connect the physical world with the digital world  Seamlessly connect two physical worlds  Make them a shared collaborative space

33 ♥ Live Whiteboard ♥  Whiteboard provides a large shared space for the participants to focus their attention and express their ideas spontaneously  Many meetings use whiteboard heavily brainstorming sessions, lectures, project planning meetings, patent disclosures, etc  Difficulties  Content is hard to archive or share with remote participants  Busy in note taking, instead of spending time sharing and absorbing ideas  Solution: let a camera watch the whiteboard

34 A typical image sequence Segmentation of the person and WB background is needed

35 Image analyzer overview  Heuristics  WB content is stationary  Pixels of WB background are typically the majority  Strategies for speed  Image analysis at cell level (16x16 pixel blocks)  Cells pass through 6 modules, not processed further if not WB  Compute in Bayer format 1 channel instead of 3

36 Live Whiteboard  Integration with Messenger Whiteboard update Annotations Windows Messenger T120 Local WB client Remote WB client USB 2 Demo Video Demo Video Real-Time WB Processing Video Whiteboard changes

37 ♥ Whiteboard Archiving ♥  Capture whiteboard content + audio/Video  high-resolution digital still camera  Produce key frames  A KF usually corresponds to a major topic  Print as notes, or cut & paste into documents  Record time stamps of each stroke  Efficient meeting browsing  Key frames to navigate between sections  Strokes to bring up the audio at the moment when they were written

38 Key Frame Extraction  Number of strokes Chapter 1Chapter 2 Key Frame 1 Key Frame 2

39 Browsing Interface: DemoDemo Key Frame Thumbnails Future Strokes Current Strokes Raw Image VCR & Timeline Control

40 ♥ Projector-Whiteboard-Camera ♥  Projector  A great tool for presentation  Not so convenient for discussions  Whiteboard-Whiteboard-Camera System  Whiteboard: Projecting surface (Output) and Writing surface (Input)  Seamless integration of computer presentation with whiteboard discussions  Enable remote collaboration on a shared workspace

41 Full-Duplex Collaborative System

42 Problem: Visual Echo  Visual info captured in the 1 st room is projected in the second room  That projected visual info is re-captured in the 2 nd room  And is sent back and projected to the 1 st room  Visual echo

43 Visual Echo Cancellation  Suppress projected content from captured image  Send only the visual info of the physical world Communi- cation Network Video Mixer Presentation Video Mixer P Visual Echo Cancellation W r i t i n g s A A Annotation A PP P W W Local Meeting Room Remote Room Whiteboard Camera Projector Display Remote Room P+A P+A+W P+W

44 albedo estimation & color clustering VEC Components  Prerequisites  Geometric Calibration  Photometric calibration recovered writings W Color Lookup Table (given by color calibration) Color Mapping& Geometric Warping estimated visual echo E captured image I Homography H (given by geometric calibration)

45 VEC Results Demo

46 Acknowledgments Ross Cutler, Li-wei He, Miao Liao, Zicheng Liu, Ruigang Yang, Cha Zhang et al.

47 Thank you ! Q & A


Download ppt "MULTIMEDIA TECHNOLOGIES FOR ENHANCED TELE-COLLABORATION Zhengyou Zhang Principal Researcher Communications and Collaboration Systems."

Similar presentations


Ads by Google