T ELEPRESENCE T UTORI A L July 30, 2012
Introduction to Telepresence 1 Introduction to the IETF CLUE work 2 Telepresence scenarios 3 CLUE FrameworkCLUE Framework 4 Realization of the CLUE framework 5 Overview
I NTRODUCTION TO T ELEPRESENCE (S TEPHEN B OTZKO )
What is Telepresence: Co-location At its core, Telepresence uses technology and “stagecraft” to create a sense of co-location (meeting participants feel they are in the same space). Key Aspects: Gaze Awareness, Eye Contact, Actual Size Rendering Telepresence Dinner
History “Toward the Telehandshake” 1983 Media for Interactive Communications; Bretz and Schmidbauer Commercial systems began in the 90s – TeleSuite founded in 1993 – Cisco, HP, Polycom, etc. by 2010.
Some Product Examples
Telepresence: Definition Telepresence: An interactive audio-visual communications experience between remote locations, where the users enjoy a strong sense of realism and presence between all participants by optimizing a variety of attributes such as audio and video quality, eye contact, body language, spatial audio, coordinated environments and natural image size.
How is it done? Lay out physical space / Identify sight lines
How is it done? Partition the space
How is it done? Place cameras and displays
Essential Co-location Requirements Preserve spatial relationships between streams Maintain coherence of audio and video “stage” Ability to scale images to true size Ability to select best sight line Many of these facilities can also be used to enhance other non-telepresence applications.
IETF CLUE Working Group (Mary Barnes)
What is CLUE? CLUE = ControLling mUltiple streams for tElepresence Motivation: – Currently deployed telepresence systems are not interoperable
What is CLUE? Objectives: – Describe the data required for the handling of multiple streams – Define the behavior required to negotiate the use of multiple streams of audio and video media flows
Scope of CLUE RTP and SIP based systems Define signaling for transporting CLUE information Apply existing protocols for signaling and transport Extensions to existing protocols in appropriate WGs (e.g., AVTCORE and MMUSIC)
Data Model Call Flows Requirements Working Towards a Solution Use Cases RTP Usage Signaling
CLUE T ELEPRESENCE S CENARIOS (R ONI E VEN )
Overview Telepresence systems (TP) –Primary objective is for an immersive experience as close to “being there” as possible Life-size video display Eye contact Gaze direction Spatial audio
Central Cameras semi circular seating 19
Cameras located with screens Semi circular or Linear classroom seating 20
Telepresence architecture TP systems will typically have multiple cameras and microphones –Typical system will have the same number of monitors and cameras (1 and 3 are common but some systems will have 2 and 4)
22
23
24
25
Additional Use Cases Dynamically add video sources from an endpoint based on meeting context – E.g. turn on a document camera or provide video stream of presentation Different number of cameras and screens. Example 3 cameras with six screens or with one big screen.
CLUE Framework (Allyn Romanow, Andy Pepperell)
Power of the Framework Interoperable Different vendors Types of devices Extensible New functionality Future
Power of the Framework Interoperable Different vendors Types of devices Extensible New functionality Future Receiver driven Chooses what to receive and encoding Media captures Description used by renderer Advertised by provider Chosen by consumer
30 What is the Framework? Vendor One Vendor Two
Provider and Consumer I am a provider. I advertise I am also a consumer. I choose I am a provider. I advertise I am also a consumer. I choose I am a provider. I advertise I am also a consumer. I choose I am a provider. I advertise I am also a consumer. I choose MCU
Meow.. Send me 2 streams of 360 at 1080p, and 1 audio.. I can send you 1 image of both of us, or 2 images each of 1 of us. I can send them at 1080p, 720p or mono audio at 64k I can send you one image, 2 images, or 3 images; 1 or 2 mono audio streams. I can send streams at 1080p, 720p, and 360 as long as total not over 4896 Mbps. All at 4Gps not exceeding 6Gbps. Audio at 64 kbits each. Basic Idea Woof woof. I’ll take the single stream at 720p, single mono audio.
Media Captures Fundamental CLUE concept Media capture is a media representation of some portion of the provided scene Eg #1: video from the left camera of 3 (maybe show in diagram) Eg #2: a stereo audio capture of a room’s audio
Capture Attributes – Each capture described via its attributes – High-level categorization, audio vs video – Spatial information (“3 – D cartesian co- ordinates”) to enable correct rendering – Switched capture – Mechanism for extensibility
Capture Scene Each alternative representation of a Scene is a capture entry in a Capture Scene Three cameras Two cameras, moved and zoomed out Switched (based on voice), composed PiP (VC0, VC1, VC2) (VC3, VC4) (VC5) (AC0) (VC0, VC1, VC2) (VC3, VC4) (VC5) (AC0) Capture Scene Entries VC0VC2VC1 VC3VC4 VC5 Main Media
Basic CLUE Messaging Provider Consumer Provider capture advertisement Consumer stream choice media streams Provider capture advertisement Consumer stream choice Potentially multiple further exchanges media streams ProviderConsumer Provider capture advertisement Consumer stream choice
Provider Capture Advertisement Provider tells consumer about its media captures – Enumeration of available media captures Includes organisation of captures into scenes – Physical constraints Center camera may also be used for “zoomed out” view – Encoding constraints Provider expresses its overall encoding capabilities Allows modelling of multiple constituent physical units
Consumer Choice Consumer tells provider which captures it wishes to receive – Encoding parameters such as max resolution, mbps etc. – Instantiates provider media captures to “real” streams Captures can have multiple instantiations; not a simple one to one mapping between captures and encodings – Media model no longer simply “transmitter chooses”
Receiver Choosing is Powerful Consumer do its own layout Knows its display hardware Number of streams, bw, resolution Receiver multiple representations of same scene – Recording – MCU switch different versions out Expanded functionality, flexibility
F RAMEWORK R EALIZATION (R OB H ANSEN )
Example Endpoint - Alice
AliceBob SIP: INVITE SIP: ACK SIP: 200 OK (optional) Single-stream RTP + RTCP CLUE: Advertisement CLUE: Configure Multi-stream RTP + RTCP Example Call-Flow
Example SIP INVITE Acceptable to non-CLUE endpoints As always, SDP defines limits of RTP sessions INVITE contains CLUE transport details Alice’s SDP has 1 audio m-line, 1 video m-line: v=0 o=alice IN IP4 client.atlanta.example.com s=- c=IN IP t=0 0 b=AS:6064 m=audio RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video RTP/AVP 96 b=AS:6000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42e016;max-mbps=244800;max-fs=8160
Example CLUE Advertisement Capture Scene –Captures –Entries Simultaneous Transmission Sets Encoding Group –Encodings
Example Captures Capture 4 Switched video No spatial parameters Capture 3 Static video Spatial parameters Capture 2 Static video Spatial parameters Capture 1 Static video Spatial parameters Capture 5 Mixed audio No spatial parameters Video Audio
Capture Spatial Parameters Capture 3 Static video Region C Capture 2 Static video Region B Capture 1 Static video Region A Point of Capture Axis of Capture Area of Capture
Entries of the same media type define alternative views of the scene. Alice advertises three entries: Example Entries Entry 1: Video captures 1, 2 & 3 (three static cameras) Entry 2: Video capture 4 (switched video stream) Entry 3: Audio capture 5 (mixed audio stream)
Encoding group limit: Max bandwidth 6Mb Encoding Group & Encodings Max 4Mb VideoAudio Max 4Mb Max 64kb Encodings define the maximum encoding parameters available for streams. Alice advertises the ability to encode up to three streams at 1080p, 4Mb, but with an overall limit of 6Mb:
CLUE Configure Bob selects the three static camera streams at 720p, and the mixed audio stream: Static capture 1 Max 2Mb Static capture 2 Max 2Mb Static capture 3 Max 2Mb Mixed capture 5 VideoAudio
Multi-stream media AliceBob Audio RTP session Video RTP Session audio port video port Alice sends 1 audio stream Alice sends 3 multiplexed video streams
AliceBob SIP: INVITE SIP: ACK SIP: 200 OK Single-stream media (optional) CLUE: Advertisement CLUE: Configure Multi-stream media Bob changes his request CLUE: Configure Different multi-stream media
2 nd CLUE Configure Advertise/configure is not offer/answer – messages are sent independently Bob now requests the single, switched video stream at 1080p: VideoAudio Switched capture 4 Max 4Mb Mixed capture 5
Summary 1) CLUE is about more than telepresence – developing building blocks for other multi- stream applications 2) CLUE uses SIP and SDP signaling for session setup. 3) CLUE defines additional non-O/A signaling to communicate CLUE specific information.
References CLUE Requirements: draft-ietf-clue-telepresence-requirements draft-ietf-clue-telepresence-requirements CLUE Use Case:sdraft-ietf-clue-telepresence- use-casesdraft-ietf-clue-telepresence- use-cases CLUE Framework: draft-ietf-clue-frameworkdraft-ietf-clue-framework RTP Usage: – draft-lennox-clue-rtp-usage draft-lennox-clue-rtp-usage – draft-even-clue-rtp-mapping draft-even-clue-rtp-mapping Call Flows: draft-romanow-clue-call-flowdraft-romanow-clue-call-flow Data model: draft-romanow-clue-data-modeldraft-romanow-clue-data-model
Contributors to the tutorial (alphabetical order) Mary Barnes Espen Berger Stephen Botzko Mark Duckworth Roni Even Rob Hansen Paul Kyzivat Jonathan Lennox Andy Pepperell Allyn Romanow
Q UESTIONS ?
B ACKUP