ViSiCAST 2001 Technical Audit 8 October 2001, Brussels Michele Wakefield - Project Manager, ITC
The ViSiCAST Project Virtual Signing Capture Animation Storage and Transmission
Aims of ViSiCAST Project “…support improved access by deaf citizens to information and services in sign language” user friendly methods to capture & generate signs machine readable system to describe gestures ... preferred medium is sign language
Tessa at the Post Office Using speech recognizer Convert counter-clerk’s voice input to text Generate sign stream from text BSL – limited repertoire
Independent Television Commission Televirtual University of East Anglia The Post Office Royal Institute for Deaf People Instituut voor Doven Hamburg University Institut für Rundfunktechnik Institut National des Télécommunications ViSiCAST Consortium
Project Dimensions Duration Start: January 2000 Finish: December 2002 36 months Total Costs 3770kECU total 2876kECU funding from EC
ViSiCAST Project Highlights Prototype enabling text translation and direct synthesis of sign language gestures Quality assessment support to other EU project New TESSA system trial at Science Museum, London Achieved BCS IT Award and Gold Medal Innovative transmission assessment for broadcast TV BBC seek to deliver a closed signing service for broadcast DTV WWW Weather-forecaster with Virtual Signer available in 3 Sign Languages
WWW High Street Broadcast Evaluation Exploitation AnimationLinguistics ViSiCAST Project Structure Technology U ser Application Exploitation &Dissemination
Presentations by Core Streams o Technology: Animation & Linguistics WP4 AnimationMark,Televirtual (10) WP5 LinguisticsThomas, UH (10 ) User: Applications WP1 BroadcastWerner,IRT; Francoise, INT (10) WP2 WWWCorrie/Margriet, IvD (10) WP3 Face to FaceStephen, UEA(10) Exploitation & Dissemination WP7,8 Michele (10)
Technology Focus Objectives WP 4 Animation Increased realism in sign generation Enhanced signing experience WP5 Sign Language Linguistics Use of natural sign language Synthesis of sign language gestures
Animation: Initial Work Developed TESSA & VISIA Avatars Developed Capture / Animation system Integrated into early demos of WPs 1-2-3
Animation Work: Objectives WP4: Develop Hi-Resolution Avatars + related capture, animation and transmission formats inc. compression To enable and support application development in WPs using WP4 (& WP5) Product. To further develop, compare and integrate both proprietary and standard solutions, where appropriate
Animation: Current Work Through Year Two Continued to support Application development Continuous upgrade to VISIA / TESSA player (Open GL renderer under Active X control) Bug fixing / Motion capture support .baf format and compression layer with WP1 to create Broadcast Demonstrator using Vsicast system MPEG compatability / parallel development in WP4 and applications
Animation: Continuing & future Work Working on ways to improve facial animation / realism (forehead / eyes) Exploring Statistical Methods to define and generate facial Animation Working on ways to facilitate Avatar creation (Photographic acquisition) Mask 2 + Improved Mo Cap
MPEG-4 SNHC for interoperable animation MPEG-4 SNHC player and server delivered in June 2001 server delivered in June 2001 MPEG-4 compliant Animation Achievements à 5 to 25 kbit/s à 7 to 14 bit/vertex Making use of a MPEG-4 compliant Visia model Compliance with VRML standard (H-Anim specifications) specifications) Incorporating a full compression layer 3D mesh & texture encoding Motion parameters (BAP/FAP) encoding Implementing importation and editing tools Open delivery interface: MPEG-2, IP, ATM...
Advanced interoperable distributed animation system Improved facial animation MPEG-4 System layer implementation Multimedia (audio, video, text…) synchronisation Error resilience Management of scene description MPEG-compliant SiGML-driven animation Open input/output interface MPEG-4 compliant Animation Perspectives
Presentation by Streams - Linguistics WP 4 Animation Increased realism in sign generation Enhanced signing experience WP5 Sign Language Linguistics Use of natural sign language Synthesis of sign language gestures
WP 5: Language Technology Goal within the project: To provide semi-automatic translation from English into BSL, DGS, NGT Can also be used to assist the user in monolingual language input No writing system for sign languages established
The last year: 3 deliverables D5-1: Defining the interfaces D5-2: Transfer to XML: SiGML definition D5-3 Prototype translation system: English to notation
D5-1: Defining the interfaces Adaptation of Discourse Representation Structure Extension of HamNoSys, a phonetic transcription system for sign language Notation conventions for all non-manual aspects relevant for (European) sign languages Body movement Head movement Facial expressions Mouthing and Mouth gestures Eye movement Synchronicity with manual elements
D5-2: SiGML Defines XML domain based on D5-1 manual and non-manual notation Simple timing model Probably to be revised to ease integration with upcoming synchronisation models as required for broadcasting etc. SMIL, XMT (MPEG4) etc.
D5-3: Proto text-to-sign notation English to semantics (DRS) CMU Parser DRS construction Semantics to sign language notation DRS to HPSG semantics (ALE/MRS) HPSG generation (ALE/LinGo) HPSG PHON (HamNoSys) to SiGML
HPSG modelling of sign languages Aiming at proper sign language, not anything like SEE No detailed grammars published, no usable dictionaries Most importantly: Data-driven Lexicon and every aspect of our grammar fragment
Example: Verifying details
Demo: D5-3 plus D4-2 Due month 26 (Feb 02), i.e. work in progress Complete route from English to sign language animation
Convert avatar-independent SiGML to avatar-specific description: Define all SiGML locations (shoulder, eyes, fingertip, etc.) in terms of the avatar's geometry Define hand shapes in terms of rotations of the hand joints Determine arm joint rotations from hand positions by inverse kinematics Convert SiGML movements into numerically defined trajectories Output in BAF format or VRML Synthetic Animation of SiGML
Model each joint by a second-order control system a muscle applies a torque to the joint, resisted by a moment of inertia and damping Generate different types of motion (fast, slow, etc.) by varying the model parameters Biocontrol model
If only hands, arms, and face are animated, the result is stiff and lifeless. Animate the spine and head by mixing “ambient motion” from motion capture files with synthetic animation. Ambient motion
An alternative route to creating animations Every important physical feature of a sign is notated in Hamnosys, guaranteed to be reproduced in the animation precise contacts between hands relationship between hands and body Any avatar can be targeted at low additional cost Usefulness of synthetic animation
Closing the feedback loop So far, only the native signers involved in the project can judge the output of our HPSG generation system Requires intimate knowledge of HamNoSys at least With the animation output, we have access to the native signers’ intuition of much more people than today Opens the way to more formal evaluation of the generation system than is available to date
Summary: Language Technology First successful steps in HPSG language modelling and translation of English to sign language Encoding established and extended sign language notation with standard description model (XML) Already close to closing the feedback loop to allow native signers evaluation of our language production system
Presentation by Streams Animation and Linguistics User Applications : Evaluation of broadcast transmission for DTV Exploitation and Dissemination
User Applications Objectives WP1 Television Closed signing for Broadcast DTT Enhanced signing experience Regulation and Standards WP2 Internet Information and Education for Deaf People WP3 Face to Face High Street Post Office Counter Services Science Museum Trial - Summer 2001
Presentation by Streams - Television WP1 Television Closed signing for Broadcast DTT Enhanced signing experience Regulation and Standards WP2 Internet Information and Education for Deaf People WP3 Face to Face High Street Post Office Counter Services Science Museum Trial - Summer 2001
Low transmission rate < 25 kbit/s Compatibility with signing on other media and foreign deaf languages foreign deaf languages Precise, sharp representation of signer Open display options Compliance with international standards: MPEG, DVB Future-proof: cost saving allows vast no. of signed programmes no transition from video-based to VH signing VH on TV: The Advantages
Integrated TX system for broadcast to STBs demonstrator complete end of 2000 Implementing virtual human s/w in STB Incorporating a compression layer Using MPEG-2 delivery layer for maximum compliance: with existing hardware with MPEG & DVB standards with proprietary formats Broadcast VH Signing: Achievements
Broadcast VH Signing: Functional architecture MUXPacket MPEG-2AVencoder MPEG-4SNHCencoder BAFencoder MPEG-2AVdecoder MPEG-4SNHCdecoder BAFdecoder MPEG-4SNHCplayer BAFplayer COMPOSE dePacket deMUX EncoderDecoder Compositor SystemSystem Delivery normative proprietary MPEG-2TS
Broadcast VH Signing: System layer implementation UDP/TCPpacketiser ThomsonMPEGencoder RFmodulator DVB receiver card IPfilter SystemSystem Delivery EncoderDecoder Compositor MPEG-2TS
MPEG-2 Transport Stream (TS) MPEG-2 Packetized Elementary Stream (PES) Section PES Broadcast VH Signing: Versatile delivery architecture BAF AudioVideo FlexMUX Scene desc. AudioVideo SNHC MPEG-4MPEG-2 Proprietary SiGML Text MPEG-7 Content description Content description Coding Delivery DVB compliant DVB compliant
Advanced TX system for broadcast to STBs Open, MPEG & DVB compliant architecture Improved synchronisation layer Integrating a compositing layer Implementing a complete MPEG-4 multimedia player Integrating SiGML stream Broadcast VH Signing: Perspectives
MPEGCompositor Broadcast VH Signing: Targeted architecture MUXPacket MPEG-4SNHCencoder BAFencoder MPEG-4SNHCdecoder BAFdecoder Multimediaplayer dePacket deMUX EncoderDecoder Compositor SystemSystem Delivery normative proprietary MPEG-2TS MPEG- MPEG-AVencoder 24 AVdecoder 24
Presentation by Streams - WWW WP1 Television Closed signing for Broadcast DTT Enhanced signing experience Regulation and Standards WP2 Internet Information and Education for Deaf People WP3 Face to Face High Street Post Office Counter Services Science Museum Trial - Summer 2001
Weather Forecast Application First WWW application: daily weather forecast in 3 sign languages content creation example forecast evaluation
Creation of content Source: forecast in free text Tool for semi-automatic conversion manual standardisation of text automatic generation sign languages Result: 3 webpages English/BSL & Dutch/SLN & German/DGS
Demo
Evaluation with Deaf users Subjective quality of signing rated as ‘reasonable’ or ‘good’ 68% correct or partially correct Improvement possibilities mouthing facial expressions
Mouthing Scores for signs depending in various degrees on mouthing
Facial Expressions Scores for signs depending in various degrees on facial expressions
Next Steps Improvements Beta-testing on line larger user group user feedback Exploitation planning
Presentation by Streams – Face to Face WP1 Television Closed signing for Broadcast DTT Enhanced signing experience Regulation and Standards WP2 Internet Information and Education for Deaf People WP3 Face to Face High Street Post Office Counter Services Science Museum Trial - Summer 2001
WP3: Face-to-face transactions Research concentrated on TESSA (Text and Sign Support Agent) Enables Post Office counter clerks to “translate” from (English) speech to sign language System developments: Autumn 2000: New system software completed, incorporating IBM “Via Voice” speech recognition and improved avatar Spring 2001: 200 new signs recorded, processed and added to system Spring/Summer 2001: Development and testing of “unconstrained system”
First System using Constrained Speech Recognition
“Unconstrained” Speech System
Demo
Testing the Speech Recognition Accuracy of the Unconstrained System Single speaker 200 “constrained” phrases Three recording conditions: studio microphone in acoustic booth boom microphone I in lab boom microphone II in Science Museum Post Office Three conditions for recogniser: Untrained Acoustic models fully trained on boom microphone II in lab Acoustic and language models fully trained
Speech recognition accuracy of unconstrained system
Language Processing I aaboutaccessaccount..youyou’veyour Co-occurrence matrix of Words versus Phrases Phrases 1,2 & 3…... …..Phrase n
Language Processing II Entry W(i,j) in matrix is transformed to: Given M words output by recogniser, score for each phrase is computed as: Normalised average uncertainty about phrase p j given word w i Compresses value of entry Scores above a threshold T are displayed to PO clerk in a list
Testing the Phrase Retrieval Accuracy of the Unconstrained System 10 speakers For each speaker and each of 200 phrases: record one utterance of the “constrained” phrase ask speaker to write down another way of expressing the phrase record speaker saying this phrase Training of recogniser not possible for 10 different speakers Hence measure phrase retrieval accuracy on text of unconstrained phrases only
Phrase recognition Results on Text of Alternative Utterances Average accuracy = 73.3%
Future Work Unconstrained System Investigate use of partial string matching of word sequences and phoneme sequences Investigate use of Latent Semantic Analysis Add spoken language(s) translation Sign recognition Collect data Configure baseline system
Exploitation and Dissemination Highlights Exploitation and Dissemination BBC Collaboration for closed signing solution for broadcasting DTV TESSA BCS IT Award & Gold Medal WWW Weather Forecasting in 3 European Sign Languages Close Involvement of Deaf People
Dissemination Highlights November 2000: TESSA wins British Computer Society Gold Medal for IT February 2001: TESSA exhibited at Royal Society March 2001: TESSA appears on “Computer Club” (German TV) July–September 2001: TESSA on exhibition at Science Museum, London October 8th 2001: TESSA appears on “Blue Peter” (BBC TV) November 2001: TESSA on show at COMDEX, Las Vegas
Exploitation Highlights / Short Term Bandwidth efficient closed signing Excessive in-vision signing disliked by hearing people Impacts on DTT multiplexes where bit-rate is already at a premium BBC investigation of closed signing for DTV Demonstration of Avatar-based signing Body suit capture technologies
Short Term- WWW strategy Give away basic web browser Sell SiGML authoring tool presented De facto standard
Exploitation Highlights Medium to Long Term Conversion of subtitles high % of programmes subtitled supports wide range of deaf signing languages subtitles translated in set top box overcomes spectrum capacity & scheduling restrictions Requirements: reliable unconstrained translator next generation DVB-compliant STB with in-built signing decoder