ETSI STQ-Aurora Distributed Speech Recognition (DSR) Bernhard Noé Distributed Speech Recognition.

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

Speech Coding Techniques
VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advances in WP1 Trento Meeting January
Speech Processing for NSR Vs DSR Veeru Ramaswamy PhD CTO, Vianix LLC
Chapter 5 standards for multimedia communications
Ranko Pinter Simoco Digital Systems
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Dieter Kopp Dieter Kopp Alcatel Research & Innovation Distributed Speech Recognition ETSI STQ Aurora Distributed.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund
Understanding the Internet Low Bit Rate Coder Jan Linden Vice President of Engineering Global IP Sound Presented by Jan Skoglund Sr. Research Scientist.
Zero byte ROHC RTP1Lars-Erik Jonsson, Zero-byte ROHC RTP Background, requirements, current status and proposed way forward Lars-Erik Jonsson.
Advances in WP1 Turin Meeting – 9-10 March
Streaming Video over the Internet: Approaches and Directions Dapeng Wu, Yiwei Thomas Hou et al. Presented by: Abhishek Gupta
Activities in the field of header compression. Center for TeleInFrastructure 2 ROHC working group RFC 3095 ROHC (Framework + RTP. UDP, ESP, uncompressed)
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Advances in WP1 and WP2 Paris Meeting – 11 febr
® The Bluetooth Architecture APIs, L2CAP, Link Management, Baseband, and the Radio.
K. Salah 1 Chapter 28 VoIP or IP Telephony. K. Salah 2 VoIP Architecture and Protocols Uses one of the two multimedia protocols SIP (Session Initiation.
Why is ASR Hard? Natural speech is continuous
Video Streaming © Nanda Ganesan, Ph.D..
DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.
ETSI STQ, Taiwan Workshop, February 13th, Recent improvements in transmission quality assessment : Background noise transmission Results of STF.
30-31 March 2005 Workshop "IMS over Fixed Access" - Washington 1 TISPAN_NGN Project plan Martin Niekus Alain Sultan
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
Computer Networks: Multimedia Applications Ivan Marsic Rutgers University Chapter 3 – Multimedia & Real-time Applications.
1 VoIP – Voice over Internet Protocol Patrick Hügenell, Andreas Vetter – TIM01AGR – 2003 VoIP Voice over IP.
Highlights of the Revised VMR-WB RTP Payload and Storage File Formats Sassan Ahmadi, Ph.D. Nokia Inc. USA May 1, 2004 For more information please refer.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Author: Naveen Parihar Inst. for Signal and Info. Processing Dept.
IP Multicast A convention to identify a multicast address Each node must translate between an IP multicast address and a list of networks that contain.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.
2nd Workshop on Wideband Speech Quality - June ETSI TC STQ STF 294 Improving the quality of eEurope wideband speech applications by developing a.
HDTV Video and AC-3 Payload Formats Ladan Gharai Allison Mankin USC/ISI.
Real Time Protocol (RTP) 김 준
H.323 Overview Demystifying Multimedia Conferencing Over the Internet Using the H.323 Set of Standards.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Overview of ROHC framework
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
LOG Objectives  Describe some of the VoIP implementation challenges such as Delay/Latency, Jitter, Echo, and Packet Loss  Describe the voice encoding.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
“Compensating for Packet Loss in Real-Time Applications“
© NOKIAAMR_MIME.PPT / / AL page: 1 MIME type registration of AMR speech codec draft-lakaniemi-avt-mime-amr-00.txt draft-wimmer-avt-mime-amr-00.txt.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
Voice Coding in 3G Networks
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
TCP/IP Protocol Suite Suresh Kr Sharma 1 The OSI Model and the TCP/IP Protocol Suite Established in 1947, the International Standards Organization (ISO)
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
A dynamic QoS control scheme for services based on RACF
Protocols and the TCP/IP Suite Overview and Discussion
Speech recognition in mobile environment Robust ASR with dual Mic
MDC METHOD FOR HDTV TRANSMISSION OVER EXISTING IP NETWORK
VOICE AND VIDEO OVER IP VOIP, RTP, RSVP.
SOURCE: TIA TITLE: TIA Update on NGN End-to-End QoS AGENDA ITEM:
Explanation of draft TP IP Radio Network Development Department
ITU-T Recommendation G.722.1:
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Understanding the Internet Low Bit Rate Coder
On the Integration of Speech Recognition into Personal Networks
IETF 50, Minneapolis Zero-byte ROHC RTP Background, requirements, current status and proposed way forward Lars-Erik Jonsson Ericsson Research, Luleå.
Presentation transcript:

ETSI STQ-Aurora Distributed Speech Recognition (DSR) Bernhard Noé Distributed Speech Recognition

Seite 2 Bernhard Noé ETSI STQ Aurora Activities  Standardisation of DSR Front-End including Compression w DSR Front-End Standard (WI007) published in Feb 2000 w Advanced Front-End (WI008) selected in Feb 2002 Approval of Standard planned for Mid 2002  DSR Front-End Extension for Tonal-Language Recognition and Speech Reconstruction (WI 030)  Definition of Applications and Protocols w Architecture definition, Client /Server protocol w Liaison to other Standardisation bodies Contribution to other Standardisation Groups

Seite 3 Bernhard Noé ETSI STQ Aurora Participants  Participants w Alcatel, Comverse, Ericsson, France Telecom, Hewlett Packard, Hutchinson, IBM, Microsoft, Mitsubishi, Motorola, Nokia, Nuance, Qualcomm, Siemens, Speech Works, Texas Instruments, Verbaltek, VoiceSignals, e. a.  Chairman of Aurora: David Pearce, Motorola

Seite 4 Bernhard Noé ETSI STQ Aurora WI008 Front-End System Overview, Requirements Application Noise Reduction Feature Extraction Speaker Independent (SI) Phoneme Reference Word Model Grammar Trans- action Front -End / TerminalBack -End / Server Transmission channel 3G, IP, ITU, etc.  Language independent, Low Delay, Medium Complexity, Datarate < 4.8 kbit /sec, support 8k,11k and 16k Sample Rate  Noise Robust, Match WI007 Performance for Clean Speech  High Performance (25% / 50% Reduction of WER to WI007) WI008 Front-End

Seite 5 Bernhard Noé ETSI STQ Aurora WI008 Front-End Competition  First Submission with Performance Results on Small Vocabulary Databases in Jan 2001 w 6 Candidates from Nokia, Ericsson, Qualcomm/OGI/ICSI, Motorola and Alcatel/France-Télécom  Final Submission with Performance Results on Small and Large Vocabulary Databases in Jan 02 w 2 Candidates from Qualcomm/OGI/ICSI and Motorola/France- Télécom/Alcatel

Seite 6 Bernhard Noé ETSI STQ Aurora WI008 Front-End Selection  Small vocabulary databases (10 digits) w Real world SDC Databases and synthetic TI-Digits Database with artificially added Noise w Word-Based Recognizer, Pre-tuned but then fixed  Large vocabulary database (5000 Words) w Wall Street Journal Database with artificially added Noise w Phoneme-based Recognizer with language model  Totally 93 Test sets with Different Languages, Noise levels, Microphones, Noise types and different Mismatch between Training and Test  Selection Criteria: Absolute Recognition Performance

Seite 7 Bernhard Noé ETSI STQ Front-End Standard  Overall best Performance: Absolute Accuracy % (weighted sum of all Test-Sets with Files ranging from dB SNR + Clean Data)  Best Performance in most of the Test-Sets  Operational Features: Complexity /Ram /Rom: ~ wMops /3.8 /3.7kWords Terminal Latency: 63 msec Datarate: 4.8 kbit/sec 39 Features

Seite 8 Bernhard Noé ETSI STQ Terminal Front-End to channel Feature Extraction Feature Compression Framing, Bit-Stream, Error Protection input signal Feature Extraction Noise Reduction Waveform Processing Cepstrum Calculation Blind Equalization 11 and 16 kHz Extension input signal to feat. comp. Front-End Standard Signal Processing in the Terminal

Seite 9 Bernhard Noé ETSI STQ Front-End Standard Signal Processing in the Server Decoding, Error Mitigation and Decompression Bit-Stream Decoding, Error Mitigation Feature Decompression Speech Engine with Feature Interface from channel

Seite 10 Bernhard Noé ETSI STQ Front-End Standard Overall Performance

Seite 11 Bernhard Noé ETSI STQ Front-End Standard Compression and Encoding /Decoding  Compression: Split VQ of pairwise grouped Cepstral Features with 6 /8 bit Resolution per Pair  Framing, Bit-Stream and Error Protection w CRC Code generated for a Frame-Pair  Mulitframe format, synchronisation sequence, header field and error protection are as in ETSI ES (WI007)  Frame packet stream includes VAD bit (Wi008 only)  Error Mitigation Scheme based on CRC and first derivative of feature set

Seite 12 Bernhard Noé ETSI STQ Aurora WI0030 Overview, Goals  New work item (WI 030) “DSR front-end extension for tonal language recognition and Speech Reconstruction” since Jun 01 w Improved Recognition in Tonal-Languages w Server-based Speech Reconstruction for Verification Purpose

Seite 13 Bernhard Noé ETSI STQ Aurora WI0030 Goals, Activities  Goals w Update Rate 10msec, Minimum Set of additional Features w Datarate < 1000 bits /sec  Definition of Requirements and Test-Set for “Intelligibility”  Definition of Requirements for “Tonal-Language Recognition evaluation”  Currently IBM & Motorola are mainly contributing

Seite 14 Bernhard Noé ETSI STQ Aurora Applications and Protocols Goals, Activities  Goals w Exploit and Reuse existing Protocols as far as possible w Start with DSR Model first but keep it open for further Extensions (Multimodal I/O)  Activities w Bring DSR into 3GPP w Approve Extensions necessary for DSR within 3GPP, IETF,... w Define Transport and Session Protocol Requirements w Define Meta information needed w Define Extensions for Multimodal Operation

Seite 15 Bernhard Noé ETSI STQ Aurora Applications and Protocols Transport and Session Control  Meta Information VAD, DMTF, BargeIn and Speech Segments in DTX Mode Codec Negotitaion  Transport Protocol (work in progress) Use RTP, definition of RTP payload for DSR  Session Protocol (work in progress) Agreement to use SIP /SDP as it is adopted by 3GPP Extensions for Codec negotiations

Seite 16 Bernhard Noé ETSI STQ Aurora Applications and Protocols Liaison to other Standardization bodies  3GPP w DSR was launched into 3GPP in July 2001 (Goal: bring DSR into Release 5), now probably Release 6 w DSR has achieved state 1 (some questions to be solved) comparison between AMR based SR and DSR based SR other open issues: service examples, billing,... New Subgroup in 3GPP: Speech Enabled Services w Approve Extensions necessary for DSR within 3GPP, IETF,  ITU - T SG16 w agreement to avoid duplication of work

Seite 17 Bernhard Noé ETSI STQ