Download presentation
Presentation is loading. Please wait.
Published byNigel Peters Modified over 9 years ago
2
NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005
3
NTT Labs. 2005 Self introduction 1980 Joined NTT, Basic research –Transform domain interleave VQ –Conjugate VQ 1989 guest researcher at AT&T Bell Labs 1990 Standardization for Japanese PDC (PSI-CELP) 1993 Standardization for ITU-T (CS-ACELP) 1995 Standardization for MPEG-4 (TwinVQ) 2001 Standardization for MPEG lossless audio
4
NTT Labs. 2005 512 256 128 64 32 16 8 4 2 1980198519901995 PARCOR LSP APC-AB VSELP G.711 G.726 G.728 G.722 MPEG-1 CD, DAT MPEG-4 1975 2000 MPEG-2 1024 bit rate [kbit/s] 2005year MP3 AAC Technologies of speech and audio coding mobile vocoder music telephone mobile phone streaming archive ubiquitous VoIP/mobile PSI-CELP G.729 MPEG-4 (lossless) wideband
5
NTT Labs. 2005 Outline 1. Fundamentals –1.1 Time domain for speech –1.2 Frequency domain for audio 2. Standardization –2.1 ITU-T speech coding –2.2 MPEG audio coding 3. Hot topics –3.1 MPEG lossless (ALS, SLS, DTS) –3.2 MPEG SBR and SSC –3.3 MPEG surround
6
NTT Labs. 2005 Fundamentals
7
NTT Labs. 2005 Category of coding coding compression presentation metadata speech language lossless lossy time- domain frequency- domain text speech audio image video
8
NTT Labs. 2005 Time-domain linear prediction -> CELP predictive coefficients – PARCOR (partial auto correlation) – LSP (line spectral pair) vector quantization of excitation source –algebraic structure ( ACELP ) Big market for cellular phone and VoIP
9
NTT Labs. 2005 LPC (Linear Predictive Coding) Σ Z -1 synthesized output excitation (innovation) (prediction residual) Z -1 α2α2 αpαp ・・・・ predictive coefficients
10
NTT Labs. 2005 Family of LPC parameters predictive coefficients α 1.... α p PARCOR coefficients k 1.... k p LSP parameters ω 1.... ω p frequency ω1ω1 ω2ω2 ωpωp merits of LSP stability interpolation quantization prediction
11
NTT Labs. 2005 CELP (Code Excited Linear Prediction) adaptive codebook (periodic) random codebook (noise, pulse) + LPC synthesis perceptual error LSP parameter Feedback (analysis by synthesis) gain input
12
NTT Labs. 2005 Synthesis model for vocoder pitch interval gain ( random ) synthesis filter Σ
13
NTT Labs. 2005 Synthesis model for multi-pulse pitch interval gain amplitude and position of pulse Σ synthesis filter
14
NTT Labs. 2005 Synthesis model for regular multi-pulse pitch interval gain amplitude of regular pulse Σ synthesis filter
15
NTT Labs. 2005 Synthesis model for CELP pitch interval gain selection of code vector Σ ・・・・・・・ synthesis filter
16
NTT Labs. 2005 Synthesis model of VSELP pitch interval gain polarity of base vector Σ +/- ・・・・・・・・・・ synthesis filter
17
NTT Labs. 2005 Synthesis model for CS-CELP pitch interval gain selection of vector pair Σ +/- ・・・・・・・ +/- synthesis filter
18
NTT Labs. 2005 Synthesis model of ACELP pitch interval gain selection of unit pulse position Σ +/- synthesis filter Simplicity is the seal of truth
19
NTT Labs. 2005 Frequency-domain Lapped transform: MDCT –Without frame noise nor information loss due to overlap Filter bank: QMF –compromises time and frequency adaptive noise control psycho-acoustics
20
NTT Labs. 2005 Transform coding Transform time to frequency envelope estimation quantization input Transform frequency to time Adaptive bit allocation output Side information
21
NTT Labs. 2005 Base of DCT frequency time
22
NTT Labs. 2005 Base of MDCT 0verlap with previous frame 0verlap with next frame symmetryanti-symmetry
23
NTT Labs. 2005 frequency 32 band QMF filter bank (analysis) QMF for MPEG1,2 Layer-I, II frequency 32 band QMF filter bank (synthesis) ….. down sample adaptive bit allocation for 32 equal bands (energy, masking) adaptive quantization reconstruction bit stream
24
NTT Labs. 2005 frequency 32 band QMF filter bank (analysis) QMF for MPEG1,2 Layer-III frequency 32 band QMF filter bank (synthesis) ….. down sample long and short MDCT adaptive bit allocation for Bark-scale (energy, masking) adaptive quantization (Huffman coding), bit reservoir reconstruction bit stream
25
NTT Labs. 2005 frequency 32 band QMF filter bank (analysis) QMS for MPEG extension tools frequency 32 band QMF filter bank (synthesis) ….. SBR (Spectral Band Replication) PS (Parametric Stereo) Surround reconstruction bit stream
26
NTT Labs. 2005 Masking effect original spectrum allowable noise level audible level log spectrum frequency masked region
27
NTT Labs. 2005 Physical and perceptual distortion un-noticeable (masking) result of compression additive noise un-noticeable region original additive echo characteristics of perceptionapplication
28
NTT Labs. 2005 Distortion by additional noise original distortion original noticeable time frequency log spectrum distortion
29
NTT Labs. 2005 Distortion by data compression control quantization noisedistortion is masked original frequency distortion time original distortion log spectrum
30
NTT Labs. 2005 Distortion by echo echo is masked watermark search or recognition time 40 ms original frequency distortion log spectrum original distortion
31
NTT Labs. 2005 Predictive coding and transform coding small correlation effect gain large correlation method unpredictableflat spectrum prediction gain transform gain waveform energy residual energy arithmetic mean geometric mean predictablevaried spectrum closed-loop quantization adaptive bit allocation weighted quantization time-domain (prediction) frequency-domain (transform, subband) Speech (5 ms) Audio (30 ms) =
32
NTT Labs. 2005 Standards
33
NTT Labs. 2005 Example of standard ITU-T –cellular phone –VoIP –TV-phone –FAX ISO/IEC JPEG, MPEG –digital camera, video –digital broadcasting –portable music player, DVD
34
NTT Labs. 2005 Merits of standard interoperability open source –long term maintenance –visible patent holders Integration of the highest technologies cost reduction by mass production market creation
35
NTT Labs. 2005 patent pool disclosure of technology patent standard service product market research R & D basic research service and products cost reduction users royalty Circulatory evolution of market competition convenient
36
NTT Labs. 2005 Standardization for speech ITU-T G. IMT-2000 (International Mobile Telecommunication) GSM (European, Asia) TIA (North America) US FS-1015 (LPC-10), 1016 (CELP), 1017 (MELP) Japanese Cellular - PDC full/half rate - PHS - cdmaOne - PDC enhanced full rate
37
NTT Labs. 2005 ITU-T standard for speech Telephone band (8 kHz sample) –G.711 PCM 64 kbit/s –G.726 ADPCM 32 kbit/s (16,24,40 kbit/s) –G.727 Embedded ADPCM 32 kbit/s (16,24,40 kbit/s) –G.728 Low-delay CELP 16 kbit/s –G.723.1 ACELP/MPC-MLQ 5.3/6.3 kbit/s –G.729 CS-ACELP 8 kbit/s Wide band (16 kHz sample) –G.722 SB-ADPCM 64, 56, 48 kbit/s –G.722.1 Transform coding 24, 32 kbit/s –G.722.2 AMR-WB 6.6 - - 24 kbit/s
38
NTT Labs. 2005 Standard for IMT-2000 3GPP (3rd Generation Partnership Project) (ARIB, TTC, T1, ETSI,TTA ) 3GPP2 bi-directional CODEC AMR (Advanced Multi Rate) AMR-WB (wide band) video phone (H.263) Audio/Low rate speech packet transmission (MPEG-4)
39
NTT Labs. 2005 Bandwidth and bitrate for audio coding 244896192384768 18 12 6 0 24 MPEG-4 MPEG-1 MPEG-2,1/2sample MPEG-2 multi-channel AC-3,AAC CD DAT Rate[kbit/s] bandwidth [kHz] MD
40
NTT Labs. 2005 Basic technology for audio coding Transform MPEG-1 L1,2 subband adaptive bit MPEG-1 L3subband+MDCTadaptive+Huffman ATRAC subband+MDCT adaptive bit AC-3MDCT adaptive+Huffman AACMDCT TwinVQ MDCTadaptive VQ adaptive+Huffman Quantization
41
NTT Labs. 2005 MPEG- 1, 2/audio MPEG-1 –sampling rate: 32, 44.1, 48 kHz stereo –algorithm: Layer-I 32 band split Layer-II + improved quantizer Layer-III + MDCT + Variable length + bit reservoir ++ MPEG-2 –low sampling rate 16, 22.05, 24 kHz –multi channel 5.1ch –backward compatibility
42
NTT Labs. 2005 MPEG-2/AAC 3 profiles -main, -LC (Low Complexity),-SSR (Scalable Sampling Rate) sampling rate: 32, 44.1, 48 kHz, +X2, X1/2, X1/4 channel: 1-48 bit rate: 8-576 kbit/s/ch MDCT 1024 or 128 TNS (Time domain Noise Shaping) MS (Middle-Side) stereo/intensity stereo non-linear scale quantizer + variable length code (2 and 4 dimension Huffman code)
43
NTT Labs. 2005 Tools in MPEG-4 audio l Low rate speech HVXC (Harmonic Vector eXcitation Coder) l Speech (narrow/wide) CELP l Low rate audio TwinVQ (Transform domain Weighted Interleave VQ) l Audio MPEG-2 AAC (Advanced Audio Coder) l Error resilient framework l Parametric audio coding HILN l Fine granular scalable audio coding BSAC l Low delay audio coding LD-AAC l Low overhead Audio Transport LATM
44
NTT Labs. 2005 MPEG-4 General audio IMDCT LTP TNS stereo coding scalability output common tools interleave VQ for MDCT scale factor Huffman coding scale factor Bit-slice arithmetic TwinVQ AAC BSAC
45
NTT Labs. 2005 Audio Demo (low rate) ITU-T G.711 64 kbit/s ITU-T G.726 32 kbit/s ITU-T G.728 16 kbit/s ITU-T G.729 8 kbit/s PDC Full 6.7 kbit/s PDC Half 3.45 kbit/s MPEG4 HVXC 2 kbit/s MPEG4 TwinVQ 8 kbit/s
46
NTT Labs. 2005 Hot Topics
47
NTT Labs. 2005 Background of lossless coding Demand for lossless compression of audio –archiving analog and digital contents –delivery over broadband network –high quality audio format up to 24 bit 192 kHz sampling –multi-channel medical data, seismic data, sensor array, etc. MPEG-4 extension –official tools (open source) –inter operability (good for over 100 years)
48
NTT Labs. 2005 Family of MPEG lossless ALS –one-step compression in time domain SLS –scalable to lossless from MPEG lossy core –fine grain scalability in frequency domain –Integer MDCT DTS –1-bit oversample format –compatible with Sony-Philips SACD format
49
NTT Labs. 2005 Property of ALS Time domain adaptive prediction –simple to high-performance backward prediction –BGMC for prediction residual –Golomb-Rice Code for PARCOR –Progressive order prediction –Long-term prediction –Hierarchical block switching extension –Floating-point support –Multi-channel predictive coding
50
NTT Labs. 2005 Prediction residual time amplitude Original wave Prediction residual wave
51
NTT Labs. 2005 Predictive coding vocoder waveform coding lossless coding compression ratio 1/30 ratio 1/10 ratio 1/2 input residual prediction synthesis parameters pulse interval all residual codebook for residual magnify 30 times different frameworkrich commonality
52
NTT Labs. 2005 45 46 47 48 49 50 051015 averaged decoding time for 30 sec files (48,96,192 kHz) [%] 45 46 47 48 49 50 20406080100120140 [sec] compression ratio Monkey’s Audio (free Software) OptimFrog (free Software) MPEG-4 SLS [%] ALS (reference decoder) ALS ( high- compression ) ALS (enhanced decoder) Compression and decoding time
53
NTT Labs. 2005 24 487296120144 stereo bit rate [kbit/s] relative quality MP3 AAC HE-AAC HE-AAC V2 Quality improvements by SBR and PS AAC AAC profile SBR PS HE-AAC profile HE-AAC V2 profile Japanese digital broadcasting (2003) Japanese mobile digital broadcasting (2006)
54
NTT Labs. 2005 MPEG SBR (HE-AAC) AAC stereo encoder AAC stereo decoder AAC stereo bit steam low-pass output down sample high frequency analysis (Spectral Band Replication) SBR bit steam full-band output full-band input high frequency synthesis envelope excitation low-pass input
55
NTT Labs. 2005 MPEG SBR+PS (HE-AAC v2) AAC monaural encoder AAC monaural decoder AAC monaural bit steam monaural output monaural input mix down stereo output PS (parametric stereo) analysis PS bit stream PS (parametric stereo) synthesis stereo input Channel level differences Inter channel correlation
56
NTT Labs. 2005 MPEG surround AAC stereo encoder AAC stereo decoder AAC stereo bit stream stereo output stereo input mix-down surround analysis surround bit stream 5-ch output 5-ch input surround synthesis Channel level differences Inter channel correlation Channel prediction coefficients
57
NTT Labs. 2005 1992 1994 199619982000200220042006 MPEG-1 MPEG-2 MC/LSF MPEG-2 AAC MPEG-4 V1 V2 2001 SBR SSC MP3 on 4 2005 DST ALS SLS History of MPEG Audio surround lossless forward and backward compatibility *Multi-channel and Low Sampling Frequency
58
NTT Labs. 2005 Future challenge Open problems –all-mighty coder for both speech and audio at less than 16 kbit/s –Wave field synthesis (multi-channel) Integrated service –video –copyright management
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.