Download presentation
1
Digital Voice Communication on HF
Ford Amateur Radio League March 12, 2015 David Treharne, N8HKU
2
Overview J2E emissions (part of telephony). Need to have protocols that work, 2 or more, and not considered encryption or a cypher. Amateurs have working on it since 2000, or even before that A lot of proprietary algorithms that do not work together and are not open for experimentation Can Codec 2 work? Where will it be useful? (Rag chewing, emergency communications, contests?)
3
Why digital? Analog has done quite well for SSB and FM.
The commercial world uses a lot of digital signals for both voice and data. Digital can improve signal to noise ratio by eliminating the channel noise. The Challenge: How to get a digital compression code down below our 2,700 Hz SSB bandwidth
4
Conventional compression
Normal Digital sampling Need 2x the data minimum when a signal is digitized 3000 Hz of voice needs 6,000 samples per second, more reasonably 8,000 samples per second. Compression algorithms: Use patterns to reduce the size Use lossy compression techniques. If you lose some of the compressed data, then you lose the complete signal.
5
From: David Rowe Presentations on Codec2
The following slides were from two Codec2 presentations by David Rowe, VK5DGR. 2010 codec2_tapr_2010_v0.2 2012 Ica_2012_codec2
6
Codec 2 Make a digital encoding that models speech, not just compress it. Similar to commercial versions, but made simpler, and designed for voice work only. open source speech codec low bit rate (2400 bit/s down to 1400 bit/s) applications include digital speech for HF and VHF radio fills gap in open source speech codecs beneath bit/s work in progress samples & source: rowetel.com/codec2.html
7
This is Not a DSP talk Pitch Estimation Linear Prediction
Line Spectrum Pairs Voicing Estimation Vector Quantisation Source-Filter Model of Speech production Inverse Discrete Fourier Transform Overlap-add Synthesis
8
Codec2 @ 1400 bit/s The benefits of a compressed voice signal:
Send 45 phone calls in standard 64 kbit/s phone channel 175 bytes/s 30 second voice mail in 5250 bytes 30 minute pod cast in 308 Kbytes (Normal podcast: 20,000 Kbytes)
9
Main Application – Voice over Digital Radio
RF spectrum is extremely limited, noise, bit errors Traditional analog speech systems (FM, SSB) are really efficient in power and bandwidth use But there is interest in using digital techniques .. if the right codec is available Compressed speech requires less bit/s over channel This means less bandwidth Less transmitter power, saving battery and improving speech quality
10
Power Efficiency: Tighter Compression = Better Speech
1 Watt 2 at 0.5 Watt Noise 1400 bit/s 2800 bit/s
11
Digital Voice Radio System
codec2 enc FEC enc mod mic HF/VHF radio D/A codec2 dec FEC dec demod spkr
12
Voice is not like data If you get a single error in a data packet the entire packet is useless If you get a few errors in a voice packet it probably sounds OK (10% errors are OK) If you discard or lose a voice packet every now and again it's probably OK (unlike data, where all data must be perfect) In speech packets some bits are more important than others (protect the most important data only) these factors can be used to build a better voice system (less power, less spectrum, more robust)
13
Codec 2 Author - David Rowe
Ham Radio operator, VK5DGR, first licensed in at age 13 (first computer in 1982) 20 years experience in speech coding Built some of the first real time speech codecs in the late 1980's on early DSP chips Now work full time on open software/open hardware for developing world communications (Wants efficient communications in parts of Africa not covered by phones or even by cell phone towers. Use a type of Mesh network to transmit efficient voice over long distances with little power or equipment)
14
Proprietary Codecs come in hardware or licensed software form
difficult to distribute they cannot be modified understanding how they work is discouraged modification is actually illegal under the license D-Star uses the AMBE coding system. We cannot modify it, we can just purchase and use the hardware chip that performs this function.
15
Speech Coding Take speech samples (e.g. 16 bit samples at 8 kHz sampling rate) Compress to 1400 to 2400 bit/s What can we throw away? Retain intelligible speech Retain natural speech Use a model of speech, send model parameters, more efficient than coding waveform (Not just using compression, but looking at how human voice works. (These are not good for music, noises, etc.)
16
Model Parameter example of a model parameter is pitch
for humans in the range 50 to 500 Hz (100Hz for males, 500 Hz for children) can be accurately represented with 7 bits updated every 20 ms so 7/0.02 = 350 bit/s to represent pitch
17
Sinusoidal Speech Coding
Amplitude (16 bit samples) (female speaker) Notice a lot of repetition of the signals, this gives us a pitch signal for this 40mS sample) Pitch Period 35 samples or 4.4ms at 8kHz sample rate (230 Hz) 40ms of female speech Time (samples)
18
Sinusoidal Speech Coding
Pitch 230Hz or 4.3ms Amplitude (dB) See how the speech spectrum is made up of peaks spaced by about 230Hz? Well 240Hz happens to be the pitch of the speech at this instant in time. Each peak can be thought of as a sine wave. A sinusoidal codec models the speech as set of sine waves, each with it’s own frequency, and phase, and amplitude. So instead of sending the speech waveform like a regular telephone, a sinusoidal encoder sends the sinusoid parameters over the channel to the decoder which then reconstructs the speech. The parameters change over time so we update them at regular intervals, like every 20ms. It turns out that if you do all of this right the speech at the decoder sounds pretty close to the original. Blue – spectrum of previous time domain segment Red – amplitude estimates of each sine wave Harmonics of 230Hz Frequency (Hz)
19
Sinusoidal Speech Model
Amplitude 1 Phase 1 Frequency 1 Amplitude 2 Phase 2 Frequency 2 In practice speech consists of voiced speech (like vowels) and unvoiced speech (like consonants) For voiced speech the frequencies are harmonics of the pitch frequency However if the number of harmonics L is high, noise can be accurately generated using harmonics with random phases So instead of sending the speech waveform, we send the pitch frequency, and the harmonic amplitudes and phases These parameters are time varying, so we update them every 10-20ms Amplitude L Phase L Frequency L
20
1 Sine Wave
21
Male speaker, 80 Hz nominal pitch)
3 sine waves
22
10
23
25
24
50 sine wave harmonics, just changing phase and amplitude of each, produces a very close signal.
25
2550 bit/s quantised model parameters
Encoder Block Diagram 7 bits of data for the pitch or fundamental frequency of the speaker at that point in time. Pitch est Pitch Quant 2550 bit/s quantised model parameters 16 bit, 8kHz samples FFT MBE Voicing est 2 bits (sampled 10mS intervals) of voicing of a vowel or a consonant) LPC Analysis LPC to LSP LSP Quant 36 bits (The model parameters of the speech) MBE voicing algorithm looks at the first 1kHz of the voice spectrum. Compares it to an ideal voiced spectrum and makes a voiced/unvoiced decision. LPC tends to make errors near 0Hz. A single bit is used to correct low frequency LPC errors. Makes a big different to speech quality, especially for males. Although necessary for high quality speech, phase information is discarded. We regenerate the phase information at the decoder using a rule based approach. Line Spectrum Pairs (LSPs) are a convenient way to encode LPCs for transmission over a channel. LPC Correction 1 bit to help with the low pitch frequencies Energy Quant 5 bits of energy signal (how loud)
26
Bit Allocation From one of the first versions of the Codec
51 bits per 20ms frame, or 2550 bit/s Parameter Bits/frame Spectral magnitudes (LSPs) 36 Low frequency LPC correction 1 Energy 5 Voicing (updated each 10ms) 2 Pitch 7 Total 51
27
Decoder Block Diagram LSP to LPC FFT Recover Harm Amps LSPs Energy LPC
Correction Phase Synthesis Post Filter Voicing Phases are synthesised using a rule based approach. More work needed here to improve speech quality. Post filter helps with background noise. This codec can distort background noise and make it sound unpleasant. The post filter tracks average background noise level. The phase of any harmonics at less than this level is randomised so they sound more like noise. The post filter still needs some work. It is an alternative to mixed voicing models used in MBE and MELP codecs that uses zero bits to handle background noise. An inverse FFT is used for synthesis. We create a synthetic speech spectrum then IFFT to create the time domain signal Overlap add is a way to overlap adjacent IFFTs to create a continuous time domain signal. Inverse FFT Overlap Add 16 bit, 8kHz samples
28
FreeDV: The Digital voice for HF
Speech is compressed down to 1600 bit/s then modulated onto a 1.25 kHz wide 16QPSK signal which is sent to the Mic input of a SSB radio. On receive, the signal is received by the SSB radio, then demodulated and decoded by FreeDV. Communications should be readable down to 2 dB S/N, and long-distance contacts are reported using 1-2 watts power. FreeDV was built by an international team of Radio Amateurs working together on coding, design, user interface and testing. FreeDV is open source software, released under the GNU Public License version 2.1. The FDMDV modem and Codec 2 Speech codec used in FreeDV are also open source.
29
Running Codec2 May 21, 2014: SmartMic announced! An embedded hardware product that allows you to run FreeDV without a PC. Plug SmartMic into your SSB or FM radio, and you now have Digital Voice (DV). (allows function with only 1 soundcard) $195.00
30
Issues over HF Fading: The signal is lost, and data is lost. Can handle some fading by use of the Forward Error Correction (FEC) bits. Ham Radio HF packet radio communication uses FEC. Codec2 also tries to decode signals even after fading. Most modems are designed to throw out data when signals are missing, since they are designed for data, where everything must be perfect. Codec2 is designed to try to decode even if there are errors. This allows us to still hear the rest of the speech even after a dropout. Since speech has a lot of redundancy, this is not an issue. (We do this all the time on HF) Group delay: When signals bounce off of the ionesphere at different heights, causing some early signals to arrive later, mixing with the later signals. 5mS is common delay in HF. Need a Codec that sends one packet of bits for longer than 5mS to make sure that it does not get affected by the delay. (Ham Radio RTTY at 45 baud sends the same signal for over 5mS before switching to a new signal to handle this delay. That is why that speed works so well for HF communications!!) Lining up the signal right in the passband. A problem for us already in HF SSB. Codec 2 starts a transmission by sending a sequence of known tones. The receiver knows the frequencies of these tones and lines up the decoder to match them.
31
What does this sound like?
Male vs Female (pitch differences, Codec2 has trouble with low frequencies, so it adds in an extra bit) background noise and speech codecs (if the noise does not sound like a voice with the harmonics, then it does not bother the coding very much)
32
Conclusions Digital HF will continue in experimentation phase into the future Use more bits to make it sound better, or to add in more error correction? Find a way to do this with one sound card? (I am not sure why it requires two cards.) Best used when a communication link has been established, then switch to digital. When it does work, it eliminates all of the QRM and QRN on SSB. It could be the ragchew method of the future! Maybe, just maybe, even for contests!
33
Bibliography freeDV: http://freedv.org/tiki-index.php
Codec 2:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.