Download presentation
Presentation is loading. Please wait.
Published byElfreda French Modified over 6 years ago
1
System integration – current status and future priorities
Alastair H. Moore Technical Project Meeting Berlin, May 2016
2
12 channel digital audio OSX Naoqi playrec m/c audio buffers transcription Matlab audio DSP Synth speech Dialogue manager Python ASR mono audio Audio DOA & saliency Motor commands (head pose) Ego sphere Python Interface DOA Image frames Ubuntu Video stream Visual DOA & saliency C++ visual localisation Face detector Face positions
3
playrec Matlab interface to portaudio
Has dynamic internal buffer structure ‘rec’ or ‘playrec’ to record audio ’getrec’ to retrieve the audio Allows online processing in matlab – buffers are stored until requested so no missed buffers If sound is to be output to soundcard (as is currently done for auditioning the processed audio) setting the number of buffers gives a trade-off between latency and risk of buffer underrun (audio glitches)
4
Audio localisation -> ego sphere
Matlab Spherical harmonic domain Pseudo-intensity vectors DPD-MUSIC Single source direction of arrival written to EARS map object Map object written to XML file Python Read XML file Converts DOAs to required co-ordinates system Send to egosphere
5
Audio localisation -> ego sphere
Scope for improvement Use confidence of localisation estimate as ‘saliency’ parameter in egosphere May need to add parameter to MAP object Avoid sending any DOAs when SNR is poor/no speech activity Incorporate tracking – audio only or audio-visual. Need interface to get visual DOAs into Matlab
6
Audio enhancement -> ASR
Matlab Spherical harmonic domain beamforming 1st order (relatively wide beams) fixed look direction (chosen for robustness of demo) limited to 5 kHz Coherent-to-diffuse ratio-based post filter Uses simulated HRTFs Enhanced audio written to TCP/IP pipe in continuous stream of small blocks
7
Audio enhancement -> ASR
Python script Reads audio from pipe Endpointing using basic energy-based voice activity detector Sends audio to Google ASR Transcription sent to Naoqi dialogue system ‘Holds off’ further ASR while Nao speaks
8
Audio enhancement -> ASR
Scope for improvement Steer beam using DOAs Can it be done robustly? Post filter with higher frequency HRTFs Acoustic echo cancellation to avoid ‘hold off’ period Add dereverberation?
9
Visual localisation -> egosphere
10
Egosphere behaviour DOAs arrive from audio and video subsystems
All DOAs are attended to (looked at) with priority according to saliency
11
12 channel digital audio Ubuntu / OSX Naoqi playrec naolab Audio stream Matlab audio DSP Synth speech Dialogue manager Python ASR Motor commands (head pose) Ego sphere Python Interface Ubuntu Video stream C++ visual localisation Face detector Synchronised mono video + face positions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.