System integration – current status and future priorities

System integration – current status and future priorities
Alastair H. Moore Technical Project Meeting Berlin, May 2016

12 channel digital audio OSX Naoqi playrec m/c audio buffers transcription Matlab audio DSP Synth speech Dialogue manager Python ASR mono audio Audio DOA & saliency Motor commands (head pose) Ego sphere Python Interface DOA Image frames Ubuntu Video stream Visual DOA & saliency C++ visual localisation Face detector Face positions

playrec Matlab interface to portaudio
Has dynamic internal buffer structure ‘rec’ or ‘playrec’ to record audio ’getrec’ to retrieve the audio Allows online processing in matlab – buffers are stored until requested so no missed buffers If sound is to be output to soundcard (as is currently done for auditioning the processed audio) setting the number of buffers gives a trade-off between latency and risk of buffer underrun (audio glitches)

Audio localisation -> ego sphere
Matlab Spherical harmonic domain Pseudo-intensity vectors DPD-MUSIC Single source direction of arrival written to EARS map object Map object written to XML file Python Read XML file Converts DOAs to required co-ordinates system Send to egosphere

Audio localisation -> ego sphere
Scope for improvement Use confidence of localisation estimate as ‘saliency’ parameter in egosphere May need to add parameter to MAP object Avoid sending any DOAs when SNR is poor/no speech activity Incorporate tracking – audio only or audio-visual. Need interface to get visual DOAs into Matlab

Audio enhancement -> ASR
Matlab Spherical harmonic domain beamforming 1st order (relatively wide beams) fixed look direction (chosen for robustness of demo) limited to 5 kHz Coherent-to-diffuse ratio-based post filter Uses simulated HRTFs Enhanced audio written to TCP/IP pipe in continuous stream of small blocks

Python script Reads audio from pipe Endpointing using basic energy-based voice activity detector Sends audio to Google ASR Transcription sent to Naoqi dialogue system ‘Holds off’ further ASR while Nao speaks

Scope for improvement Steer beam using DOAs Can it be done robustly? Post filter with higher frequency HRTFs Acoustic echo cancellation to avoid ‘hold off’ period Add dereverberation?

Visual localisation -> egosphere

Egosphere behaviour DOAs arrive from audio and video subsystems
All DOAs are attended to (looked at) with priority according to saliency

12 channel digital audio Ubuntu / OSX Naoqi playrec naolab Audio stream Matlab audio DSP Synth speech Dialogue manager Python ASR Motor commands (head pose) Ego sphere Python Interface Ubuntu Video stream C++ visual localisation Face detector Synchronised mono video + face positions

System integration – current status and future priorities

Similar presentations

Presentation on theme: "System integration – current status and future priorities"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

System integration – current status and future priorities

Similar presentations

Presentation on theme: "System integration – current status and future priorities"— Presentation transcript:

Similar presentations

About project

Feedback