System integration – current status and future priorities

Slides:



Advertisements
Similar presentations
Researches and Applications for Automotive Field Andrea Azzali, Eraldo Carpanoni, Angelo Farina University of Parma.
Advertisements

Topic 5 Instructional audio OWT 410. Instructional audio Digital audio Definition of podcast Type of podcast Steps for creating audio podcasts Tools for.
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Using Multimedia on the Web Enhancing a Web Site with Sound, Video, and Applets.
CNIT 132 – Week 9 Multimedia. Working with Multimedia Bandwidth is a measure of the amount of data that can be sent through a communication pipeline each.
Higher Music Technology Effects and Processes Effects Chorus - A chorus (or ensemble) is a modulation effect used to create a richer, thicker sound and.
Final Year Project Progress January 2007 By Daire O’Neill 4EE.
SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)
Implement a 2x2 MIMO OFDM-based channel measurement system (no data yet) at 2.4 GHz Perform baseband processing and digital up and down conversion on Nallatech.
Musical Sound Processing Student Name: 鄭建健
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
3-D Spatialization and Localization and Simulated Surround Sound with Headphones Lucas O’Neil Brendan Cassidy.
Nearfield Spherical Microphone Arrays for speech enhancement and dereverberation Etan Fisher Supervisor: Dr. Boaz Rafaely.
TCP/IP Protocol Suite 1 Chapter 25 Upon completion you will be able to: Multimedia Know the characteristics of the 3 types of services Understand the methods.
Binaural Sound Localization and Filtering By: Dan Hauer Advisor: Dr. Brian D. Huggins 6 December 2005.
ABSTRACT: Noise cancellation systems have been implemented to counter the effects of echoes in communications systems. These systems use algorithms that.
                      Digital Video 1.
Page 1 | Microsoft Introduction to audio stream Kinect for Windows Video Courses.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Umm Al-Qura University Collage of Computer and Info. Systems Computer Engineering Department Automatic Camera Tracking System IMPLEMINTATION CONCLUSION.
10/10/04 L5/1/28 COM342 Networks and Data Communications Ian McCrumRoom 5D03B Tel: voice.
Signal Digitization Analog vs Digital Signals An Analog Signal A Digital Signal What type of signal do we encounter in nature?
Multimedia Technology and Applications Chapter 2. Digital Audio
4/7 Multimedia Roll call Video Lecture: –multimedia sound –multimedia video Image courtesy of
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
Using a MATLAB/Photoshop Interface to Enhance Image Processing in the Interpretation of Radar Imagery The Center for Remote Sensing of Ice Sheets (CReSIS)
Using IR Chapters 7 & 8 of Robotics with the Boe-Bot.
Using IR Chapters 7 & 8 of Robotics with the Boe-Bot.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Interactive Multimedia Sound Mikael Fernström. Data sources Microphones and transducers –Sample acoustic reality Synthesis –Simulate reality (and beyond.
Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.
1 Department of Electrical and Computer Engineering SDP15: Project Sauron Project Sauron Preliminary Design Review Senior Design Project Fall 2015.
Final Year Project Eoin Culhane. MIDI Guitar Guitar with 6 outputs 1 output for each string Each individual string output will be converted to MIDI.
Dan Nichols Head of Recording Services Internet2 Multimedia Specialist Northern Illinois University Your TV IS TOO SLOW.
Radio Equipment. Review: On the Transmitter Side The purpose of radio communications is to transfer information from one point to another. The information.
Motivation ● The (Ham) world needs an open source, patent free speech codec at bit rates of less than 5000 bit/s ● I know how to build one!
Chapter 15 Recording and Editing Sound
ECE 492 Capstone Design: Multi-Functional Wireless Guitar
ECE 492 Capstone Design: Multi-Functional Wireless Guitar
- Graphical extension to MATLAB for modeling and simulation of systems
WP 7: Management Stipulated versus actual work
Text-to-Speech Device for V+ May 20, 2018
Sound / Audio.
ECE 492 Capstone Design: Multi-Functional Wireless Guitar
WP 2: Acoustic Scene Analysis
PRESENTED TO CANARIE BY TOM LANDRY MARCH 24th
Voice Manipulator Department of Electrical & Computer Engineering
Acoustic mapping technology
Digital Communications Chapter 13. Source Coding
ETD/Online Report D. Breton, U. Marconi, S. Luitz
Digital Communication
Status of the Merlin Readout System
Auditorium acoustic (continued)
This chapter provides a series of applications.
Chapter 1 Introduction to Digital Signal Processing
Presentation of the System
WP 1: Embodied Acoustic Sensing for Real-world Environments
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Hardware and system development:
Games Development Practices Sound Effects
Microphone array beamforming
Higher Music Technology
Voice Manipulator Department of Electrical & Computer Engineering
ECE 492 Capstone Design: Multi-Functional Wireless Guitar
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

System integration – current status and future priorities Alastair H. Moore Technical Project Meeting Berlin, May 2016

12 channel digital audio OSX Naoqi playrec m/c audio buffers transcription Matlab audio DSP Synth speech Dialogue manager Python ASR mono audio Audio DOA & saliency Motor commands (head pose) Ego sphere Python Interface DOA Image frames Ubuntu Video stream Visual DOA & saliency C++ visual localisation Face detector Face positions

playrec Matlab interface to portaudio Has dynamic internal buffer structure ‘rec’ or ‘playrec’ to record audio ’getrec’ to retrieve the audio Allows online processing in matlab – buffers are stored until requested so no missed buffers If sound is to be output to soundcard (as is currently done for auditioning the processed audio) setting the number of buffers gives a trade-off between latency and risk of buffer underrun (audio glitches)

Audio localisation -> ego sphere Matlab Spherical harmonic domain Pseudo-intensity vectors DPD-MUSIC Single source direction of arrival written to EARS map object Map object written to XML file Python Read XML file Converts DOAs to required co-ordinates system Send to egosphere

Audio localisation -> ego sphere Scope for improvement Use confidence of localisation estimate as ‘saliency’ parameter in egosphere May need to add parameter to MAP object Avoid sending any DOAs when SNR is poor/no speech activity Incorporate tracking – audio only or audio-visual. Need interface to get visual DOAs into Matlab

Audio enhancement -> ASR Matlab Spherical harmonic domain beamforming 1st order (relatively wide beams) fixed look direction (chosen for robustness of demo) limited to 5 kHz Coherent-to-diffuse ratio-based post filter Uses simulated HRTFs Enhanced audio written to TCP/IP pipe in continuous stream of small blocks

Audio enhancement -> ASR Python script Reads audio from pipe Endpointing using basic energy-based voice activity detector Sends audio to Google ASR Transcription sent to Naoqi dialogue system ‘Holds off’ further ASR while Nao speaks

Audio enhancement -> ASR Scope for improvement Steer beam using DOAs Can it be done robustly? Post filter with higher frequency HRTFs Acoustic echo cancellation to avoid ‘hold off’ period Add dereverberation?

Visual localisation -> egosphere

Egosphere behaviour DOAs arrive from audio and video subsystems All DOAs are attended to (looked at) with priority according to saliency

12 channel digital audio Ubuntu / OSX Naoqi playrec naolab Audio stream Matlab audio DSP Synth speech Dialogue manager Python ASR Motor commands (head pose) Ego sphere Python Interface Ubuntu Video stream C++ visual localisation Face detector Synchronised mono video + face positions