Audio lab Understanding the soundscape concept: the role of sound recognition and source identification David Chesmore Audio Systems Laboratory Department.

Slides:

Advertisements

Similar presentations

Signal Estimation Technology Inc. Maher S. Maklad Optimal Resolution of Noisy Seismic Data Resolve++

Advertisements

Applications of one-class classification

Speech Enhancement through Noise Reduction By Yating & Kundan.

Multipitch Tracking for Noisy Speech

Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.

THE AUSTRALIAN NATIONAL UNIVERSITY Infrasound Technology Workshop, November 2007, Tokyo, Japan OPTIMUM ARRAY DESIGN FOR THE DETECTION OF DISTANT.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Accelerometer-based Transportation Mode Detection on Smartphones

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Multichannel Phonocardiogram Source Separation PGBIOMED University of Reading 20 th July 2005 Conor Fearon and Scott Rickard University College Dublin.

A Study of Approaches for Object Recognition

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

1 Location-Based Services Using GSM Cell Information over Symbian OS Final Year Project LYU0301 Mok Ming Fai (mfmok1) Lee Kwok Chau (leekc1)

Distributed and Efficient Classifiers for Wireless Audio-Sensor Networks Baljeet Malhotra Ioanis Nikolaidis Mario A. Nascimento University of Alberta Canada.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Digital Sound and Video Chapter 10, Exploring the Digital Domain.

SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.

Image Pattern Recognition The identification of animal species through the classification of hair patterns using image pattern recognition: A case study.

ENDA MOLLOY, ELECTRONIC ENG. FINAL PRESENTATION, 31/03/09. Automated Image Analysis Techniques for Screening of Mammography Images.

Detection and Segmentation of Bird Song in Noisy Environments

1 A Local and Remote Radio Frequency Identification Learning Environment Andrew Shields & David Butcher Wireless and Mobility Research Group, Institute.

Copyright 1998, S.D. Personick. All Rights Reserved1 Telecommunications Networking I Lectures 2 & 3 Representing Information as a Signal.

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom

INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.

Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.

May 3 rd, 2010 Update Outline Monday, May 3 rd 2  Audio spatialization  Performance evaluation (source separation)  Source separation  System overview.

NEURAL NETWORKS FOR DATA MINING

Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.

Image Classification 영상분류

Multimodal Information Analysis for Emotion Recognition

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.

Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink

Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.

‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.

Multidimensional classification of burst triggers from the fifth science run of LIGO Soma Mukherjee CGWA, UTB GWDAW11, Potsdam, 12/18/06 LIGO-G

Gammachirp Auditory Filter

Signal Processing Algorithms for Wireless Acoustic Sensor Networks Alexander Bertrand Electrical Engineering Department (ESAT) Katholieke Universiteit.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.

CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Predicting Voice Elicited Emotions

Turning a Mobile Device into a Mouse in the Air

Independent Component Analysis Independent Component Analysis.

Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos

Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,

Benedikt Loesch and Bin Yang University of Stuttgart Chair of System Theory and Signal Processing International Workshop on Acoustic Echo and Noise Control,

Long-Term Soundscape Monitoring and Classification Birds, amphibians, mammals, and insects use sound to detect threats, find mates, seek food, navigate,

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

Automatic Transcription of Polyphonic Music

CS 445/656 Computer & New Media

Recognition of bumblebee species by their buzzing sound

ARTIFICIAL NEURAL NETWORKS

Acoustic mapping technology

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Analysis of Audio Using PCA

Audio and Speech Computers & New Media.

Overview: Chapter 4 Infrastructure Establishment

Presentation transcript:

Audio lab Understanding the soundscape concept: the role of sound recognition and source identification David Chesmore Audio Systems Laboratory Department of Electronics University of York

Audio lab Overview of Presentation Role of soundscape analysis Instrument for Soundscape Recognition, Identification and Evaluation (ISRIE) Soundscape description language Applications Conclusions

Audio lab Role of Soundscape Analysis Potential applications: identifying relevant sound elements in a soundscape (e.g. high intensity sounds) determining positive and negative sounds biodiversity studies tranquil areas preserving important soundscapes planning and noise abatement studies

Audio lab Soundscape Analysis Options Manual Advantage: subjective Disadvantages: time consuming, limited resources, subjective, very large storage requirements Automatic Advantages: objective (once trained), continuous analysis possible, much reduced data storage requirements Disadvantage: reliability of sound element classification

Audio lab How to Automatically Classify Sounds? Major issues to address: separation and localisation of sounds in the soundscape (especially with multiple simultaneous sounds) classification of sounds depends on feature overlap, number of elements Number of elements, localisation, etc depends on application

Audio lab Instrument for Soundscape Recognition, Identification and Evaluation (ISRIE) ISRIE is a collaborative project between York, Southampton and Newcastle Universities 1 of 3 projects arising from EPSRC Noisy Futures Sandpit York - sound separation + sound classification Southampton - applications + interface with users Newcastle - sound localisation + arrays

Audio lab Aim of ISRIE Aim is to produce an instrument capable of automatically identifying sounds in a soundscape by: separating sounds in 3-d localising sounds from the 3-d field classification of sound in a restricted range of categories

Audio lab Outline of ISRIE Localisation + Separation Classification (alt, az) Location Duration, SPL, L EQ Category Sensor ISRIE

Audio lab Sound Separation - Sensor B-format microphone as sensor –Provides 3D directional information –A coincident microphone array reduces convolutive separation problems to instantaneous. –More compact and practical than multi-microphone solutions. Outputs W – omni-directional component X – fig-8 response along x-axis Y – fig-8 response along y-axis Z – fig-8 response along z-axis

Audio lab Overview of Separation Method Use Coincident Microphone array Transform into Time-Frequency Domain Find Direction Of Arrival (DOA) vector for each Time- Frequency point. Filter sources based on known or estimated positions in 3D space

Audio lab Assumptions Approximately W-Disjoint Orthogonal Sparse in time-frequency domain, i.e. the power in any time-frequency window is attributed to one source. Sound sources are geographically spaced (sparse) Noise sources have unique Direction of Arrival (DOA).

Audio lab The Dual Tree Complex Wavelet Transform (DT-CWT) Efficient filterbank structure Approximately shift invariant

STFT separation

DT-CWT separation

Audio lab Separation results - speech 3 Male speakers Recorded in anechoic chamber ISVR. Mixed to virtual B-format, known locations spaced around microphone Performance Measure SpeakerSIR original (dB) SIR separated (dB) SIR gain (dB) PSRM (dB)

Audio lab Source Estimation and Tracking Examples used known source locations. In many deployment scenarios, this is acceptable. More versatility could be provided by finding source locations and tracking Two approaches considered 3D histogram approach Clustering using plastic self organising map

Audio lab Results 2 Speakers – Directional Geodesic Histogram Position of peaks at (0,0) and (10,20) degrees Blur between peaks due to 2 sources only approximating the assumptions

Audio lab Signal Classification What features? TDSC Which classifier? ANN – MLP, LVQ Which Sounds?

ISRIE Sound Categories

Audio lab Time-Domain Signal Coding Purely time-domain technique Successfully used for: Species recognition birds, crickets, bats, wood-boring insects Heart sound recognition Current applications Environmental sound Vehicle recognition

Audio lab Time-Domain Signal Coding Time Epoch

Audio lab MultiscaleTDSC (MTDSC) New method of D-S data presentation Replaces S-matrix, A-matrix or D-matrix Multiscale Made from groups of epochs in powers of 2 (512, 256, etc) Inspired by Wavelets

Audio lab MTDSC Level 1S 1(1) S 1(2) S 1(3) S 1(4) S 1(5) S 1(6) S 1(7) S 1(8) 2S 2(1) S 2(2) S 2(3) S 2(4) 3S 3(1) S 3(2) 4S4S4 1 Frame (epochs) Value in frame n=4

Audio lab MTDSC Example Logarithmic Chirp – 100Hz - 24kHz Epoch frame length 2 m

Audio lab MTDSC (cont) Currently use shape but will investigate: epoch duration (zero-crossings interval) only epoch duration and shape epoch duration, shape and energy Also use mean, can also use varience, higher order statistics for larger values of m (e.g. 9)

Audio lab MTDSC Results (1) MTDSC data generation & stacking 3 output LVQ network Audio Winning output determines result Overall network accuracy: 76% Some categories better than others –Road, Rail – 93%

Audio lab MTDSC Results (2) 3 different Japanese cicada species used for biodiversity studies (2 common, 1 rare) in northern Japan 21 test files from field recordings including 1 with -6dB SNR Backpropagation MLP classifier 20 out of 21 test files correctly classified ~ 95% accuracy

Audio lab Practical ISRIE Localisation + Separation Classification (alt, az) Location Duration, SPL, L EQ Category Sensor ISRIE Approx location required sound category User Supplied Data

Audio lab Restricting Location   Cone of acceptance Automatic rejection of signals target

Audio lab Further Automated Analysis At present, ISRIE only provides a classified sound element in a small range of categories Can we create a soundscape description language (SDL)? Needs to be flexible enough to accomodate manually and automatically generated soundscapes Take inspiration from speech recognition, natural language, bioacoustics (e.g. automated ID of insects, birds, bats, cetaceans)

Audio lab sonotag =  (L, ,d,t,D,a,c,p,G) whereL = label  = direction of sound d = estimated distance to sound t = onset time D = duration a = received sound pressure level c = classification (a = automatic, m = manual) p = level of confidence in classification G = geotag = G(ll,lo,al) ll = lat, lo = longitude, al = altitude Other possibilities exist

Audio lab Example of Monaural Sonotags 18s recording of O. viridulus at nature reserve in Yorkshire in 2003  (O. viridulus, ,1,11:45,2,50,a,0.99,(53.914,-0.845,10))  (O. viridulus, ,1,11:50,1.5,50,a,0.99,(53.914,-0.845,10))  (plane, ,100,11:52.5,5,35,a,0.96,(53.914,-0.845,10))  (Bird1, ,100,12:02,5,41,a,0.99,(53.914,-0.845,10))

Audio lab Example of 3-D Sonotags  (speaker2,0,0,1.5,14:00,5,43,a,0.96,(53.9,-0.9,10))  (speaker1,10,20,2,14:00,5,42,a,0.92,(53.9,-0.9,10)) Treat separated sounds as monaural recordings for classification

Audio lab Applications (1) BS 4142 assessments PPG 24 assessments Noise nuisance applications Other acoustic consultancy problems Soundscape recordings Future noise policy

Audio lab Applications (2) Biodiversity assessment, endangered species monitoring Alien invasive species (e.g. Cane Toad in Australia) Anthropomorphic noise effects on animals Habitat fragmentation Tranquility studies

Audio lab Conclusions ISRIE has been shown to be successful in separating and classifying urban sounds much work still to be done, especially in classification Automated soundscape description is possible but a flexible and formal framework is needed