Audio and Speech Computers & New Media.

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Multimedia Interfaces What is a multimedia interface – Most anything where users do not just interact with text – E.g., audio, speech, images, faces, video,
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Data Visualization STAT 890, STAT 442, CM 462
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Introduction to machine learning
Representing Acoustic Information
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
Topics for Today General Audio Speech Music Music management support.
Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the.
Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Speech and Language Processing
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Keystroke Recognition using WiFi Signals
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Research Projects 6v81 Multimedia Database Yohan Jin, T.A.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Autonomous Robots Vision © Manfred Huber 2014.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Predicting Voice Elicited Emotions
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Korean Phoneme Discrimination
Brief Intro to Machine Learning CS539
Experience Report: System Log Analysis for Anomaly Detection
CS 445/656 Computer & New Media
CS 445/656 Computer & New Media
Recognition of bumblebee species by their buzzing sound
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
When to engage in interaction – and how
Customer Satisfaction Based on Voice
Machine Learning Feature Creation and Selection
Basic Intro Tutorial on Machine Learning and Data Mining
Digital Music Audio Processing
Brief Review of Recognition + Context
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Popular Music Vocal Analysis
Supervised vs. unsupervised Learning
Music Computer & New Media.
Wavelet Transform Fourier Transform Wavelet Transform
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Machine Learning.
Presentation transcript:

Audio and Speech Computers & New Media

Topics for Today General Audio Speech Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications

The Audio Signal Energy at each frequency step for every recorded point of time

Features for Audio Analysis Data over Time and Frequency

Energy Over Time What are these? Speech Music Gunshot

Summarizing the Audio Signal Sum energy for bands of frequencies over intervals of time

Audio Signal Analysis Fast Fourier Transform (FFT) Commonly used on audio signals Allows for analysis of frequency features across time Discrete Wavelet Transform (DWT) FFTs have equal sized windows where wavelets can vary based on frequency

Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC) Based on FFTs Maps results into bands approximating human auditory system

Event Detection Mapping audio cues to events Recognizing sounds related to particular events (e.g. gunshot, falling, scream)

Classifying Audio Signals Features are extracted from audio signals Can be time or frequency or both Features creates a multidimensional space of data points Supervised learning Train classifier with set of labeled signals SVMs, neural nets, … Unsupervised learning Cluster unlabeled signals based on similarity HAC, K-means, … Same for most any type of signal, not just audio

Speech Detection Another audio signal classification task Complicated by background sounds

Distinguishing between Speakers Speaker segmentation/diarization Identify when a change in speaker occurs Self-similarity assessments Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker

Speech Recognition Segment utterances & characterize phonemes Use gaps to segment Group phoneme segments into words Group words into requests or sentences

Speech Recognition Continuous speech What to do for noisy signal Language models for disambiguation Speaker dependent training improves recognition What to do for noisy signal Topic spotting Heuristic search

Playing Back or Generating Audio Where do you find audio cues in software outside of games? Mapping events in software to audio cues LogoMedia included audio cues to speed up stepping through code InfoSound used audio to aid in program comprehension Caitlin mapped code elements to different instruments

Spatialized Audio Additional geographic/navigational channel Examples Joyce’s interactive Central Park hyperaudio Audio maps of city for the visually impaired Conveys distances, directions, and object sizes Not for use while moving at time of writing

Spatialized Audio Generation Head-related transfer function (HRTF) Difference in timing and signal strength determine how we identify position of sound Easy to apply with headphones In open space Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location

Echology: Interacting with Spatialized Audio An interactive 2D soundscape combining human collaboration with aquarium activity Goal: engage visitors to spend more time with (and learn more about) Beluga whales Spatialized sound based on whale activity and human interaction

Echology Interaction Whale activity is classified to create different sounds in soundstage Visitors determine how sounds move through space

Echology Architecture

Topics for Today General Audio Speech Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications