A System for Hybridizing Vocal Performance By Kim Hang Lau.

Slides:

Advertisements

Similar presentations

Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

Advertisements

August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.

An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.

Final Year Project Pat Hurney Digital Pitch Correction for Electric Guitars.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December

Adaptive Delay Concealment for Internet Voice Applications with Packet-Based Time-Scale Modification Fang Liu, JongWon Kim, C.-C. Jay Kuo IEEE ICASSP 2001.

Yi Liang July 12, 2000 Adaptive Playout Time Control with Time-scale Packet Modification.

Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.

Introduction to Wavelets

Perceived video quality measurement Muhammad Saqib Ilyas CS 584 Spring 2005.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

Digital Communication Techniques

Advanced Phasor Measurement Units for the Real-Time Monitoring

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Exact Indexing of Dynamic Time Warping

Patrick-André Savard, Philippe Gournay and Roch Lefebvre Université de Sherbrooke, Québec, Canada.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Modulation, Demodulation and Coding Course Period Sorour Falahati Lecture 2.

Digital Sound and Video Chapter 10, Exploring the Digital Domain.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.

EE Audio Signals and Systems Effects Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.

1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.

A Segmentation Algorithm Using Dyadic Wavelet Transform and the Discrete Dynamic Contour Bernard Chiu University of Waterloo.

Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.

Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.

Chapter 12 The Principles of Computer Music Contents Digital Audio Processing Noise Reduction Audio Compression Digital Rights Management (DRM)

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Power Quality: A Nonlinear Adaptive Filter for Improved Power System Operation and Protection Research Overview: Focuses on the application of a new algorithm.

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Background 2 Outline 3 Scopus publications 4 Goal and a signal model 5Harmonic signal parameters estimation.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing

Packet loss concealment using audio morphing

Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.

A System for Hybridizing Vocal Performance

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Keyword Spotting Dynamic Time Warping

Auditory Morphing Weyni Clacken

Presentation transcript:

A System for Hybridizing Vocal Performance By Kim Hang Lau

Parameters of the singing voice  Parameters of the singing voice can be loosely classified as: –Timbre –Pitch contour –Time contour (rhythm) –Amplitude envelope (projections)

Vocal Modification  Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre  Commercially available units include –Intonation corrector –Pitch/formant processor –Harmonizer –Vocoder

Objectives  Prototype a system for vocal modification  Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample  Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance

Order of Presentation  System Overview  Individual components  System evaluation  System limitations  Conclusions and recommendations

System Overview  Three components –Pitch-marking –Time-alignment –Time/pitch/amplitude modification engine  Inspired by Verhelst’s prototype system for the post- synchronization of speech utterances

Targeted System Specifications Vocal performanceCommercial singing Vocal pitch range Hz Detection accuracy/resolution10 cents Detection dynamic range40dB Sampling rate44.1kHz and 48kHz Time-scale modification±20% Pitch-scale modification±600 cents

Component No.1 Pitch-marking

Pitch-marking and Glottal Closure Instants (GCIs)  Information generated from pitch-marking –Pitch period –Amplitude envelope –Voiced/unvoiced segment boundaries Pitch-marks 5ms PP’P’

Pitch-marking applying Dyadic Wavelet Transform (DyWT)  Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal  He assumed the correlation between edges in image signal and GCIs in speech signal  DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking  If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI

MallatKadambe Original Signal 2^1 2^22^3 2^42^5 Base-band

The proposed pitch-marking scheme  Detection principle –Detection of the scale that contains the fundamental period –Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered  Features –4X decimation to support high sampling rates –Frame based processing and error correction for possible quasi-real-time detection

The proposed pitch-marking system

Comparisons of results with Auto-Tune Proposed systemAuto-Tune

Component No.2 The Modification Engine

 (n): time-modification factor  (n): pitch-modification factor  (n): amplitude modification factor D(n): time-warping function  (n)  (n)  (n) D(n) Time/pitch/amplitude modification engine

TD-PSOLA (Time-domain Pitch Synchronous Overlap-Add)  Time-domain splicing overlap-add method  Used in prosodic modification of speech

Evaluation of the modification engine Original TD-PSOLA Auto-Tune

Component No.3 Time-alignment

Time-alignment  Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)  He claimed that the basic local constrain produces the most accurate time- warping path  Exponential increase in computation as length of comparison increases  Accuracy deteriorates as length of comparison increases

Adaptations from Verhelst’s method  Proposed to perform time-alignment on a voiced/unvoiced segmental basis –DTW for voiced segments –Linear Time Warping (LTW) for unvoiced segments  Global constraints are introduced to further reduce computations  Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation

Manipulation of modification parameters  Simple smoothing of  (n),  (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine

The Prototype System

System Evaluation: case 1

System Evaluation: case 2

System Limitations  Segmentation –Lack of a reliable technique for voiced/unvoiced segmentation –Segmentation and classification of different vocal sounds is the key to devise rules for modification  Modification engine –Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage

System Limitations  Pitch-marking –Proposed system lacks robustness –Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently  Time-alignment –The DTW basic local constraint allows infinite time expansion and compression. –This factor often causes distortions in the synthesized vocal sample

Conclusions and Recommendations  Current systems works well for slow and continuous singing  Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles

Questions & Answers

Wavelet filter bank

Dyadic Spline Wavelet

Wide-band analysis

DTW local constraints

Calculation of pitch-marks

DyWT