Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – Department.

Slides:



Advertisements
Similar presentations
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
IAFPA 2006 On sentence content, speaker familiarity and dialect Elisabeth Zetterholm, Erik J. Eriksson & Kirk P.H. Sullivan.
Voice quality variation with fundamental frequency in English and Mandarin.
Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Acoustic effects of variation in vocal effort by men, women and children Hartmut Traunmüller and Anders Eriksson assisted by Anita Andersson, Ingegerd.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Voice Onset Time as a Parameter for Identification of Bilinguals Claire Gurski University of Western Ontario London, ON Canada.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Using Creaky Voice Index in Forensic Phonetics – Is it valid and is it reliable? ____________________________ Tuija Niemi-Laitinen Forensic Scientist/Technical.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Today Speaker Variable: Gender
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
LE 460 L Acoustics and Experimental Phonetics L-13
Kinect Player Gender Recognition from Speech Analysis
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Nasal endings of Taiwan Mandarin: Production, perception, and linguistic change Student : Shu-Ping Huang ID No. : NA3C0004 Professor : Dr. Chung Chienjer.
A study on Prediction on Listener Emotion in Speech for Medical Doctor Interface M.Kurematsu Faculty of Software and Information Science Iwate Prefectural.
Automatic Pitch Tracking January 16, 2013 The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll.
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 5 Sounds III.
Is phonetic variation represented in memory for pitch accents ? Amelia E. Kimball Jennifer Cole Gary Dell Stefanie Shattuck-Hufnagel ETAP 3 May 28, 2015.
Speech Acoustics1 Clinical Application of Frequency and Intensity Variables Frequency Variables Amplitude and Intensity Variables Voice Disorders Neurological.
IAFPA 2007 Plymouth, July 22-25, 2007 Developments in automatic speaker recognition at the BKA Michael Jessen, Bundeskriminalamt Franz Broß, Univ. Applied.
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.
Speech Perception 4/4/00.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
SPPA 6010 Advanced Speech Science
Lecture 1 Phonetics – the study of speech sounds
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Creaky voice usage among French female learners of English: acoustic and electroglottographic study Zakaria TOUHAMI, Paris Diderot University—Paris 7
Sentence Durations and Accentedness Judgments
Speech Science I Perry C. Hanavan.
Acoustics´08 Paris, 29 June – July 2008
Voice source characterisation
A maximum likelihood estimation and training on the fly approach
Analyzing F0 and vowel formants of Persian based on long-term features
Within-speaker variability in long-term F0
Presentation transcript:

Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – Department of Linguistics, Göteborg University and GSLT (Graduate School of Language Technology) IAFPA 2006

Outline Background and Introduction –F0 and Forensic Phonetics –Modulation theory of speech Hypotheses Methods Results –F0 Statistics – for Young Swedish males –Robustness test –Vocal effort test. –Liveliness illustration. Conclusions Future Work

Background and Introduction F0 a reliable parameter for speaker identification (French, 1990 ; Hollien, 1990 ; Künzel, 1987 ; Nolan, in Braun, 1995). Technical, physiological and psychological factors (Braun, 1995). Fundamental frequency measures. Some previous studies and results.

Background and Introduction (Braun, 1995) Technical factors –Tape speed unfortunately still a problem. –Sample durations (50, 75, 14, 120 s?). Physiological factors –Age, smoking, operations. –Larynx size, shape and mass. –Between speaker variation. Psychological factors –Noise level, emotions, time of the day. –Vocal effort, speaking rate, F0-dynamics, voice quality –Within speaker variation

Background and Introduction Fundamental frequency measures –Average –Standard deviation –Median –Interquartile range –F0 mode –Base value! Modulation theory of speech.

Modulation theory of speech The theory /…/ considers speech signals as the result of allowing conventional gestures to modulate a carrier signal that has the personal characteristics of the speaker. This implies that in general the conventional information can only be retrieved by demodulation. In order to perceive the phonetic quality of a speech signal, listeners evaluate the deviations of the properties of the signal (F0, formant frequencies, etc.) from those they expect of a neutral vocalization produced by the speaker with properties given by his age, sex, vocal effort, speech rate, etc. (part of abstract -Traunmüller, 1994)

F0 Liveliness Average F0 ‑ variation (SD in semitones) as a function of the type of speech as classified in. Under ‘Type’, the speech samples are classified according to their expected liveliness (Traunmüller & Eriksson, 1995).

F0 Mean, SD and ‘liveliness’ InvestigationTypenSexAgeF0F0 SD Rappaport (1958), German1190m Chevrie ‑ Muller et al. (1967),Fr 221m20– Boë et al. (1975), Fr230m Takefuta et al. (1972), English424m Chen (1974), Mandarin Chinese22m30– Rose (1991), Wú24m25– Kitzing (1979), Swedish251m21– Pegoraro Krook (1988), Swedish2198m20–

F0 Mean, SD and ‘liveliness’ InvestigationTypenSexAgeF0F0 SD Johns ‑ Lewis (1986), English: Conversation25m24– Reading35m24– Acting45m24– Graddol (1986), English: Reading passage A212m25– Reading passage B312m25– Average/investigation10m Average/balanced speaker 471m1192.8

F0 Liveliness (Traunmüller & Eriksson, 1995) The SD of F 0 increases with increasing ‘liveliness’ of the discourse. The SD of F 0 seems to be larger in tone languages than in non ‑ tone languages.

F0 baseline (Traunmüller & Eriksson, 1995) F b = F mean – k   (F) Where k is a constant (app. 1.43). App. 5% F0 values below F b. Different liveliness, same F b. Tested by changing the factor and not F b when resynthesizing natural speech. ke = 0.156, 0.414, 0.704, 1.000, 1.290, 1.566, “Det finns folkstammar som äter både kattkött och hundkött”.

Hypotheses concerning F0 for young Swedish males The F0 median is more robust than the F0 mean when it comes to technical factors, i.e. less sensitive to outliers. The base value shows least within speaker variation of presented measures within a voice modality. (creaky voice, shouting or raising one’s voice) The 5% limit frequency (alternative baseline) is more robust than the base value when the technical factor means positive octave jumps.

Methods The software Praat (Boersma & Weenink, 2005) was used to automatically extract F0 data from 109 young male speakers (20-30 years old). –The group exist as such in the Swedia database. –62% of convicted criminals in Sweden 2004 (25-35). The recordings were taken from the Swedia database ( ) – spontaneous speech. Mean duration of 52.3 sec.

Methods Edited out interviewer. Manual check of octave jumps. Ongoing is the collection of 5% limit frequency, F0 mode (histograms for each speaker’s F0 distribution) and interquartile range.

Methods A small robustness test was made by measuring F0 for simultaneous recording on four different devices (material Livijn, 2004). –The North wind and the sun (in Swedish). –MCA, Cassette, Mobile and digital (Reference).

Methods Vocal effort test. 5 male speakers from Eriksson & Traunmüller (2000) High quality recordings. 5 distances/subject outdoors (0,3-1,5-7,5- 37,5-187,5m) –“Jag tog ett violett, åtta svarta och sex vita.”

Methods A liveliness illustration Recordings of a simulated carrier signal + a neutral, happy, sad and angry voice.

Results Mean of means 120,8 Hz – 65% between Hz

Results

Mean of medians 115,8 Hz – 68% between Hz

Results

Mean of std’s 24,1 Hz – 56% between Hz

Results Mean of baselines 86,3 Hz – 68% between Hz

Results

Conclusions The median is more robust than the mean when it comes to technical factors, i.e. less sensitive to outliers. –Yes. Manual check and results confirm this. The base value shows least within speaker variation of presented measures within a voice modality. –Yes. Shouting or raising one’s voice can mean raising one’s base value. –68% within 30 Hz, same as median. The 5% limit frequency is more robust than the base value when the technical factor means positive octave jumps. –Yes. Robustness test.

Conclusions F0 should be measured in case work. If baseline values are different there should be a reasonable explanation for it not to indicate speaker difference. –Such as ‘voice modality’ (creak, shout etc.) differences.

Future work F0 mode (ongoing) and individual histograms. More measures on different “liveliness” levels for same and different speakers on different recording devices. Sample size vs. content. Authentic case material. Separate study of creaky voice.

Thank you for your attention. Questions?

References Boersma, P. & Weenink, D. (2005) Praat: doing phonetics by computer (Version ) [Computer program] Retrieved October 7, 2005, from Braun, A. (1995) Fundamental frequency – how speaker-specific is it?, in Braun and Köster (eds) (1995): 9-23 Brottsförebyggande Rådet: [www] Retrieved November 26, 2005, from Bruce, G. (1982) Developing the Swedish Intonation Model. In Working Papers 22 (Lund University, Dep of Linguistics, Jassem, W., Steffen-Batog, S., and Czajka, M. (1973) Statistical characteristics short-term average F0 distributions as personal voice features, in W. Jassem (ed.) (1973) Speech Analysis and Synthesis vol. 3:209-25, Warsaw: Polish Academy of Science. Kitzing, P. (1979) Glottografisk frekvensindikering: En undersökningsmetod för mätning av röstläge och röstomfång samt framställning av röstfrekvensdistributionen (Lund University, Malmö) Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University Press. Traunmüller, H. (1994) Conventional, biological, and environmental factors in speech communication: A modulation theory. Phonetica 51: Traunmüller, H. & Eriksson, A. (1995) The frequency range of the voice fundamental in the speech of male and female adults. Unpublished Manuscript (can be retrieved from Traunmüller, H. & Eriksson, A. (1995) The perceptual evaluation of F0- excursions in speech as evidenced in liveliness estimations. J. Acoust. Soc. Am. 97: Hartmut Traunmüller and Anders Eriksson (2000) "Acoustic effects of variation in vocal effort by men, women, and children", J. Acoust Soc. Am. 107: Rose, P. (2002) Forensic Speaker Identification. New York, Taylor & Francis. Rose, P. (1991) How effective are long term mean and standard deviation as normalisation parameters for tonal fundamental frequency?, Speech Communication 10: