King Saud University College of Engineering IE – 341: “Human Factors Engineering” Fall – 2016 (1st Sem. 1437-8H) Human Capabilities Part – C. Speech.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Speech Perception Dynamics of Speech
Your Vocal Instrument.
PHONETICS AND PHONOLOGY
The nature of sound Types of losses Possible causes of hearing loss Educational implications Preparing students for hearing assessment.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Introduction to Speech Production Lecture 1. Phonetics and Phonology Phonetics: The physical manifestation of language in sound waves. –How sounds are.
Speech Communications Chapter 7. Speech Communications  The Nature of Speech    Criteria for Evaluating Speech    Components of Speech Communication.
Chapter 2 Introduction to articulatory phonetics
The Description of Speech
Phonetics HSSP Week 5.
Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
Human Capabilities Part - B. Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
IE341: Human Factors Engineering Prof. Mohamed Zaki Ramadan Lecture 6 – Auditory Displays.
Phonetics and Phonology
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
LE 222 Sound and English Sound system
Chapter 7 SPEECH COMMUNICATIONS
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
English Phonetics and Phonology
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
Speech Perception 4/4/00.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Human Capabilities Part – I. Hearing (Chapter 6*) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
PHONETIC 1 MGSTER. RAMON GUERRA by: Mgster. Ramon Guerra.
Acoustic Phonetics 3/14/00.
Speech in the DHH Classroom A new perspective. Speech in the DHH Bilingual Classroom Important to look beyond the traditional view of speech Think of.
Whip Around If you were stranded on an island, what two things would you like to have with you? Think about this question and be prepared to share aloud.
 What is one fun thing that you did this summer?  Think about this question and be prepared to share aloud.
PHONETICS AND PHONOLOGY
Speech Audiometry Lecture 8.
IIS for Speech Processing Michael J. Watts
Chapter 3: The Speech Process
L 17 The Human Voice.
ARTICULATORY PHONETICS
Chapter 4: The Sounds of American English
Improving voice and diction Introduction
The Human Voice. 1. The vocal organs
What is Speech-Language Therapy?
Introduction to Linguistics
Vowels and Consonant Serikova Aigerim.
Structure of Spoken Language
Sounds of Language: fənɛ́tɪks
Chapter 3: The Speech Process
Essentials of English Phonetics
King Saud University College of Engineering IE – 341: “Human Factors Engineering” Fall – 2017 (1st Sem H) Chapter 3. Information Input and Processing.
The Human Voice. 1. The vocal organs
Topic 10: Now hear this © Siemens AG All rights reserved.
WHAT IS SOUND?!?!? Sound Vibration
Speech Perception.
Previous Microphone Array System Integrated Microphone Array System
Chapter 3. Information Input and Processing
1. SPEECH PRODUCTION MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Chapter 2 Phonology.
Speech Perception (acoustic cues)
Sound, language, thought and sense integration
What is phonetics? It is the study of the production, transmission and reception of speech sounds. It studies the medium of the spoken language. It looks.
INTRODUCTION TO PHONETICS for III H.E.C.E., V Semester Students
PHONETICS AND PHONOLOGY
Speech Communications
Presentation transcript:

King Saud University College of Engineering IE – 341: “Human Factors Engineering” Fall – 2016 (1st Sem. 1437-8H) Human Capabilities Part – C. Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD

Lesson Overview Introduction The Nature of Speech Criteria for Evaluating Speech Components of Speech Communication Systems

Introduction Speech is form of “display” Source of speech i.e. form of auditory information Source of speech Mostly human (focus of this lesson) Could also be synthesized i.e. machine; e.g. voice mail, access confirmation) Receiver of speech Mostly human Could also be machine: “voice recognition” not advanced as synthesized sound

The Nature of Speech Speech: closely associated with breathing Organs associated with speech: Lungs Larynx contains vocal cords Pharynx channel bet. larynx & mouth Mouth (AKA: oral cavity): tongue, lips, teeth, velum Nasal cavity Watch the following video: “how speech works” https://youtu.be/C2lRhe_Fc04 Velum

Cont. The Nature of Speech Vocal cords Contains vibrating folds Opening between folds: glottis / epiglottis Vibrates 80-400 times/sec. Rate of vibration of vocal cords: controls freq. of resulting speech sounds Watch the following video on vocal cords: https://youtu.be/P2pLJfWUjc8 Speech/sound waves: Produced by: vocal cords Further modified by “resonators”: pharynx, oral cavity, nasal cavity Further articulated by “manipulators”: Mouth: tongue, lips, velum Nasal cavity: velum, pharynx muscles Also watch “Vocal Cords in Action”: https://youtu.be/iYpDwhpILkQ

Cont. The Nature of Speech Types of Speech sounds Phonemes Basic unit of speech Defn: “shortest segment of speech which, if changed, would change the meaning of a word” Phonemes in English language: Vowel sounds: 13 (e.g. u sound in put, u sound in but) Consonant sounds: 25 (e.g. g sound in gyp, g in gale) Diphthongs (i.e. sound combinations): e.g. oy sound in boy; ou sound in about Can you compare these to Arabic phonemes? Combining phonemes: Phonemes form syllables ⇒ syllables form words (e.g. ac·a·dem·ic) ⇒ words form sentences Note Phonemes > letters (why?): since phonemes change when combined together (e.g. d in di different than du) * Can you find the number of diphthongs in the English / Arabic languages?

Cont. The Nature of Speech Depicting Speech Sound is generated by variations in air pressure This is represented in several graphical ways Method 1: waveform Shows intensity variation over time (relative scale) Listen to file below for verse “بسم الله الرحمن الرحيم”* *The clip shown was created using program “Wavepad Sound Editor”; clip is by Sh. Abdul-Bassitt Abdul-Samad

Cont. The Nature of Speech Cont. Depicting Speech Method 2: spectrum Shows for given phoneme / word: intensity of various frequencies in that sound sample (see right) Which freq. has highest intensity in shown figure? Method 3: sound spectrogram Frequency: vertical scale Time: horizontal scale Intensity: degree of darkness on plot (see right) * note, the figure states c. Sound Spectogram; the correct spelling is Spectrogram

Cont. The Nature of Speech Intensity of Speech (AKA “Speech Power”) Variation among phonemes Vowels speech power » consonants e.g. a in “talk” has speech power: 680 times > th in then (i.e 28 dB difference) Variation among speech types conversational speech: 45-55 dBA* Telephone/lecture speech: 65 dBA Loud speech: 75 dBA Shouting: 85 dBA Variation: Male & Female Male > female by 3-5 dB (in general) Men in lower freq. has higher intensity than women (see right) *note: A in dBA: most widely used sound filter (i.e. used to control noise in sound); sound level meter is thus less sensitive to very high and very low frequencies; C filter is used for v. high sounds (dBC)

Criteria for Evaluating Speech Speech Intelligibility Defn: “degree/percentage to which a speech message (e.g. group of words) is correctly recognized” This’s major criterion for evaluating speech Assessment of speech intelligibility: Either repeating back read material Or answering questions regarding material Speech Intelligibility tests: Nonsense syllables (e.g. un, us, mus, sub, sud, …) these have least intelligibility Phonetically balanced (PB) word lists Nonsense syllables < words Intelligibility < sentences Complete sentences These have highest intelligibility, even when some words are not recognized (i.e. depends on context) e.g. “Did you go to the store” may sound as “Dijoo …” Watch the following videos regarding speech intelligibility: Measuring speech intelligibility : https://youtu.be/pPS3Z11Wf7Q Research study on speech intelligibility: https://youtu.be/hR7PeFEhnG0 * What level of hearing is speech intelligibility: detectability, discriminability, identification?

Cont. Criteria for Evaluating Speech Speech Quality Another criterion for evaluating speech May be important in identifying a specific speaker e.g. on phone (i.e. absolute identification) Also important to choose bet. different products e.g. speaker phone on home phones, mobile phones Assessment of speech quality Usually done using rating system e.g. people listen to speech and asked to rate quality: excellent, fair, poor, unacceptable, etc. May also be done by comparing to some standard speech quality

Components of Speech Communication Systems Speaker Message Transmission System Noise Environment Hearer Discussed here in terms of Effects on intelligibility of speech communications Methods to improve intelligibility of system

Cont. Speech Communication Systems Speaker Intelligibility of speaker usu. called “enunciation” Research found higher intelligibility is caused by: Longer syllable duration Speaking with high intensity Making use of speech time with spoken words and little pauses Variation of speech frequencies Differences bet. Intelligibilities generate from: Structure of articulators (sound-producing organs) Speech habits that people acquire Speech training may improve speech intelligibility (but not very much) Watch the following video on improving enunciation: https://youtu.be/pBDS6Li2WQM

Cont. Speech Communication Systems Message Affected by: phonemes used, words, context Phoneme Confusions Some speech sounds more easily confused than others e.g. letters in each group (consonants) can be confused with each other: DVPBGCET, FXSH, KJA, MN Avoid using single letters in presence of noise Word Characteristics: for higher intelligibility use: More familiar words Longer words: for longer words even if part of word is dropped, rest can still be figured out e.g. “word-spelling” alphabet: alpha, bravo, Charlie, delta, … instead of A, B, C, D

Cont. Speech Communication Systems Cont. Message Context features: for higher intelligibility use: Sentences (rather than words) Meaningful sentences (rather than non-sense phrases) e.g. “This book is great” rather than “is great book this” Less vocabulary (words) in the presence of noise More words with noise ⇒ less intelligibility (see below) Note, -ve SNR means noise is more intense than signal Also note, monosyllable: words with only one syllable (e.g. hit, ant, cube, fish) * thus, in the presence of noise it’s best to use 1-2 meaningful words (e.g. help me; calling someone by their First and Last Names)

Cont. Speech Communication Systems Transmission System Transmission Systems Natural: air Artificial: telephone, radio, etc. Artificial systems cause distortions, e.g. Amplitude distortion Frequency distortion Filtering Low-pass filter: eliminates freq. above some level High-pass filter: eliminates freq. Below level Filtering: freq. > 4000 Hz, < 600 Hz: little effect on intelligibility; but how about > 1000 Hz, < 3000 Hz? Filtering freq. > 4000 Hz  low pass filter  (from figure above) intelligibility > 90%; Filtering freq. < 600 Hz  high pass filter  (from figure above): intelligibility > 90% why? A: remember –simply– we hear high-sensitivity sounds between 1-3 kHz (use this to answer the second question shown above)

Cont. Speech Communication Systems Noise Environment causes biggest harm to speech intelligibility SNR (signal to noise ratio): Simplest way to evaluate impact of noise on intelligibility Study: for noise level of 35-100 dB ⇒ SNR = 12 dB for threshold of intelligibility (what to do for loud noise?) However, SNR does not take frequency into consideration (only intensity) Other measures (taking freq. into consideration): Articulation index (AI): a measure (0-1) of speech intelligibility while knowing the noise environment Preferred-octave speech interference level (PSIL): rough measure of effect of noise on speech reception Preferred noise criteria (PNC) curves: suggest acceptable noise level for different work environments (e.g. offices) A: Shouting is required for loud noise since SNR = 12 dB @ noise = 100 dB ⇒ signal = 112 dB

Cont. Speech Communication Systems Cont. Noise Environment Reverberation: Bouncing effect of noise from walls, floor, ceiling in a closed room Greatly decreases speech intelligibility (e.g. classrooms) In general, the longer the reverberation time, the more the speech intelligibility decreases Examine the linear relation (right) for a decaying 60 dB noise

Cont. Speech Communication Systems Hearer To receive speech under noise: hearer should Have normal hearing Be trained to receive messages Be able to withstand stress of situation Age Also affects speech reception (i.e. intelligibility); see right 20-29 age group: base level Note, unaltered speech: 120 wpm vs. speeded speech: 300 wpm Hearing protection Prevents hearing loss May improve SI for noise >80 dBA Decreases SI for noise <80 dBA Note the large SI drop at the 60 year mark

References Human Factors in Engineering and Design. Mark S. Sanders, Ernest J. McCormick. 7th Ed. McGraw: New York, 1993. ISBN: 0-07-112826-3.