Download presentation
Presentation is loading. Please wait.
Published byCornelius Fitzgerald Modified over 9 years ago
1
1/59 Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship Clifford Nass Stanford University
2
2/59 Speaking is Fundamental Fundamental means of human communication Everyone speaks IQs as low as 50 Brains as small as 400 grams Humans are built for words Learn new word every two hours for 11 years
3
3/59 Listening to Speech is Fundamental Womb: Mother’s voice differentiation One day old: Differentiate speech vs. other sounds Responses Brain hemispheres Four day olds: Differentiate native language vs. other languages Adults: Phoneme differentiation at 40-50 phonemes per second Cope with cocktail parties
4
4/59 Listening Beyond Speech is Fundamental Humans are acutely aware of para-linguistic cues Gender Personality Accent Emotion Identity
5
5/59 Humans are Wired for Speech Special parts of the brain devoted to Speech recognition Speech production Para-linguistic processing Voice recognition and discrimination
6
6/59 Therefore … Voice interface should be the most Enjoyable, Efficient, & Memorable method for providing and acquiring information
7
7/59 Are They? No! Why Not? Machines are different than humans Technology is insufficient But are these good reasons?
8
8/59 It’s Easy to Create Rich Interactions
9
9/59 Critical Insights Voice = Human Technology Voice = Human Voice Human-Technology Interaction = Human-Human Interaction
10
10/59 Where’s the Leverage? Social sciences can give us What’s important What’s unimportant Understanding Methods Unanswered questions
11
11/59 Examples of the Power of Social Science
12
12/59 Male or Female Voice? Is gender important? Can technology have gender?
13
13/59 The Case of BMW
14
14/59 Brains are Built to Detect Voice Gender First human category Infants at six months Self-identification by 2-3 years old Within seconds for adults Multiple ways to recognize gender in voice Pitch Pitch range Variety of other spectral characteristics
15
15/59 Once Person Identifies Gender by Voice Guides every interaction Same-gender favoritism Trust Comfort Gender stereotyping
16
16/59 Gender and Products Gender should match product More appropriate More credible Mutual influence of voice and product gender Female voices feminize products (and conversely) Female products feminize voices (and conversely) “Match principle”
17
17/59 Research Context “Gender” of voice (synthetic) Gender of user “Gender” of product E-Commerce website
18
18/59 Examples of Advertisements “Female” voice; female product “Male” voice; female product “Male” voice; male product
19
19/59 Appropriateness of the Voice
20
20/59 Voice/Product Gender Influences Female voices feminize products; Male voices masculinize products Strongest for opposite gender products Female products feminize voices; Male products maculinize voices Strong preference when voice matches product
21
21/59 Results for User Gender People trust voices that match themselves Females conform more with “female” voices Males conform more with “male” voices People like voices that match themselves Females like the “female” voice more Males like the “male” voice more
22
22/59 Other Results Participants denied stereotyping technology Participants denied harboring stereotypes!
23
23/59 People stereotype voices by gender Voice “gender” should match content “gender” Product descriptions Teaching Praise Jokes
24
24/59 Gender is Marked by Word Choice Female speech More “I,” “you,” “she,” “her,” “their,” “myself” Less “the,” “that,” these,” “one,” “two,” “some more” More compliments More apologies More relationships between things Less description of particular things “They” for living things only Voices should speak consistently with their “gender”
25
25/59 Selecting Voices Voices manifest many traits Gender Personality Age Ethnicity Voice traits should match content traits Content Language style Appearance (e.g., accent and race) Context Voice traits should match user traits
26
26/59 If Only One Voice Consider stereotypes Masculine vs. feminine (same voice) Boost high frequencies (feminine) Boost low frequencies (masculine)
27
27/59 Emotions
28
28/59 Emotion and Voice Voice is the first indicator of emotion Voice emotion has many markers Pitch Value Range Change rate Amplitude Value Range Change rate Words per minute
29
29/59 Emotion is always relevant User has initial emotion Interactions create emotions Voice is particularly powerful Frustration is particularly powerful
30
30/59 Emotion and Technology Could technology-based voices exhibit emotion? Could technology-based voice emotion influence people?
31
31/59 Research Context Create upset or happy drivers Have them “drive” for 25 minutes Female voice gives information and makes suggestions Upbeat Subdued
32
32/59 Number of Accidents
33
33/59 Results People speak to car much more when emotion is consistent People like car much more when emotion is consistent
34
34/59 Implications User emotion is a critical part of any interaction Emotion must match content Perception of voice Trust Intelligence User Performance Comfort Enjoyment
35
35/59 One Voice Emotion: Select for Goal Overall liking Slightly happy voice Attention-getting Anger Sadness Trust and vulnerability Sadness (mild)
36
36/59 If You Can’t Manipulate Voice Emotion Manipulate content Manipulate music
37
37/59 Using the First Person: Should IT say “I”
38
38/59 Should Voice Interfaces say “ I ” ? When should a voice interface say “I”? Does synthetic vs. recorded speech affect the answer to the previous question?
39
39/59 The Importance of “I” “I” is the most basic claim to humanity “I think, therefore I am” “I, Robot” Dobby and monsters don’t say “I” “I” is the marker of responsibility “I made a mistake” vs. “Mistakes were made”
40
40/59 Research Context Auction site Telephone interface with speech recognition Recorded bidding behavior Online questionnaire
41
41/59 Average Bidding Price
42
42/59 Results When “I”+Recorded or “No I”+Synthetic System is higher quality Users were much more relaxed “No I” is more objective “I” is more “present”
43
43/59 Results “I” is right for embodiments Robots Characters Autonomous intelligence (“KITT”) “I” is wrong when voice is second fiddle to technology Traditional car Heavily-branded products
44
44/59 Design Text-to-Speech is a machine voice Recorded speech is a human voice Design questions are Not philosophical questions Not judgment questions Experimentally verifiable
45
45/59 Mistakes are Tough to Talk About
46
46/59 Who is Responsible for Errors? Recognition is not perfect When system fails, who should be assigned responsibility? System User No one
47
47/59 Responding to Errors Modesty Likable Unintelligent (people believe modesty!) Criticism Isn’t really constructive Unpleasant Intelligent Scapegoating Effective Safe
48
48/59 System Responses to Errors System blame (most common) No blame User blame
49
49/59 Research context Amazon-by-phone Numerous planned interaction errors
50
50/59 Book Buying
51
51/59 Results Neutral and system blame Sell much better than user blame Easier to use than system blame Nicer than system blame User blame is most intelligent! System blame is least intelligent
52
52/59 Results for Errors Take responsibility when unavoidable Increases trust Increases liking Weak negative effect on intelligence Ignore errors whenever possible Duck responsibility to third party if needed Blame the phone line Blame the road Making the Microsoft paperclip likable!
53
53/59 Results for Errors Show commitment to the interaction Make guesses Show concern Griceian maxims Quality Quantity Relevance Clarity
54
54/59 Design Error recovery is critically important Negative experiences are more memorable Adaptation is crucially important Flattery is effective Note times when interaction is successful Design to avoid errors Alignment (good repetition) Air quotes Scripting is important at all stages of the interaction
55
55/59 Other Areas of Importance/Research
56
56/59 Other Key Findings Personality Accents Multiple voices and mixing voices Input vs. output modality Microphone type
57
57/59 Tying it All Together Voice interfaces can be the most enjoyable, efficient, and memorable method for acquiring and providing information Voice interfaces turn up the volume knob in user responses The key is leveraging social aspects of speech
58
58/59 Summary – Part 1 Humans are wired for speech Interactions with voice interfaces are fundamentally social Same social rules Same social expectations
59
59/59 Summary – Part 2 Social aspects of voice interfaces can be beneficial Users perform better Users feel better Users understand better Social aspects of voice interfaces cannot be ignored Social audit is critical Social design is critical Design psychology can be leveraged Less expensive than technology More effective than technology Broader impact than technology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.