Download presentation
Presentation is loading. Please wait.
1
Talking with computers
Meeting 6 – Module 2 Talking with computers Book C and S Tutor: Dr. Jihene Kaabi-Harrath
2
In this module, we were interested in very special type of communication: communication between people and ICT systems: Human-Technology communication (HTC). In this module, we will look specifically, at one of the most popular current trends: the use of speech. Example of speech use, dictation systems (speech is used to partially substitute for typing on keyboard.)
3
Use of cash point or ATM (Automatic Teller Machine)
1.2. ICT everywhere Example of ICT Systems Use of cash point or ATM (Automatic Teller Machine) An Automatic System in a customer service department directs a client call to a human operator Use of video-cassette recorder to record an interesting program on the television that you wouldn't like to miss when you go out.
4
The basic criteria that concern system interface designers are:
1.3 Basic design issues The basic criteria that concern system interface designers are: Ease of use: which is the most basic design issue if we need or want to become users of a technology system, we'd like it to be as simple to use as possible. Efficiency: which is a crucial aspect that contributes to the success or not of a system. Accessibility: maximum of people can use them, they can be found everywhere, accessible to disabled persons, multiple language interface, …
5
ATM: Automatic Teller Machine:
1.3 Basic design issues ATM: Automatic Teller Machine: The human interface is usually very intuitive: there is no learning involved in operating an ATM. However, ATMs aren’t accessible to users with severe cases of motion impairment. ACC (Automatic Call Center): Easy to use Accessible: there are a number of automatically programmed responses that attempt to cover most types of queries the call center receive. Efficiency: takes a lot of time
6
1.3 Basic design issues Definition The International Organization for Standardization (ISO) defined the term Usability as the “effectiveness, efficiency and satisfaction with which a specific set of users can achieve a specified set of tasks in a particular environment. ”
7
1.4 Integrating knowledge, ideas and experiences
The notion of the user interface became a general concept to both system designers and researchers. Computer companies thought that if they could improve the physical appearance of the user interface, they would have better chances of commercial success. Companies paid little attention to more relevant issues in HTC. Academic researchers on the other hand, were concerned with figuring out ways in which computers might help the work of the personal lives of people. Those researchers focused on the capabilities and limitation of human users.
8
1.4 Integrating knowledge, ideas and experiences
HTC is an area to which a number of different disciplines contribute knowledge and expertise from different disciplines is necessary for good interface design. Human–Technology Communication and related disciplines
9
1.4 Integrating knowledge, ideas and experiences
The contribution of computer science and engineering to the development of HTC is extremely important but it also draws on the knowledge, ideas and expertise of many other areas as well Some of these areas include: Linguistics Cognitive Psychology Philosophy Sociology Economics and human factors Linguistics is a very important factor in Speech Based Interfaces. It is a discipline which is of particular interest to speech recognition systems developers.
10
1.5 The changing face of technology
A commonly used technique for interfacing with ICT systems is through the input-output methodology. A commonly used input device is the keyboard and an output device is the monitor screen. Command-based interface: requires the user to type in a valid command to an ICT system. This may requires the users to learn the commands and may be tedious to use. Form-Fill interface: is based on the idea of using old paper filling systems of forms etc and was introduced for less experienced users. It’s lately used in web page designs.
11
1.5 The changing face of technology
Menu driven interface: provides a choice of option selections. GUI: (Graphical User Interface): Apple’s LISA computer was the first computer with a GUI. This is one of the most important development in the HTC field to date. GUI transformed HTC from text-based into a visual activity. The Microsoft Windows desktop is perhaps the best known example of a GUI.
12
1.5 The changing face of technology
Examples of visual interface menu-driven interface
13
1.5 The changing face of technology
Examples of visual interface a form-filling web page used in the OU’s intranet for applying for car-parking permits,
14
1.5 The changing face of technology
a menu-driven spreadsheet application
15
1.5 The changing face of technology
Notion of direct manipulation: In a context of direct manipulation, rather than dealing with abstract entities, you deal directly with the objects around you. This gives the user a sense of immediacy and naturalness Ultimate goal of HTC: is to design and develop interfaces that enable natural communication between users and machines such as Speech-Based Interfacing.
16
1.6 Speech-based Interfacing in context (1)
Motivation: Speech is the most natural and preferred modes of communication between people. Speaking feels more natural than typing an a keyboard. Business-like considerations such as probably increased productivity Accessibility to impaired users. Difficulties: Technically difficult to build
17
Part 2 Making things better
18
2.1 Introduction Recognition errors: One of the most basic problems in implementing spoken dialogue systems is the potential for speech recognition errors. Pronunciation: in order to achieve robustness, that is, the capability to deal with a wide range of variations in pronunciation and surrounding environmental conditions, some designers are looking at integrating speech with other modes of interacting with machines. Multimodal architectures: are schemes for human interfaces that use more than one mode of interaction.
19
2.1 Introduction Virtual reality Vs augmented reality: In contrast with virtual reality, which attempts to place the user within a high-quality, three-dimensional animated representation of the world, augmented reality explores the fact that the physical configuration of a device is a major factor that determines its usability.
20
2.2 Speech-based Interfacing in context (2)
Read article 2 by Sharon Oviatt’s “Taming recognition errors with a multimodal interface” Book C page 38
21
2.2 Speech-based Interfacing in context (2)
Activity 2.3 book C page 23 1- The main problems for the use of speech-base technology are: Speech style different from the training data used to develop the recognizer (spontaneous speech and accented speakers). A speech recognizer is developed using a set of training data that doesn’t represent all the possible ways in which speech can be realized (in terms of accent, intonation, articulation, etc.). So, whenever the speech style diverges from the original training data, there is the possibility of error.
22
2.2 Speech-based Interfacing in context (2)
Extraneous noise (speech within a natural field environment). A speech recognizer is developed under specific conditions (e.g. a silent laboratory), which makes it difficult for a system to handle inputs in real-world environments that involve unpredictable noise.
23
2.2 Speech-based Interfacing in context (2)
3- Multimodal interfacing is a type of human interface architecture that uses more than one modality of communication between user and system. 4- - First, a multimodal architecture enables users to choose freely the most appropriate input mode for a given situation. - Second, the complexity of language processing can be minimized in a multimodal architecture as users tend to use simpler and briefer language. - Finally, a multimodal architecture allows error correction triggered by the user with the choice of an alternative input mode.
24
2.2 Speech-based Interfacing in context (2)
5- Mutual disambiguation is the capability of a multimodal system to make correct decisions based on an evaluation of ambiguous or conflicting inputs by different modes. This is clearly an advantage when the input modes are error-prone.
25
2.3 Towards communication
Multimodal interfacing explores to a great extent the capabilities and the limitations of both the users and the available technology. Aspects of human communication
26
2.3 Towards communication
In particular, paralanguage, that is, how you say things rather than what you say, is a central element that conveys meaning in a conversation. In fact, you’ve seen in Oviatt’s article how paralinguistic elements contribute to speech recognition errors. Aspects such as the pitch (how ‘high’ or ‘low’ the voice sounds) and volume (loudness) of speech carry an incredible amount of information related not only to what you’re saying but also about yourself, who you are, your feelings … From this wider perspective, the notion of direct manipulation is quite limited. One direction in which this concept has been extended is known as virtual reality.
27
2.3 Towards communication
Virtual reality is basically a collection of techniques for creating a ‘simulated experience’ (with the help of various devices connected to a computer system: a headset, manipulative gloves and special clothing with embedded sensors). These means connect you with the outside world, letting you know about the objects around you, and letting you interact with these objects and with other people. The idea of virtual reality is to stimulate your senses in ways that give you the impression of being immersed in an alternative reality.
28
2.3 Towards communication
Helmets are used in this virtual reality driving simulator
29
Book S Speech recognition
30
Humans use speech as the most natural way to communicate.
Introduction Humans use speech as the most natural way to communicate. Generally, Automatic Speech Recognition (ASR) is the recognition and understanding of human speech by machines. It is a difficult task to achieve due to the complication of the speech signal and associated algorithms. Despite being a difficult task, several commercial applications of speech recognition already exist. For example, AT&T (one of the largest US telecommunications companies) uses speech recognition to handle telephone call placement (e.g. user requesting a person-to-person call, calls charged to their telephone cards, etc.). The system processes about one billion calls per year with an accuracy of 95% (Levinson, 1995, p. 400).
31
Introduction ASR belongs to the area of Digital Signal Processing (DSP)/Computer Science. Before attempting any of the experiments described in this book you will need to verify your recording system and install the CSLU Toolkit software. The details are covered in Experiments 1, 2 and 3 in Book E, Experiments, Part 1 Speech recognition.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.