Automation Reliability in Unmanned Aerial Vehicle Control: A Reliance-Compliance Model of Automation Dependence in High Workload Stephen R. Dixon, Christopher.

Slides:

Advertisements

Similar presentations

Method Participants 184 five-year-old (M age=5.63, SD=0.22) kindergarten students from 30 classrooms in central Illinois Teacher ratings The second edition.

Advertisements

Postgraduate Course 7. Evidence-based management: Research designs.

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.

1 Emergency Services Electronic Searches Mission Pilot CAPR Attachment 10 Paragraph b Richard Shulak Wasatch Sqdn. RMR-UT-008 January 2000.

Errors and Uncertainties in Biology Accuracy Accuracy indicates how close a measurement is to the accepted value. For example, we'd expect a balance.

LOGO Dual-Task Performance Consequences of Imperfect Alerting Associated With a Cockpit Display of Traffic Information Christopher Wickens, Angela Colcombe.

Tables Data collected during an experiment should be recorded in a Table The First column of the table contains the manipulated variable The second column.

Multimodal feedback : an assessment of performance and mental workload (NASA-TLX) 남종용.

Haptic Signals for Communication under Workload In a primarily visual task, haptic signals can be more resistant to large cognitive workloads than visual.

Title slide PIPELINE QRA SEMINAR. PIPELINE RISK ASSESSMENT INTRODUCTION TO RISK IDENTIFICATION 2.

How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.

Verification of the Change Blindness Phenomenon While Managing Critical Events on a Combat Information Display 作者： Joseph DiVita et al. 報告者：李正彥日期： 2006/4/27.

Use of Aerial Views in Remote Ground Vehicle Operations Roger A. Chadwick New Mexico State University Department of Psychology Douglas J. Gillan (PI)

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Quality Assurance in the clinical laboratory

Descriptive Statistics

Testing Hypotheses.

Pilot: Customizing a Commercially Available Digital Game to Assess Cognitive Function William C. M. Grenhart, John F. Sprufera, Jason C. Allaire, & Anne.

Effects of Warning Validity and Proximity on Responses to Warnings Joachim Meyer, Israel HUMAN FACTORS, Vol. 43, No. 4 (2001)

Collective Mistrust of Alarms James P. Bliss, Ph.D. Susan Sidone Holly Mason Old Dominion University.

Chapter 11 Research Methods in Behavior Modification.

Design of Experiments Chapter 21.

ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.

1 Statistical Inference. 2 The larger the sample size (n) the more confident you can be that your sample mean is a good representation of the population.

LOGO Imperfect in- vehicle collision avoidance warning systems can aid distracted drivers Masha Maltz, David Shinar Transportation Research Part F 10 (2007)

Chapter 2 Research in Abnormal Psychology. Slide 2 Research in Abnormal Psychology  Clinical researchers face certain challenges that make their investigations.

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Inference and Inferential Statistics Methods of Educational Research EDU 660.

Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.

Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.

COMPANY L O G O On the Independence of Compliance and Reliance: Are Automation False Alarms Worse Than Misses? Stephen R. Dixon, Christopher D. Wickens,

[Hayes, Dassen, Schilder and Wallage, Principles of Auditing An Introduction to ISAs, edition 2.1] © Pearson Education Limited 2007 Slide 7.1 Internal.

Application control chart concepts of designing a pre-alarm system Sheue-Ling Hwang, Jhih-Tsong Lin, Guo-Feng Liang, Yi-Jan Yau, Tzu-Chung Yenn, Chong-Cheng.

Chapter 10 Verification and Validation of Simulation Models

Effect of a concurrent auditory task on visual search performance in a driving-related image-flicker task Professor: Liu Student: Ruby.

QC THE MULTIRULE INTERPRETATION

Experiment 2 (N=10) Purpose: Examine the ability of rare abrupt onsets (20% of trials) to capture attention away from a relevant cue. Design: Half of the.

Computer and Robot Vision II Chapter 20 Accuracy Presented by: 傅楸善 & 王林農指導教授 : 傅楸善博士.

Age Differences in Visual Search for Traffic Signs During a Simulated Conversation 學生：董瑩蟬.

REFERENCES Bargh, J. A., Gollwitzer, P. M., Lee-Chai, A., Barndollar, K., & Troetschel, R. (2001). The automated will: Nonconscious activation and pursuit.

When the brain is prepared to learn: Enhancing human learning using real- time fMRI Y, J. J. a, Hinds, O. b, Ofen, N. a, Thompson, T. W. b, Whitﬁeld-Gabrieli,

Boeing-MIT Collaborative Time- Sensitive Targeting Project July 28, 2006 Stacey Scott, M. L. Cummings (PI) Humans and Automation Laboratory

Older adults generally perform worse than younger adults on tests of episodic long-term memory, but show preserved performance on tests of semantic memory.

Disrupting face biases in visual attention Anna S. Law, Liverpool John Moores University Stephen R. H. Langton, University of Stirling Introduction Method.

Modeling Speed-Accuracy Tradeoffs in Recognition Darryl W. Schneider John R. Anderson Carnegie Mellon University.

Ken W.L. Chan, Alan H.S. Chan* Displays 26 (2005) 109–119 Spatial S–R compatibility of visual and auditory signals: implications for human–machine interface.

Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.

Sampling techniques validity & reliability Lesson 8.

Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.

PS Research Methods I with Kimberly Maring Unit 9 – Experimental Research Chapter 6 of our text: Zechmeister, J. S., Zechmeister, E. B., & Shaughnessy,

Aerial Inspection J.P. Donley System Maintenance Director Pedernales Electric Cooperative.

Body Position Influences Maintenance of Objects in Visual Short-Term Memory Mia J. Branson, Joshua D. Cosman, and Shaun P. Vecera Department of Psychology,

Quality Assurance in the clinical laboratory

Verifying Precipitation Events Using Composite Statistics

Impacts of workload on trust in imperfect automated systems

Oliver Sawi1,2, Hunter Johnson1, Kenneth Paap1;

Critical Reading of Clinical Study Results

Management of Multiple Dynamic Human Supervisory Control Tasks

More about Tests and Intervals

Daphna Shohamy, Anthony D. Wagner Neuron

Volume 17, Issue 21, Pages (November 2007)

The Generality of Parietal Involvement in Visual Attention

Between Thoughts and Actions: Motivationally Salient Cues Invigorate Mental Action in the Human Brain Avi Mendelsohn, Alex Pine, Daniela Schiller Neuron

Evidence Based Practice

Scientific Method.

Computer Vision II Chapter 20 Accuracy

Simon Hanslmayr, Jonas Matuschek, Marie-Christin Fellner

Embry-Riddle Aeronautical University, College of Aviation

A Goal Direction Signal in the Human Entorhinal/Subicular Region

Presentation transcript:

Automation Reliability in Unmanned Aerial Vehicle Control: A Reliance-Compliance Model of Automation Dependence in High Workload Stephen R. Dixon, Christopher D. Wickens University of Illinois HUMAN FACTORS, Vol. 48, No. 3, Fall 2006, pp

Introduction Unmanned aerial vehicles (UAVs) are now commonly used to fulfill military reconnaissance missions without endangering human pilots.

Imperfect Automation Imperfect automation has been shown to create different states of overtrust, undertrust, or calibrated trust complacency, and performance loss.

Diagnostic Failures: Misses and False Alarms The focus of the current study was on imperfect automation diagnostic alerting systems. There is some evidence that the generic costs of alerting system false alarms may be greater than those of misses.

Reliance Versus Compliance Reliance refers to the human operator state when the alert is silent, signaling “all is well.” Reliant operators will have ample resources to allocate to concurrent tasks because they rely on the automation to let them know when a problem occurs.

Compliance describes the operator’s response when the alarm sounds, whether true or false. A compliant operator will rapidly switch attention from concurrent activities to the alarm domain.

The Current Study UAVsimulation provided an ideal test bed for two experiments that examined the issues of imperfect automation in dual-task settings. 4 hypotheses: H1: The symptoms of automation dependence (benefits if correct, costs if incorrect) will emerge primarily at high workload. Automation imperfection driven by misses and false alarms would show qualitatively different effects as reflected by measures of reliance and of compliance.

H2: Indices of high reliance will decrease with increasing miss rate. H3: Indices of high compliance will decrease with increasing false alarm rate. H4: The two vectors of reliance and compliance will show relative independence from each other.

METHODS: EXPERIMENT 1 Participants: Thirty-two undergraduate and graduate students received $8/hr, plus bonuses of $20, $10, and $5, for first-, second-, and third-place finishes, out of groups of 8 participants.

Apparatus The experimental simulation ran on an Evans and Sutherland SimFusion 4000q system. the experimental environment was subdivided into four separate windows.

The top left window contained a 3-D egocentric image view of the terrain directly below the UAV(6000 feet altitude). The bottom left window contained a 2-D top-down map of the 20 × 20 mile (32 × 32 km) simulation world.

The bottom center window contained the message box, with “fly to” coordinates and command target (CT) report questions. The bottom right window contained the four system gauges for the system failure monitoring task.

Procedure Each participant flew one UAV through 10 consecutive mission legs. During each leg, the participant completed three goal-oriented tasks: mission navigation and command target inspection, target of opportunity (TOO) search, and systems monitoring.

mission navigation and command target inspection: Once participants arrived at the CT location, they loitered around the target, manipulated a camera for closer target inspection via a joystick, and reported back relevant information to mission command.

TOO search: a task similar to the CT report except that the TOOs were much smaller than the CT report objects and were camouflaged. Systems monitoring: participants were also required to monitor the system gauges for possible system failures.

Design The auditory autoalerts for the SFs were provided for three out of the four conditions, using a between-subjects design. four conditions:A100, A67f, A67m, and baseline.

A100 condition: A = automation, 100% reliable. A67f condition: f = false alarm, 67% reliable, provided 10 true alerts and an additional 5 false alarms. A67m condition: (m = miss, 67% reliable) provided 10 true alerts but failed to alert an additional 5 events (10 true alarms plus 5 misses).

RESULTS: EXPERIMENT 1-Primary Task: Mission Navigation and CT Inspection Tracking error and CT reporting: Planned comparisons revealed no main effect for tracking error (all ps >.10) or for CT reporting speed an accuracy. Repeats: the A67m condition generated twice as many repeats as the did A67f condition, t(14) = 2.52, p =.01.

Secondary Task: TOO Monitoring TOO detection rates: detection rates were significantly lower in the A67m (miss) condition than in the A67f (false alarm) condition in both the lowworkload, t(12)=2.25, p <.05, and high-workload trials, t(12) = 2.20, p <.05.

TOO detection times: the A67f condition may have generated longer detection times than the A67m condition did, t(6) = 1.40, p =.10 (approaching significance).

SF Monitoring SF detection rates: Planned comparisons revealed that the 67% reliable conditions resulted in poorer detection rates than did the baseline condition. SF detection times: it is interesting to note that the A67m condition resulted in detection times slower than those in the A67f condition.

DISCUSSION: EXPERIMENT 1 Perfect automation had a beneficial effect, relative to baseline, on performance in the automated task, but it had no benefit on concurrent task performance. Imperfect automation (67%) hurt both the automated task and concurrent tasks, even dropping these below baseline in some cases.

METHODS: EXPERIMENT 2 The procedures of Experiment 2 replicated those of Experiment 1. An A80 condition (A = automation, 80% reliable) failed by giving 1 false alarm and 1 miss during each mission (8 true alarms, 1 miss, and 1 false alarm).

A60f condition (f = false alarm, 60% reliable) was created by imposing 3 automation false alarms and 1 automation miss (4 automation failures) out of the 10 possible system failures. A60m condition (m = miss, 60% reliable) resulted in 3 misses and 1false alarm (6 true alarms plus 3 misses and 1 false alarm).

RESULTS: EXPERIMENT 2- Mission Completion performance in the two 60% reliable conditions was worse than baseline, t(20) = 2.77, p <.05, whereas the A80 condition did not differ from baseline. There was no statistical difference between the A60f and A60 conditions.

TOO Monitoring TOO detection rates: Planned comparisons revealed that there was no difference between the 60% reliable conditions and baseline, t(20) = 1.17, p >.10, whereas performance in the A80 condition was better than baseline. TOO detection times: 60% reliable conditions was worse than baseline, t(16) = 3.09, p <.01, but there was no difference between the A80 condition and baseline.

SF Monitoring SF detection rates: the reduced rates in the A60f condition in high workload (50%), as compared with the other conditions (mean = 74%). SF detection times: in the high workload trials, performance in the 60% reliable conditions may have been worse than baseline.

Complacency effect: responses when an alert correctly sounded (A60f = s; A60m = 3.96 s) and those when the alert failed to sound (A60f = s; A60m = s).

DISCUSSION: EXPERIMENT 2 Highly reliable automation did not benefit performance in the automated task relative to baseline, but it had a small benefit to concurrent task performance. Lowreliability automation (60%) hurt both the automated task and concurrent tasks, with different effects for false alarms and misses.

MODELING OF AUTOMATION DEPENDENCE It is possible to assess measures of reliance and compliance. Reliance is indexed by (a) the performance on secondary or concurrent tasks. (b) Reliance is also indexed by the time required to respond to an unannounced failure.

Compliance is indexed by the response time and accuracy to an announced system failure (higher compliance shorter RT) under high workload.

All four measures of reliance showed a correlation in the expected direction SF automation miss rate correlates with TOO miss rate, r =.50; RT to TOO, r =.73; repeats, r =.76; and RT to SF misses, r = –0.97.

the correlations were in the expected direction. The correlation of automation FA rate with RT to SF was r =.37; with SF miss rate it was r =.73 – that is, higher FA rate→ less compliance → slower and less accurate response to the SF alerts.

Correlations on the pooled data revealed that miss rate → reliance (r =.67, p <.01); miss rate → compliance (r =.07, ns); FA rate → reliance (r = –.50, p =.06); FA rate → compliance (r =.49, p =.11).

GENERAL DISCUSSION A100 performance was superior to baseline performance in the RTto system failures only at high workload, supporting H1. People depend on automation even when it is imperfect.

Our data revealed a strong effect of miss rate on reliance (r =.67), as participants became less trusting of the automation to alert them if a failure occurred. Negative effects of high false alert rate on compliance, reflecting the “cry wolf” phenomenon.