Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human Decisions
CHRISTINA HUMER, ANDREAS HINTERREITER, BENEDIKT LEICHTMANN, MARTINA MARA, and MARC STREIT
Abstract: Trust calibration is essential in AI-assisted decision-making. If human users understand the rationale on which an AI model has made a prediction, they can decide whether they consider this prediction reasonable. Especially in high-risk tasks such as mushroom hunting (where a wrong decision may be fatal), it is important that users make correct choices to trust or overrule the AI. Various explainable AI (XAI) methods are currently being discussed as potentially useful for facilitating understanding and subsequently calibrating user trust. So far, however, it remains unclear which approaches are most effective. In this paper the effects of XAI methods on human AI-assisted decision-making in the high-risk task of mushroom picking were tested. For that endeavor, the effects of (i) Grad-CAM attributions, (ii) nearest-neighbor examples, and (iii) network-dissection concepts were compared in a between-subjects experiment with 𝑁 = 501 participants. In general, nearest-neighbor examples improved decision correctness the most. However, varying effects for different task items became apparent. All explanations seemed to be particularly effective when they revealed reasons to (i) doubt a specific AI classification when the AI was wrong and (ii) trust a specific AI classification when the AI was correct. Our results suggest that well-established methods, such as Grad-CAM attribution maps, might not be as beneficial to end users as expected and that XAI techniques for use in real-world scenarios must be chosen carefully.
Keywords: explainable artificial intelligence, XAI, trust calibration, AI-assisted decision-making, mushroom identification
安慰、误导、揭穿: 比较 XAI 方法对人类决策的影响
CHRISTINA HUMER, ANDREAS HINTERREITER, BENEDIKT LEICHTMANN, MARTINA MARA, and MARC STREIT
摘要:信任校准在人工智能辅助决策中至关重要。如果人类用户了解人工智能模型做出预测的依据,他们就能决定是否认为该预测是合理的。特别是在蘑菇采集等高风险任务中(错误的决策可能是致命的),用户必须正确选择信任或否决人工智能。目前,人们正在讨论各种可解释的人工智能(XAI)方法,认为它们可能有助于促进理解并随后校准用户信任度。然而,到目前为止,人们仍不清楚哪种方法最有效。本文测试了在采蘑菇这一高风险任务中,XAI 方法对人类人工智能辅助决策的影响。为此,在一项有 𝑁 = 501 名参与者参加的主体间实验中,比较了(i) Grad-CAM 归因、(ii) 最近邻实例和(iii) 网络剖析概念的效果。总的来说,近邻示例对决策正确性的提高最大。然而,不同任务项目的效果也不尽相同。所有的解释似乎都特别有效,因为它们揭示了:(i) 当人工智能错误时,怀疑特定人工智能分类的理由;(ii) 当人工智能正确时,信任特定人工智能分类的理由。我们的研究结果表明,Grad-CAM归因图等成熟的方法可能并不像预期的那样有利于最终用户,在现实世界中使用 XAI 技术时必须谨慎选择。
关键词:可解释人工智能、XAI、信任校准、人工智能辅助决策、蘑菇识别
来源:Humer C, Hinterreiter A, Leichtmann B, et al. Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human Decisions[J]. OSF Preprints. October, 2022, 13.