Seminar 1: BrIAS Fellow Prof. Rafael Bello
Important Considerations for Generating Counterfactual Explanations
Abstract: Counterfactual Explanations (CEs) have become one of the leading post-hoc methods for explaining AI models in the field of Explainable AI (XAI). The core idea is that, given an input x to a model M, a CE essentially presents a user with a new, slightly modified input x′, illustrating how a different outcome could be achieved if certain changes were applied to x. In other words, a CE identifies the minimal modification to a given feature vector that alters a classifier’s decision (M(x) ≠ M(x′)). Various methods have been proposed to generate counterfactuals (x′) that satisfy important properties such as actionability, causality, diversity, proximity, sparsity, plausibility, and robustness. This seminar focuses on analyzing the last two properties—plausibility and robustness—and discusses strategies to improve them using concepts from Rough Set Theory and Out-of-Distribution (OOD).
Seminar 2: BrIAS Fellow Prof. Willem Zuidema
Under the hood: what LLMs learn about our language, and what they teach us about us
Abstract: Large Language Models (LLMs) and Neural Speech Models (NSMs) have made big advances in the last few years in their abilities to mimic and process human language and speech. Their internal representations, however, are notoriously difficult to interpret, limiting their usefulness for cognitive and neuroscience. However, a new generation of posthoc interpretability techniques, based on causal interventions, provide an increasingly detailed look under the hood. These techniques allow us, in some cases, to reveal the nature of the learned representations, assess how general the learned rules are and formulate new hypotheses on how humans might process aspects of language and speech. I will discuss examples on syntactic priming and phonotactics, and speculate on the future impact of AI-models on the cognitive science of language.