Follow
Georg Lange
Title
Cited by
Cited by
Year
An interpretability illusion for activation patching of arbitrary subspaces
G Lange, A Makelov, N Nanda
LessWrong, 2023
32023
Is this the subspace you are looking for? An interpretability illusion for subspace activation patching
A Makelov, G Lange, A Geiger, N Nanda
The Twelfth International Conference on Learning Representations, 2023
22023
Quantifying Psychostimulant-induced Sensitization Effects on Dopamine and Acetylcholine Release across different Timescales
G Lange
2023
Reproducibility report for" Interpretable Complex-Valued Neural Networks for Privacy Protection"
A Sheverdin, N Corten, A Knijff, G Lange
ML Reproducibility Challenge 2020, 2021
2021
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
A Makelov, G Lange, N Nanda
ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 0
The system can't perform the operation now. Try again later.
Articles 1–5