Follow
Catherine Olsson
Catherine Olsson
Anthropic
Verified email at mit.edu
Title
Cited by
Cited by
Year
Estimating the reproducibility of psychological science
Open Science Collaboration
Science 349 (6251), aac4716, 2015
91482015
Dota 2 with large scale deep reinforcement learning
C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ...
arXiv preprint arXiv:1912.06680, 2019
16452019
An open, large-scale, collaborative effort to estimate the reproducibility of psychological science
Open Science Collaboration
Perspectives on Psychological Science 7, 657-660, 2012
7252012
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
6352022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
5392022
Tensorfuzz: Debugging neural networks with coverage-guided fuzzing
A Odena, C Olsson, D Andersen, I Goodfellow
International Conference on Machine Learning, 4901-4911, 2019
3352019
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2132022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
2002021
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
1962022
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
1742022
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
1612022
Discriminator rejection sampling
S Azadi, C Olsson, T Darrell, I Goodfellow, A Odena
arXiv preprint arXiv:1810.06758, 2018
1472018
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1, 1, 2021
1432021
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
1332022
Is generator conditioning causally related to GAN performance?
A Odena, J Buckman, C Olsson, T Brown, C Olah, C Raffel, I Goodfellow
International conference on machine learning, 3849-3858, 2018
1332018
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1192022
Dawn Drain
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson …, 2021
1132021
Dota 2 with large scale deep reinforcement learning
CB OpenAI, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ...
arXiv preprint arXiv:1912.06680 2, 2019
1022019
Dawn Drain
C Olsson, N Elhage, NJ Neel Nanda, N DasSarma, T Henighan, B Mann, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy …, 2022
992022
Unrestricted adversarial examples
TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow
arXiv preprint arXiv:1809.08352, 2018
942018
The system can't perform the operation now. Try again later.
Articles 1–20