Follow
Elizabeth Barnes
Elizabeth Barnes
Alignment Research Center
Verified email at alignment.org - Homepage
Title
Cited by
Cited by
Year
Evaluating large language models trained on code
M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ...
arXiv preprint arXiv:2107.03374, 2021
20222021
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
7462022
Toward trustworthy AI development: mechanisms for supporting verifiable claims
M Brundage, S Avin, J Wang, H Belfield, G Krueger, G Hadfield, H Khlaaf, ...
arXiv preprint arXiv:2004.07213, 2020
3292020
Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
T Everitt, PA Ortega, E Barnes, S Legg
arXiv preprint arXiv:1902.09980, 2019
302019
Understanding agent incentives using causal influence diagrams
T Everitt, PA Ortega, E Barnes, S Legg
Part I: Single action settings. CoRR, abs/1902.09980, 2019
42019
Advanced Artificial Intelligence: Policy and Strategy
E Barnes
CUSPE, 2016
32016
Evaluating large language models trained on code.(2021)
M Chen, J Tworek, H Jun, Q Yuan, HP de Oliveira Pinto, J Kaplan, ...
arXiv preprint arXiv:2107.03374, 2021
2021
Reflection Mechanisms as an Alignment Target: A Survey
M Hobbhahn, E Landgrebe, E Barnes
NeurIPS ML Safety Workshop, 0
The system can't perform the operation now. Try again later.
Articles 1–8