Evaluating large language models trained on code M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ... arXiv preprint arXiv:2107.03374, 2021 | 2022 | 2021 |
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 746 | 2022 |
Toward trustworthy AI development: mechanisms for supporting verifiable claims M Brundage, S Avin, J Wang, H Belfield, G Krueger, G Hadfield, H Khlaaf, ... arXiv preprint arXiv:2004.07213, 2020 | 329 | 2020 |
Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings T Everitt, PA Ortega, E Barnes, S Legg arXiv preprint arXiv:1902.09980, 2019 | 30 | 2019 |
Understanding agent incentives using causal influence diagrams T Everitt, PA Ortega, E Barnes, S Legg Part I: Single action settings. CoRR, abs/1902.09980, 2019 | 4 | 2019 |
Advanced Artificial Intelligence: Policy and Strategy E Barnes CUSPE, 2016 | 3 | 2016 |
Evaluating large language models trained on code.(2021) M Chen, J Tworek, H Jun, Q Yuan, HP de Oliveira Pinto, J Kaplan, ... arXiv preprint arXiv:2107.03374, 2021 | | 2021 |
Reflection Mechanisms as an Alignment Target: A Survey M Hobbhahn, E Landgrebe, E Barnes NeurIPS ML Safety Workshop, 0 | | |