Thinking fast and slow with deep learning and tree search TW Anthony, Z Tian, D Barber Advances in Neural Information Processing Systems, 5360-5370, 2017 | 173 | 2017 |
Openspiel: A framework for reinforcement learning in games M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint arXiv:1908.09453, 2019 | 43 | 2019 |
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman arXiv preprint arXiv:1904.03646, 2019 | 17 | 2019 |
From Poincar\'e Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ... arXiv preprint arXiv:2002.08456, 2020 | 15 | 2020 |
Smooth markets: A basic mechanism for organizing gradient-based learners D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ... arXiv preprint arXiv:2001.04678, 2020 | 8 | 2020 |
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019) M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint cs.LG/1908.09453, 2019 | 7 | 2019 |
Learning to Play No-Press Diplomacy with Best Response Policy Iteration T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ... arXiv preprint arXiv:2006.04635, 2020 | 6 | 2020 |
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach arXiv preprint arXiv:2003.00799, 2020 | 3 | 2020 |
Learning to Play against Any Mixture of Opponents MO Smith, T Anthony, Y Wang, MP Wellman arXiv preprint arXiv:2009.14180, 2020 | 2 | 2020 |
On the role of planning in model-based deep reinforcement learning JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ... arXiv preprint arXiv:2011.04021, 2020 | | 2020 |
Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution Y Bachrach, T Lattimore, M Garnelo, J Perolat, D Balduzzi, T Anthony, ... | | 2019 |
Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation T Anthony, I Gemp, J Kramar, T Eccles, A Tacchetti, Y Bachrach | | 2019 |
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE MO Smith, T Anthony, MP Wellman | | |