PyTorch: An imperative style, high-performance deep learning library A Paszke, S Gross, F Massa, A Lerer, J Bradbury, G Chanan, T Killeen, ... Advances in Neural Information Processing Systems, 8024-8035, 2019 | 47643 | 2019 |
PaLM: Scaling Language Modeling with Pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311, 2022 | 4514 | 2022 |
JAX: composable transformations of Python+NumPy programs J Bradbury, R Frostig, P Hawkins, MJ Johnson, C Leary, D Maclaurin, ... https://github.com/google/jax, 18, 2018 | 2722 | 2018 |
Pointer Sentinel Mixture Models S Merity, C Xiong, J Bradbury, R Socher ICLR 2017, 2016 | 2245 | 2016 |
Ask me anything: Dynamic memory networks for natural language processing A Kumar, O Irsoy, P Ondruska, M Iyyer, J Bradbury, I Gulrajani, V Zhong, ... ICML 2016, 2016 | 1558 | 2016 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1493 | 2023 |
Learned in Translation: Contextualized Word Vectors B McCann, J Bradbury, C Xiong, R Socher NIPS 2017, 2017 | 1272 | 2017 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1219 | 2023 |
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 947 | 2021 |
Non-Autoregressive Neural Machine Translation J Gu, J Bradbury, C Xiong, VOK Li, R Socher ICLR 2018, 2018 | 848 | 2018 |
Quasi-Recurrent Neural Networks J Bradbury, S Merity, C Xiong, R Socher ICLR 2017, 2016 | 631 | 2016 |
PaLI: A Jointly-Scaled Multilingual Language-Image Model X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ... arXiv preprint arXiv:2209.06794, 2022 | 535 | 2022 |
OpenSpiel: A framework for reinforcement learning in games M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint arXiv:1908.09453, 2019 | 271 | 2019 |
Efficiently Scaling Transformer Inference R Pope, S Douglas, A Chowdhery, J Devlin, J Bradbury, A Levskaya, ... MLSys 2023, 2022 | 225 | 2022 |
Scaling Up Models and Data with and A Roberts, HW Chung, A Levskaya, G Mishra, J Bradbury, D Andor, ... arXiv preprint arXiv:2203.17189, 2022 | 143* | 2022 |
Velo: Training versatile learned optimizers by scaling up L Metz, J Harrison, CD Freeman, A Merchant, L Beyer, J Bradbury, ... arXiv preprint arXiv:2211.09760, 2022 | 59 | 2022 |
On Machine Learning and Programming Languages M Innes, S Karpinski, V Shah, D Barber, P Stenetorp, T Besard, ... SysML 2018, 2018 | 25 | 2018 |
A Flexible Approach to Automated RNN Architecture Generation M Schrimpf, S Merity, J Bradbury, R Socher ICLR Workshop 2018, 2018 | 23 | 2018 |
Exploring the limits of Concurrency in ML Training on Google TPUs S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing MLSys 2021, 2021 | 19 | 2021 |
Towards Neural Machine Translation with Latent Tree Attention J Bradbury, R Socher SPNLP 2017, 2017 | 18 | 2017 |