Follow
Łukasz Kaiser
Łukasz Kaiser
OpenAI & CNRS
Verified email at openai.com - Homepage
Title
Cited by
Cited by
Year
Attention is all you need
A Vaswani
Advances in Neural Information Processing Systems, 2017
1352312017
TensorFlow: Large-scale machine learning on heterogeneous systems
M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ...
31520*2015
Google’s neural machine translation system: Bridging the gap between human and machine translation
Y Wu
arXiv preprint arXiv:1609.08144, 2016
89372016
Gpt-4 technical report
J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ...
arXiv preprint arXiv:2303.08774, 2023
38522023
Evaluating large language models trained on code
M Chen, J Tworek, H Jun, Q Yuan, HPDO Pinto, J Kaplan, H Edwards, ...
arXiv preprint arXiv:2107.03374, 2021
28792021
Reformer: The efficient transformer
N Kitaev, Ł Kaiser, A Levskaya
arXiv preprint arXiv:2001.04451, 2020
26682020
Image transformer
N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran
International conference on machine learning, 4055-4064, 2018
19882018
Advances in neural information processing systems
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
Attention is all you need, 2017
19072017
Attention is all you need
V Ashish
Advances in neural information processing systems 30, I, 2017
17902017
Training verifiers to solve math word problems
K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ...
arXiv preprint arXiv:2110.14168, 2021
17252021
Rethinking attention with performers
K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ...
arXiv preprint arXiv:2009.14794, 2020
15562020
Attention Is All You Need.(Nips), 2017
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
arXiv preprint arXiv:1706.03762 10, S0140525X16001837, 2017
14312017
Regularizing neural networks by penalizing confident output distributions
G Pereyra, G Tucker, J Chorowski, Ł Kaiser, G Hinton
arXiv preprint arXiv:1701.06548, 2017
12642017
Grammar as a Foreign Language
O Vinyals
arXiv preprint arXiv:1412.7449, 2015
11442015
Generating wikipedia by summarizing long sequences
PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer
arXiv preprint arXiv:1801.10198, 2018
9812018
Model-based reinforcement learning for atari
L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, ...
arXiv preprint arXiv:1903.00374, 2019
9742019
Multi-task sequence to sequence learning
MT Luong, QV Le, I Sutskever, O Vinyals, L Kaiser
arXiv preprint arXiv:1511.06114, 2015
9622015
Universal transformers
M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser
arXiv preprint arXiv:1807.03819, 2018
9592018
Tensor2tensor for neural machine translation
A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ...
arXiv preprint arXiv:1803.07416, 2018
6352018
Adding gradient noise improves learning for very deep networks
A Neelakantan, L Vilnis, QV Le, I Sutskever, L Kaiser, K Kurach, J Martens
arXiv preprint arXiv:1511.06807, 2015
6052015
The system can't perform the operation now. Try again later.
Articles 1–20