Big bird: Transformers for longer sequences M Zaheer, G Guruganesh, KA Dubey, J Ainslie, C Alberti, S Ontanon, ... Advances in Neural Information Processing Systems 33, 17283-17297, 2020 | 605 | 2020 |
ETC: Encoding long and structured inputs in transformers J Ainslie, S Ontanon, C Alberti, V Cvicek, Z Fisher, P Pham, A Ravula, ... arXiv preprint arXiv:2004.08483, 2020 | 105 | 2020 |
Fnet: Mixing tokens with fourier transforms J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon arXiv preprint arXiv:2105.03824, 2021 | 89 | 2021 |
Encoding long and structured data in transformers J Ainslie, S Ontanon, C Alberti, P Pham, A Ravula, S Sanghai arXiv preprint arXiv:2004.08483 2, 2020 | 22 | 2020 |
Realformer: Transformer likes residual attention R He, A Ravula, B Kanagal, J Ainslie arXiv preprint arXiv:2012.11747, 2020 | 20 | 2020 |
Making transformers solve compositional tasks S Ontanón, J Ainslie, V Cvicek, Z Fisher arXiv preprint arXiv:2108.04378, 2021 | 11 | 2021 |
ETC: Encoding Long and Structured Data in Transformers. J Ainslie, S Ontañón, C Alberti, P Pham, A Ravula, S Sanghai | 9 | 2020 |
FNet: Mixing Tokens with Fourier Transforms. arXiv 2021 J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon arXiv preprint arXiv:2105.03824, 0 | 7 | |
Big bird: Transformers for longer sequences. arXiv 2020 M Zaheer, G Guruganesh, A Dubey, J Ainslie, C Alberti, S Ontanon, ... arXiv preprint arXiv:2007.14062, 0 | 6 | |
Longt5: Efficient text-to-text transformer for long sequences M Guo, J Ainslie, D Uthus, S Ontanon, J Ni, YH Sung, Y Yang arXiv preprint arXiv:2112.07916, 2021 | 5 | 2021 |
Readtwice: Reading very large documents with memories Y Zemlyanskiy, J Ainslie, M de Jong, P Pham, I Eckstein, F Sha arXiv preprint arXiv:2105.04241, 2021 | 4 | 2021 |
ETC: Encoding long and structured inputs in transformers A Ravula, C Alberti, J Ainslie, L Yang, PM Pham, Q Wang, S Ontanon, ... | 4 | 2020 |
Improving compositional generalization in classification tasks via structure annotations J Kim, P Ravikumar, J Ainslie, S Ontañón arXiv preprint arXiv:2106.10434, 2021 | 3 | 2021 |
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction CY Lee, CL Li, T Dozat, V Perot, G Su, N Hua, J Ainslie, R Wang, Y Fujii, ... arXiv preprint arXiv:2203.08411, 2022 | 2 | 2022 |
Iterative decoding for compositional generalization in transformers L Ruiz, J Ainslie, S Ontañón arXiv preprint arXiv:2110.04169, 2021 | 2 | 2021 |
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT J Lee-Thorp, J Ainslie arXiv preprint arXiv:2205.12399, 2022 | | 2022 |
Attention neural networks with sparse attention mechanisms JT Ainslie, S Ontañón, P Pham, M Zaheer, G Guruganesh, KA Dubey, ... US Patent App. 17/589,542, 2022 | | 2022 |
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models S Ontanon, J Ainslie, V Cvicek, Z Fisher arXiv preprint arXiv:2203.15099, 2022 | | 2022 |
Attention neural networks with sparse attention mechanisms JT Ainslie, S Ontañón, P Pham, M Zaheer, G Guruganesh, KA Dubey, ... US Patent 11,238,332, 2022 | | 2022 |
ShopTalk: A System for Conversational Faceted Search G Manku, J Lee-Thorp, B Kanagal, J Ainslie, J Feng, Z Pearson, E Anjorin, ... arXiv preprint arXiv:2109.00702, 2021 | | 2021 |