BLOOM: A 176B-Parameter Open-Access Multilingual Language Model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... arXiv preprint arXiv:2211.05100, 2022 | 1208 | 2022 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 117 | 2022 |
Obelics: An open web-scale filtered dataset of interleaved image-text documents H Laurençon, L Saulnier, L Tronchon, S Bekman, A Singh, A Lozhkov, ... Advances in Neural Information Processing Systems 36, 2024 | 71 | 2024 |
The ROOTS search tool: Data transparency for LLMs A Piktus, C Akiki, P Villegas, H Laurençon, G Dupont, AS Luccioni, ... arXiv preprint arXiv:2302.14035, 2023 | 21 | 2023 |
DP-Parse: Finding word boundaries from raw speech with an instance lexicon R Algayres, T Ricoul, J Karadayi, H Laurençon, S Zaiem, A Mohamed, ... Transactions of the Association for Computational Linguistics 10, 1051-1065, 2022 | 11 | 2022 |
Continuous homeostatic reinforcement learning for self-regulated autonomous agents H Laurençon, CR Ségerie, J Lussange, BS Gutkin arXiv preprint arXiv:2109.06580, 2021 | 6 | 2021 |
Calm: A multi-task benchmark for comprehensive assessment of language model bias V Gupta, PN Venkit, H Laurençon, S Wilson, RJ Passonneau arXiv preprint arXiv:2308.12539, 2023 | 1 | 2023 |
What matters when building vision-language models? H Laurençon, L Tronchon, M Cord, V Sanh arXiv preprint arXiv:2405.02246, 2024 | | 2024 |
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset H Laurençon, L Tronchon, V Sanh arXiv preprint arXiv:2403.09029, 2024 | | 2024 |