Mostofa Patwary

Cited by

	All	Since 2019
Citations	9382	7594
h-index	39	33
i10-index	62	51

2100

1050

525

1575

201220132014201520162017201820192020202120222023202446 81 130 222 337 391 487 561 815 989 1327 1876 2014

Public access

View all

13 articles

2 articles

available

not available

Based on funding mandates

Co-authors

Mohammad ShoeybiDirector of Applied Research at NVIDIAVerified email at nvidia.com
Bryan CatanzaroNVIDIAVerified email at acm.org
Narayanan SundaramMetaVerified email at fb.com
Alok ChoudharyProfessor Northwestern UniversityVerified email at eecs.northwestern.edu
Pradeep DubeyIntel CorporationVerified email at intel.com
Raul PuriResearch Scientist, OpenAIVerified email at openai.com
Fredrik ManneDepartment of Informatics, University of BergenVerified email at ii.uib.no
Jared CasperResearch Scientist, NVIDIAVerified email at nvidia.com
Assefaw GebremedhinWashington State University, School of EECSVerified email at eecs.wsu.edu
Gregory DiamosLanding AIVerified email at landing.ai
Ryan P. AdamsPrinceton UniversityVerified email at princeton.edu
Alex PothenProfessor of Computer Science, Purdue UniversityVerified email at purdue.edu
Rob H. BisselingProfessor in Mathematics, Utrecht UniversityVerified email at uu.nl
Nadathur Satish

Mostofa Patwary

Applied Deep Learning Research, NVIDIA

Verified email at nvidia.com - Homepage

Natural Language Processing Large Scale Deep Learning High Performance Computing Parallel Algorithms Algorithm Engineering


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Megatron-LM: Training Multi-Billion Parameter Language Models Using GPU Model Parallelism M Shoeybi, M Patwary, R Puri, P LeGresley, J Casper, B Catanzaro arXiv preprint arXiv:1909.08053, 2019	1674	2019
Scalable Bayesian Optimization Using Deep Neural Networks J Snoek, O Rippel, K Swersky, R Kiros, N Satish, N Sundaram, M Patwary, ... arXiv preprint arXiv:1502.05700, 2015	1324	2015
Deep learning scaling is predictable, empirically J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ... arXiv preprint arXiv:1712.00409, 2017	751	2017
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022	730*	2022
Efficient large-scale language model training on GPU clusters using megatron-LM D Narayanan, M Shoeybi, J Casper, P LeGresley, M Patwary, ... Proceedings of the International Conference for High Performance Computing …, 2021	586	2021
Twitter trending topic classification K Lee, D Palsetia, R Narayanan, MMA Patwary, A Agrawal, A Choudhary Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on …, 2011	481	2011
GraphMat: High performance graph analytics made productive N Sundaram, N Satish, MMA Patwary, SR Dulloor, MJ Anderson, ... Proceedings of the VLDB Endowment 8 (11), 1214-1225, 2015	406	2015
Navigating the maze of graph analytics frameworks using massive graph datasets N Satish, N Sundaram, MMA Patwary, J Seo, J Park, MA Hassaan, ... Proceedings of the 2014 ACM SIGMOD international conference on Management of …, 2014	246	2014
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure MMA Patwary, D Palsetia, A Agrawal, W Liao, F Manne, A Choudhary SC'12: Proceedings of the International Conference on High Performance …, 2012	234	2012
Training Question Answering Models From Synthetic Data R Puri, R Spring, M Patwary, M Shoeybi, B Catanzaro arXiv preprint arXiv:2002.09599, 2020	163	2020
Controllable Story Generation with External Knowledge Using Large-Scale Language Models P Xu, M Patwary, M Shoeybi, R Puri, P Fung, A Anandkumar, B Catanzaro Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020	150*	2020
Factuality enhanced language models for open-ended text generation N Lee, W Ping, P Xu, M Patwary, PN Fung, M Shoeybi, B Catanzaro Advances in Neural Information Processing Systems 35, 34586-34599, 2022	145	2022
BioMegatron: Larger Biomedical Domain Language Model HC Shin, Y Zhang, E Bakhturina, R Puri, M Patwary, M Shoeybi, R Mani Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020	140	2020
Fast maximum clique algorithms for large graphs RA Rossi, DF Gleich, AH Gebremedhin, MMA Patwary Proceedings of the companion publication of the 23rd international …, 2014	118	2014
Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs B Pattabiraman, M Patwary, M Ali, AH Gebremedhin, W Liao, ... arXiv preprint arXiv:1209.5818, 2012	112	2012
StarCoder 2 and The Stack v2: The Next Generation A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ... arXiv preprint arXiv:2402.19173, 2024	102	2024
ColPack: Software for graph coloring and related problems in scientific computing AH Gebremedhin, D Nguyen, MMA Patwary, A Pothen ACM Transactions on Mathematical Software (TOMS) 40 (1), 1-31, 2013	102	2013
Deep learning at 15PF: supervised and semi-supervised classification for scientific data T Kurth, J Zhang, N Satish, E Racah, I Mitliagkas, MMA Patwary, T Malas, ... Proceedings of the International Conference for High Performance Computing …, 2017	96	2017
End-to-End Training of Neural Retrievers for Open-Domain Question Answering DS Sachan, M Patwary, M Shoeybi, N Kant, W Ping, WL Hamilton, ... arXiv preprint arXiv:2101.00408, 2021	94	2021
Parallel efficient sparse matrix-matrix multiplication on multicore platforms MMA Patwary, NR Satish, N Sundaram, J Park, MJ Anderson, ... International Conference on High Performance Computing, 48-57, 2015	84	2015

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors