Demystifying parallel and distributed deep learning: An in-depth concurrency analysis T Ben-Nun, T Hoefler ACM Computing Surveys (CSUR) 52 (4), 1-43, 2019 | 473 | 2019 |
A package for OpenCL based heterogeneous computing on clusters with many GPU devices A Barak, T Ben-Nun, E Levy, A Shiloh 2010 IEEE international conference on cluster computing workshops and …, 2010 | 148 | 2010 |
Neural Code Comprehension: A Learnable Representation of Code Semantics T Ben-Nun, AS Jakobovits, T Hoefler Advances in Neural Information Processing Systems 31, 2018 | 138 | 2018 |
Groute: An asynchronous multi-GPU programming model for irregular computations T Ben-Nun, M Sutton, S Pai, K Pingali ACM SIGPLAN Notices 52 (8), 235-248, 2017 | 118 | 2017 |
Augment your batch: Improving generalization through instance repetition E Hoffer, T Ben-Nun, I Hubara, N Giladi, T Hoefler, D Soudry Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 108* | 2020 |
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks T Hoefler, D Alistarh, T Ben-Nun, N Dryden, A Peste Journal of Machine Learning Research 22 (241), 1-124, 2021 | 84 | 2021 |
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning T Ben-Nun, M Besta, S Huber, AN Ziogas, D Peter, T Hoefler The 33rd IEEE International Parallel & Distributed Processing Symposium …, 2019 | 66 | 2019 |
Memory access patterns: The missing piece of the multi-GPU puzzle T Ben-Nun, E Levy, A Barak, E Rubin SC'15: Proceedings of the International Conference for High Performance …, 2015 | 64 | 2015 |
Solution X-ray scattering form factors of supramolecular self-assembled structures P Székely, A Ginsburg, T Ben-Nun, U Raviv Langmuir 26 (16), 13110-13129, 2010 | 62 | 2010 |
X+: a comprehensive computationally accelerated structure analysis tool for solution X-ray scattering from supramolecular self-assemblies T Ben-Nun, A Ginsburg, P Székely, U Raviv Journal of Applied Crystallography 43 (6), 1522-1531, 2010 | 58 | 2010 |
Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures T Ben-Nun, J de Fine Licht, AN Ziogas, T Schneider, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2019 | 55 | 2019 |
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations C Cummins, ZV Fisches, T Ben-Nun, T Hoefler, MFP O’Boyle, H Leather International Conference on Machine Learning, 2244-2253, 2021 | 50* | 2021 |
Deep learning for post-processing ensemble weather forecasts P Grönquist, C Yao, T Ben-Nun, N Dryden, P Dueben, S Li, T Hoefler Philosophical Transactions of the Royal Society A 379 (2194), 1-18, 2021 | 42 | 2021 |
Graph processing on FPGAs: Taxonomy, survey, challenges M Besta, D Stanojevic, JDF Licht, T Ben-Nun, T Hoefler arXiv preprint arXiv:1903.06697, 2019 | 41 | 2019 |
Taming unbalanced training workloads in deep learning with partial collective operations S Li, T Ben-Nun, SD Girolamo, D Alistarh, T Hoefler Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020 | 31 | 2020 |
A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations AN Ziogas, T Ben-Nun, GI Fernández, T Schneider, M Luisier, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2019 | 29 | 2019 |
Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling M Sutton, T Ben-Nun, A Barak IEEE International Parallel and Distributed Processing Symposium, 2018 | 29 | 2018 |
MAPS: Optimizing Massively Parallel Applications using Device-Level Memory Abstraction E Rubin, E Levy, A Barak, T Ben-Nun ACM Transactions on Architecture and Code Optimization (TACO) 11 (4), 44, 2015 | 27 | 2015 |
Substream-Centric Maximum Matchings on FPGA M Besta, M Fischer, T Ben-Nun, J de Fine Licht, T Hoefler 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays …, 2018 | 25 | 2018 |
Accelerating deep learning frameworks with micro-batches Y Oyama, T Ben-Nun, T Hoefler, S Matsuoka 2018 IEEE International Conference on Cluster Computing (CLUSTER), 402-412, 2018 | 22 | 2018 |