Scalable bayesian optimization using deep neural networks J Snoek, O Rippel, K Swersky, R Kiros, N Satish, N Sundaram, M Patwary, ... International conference on machine learning, 2171-2180, 2015 | 1284 | 2015 |

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, ... Proceedings of the 37th annual international symposium on Computer …, 2010 | 1207 | 2010 |

Designing efficient sorting algorithms for manycore GPUs N Satish, M Harris, M Garland 2009 IEEE International Symposium on Parallel & Distributed Processing, 1-10, 2009 | 921 | 2009 |

Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs C Kim, T Kaldewey, VW Lee, E Sedlar, AD Nguyen, N Satish, J Chhugani, ... Proceedings of the VLDB Endowment 2 (2), 1378-1389, 2009 | 437 | 2009 |

FAST: fast architecture sensitive tree search on modern CPUs and GPUs C Kim, J Chhugani, N Satish, E Sedlar, AD Nguyen, T Kaldewey, VW Lee, ... Proceedings of the 2010 ACM SIGMOD International Conference on Management of …, 2010 | 435 | 2010 |

Clearpath: highly parallel collision avoidance for multi-agent simulation SJ Guy, J Chhugani, C Kim, N Satish, M Lin, D Manocha, P Dubey Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer …, 2009 | 431 | 2009 |

Graphicionado: A high-performance and energy-efficient accelerator for graph analytics TJ Ham, L Wu, N Sundaram, N Satish, M Martonosi 2016 49th annual IEEE/ACM international symposium on microarchitecture …, 2016 | 426 | 2016 |

Graphmat: High performance graph analytics made productive N Sundaram, NR Satish, MMA Patwary, SR Dulloor, SG Vadlamudi, ... arXiv preprint arXiv:1503.07241, 2015 | 397 | 2015 |

3.5-D blocking optimization for stencil computations on modern CPUs and GPUs A Nguyen, N Satish, J Chhugani, C Kim, P Dubey SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010 | 396 | 2010 |

Glow: Graph lowering compiler techniques for neural networks N Rotem, J Fix, S Abdulrasool, G Catron, S Deng, R Dzhabarov, N Gibson, ... arXiv preprint arXiv:1805.00907, 2018 | 320 | 2018 |

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort N Satish, C Kim, J Chhugani, AD Nguyen, VW Lee, D Kim, P Dubey Proceedings of the 2010 ACM SIGMOD International Conference on Management of …, 2010 | 318 | 2010 |

Dyser: Unifying functionality and parallelism specialization for energy-efficient computing V Govindaraju, CH Ho, T Nowatzki, J Chhugani, N Satish, ... IEEE Micro 32 (5), 38-51, 2012 | 306 | 2012 |

Data tiering in heterogeneous memory systems SR Dulloor, A Roy, Z Zhao, N Sundaram, N Satish, R Sankaran, ... Proceedings of the Eleventh European Conference on Computer Systems, 1-16, 2016 | 266 | 2016 |

Navigating the maze of graph analytics frameworks using massive graph datasets N Satish, N Sundaram, MMA Patwary, J Seo, J Park, MA Hassaan, ... Proceedings of the 2014 ACM SIGMOD international conference on Management of …, 2014 | 245 | 2014 |

Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ... arXiv preprint arXiv:1811.09886, 2018 | 210 | 2018 |

Fast updates on read-optimized databases using multi-core CPUs J Krueger, C Kim, M Grund, N Satish, D Schwalb, J Chhugani, H Plattner, ... arXiv preprint arXiv:1109.6885, 2011 | 191 | 2011 |

IMP: Indirect memory prefetcher X Yu, CJ Hughes, N Satish, S Devadas Proceedings of the 48th International Symposium on Microarchitecture, 178-190, 2015 | 190 | 2015 |

Streaming similarity search over one billion tweets using parallel locality-sensitive hashing N Sundaram, A Turmukhametova, N Satish, T Mostak, P Indyk, S Madden, ... Proceedings of the VLDB Endowment 6 (14), 1930-1941, 2013 | 172 | 2013 |

Can traditional programming bridge the ninja performance gap for parallel computing applications? N Satish, C Kim, J Chhugani, H Saito, R Krishnaiyer, M Smelyanskiy, ... ACM SIGARCH Computer Architecture News 40 (3), 440-451, 2012 | 148 | 2012 |

PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors J Sewall, J Chhugani, C Kim, N Satish, P Dubey Proceedings of the VLDB Endowment 4 (11), 795-806, 2011 | 147 | 2011 |