A detailed and flexible cycle-accurate network-on-chip simulator N Jiang, DU Becker, G Michelogiannakis, J Balfour, B Towles, DE Shaw, ... 2013 IEEE international symposium on performance analysis of systems and …, 2013 | 896 | 2013 |
Simba: Scaling deep-learning inference with multi-chip-module-based architecture YS Shao, J Clemons, R Venkatesan, B Zimmer, M Fojtik, N Jiang, B Keller, ... Proceedings of the 52nd Annual IEEE/ACM International Symposium on …, 2019 | 445 | 2019 |
Indirect adaptive routing on large scale interconnection networks N Jiang, J Kim, WJ Dally Proceedings of the 36th annual international symposium on Computer …, 2009 | 186 | 2009 |
A 0.32–128 TOPS, scalable multi-chip-module-based deep neural network inference accelerator with ground-referenced signaling in 16 nm B Zimmer, R Venkatesan, YS Shao, J Clemons, M Fojtik, N Jiang, B Keller, ... IEEE Journal of Solid-State Circuits 55 (4), 920-932, 2020 | 106 | 2020 |
Booksim 2.0 user’s guide N Jiang, G Michelogiannakis, D Becker, B Towles, WJ Dally Standford University, q1, 2010 | 78 | 2010 |
Network congestion avoidance through speculative reservation N Jiang, DU Becker, G Michelogiannakis, WJ Dally IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012 | 77 | 2012 |
An in-network architecture for accelerating shared-memory multiprocessor collectives B Klenk, N Jiang, G Thorson, L Dennison 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 63 | 2020 |
Network endpoint congestion control for fine-grained communication N Jiang, L Dennison, WJ Dally Proceedings of the International Conference for High Performance Computing …, 2015 | 59 | 2015 |
Packet chaining: Efficient single-cycle allocation for on-chip networks G Michelogiannakis, N Jiang, D Becker, WJ Dally Proceedings of the 44th Annual IEEE/ACM International Symposium on …, 2011 | 57 | 2011 |
A 0.11 pj/op, 0.32-128 tops, scalable multi-chip-module-based deep neural network accelerator with ground-reference signaling in 16nm B Zimmer, R Venkatesan, YS Shao, J Clemons, M Fojtik, N Jiang, B Keller, ... 2019 Symposium on VLSI Circuits, C300-C301, 2019 | 55 | 2019 |
Adaptive backpressure: Efficient buffer management for on-chip networks DU Becker, N Jiang, G Michelogiannakis, WJ Dally 2012 IEEE 30th International Conference on Computer Design (ICCD), 419-426, 2012 | 42 | 2012 |
Use of stashing buffers to improve the efficiency of crossbar switches MA Blumrich, N Jiang, LR Dennison US Patent 11,108,704, 2021 | 34 | 2021 |
Booksim interconnection network simulator N Jiang, G Michelogiannakis, D Becker, B Towles, W Dally Online, https://nocs. stanford. edu/cgibin/trac. cgi/wiki/Resources/BookSim, 2016 | 30 | 2016 |
Exploiting idle resources in a high-radix switch for supplemental storage MA Blumrich, N Jiang, LR Dennison SC18: International Conference for High Performance Computing, Networking …, 2018 | 29 | 2018 |
Network endpoint congestion management N Jiang, LR Dennison, WJ Dally US Patent 10,063,481, 2018 | 29 | 2018 |
A mips r2000 implementation N Pinckney, T Barr, M Dayringer, M McKnett, N Jiang, C Nygaard, ... Proceedings of the 45th Annual Design Automation Conference, 102-107, 2008 | 22 | 2008 |
Parallelized radix-2 scalable Montgomery multiplier N Jiang, D Harris 2007 IFIP International Conference on Very Large Scale Integration, 146-150, 2007 | 22 | 2007 |
Simba: scaling deep-learning inference with chiplet-based architecture YS Shao, J Cemons, R Venkatesan, B Zimmer, M Fojtik, N Jiang, B Keller, ... Communications of the ACM 64 (6), 107-116, 2021 | 21 | 2021 |
A 0.11 pj/op, 0.32-128 tops, scalable multi-chip-module-based deep neural network accelerator designed with A high-productivity vlsi methodology R Venkatesan, YS Shao, B Zimmer, J Clemons, M Fojtik, N Jiang, B Keller, ... 2019 IEEE Hot Chips 31 Symposium (HCS), 1-24, 2019 | 13 | 2019 |
Scalable in-network computation for massively-parallel shared-memory processors B Klenk, N Jiang, LR Dennison, GM Thorson US Patent 11,171,798, 2021 | 5 | 2021 |