SlimPajama: A 627B token cleaned and deduplicated version of RedPajama D Soboleva, F Al-Khateeb, R Myers, JR Steeves, J Hestness, N Dey https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and …, 2023 | 163* | 2023 |
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster N Dey, G Gosal, ZC Chen, H Khachane, W Marshall, R Pathria, M Tom, ... arXiv preprint arXiv:2304.03208, 2023 | 97 | 2023 |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model N Dey, D Soboleva, F Al-Khateeb, B Yang, R Pathria, H Khachane, ... arXiv preprint arXiv:2309.11568, 2023 | 10* | 2023 |
37,000 Human-Planned Robotic Grasps With Six Degrees of Freedom VR Osorio, R Iyengar, X Yao, P Bhattachan, A Ragobar, N Dey, B Tripp IEEE Robotics and Automation Letters 5 (2), 3346-3351, 2020 | 5 | 2020 |
Sparse maximal update parameterization: A holistic approach to sparse training dynamics N Dey, S Bergsma, J Hestness The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 3 | 2024 |
Position Interpolation Improves ALiBi Extrapolation F Al-Khateeb, N Dey, D Soboleva, J Hestness arXiv preprint arXiv:2310.13017, 2023 | 2 | 2023 |
Studying CNN representations through activation dimensionality reduction and visualization NS Dey University of Waterloo, 2021 | 1 | 2021 |
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs S Bergsma, NS Dey, G Gosal, G Gray, D Soboleva, J Hestness The Thirteenth International Conference on Learning Representations, 2025 | | 2025 |
Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling E Singh, S Bergsma, NS Dey, J Hestness, G Gray Workshop on Machine Learning and Compression, NeurIPS 2024, 2024 | | 2024 |
The Practitioner’s Guide to the Maximal Update Parameterization N Dey, Q Anthony, J Hestness https://cerebras.ai/blog/the-practitioners-guide-to-the-maximal-update …, 2024 | | 2024 |
Identifying and interpreting tuning dimensions in deep networks NS Dey, JE Taylor, BP Tripp, A Wong, GW Taylor NeurIPS 2020 Workshop on Shared Visual Representations in Human & Machine …, 2020 | | 2020 |