Follow
Amirkeivan Mohtashami
Title
Cited by
Cited by
Year
Meditron-70b: Scaling medical pretraining for large language models
Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ...
arXiv preprint arXiv:2311.16079, 2023
752023
Landmark Attention: Random-Access Infinite Context Length for Transformers
A Mohtashami, M Jaggi
Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023
62*2023
Masked Training of Neural Networks with Partial Gradients
A Mohtashami, M Jaggi, SU Stich
The 25th International Conference on Artificial Intelligence and Statistics, 2021
27*2021
Critical parameters for scalable distributed learning with large batches and asynchronous updates
S Stich, A Mohtashami, M Jaggi
International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021
192021
Characterizing & finding good data orderings for fast convergence of sequential gradient methods
A Mohtashami, S Stich, M Jaggi
arXiv preprint arXiv:2202.01838, 2022
132022
The splay-list: A distribution-adaptive concurrent skip-list
V Aksenov, D Alistarh, A Drozdova, A Mohtashami
34th International Symposium on Distributed Computing 179, 2020
102020
Special Properties of Gradient Descent with Large Learning Rates
A Mohtashami, M Jaggi, S Stich
ICML 2023, 2022
9*2022
Quarot: Outlier-free 4-bit inference in rotated llms
S Ashkboos, A Mohtashami, ML Croci, B Li, M Jaggi, D Alistarh, T Hoefler, ...
arXiv preprint arXiv:2404.00456, 2024
62024
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
A Mohtashami, M Verzetti, PK Rubenstein
Practical ML for Developing Countries Workshop @ ICLR 2023, 2023
52023
Social Learning: Towards Collaborative Learning with Large Language Models
A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ...
arXiv preprint arXiv:2312.11441, 2023
22023
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
M Pagliardini, A Mohtashami, F Fleuret, M Jaggi
arXiv preprint arXiv:2402.02622, 2024
12024
CoTFormer: More Tokens With Attention Make Up For Less Depth
A Mohtashami, M Pagliardini, M Jaggi
Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023
12023
Reproducibility Report for "On Warm-Starting Neural Network Training"
A Mohtashami, E Pajouheshgar, K Kireev
ML Reproducibility Challenge 2020, 2021
2021
A Gradient-Based Approach to Neural Networks Structure Learning
AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi
2019
TPS (Task Preparation System): A Tool for Developing Tasks in Programming Contests
K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH
2019
MLO
J Bachmann Ona, SA Bahreinian, LF Barba Flores, WA Ben Naceur, ...
The system can't perform the operation now. Try again later.
Articles 1–16