Pergi ke luar talian dengan aplikasi Player FM !
The AdEMAMix Optimizer: Better, Faster, Older
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episod
Manage episode 438438307 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episod
ทุกตอน
×Selamat datang ke Player FM
Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.