{getToc} $title={Table of Contents}
Summary
This paper studies nonconvex stochastic optimization under heavy-tailed noises, where the stochastic noise only has a finite p-th moment. The authors propose using the existing Batched Normalized Stochastic Gradient Descent with Momentum (Batched NSGDM) algorithm, which is shown to converge in expectation at an optimal rate of O(T^(3p-2)/(3p-2)) without gradient clipping.
Highlights
- Batched NSGDM algorithm converges in expectation at an optimal rate of O(T^(3p-2)/(3p-2)) under heavy-tailed noises.
- The algorithm does not require gradient clipping, which is a common technique used in previous works.
- The authors also establish a refined lower complexity result that perfectly matches their new convergence theory for Batched NSGDM.
- The analysis is conducted under the generalized smoothness condition and generalized heavy-tailed noises assumption.
- The authors initiate the study of optimization under heavy-tailed noises with an unknown tail index p.
- Batched NSGDM can achieve the optimal rate even under the relaxed smooth condition.
- The authors provide a simple strategy for setting the stepsize and momentum parameter when the tail index p is unknown.
Key Insights
- The Batched NSGDM algorithm is a powerful tool for dealing with heavy-tailed noises in stochastic nonconvex optimization problems, and its convergence rate is optimal in T.
- The analysis of Batched NSGDM under heavy-tailed noises is more challenging than that of traditional SGD, as the noise only has a finite p-th moment, which can lead to infinite variance.
- The generalized smoothness condition and generalized heavy-tailed noises assumption used in this paper are more realistic and general than the traditional smoothness and finite variance assumptions.
- The authors' refined lower complexity result provides a more precise order on problem-dependent parameters, which is important for understanding the convergence behavior of the algorithm.
- The strategy for setting the stepsize and momentum parameter when the tail index p is unknown is simple and effective, and it allows the algorithm to converge at a rate of O(T^(1/2-1/p)).
- The authors' work provides new insights into the optimization community and opens potential ways for future algorithm design.
- The analysis of Batched NSGDM under heavy-tailed noises has important implications for many machine learning applications, where the data is often noisy and has heavy tails.
Mindmap
Citation
Liu, Z., & Zhou, Z. (2024). Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2412.19529