Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

{getToc} $title={Table of Contents}

Summary

This paper studies nonconvex stochastic optimization under heavy-tailed noises, where the stochastic noise only has a finite p-th moment. The authors propose using the existing Batched Normalized Stochastic Gradient Descent with Momentum (Batched NSGDM) algorithm, which is shown to converge in expectation at an optimal rate of O(T^(3p-2)/(3p-2)) without gradient clipping.

Highlights

Batched NSGDM algorithm converges in expectation at an optimal rate of O(T^(3p-2)/(3p-2)) under heavy-tailed noises.
The algorithm does not require gradient clipping, which is a common technique used in previous works.
The authors also establish a refined lower complexity result that perfectly matches their new convergence theory for Batched NSGDM.
The analysis is conducted under the generalized smoothness condition and generalized heavy-tailed noises assumption.
The authors initiate the study of optimization under heavy-tailed noises with an unknown tail index p.
Batched NSGDM can achieve the optimal rate even under the relaxed smooth condition.
The authors provide a simple strategy for setting the stepsize and momentum parameter when the tail index p is unknown.

Key Insights

The Batched NSGDM algorithm is a powerful tool for dealing with heavy-tailed noises in stochastic nonconvex optimization problems, and its convergence rate is optimal in T.
The analysis of Batched NSGDM under heavy-tailed noises is more challenging than that of traditional SGD, as the noise only has a finite p-th moment, which can lead to infinite variance.
The generalized smoothness condition and generalized heavy-tailed noises assumption used in this paper are more realistic and general than the traditional smoothness and finite variance assumptions.
The authors' refined lower complexity result provides a more precise order on problem-dependent parameters, which is important for understanding the convergence behavior of the algorithm.
The strategy for setting the stepsize and momentum parameter when the tail index p is unknown is simple and effective, and it allows the algorithm to converge at a rate of O(T^(1/2-1/p)).
The authors' work provides new insights into the optimization community and opens potential ways for future algorithm design.
The analysis of Batched NSGDM under heavy-tailed noises has important implications for many machine learning applications, where the data is often noisy and has heavy tails.

Mindmap

Citation

Liu, Z., & Zhou, Z. (2024). Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2412.19529

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Summary

Highlights

Key Insights

Mindmap

Citation

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation

Topics

Latest Posts

Popular Posts

MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRI

Valley Splitting Correlations Across a Silicon Quantum Well

A New Parameterization of Photoinhibition For Phytoplankton

Structured pathways in the turbulence organizing recent oil spill events in the Eastern Mediterranean

The Black Rabbits of Fibonacci

Contact Form