Stochastic gradient descent with random label noises: doubly stochastic models and inference stabilizer
Stochastic gradient descent with random label noises: doubly stochastic models and inference stabilizer
Random label noises (or observational noises) widely exist in practical machine learning settings. While previous studies primarily focus on the affects of label noises to the performance of learning, our work intends to investigate the implicit regularization effects of the label noises, under mini-batch sampling settings of stochastic gradient descent (SGD), with assumptions that label noises are unbiased. Specifically, we analyze the learning dynamics of SGD over the quadratic loss with unbiased label noises, where we model the dynamics of SGD as a stochastic differentiable equation (SDE) with two diffusion terms (namely a Doubly Stochastic Model ). While the first diffusion term is caused by mini-batch sampling over the (labelnoiseless) loss gradients, as in many other works on SGD [1, 2], our model investigates the second noise term of SGD dynamics, which is caused by mini-batch sampling over the label noises, as an implicit regularizer. Our theoretical analysis finds such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters (namely inference stability). Though similar phenomenon have been investigated by Blanc et al. [3], our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more generalizable result with convergence of approximation proved. To validate our analysis, we design two sets of empirical studies to analyze the implicit regularizer of SGD with unbiased random label noises for deep neural networks training and linear regression. Our first experiment studies the noisy self-distillation tricks for deep learning, where student networks are trained using the outputs from well-trained teachers with additive unbiased random label noises. Our experiment shows that the implicit regularizer caused by the label noises tend to select models with improved inference stability.
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xuhong
dc41cbd3-4bcc-4b29-9eef-41e66ff34115
Yu, Boyang
8fcbdeda-72a9-469c-8f64-4d8a1723b0f6
Wu, Dongrui
1fc199bb-8a24-4918-b9f5-e172249adc5c
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Dou, Dejing
ede75a76-6d15-443b-b5c4-63fc2e007f8c
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xuhong
dc41cbd3-4bcc-4b29-9eef-41e66ff34115
Yu, Boyang
8fcbdeda-72a9-469c-8f64-4d8a1723b0f6
Wu, Dongrui
1fc199bb-8a24-4918-b9f5-e172249adc5c
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Dou, Dejing
ede75a76-6d15-443b-b5c4-63fc2e007f8c
Xiong, Haoyi, Li, Xuhong, Yu, Boyang, Wu, Dongrui, Zhu, Zhanxing and Dou, Dejing
(2023)
Stochastic gradient descent with random label noises: doubly stochastic models and inference stabilizer.
Machine Learning: Science and Technology.
(doi:10.1088/2632-2153/ad13ba).
Abstract
Random label noises (or observational noises) widely exist in practical machine learning settings. While previous studies primarily focus on the affects of label noises to the performance of learning, our work intends to investigate the implicit regularization effects of the label noises, under mini-batch sampling settings of stochastic gradient descent (SGD), with assumptions that label noises are unbiased. Specifically, we analyze the learning dynamics of SGD over the quadratic loss with unbiased label noises, where we model the dynamics of SGD as a stochastic differentiable equation (SDE) with two diffusion terms (namely a Doubly Stochastic Model ). While the first diffusion term is caused by mini-batch sampling over the (labelnoiseless) loss gradients, as in many other works on SGD [1, 2], our model investigates the second noise term of SGD dynamics, which is caused by mini-batch sampling over the label noises, as an implicit regularizer. Our theoretical analysis finds such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters (namely inference stability). Though similar phenomenon have been investigated by Blanc et al. [3], our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more generalizable result with convergence of approximation proved. To validate our analysis, we design two sets of empirical studies to analyze the implicit regularizer of SGD with unbiased random label noises for deep neural networks training and linear regression. Our first experiment studies the noisy self-distillation tricks for deep learning, where student networks are trained using the outputs from well-trained teachers with additive unbiased random label noises. Our experiment shows that the implicit regularizer caused by the label noises tend to select models with improved inference stability.
Text
Xiong+et+al_2023_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad13ba
- Accepted Manuscript
More information
e-pub ahead of print date: 8 December 2023
Identifiers
Local EPrints ID: 486446
URI: http://eprints.soton.ac.uk/id/eprint/486446
ISSN: 2632-2153
PURE UUID: 98356c22-e291-4d64-9a3c-2e79b1e73d72
Catalogue record
Date deposited: 22 Jan 2024 17:53
Last modified: 17 Mar 2024 06:51
Export record
Altmetrics
Contributors
Author:
Haoyi Xiong
Author:
Xuhong Li
Author:
Boyang Yu
Author:
Dongrui Wu
Author:
Zhanxing Zhu
Author:
Dejing Dou
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics