Towards making deep transfer learning never hurt
Towards making deep transfer learning never hurt
Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.
578-587
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xingjian
b8318628-5f1e-4334-8bff-ea1e489a1cea
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Huan, Jun
0ea4757d-fe12-44b7-9928-6f94e70117ae
30 January 2020
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xingjian
b8318628-5f1e-4334-8bff-ea1e489a1cea
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Huan, Jun
0ea4757d-fe12-44b7-9928-6f94e70117ae
Wan, Ruosi, Xiong, Haoyi, Li, Xingjian, Zhu, Zhanxing and Huan, Jun
(2020)
Towards making deep transfer learning never hurt.
In 2019 IEEE International Conference on Data Mining (ICDM).
IEEE.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.
This record has no associated files available for download.
More information
Published date: 30 January 2020
Identifiers
Local EPrints ID: 486289
URI: http://eprints.soton.ac.uk/id/eprint/486289
PURE UUID: c3d4218b-9372-46a0-8d54-f7f95988531b
Catalogue record
Date deposited: 16 Jan 2024 17:51
Last modified: 17 Mar 2024 06:51
Export record
Contributors
Author:
Ruosi Wan
Author:
Haoyi Xiong
Author:
Xingjian Li
Author:
Zhanxing Zhu
Author:
Jun Huan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics