The University of Southampton
University of Southampton Institutional Repository

Towards making deep transfer learning never hurt

Towards making deep transfer learning never hurt
Towards making deep transfer learning never hurt
Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.
578-587
IEEE
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xingjian
b8318628-5f1e-4334-8bff-ea1e489a1cea
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Huan, Jun
0ea4757d-fe12-44b7-9928-6f94e70117ae
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Xiong, Haoyi
ce4ad3c5-7887-4830-941c-02e593f20dae
Li, Xingjian
b8318628-5f1e-4334-8bff-ea1e489a1cea
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Huan, Jun
0ea4757d-fe12-44b7-9928-6f94e70117ae

Wan, Ruosi, Xiong, Haoyi, Li, Xingjian, Zhu, Zhanxing and Huan, Jun (2020) Towards making deep transfer learning never hurt. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE. pp. 578-587 .

Record type: Conference or Workshop Item (Paper)

Abstract

Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.

This record has no associated files available for download.

More information

Published date: 30 January 2020

Identifiers

Local EPrints ID: 486289
URI: http://eprints.soton.ac.uk/id/eprint/486289
PURE UUID: c3d4218b-9372-46a0-8d54-f7f95988531b

Catalogue record

Date deposited: 16 Jan 2024 17:51
Last modified: 17 Mar 2024 06:51

Export record

Contributors

Author: Ruosi Wan
Author: Haoyi Xiong
Author: Xingjian Li
Author: Zhanxing Zhu
Author: Jun Huan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×