Transfer Learning in Deep Learning

A transfer learning method involves transferring existing knowledge from a trained model into a new, similar, or entirely different task. Researchers can save a lot of time, and computation resources, and obtain higher performance even with small datasets using this technique.

PyTorch Nov 1, 2022 0 529 Add to Reading List

Transfer learning (TL) is a method of conveying knowledge adapted from a trained model to a new, similar, or wholly different task in deep learning. A pre-trained model, which is typically developed on large datasets utilizing several GPUs by large firms, is utilized as a feature extractor for new tasks in transfer learning. It is critical that the dataset on which the model was trained and the new dataset on which this model is deployed as a feature extractor has some similarities. These pre-trained models are developed using enormous datasets, one of which is ImageNet, which has 1.2 million images and 1000 different categories (classes). Because this post focuses on computer vision challenges, the models that we present were often constructed with Convolutional Neural Network (CNN) layers, which were then followed by some fully connected layers. Modern CNN-based image classification algorithms that are well-liked and have been trained on the ImageNet dataset include:

AlexNet
InceptionNetV3
VGG19 and VGG16
ResNet

The following are some examples of situations in which transfer learning might be beneficial:

1. Limited Dataset: While deep learning projects often require a massive quantity of data for training, TL can assist in achieving amazing outcomes even with a restricted amount of data to work with.

2. Speed: Training a model from scratch might take days or even weeks. TL can greatly assist to speed up the process by freezing the weights of top layers and just training a portion of the network or by adding a few layers to a model that has already been pre-trained.

3. Better Results: Using TL models can help you get far better results than training from scratch, and one of the reasons for this is that selecting appropriate hyper-parameters can be a challenging endeavor.

The aforementioned models typically consist of two parts. First, feature extraction; second, classification. Depending on the job you wish to do, there are many other alternative methods to employ a pre-trained model for transfer learning. The following are the most typical uses for pre-trained models:

1. Using a pre-trained model as a feature extractor: If your dataset contains a greater number of classes than ImageNet, you may choose to simply alter the network's classification section and train it. For instance, you might obtain 25088 flattened features from the final Conv layer of the vgg16 and add a linear layer, with input features of 25088 and output features corresponding to the classes in your dataset.

2. Fine-tuning the Model: When your dataset is sufficiently large and you wish to use the weights of a pre-trained network in addition to doing further iterations on your dataset rather than a larger dataset, you may do so. If your dataset differs somewhat from ImageNet, but not much, and it is large enough, overfitting is less likely to occur. In CNN-based networks, the top layers collect general characteristics like edges and colors while training the next few layers, which include domain and dataset-specific qualities is frequently a wonderful idea.