Transfer Learning & Fine Tuning: Using Pre-Trained Models

Modern deep learning often begins not with training from scratch, but by standing on the shoulders of giants-previously trained networks that have already captured useful patterns from massive datasets.

Transfer Learning and Fine Tuning let you tap into this pre-trained knowledge, drastically reducing training time and data requirements. In this chapter, we explore both concepts, show how they work in TensorFlow/Keras, and discuss when to apply each technique.

Transfer Learning: Using Pre-Trained Networks as Feature Extractors

Transfer Learning treats a pre-trained model—such as ResNet, VGG, or MobileNet—as a fixed feature extractor. By freezing its convolutional layers and appending new classification layers, you retain rich hierarchical features learned on large datasets like ImageNet.

1import tensorflow as tf
2from tensorflow.keras.applications import MobileNetV2
3from tensorflow.keras.models import Model
4from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
5
6base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
7base_model.trainable = False
8
9x = GlobalAveragePooling2D()(base_model.output)
10x = Dense(1024, activation='relu')(x)
11predictions = Dense(10, activation='softmax')(x)
12model = Model(inputs=base_model.input, outputs=predictions)
13model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
14model.fit(train_data, validation_data=val_data, epochs=10)

In this pattern, only the newly added dense layers learn from your smaller dataset, while the base model’s parameters remain fixed. This approach excels when your dataset shares characteristics with the pre-trained model’s original domain and when computational resources are limited.

Fine Tuning: Adapting Pre-Trained Weights to Your Task

After training the top layers, you can unlock deeper layers in the base model to refine the learned representations. Fine Tuning involves unfreezing select layers—and often lowering the learning rate—to allow gradual adjustments that better suit your specific data.

1for layer in base_model.layers[-20:]:
2    layer.trainable = True
3
4model.compile(
5    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
6    loss='categorical_crossentropy',
7    metrics=['accuracy']
8)
9model.fit(train_data, validation_data=val_data, epochs=10)

By choosing how many layers to unfreeze and applying a smaller learning rate, you achieve a balance between retaining foundational features and adapting to new patterns. Fine tuning shines when your target dataset diverges moderately from the original training domain or when task-specific nuances demand deeper adjustment.

Aspect	Transfer Learning	Fine Tuning
Definition	Use a pre-trained model as a fixed extractor	Unfreeze and retrain parts of the base model
Trainable Layers	Only newly added custom layers	Custom layers + selected pre-trained layers
Learning Rate	Standard (e.g., 1e-3)	Lower (e.g., 1e-5)
Dataset Requirement	Small dataset suffices	Larger dataset helps avoid overfitting
Computational Cost	Low	Higher
Ideal When	Tasks closely match pre-trained domain	Tasks differ moderately from original domain

Pre-trained models span diverse domains and sizes. Convolutional backbones like ResNet, VGG, and EfficientNet excel in computer vision, while Transformers such as BERT and GPT dominate NLP tasks. Mobile-friendly architectures (MobileNet, EfficientNet Lite) suit edge devices. For sequential data, RNNs or LSTMs with pre-trained embeddings offer strong starting points.

Whether you freeze an entire backbone or selectively unfreeze layers for fine tuning, these techniques let you harness the power of large-scale pre-training on your own problems turning hours of training into minutes and scarce data into actionable insights.