Transfer Learning & Fine Tuning: Using Pre-Trained Models
Modern deep learning often begins not with training from scratch, but by standing on the shoulders of giants-previously trained networks that have already captured useful patterns from massive datasets.
Transfer Learning and Fine Tuning let you tap into this pre-trained knowledge, drastically reducing training time and data requirements. In this chapter, we explore both concepts, show how they work in TensorFlow/Keras
, and discuss when to apply each technique.
Transfer Learning: Using Pre-Trained Networks as Feature Extractors
Transfer Learning treats a pre-trained model—such as ResNet, VGG, or MobileNet—as a fixed feature extractor. By freezing its convolutional layers and appending new classification layers, you retain rich hierarchical features learned on large datasets like ImageNet.
1import tensorflow as tf 2from tensorflow.keras.applications import MobileNetV2 3from tensorflow.keras.models import Model 4from tensorflow.keras.layers import Dense, GlobalAveragePooling2D 5 6base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) 7base_model.trainable = False 8 9x = GlobalAveragePooling2D()(base_model.output) 10x = Dense(1024, activation='relu')(x) 11predictions = Dense(10, activation='softmax')(x) 12model = Model(inputs=base_model.input, outputs=predictions) 13model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) 14model.fit(train_data, validation_data=val_data, epochs=10)
In this pattern, only the newly added dense layers learn from your smaller dataset, while the base model’s parameters remain fixed. This approach excels when your dataset shares characteristics with the pre-trained model’s original domain and when computational resources are limited.
Fine Tuning: Adapting Pre-Trained Weights to Your Task
After training the top layers, you can unlock deeper layers in the base model to refine the learned representations. Fine Tuning involves unfreezing select layers—and often lowering the learning rate—to allow gradual adjustments that better suit your specific data.
1for layer in base_model.layers[-20:]: 2 layer.trainable = True 3 4model.compile( 5 optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), 6 loss='categorical_crossentropy', 7 metrics=['accuracy'] 8) 9model.fit(train_data, validation_data=val_data, epochs=10)
By choosing how many layers to unfreeze and applying a smaller
learning rate
, you achieve a balance between retaining foundational features and adapting to new patterns. Fine tuning shines when your target dataset diverges moderately from the original training domain or when task-specific nuances demand deeper adjustment.
Aspect | Transfer Learning | Fine Tuning |
---|---|---|
Definition | Use a pre-trained model as a fixed extractor | Unfreeze and retrain parts of the base model |
Trainable Layers | Only newly added custom layers | Custom layers + selected pre-trained layers |
Learning Rate | Standard (e.g., 1e-3) | Lower (e.g., 1e-5) |
Dataset Requirement | Small dataset suffices | Larger dataset helps avoid overfitting |
Computational Cost | Low | Higher |
Ideal When | Tasks closely match pre-trained domain | Tasks differ moderately from original domain |
Pre-trained models span diverse domains and sizes. Convolutional backbones like ResNet, VGG, and EfficientNet excel in computer vision, while Transformers such as BERT and GPT dominate NLP tasks. Mobile-friendly architectures (MobileNet, EfficientNet Lite) suit edge devices. For sequential data, RNNs or LSTMs with pre-trained embeddings offer strong starting points.
Whether you freeze an entire backbone or selectively unfreeze layers for fine tuning, these techniques let you harness the power of large-scale pre-training on your own problems turning hours of training into minutes and scarce data into actionable insights.