This chapter covers transfer learning and pre-trained models, a cornerstone of modern AI that enables rapid development of accurate models with limited data. For the AI-900 exam, approximately 10-15% of questions touch on these concepts, often in the context of Azure Cognitive Services and custom vision. You will learn what transfer learning is, how it works internally, its key components, and how to leverage pre-trained models in Azure. Understanding these concepts is critical for answering questions about training custom models, fine-tuning, and choosing between built-in AI services and custom solutions.
Jump to a section
Imagine a world-class chef who has spent years perfecting a set of base sauces—béchamel, velouté, tomato, hollandaise, and espagnole. These sauces are the result of thousands of hours of training on millions of ingredients. Now, a new restaurant opens and needs to create a signature dish. Instead of starting from scratch—growing wheat for flour, milking cows for butter—the chef takes the pre-made béchamel and adds a few local spices and a unique garnish. The base sauce already knows how to thicken, emulsify, and carry flavor. The chef only needs to fine-tune the final layers. This is transfer learning. The pre-trained model (the base sauce) has learned general features from a massive dataset (the chef's lifetime of experience). The new task (the signature dish) requires only minor adjustments—a few new layers or fine-tuning of existing weights. Without the base sauce, the chef would need years to develop a comparable foundation. In machine learning, transfer learning takes a model trained on a large, generic dataset (like ImageNet with 1.2 million images) and adapts it to a specific, smaller dataset (like X-ray images). The pre-trained model's early layers detect edges, textures, and shapes—universal features—while later layers are retrained to recognize domain-specific patterns. This dramatically reduces training time, data requirements, and computational cost. Just as the chef's sauce is a shortcut to culinary excellence, transfer learning is a shortcut to high-performing models without starting from zero.
What is Transfer Learning and Why Does It Exist?
Transfer learning is a machine learning technique where a model developed for one task is reused as the starting point for a model on a second task. It exists because training a deep neural network from scratch requires enormous amounts of labeled data, computational resources, and time. For example, training a state-of-the-art image classifier on ImageNet (1.2 million images, 1000 classes) can take days or weeks on multiple GPUs. Transfer learning bypasses this by taking a pre-trained model—one already trained on a large, generic dataset—and adapting it to a new, often smaller, dataset.
In the AI-900 context, transfer learning is the engine behind Azure Custom Vision and many other Azure Cognitive Services. It allows you to train a custom image classifier with as few as 50 images per class, achieving high accuracy in minutes instead of days. The exam expects you to understand when to use transfer learning versus training from scratch, and how Azure services abstract this complexity.
How Transfer Learning Works Internally
Neural networks learn hierarchical features. In a convolutional neural network (CNN) for images, early layers learn low-level features like edges, corners, and colors. Middle layers learn mid-level features like shapes and textures. Later layers learn high-level features specific to the original task, such as faces, wheels, or fur.
When you apply transfer learning, you typically: 1. Remove the original classifier head (the final fully connected layers that map features to the original classes). 2. Freeze the early layers' weights so they are not updated during training on the new dataset. This preserves the general feature detectors. 3. Add a new classifier head with the number of outputs equal to your new number of classes. 4. Train the new head (and optionally fine-tune some later layers) on your new dataset.
This process is called fine-tuning. The frozen layers act as a fixed feature extractor. The new head learns to combine these features for your specific task.
Key Components and Defaults
Pre-trained models: Common models used for transfer learning include: - ResNet: Residual networks with variants like ResNet-18, ResNet-50, ResNet-152. Default in Azure Custom Vision (ResNet-18 for simplicity, but other backbones available). - VGG: Very deep convolutional networks (VGG-16, VGG-19). - Inception: GoogLeNet architecture with inception modules. - EfficientNet: State-of-the-art efficiency scaling.
Azure Custom Vision: When you create a custom vision project, Azure automatically downloads a pre-trained model (e.g., ResNet-18) and fine-tunes it on your images. You can choose the model domain (e.g., "General", "Food", "Landmarks") which selects an appropriate pre-trained backbone.
Training parameters: - Number of training iterations: Default is 10 in Custom Vision, but you can increase up to 100. - Learning rate: Typically 0.001 for fine-tuning, but Azure handles this automatically. - Batch size: Determined by the service; you cannot set it manually.
Configuration and Verification in Azure
To use transfer learning in Azure: 1. Create a Custom Vision resource in the Azure portal. 2. Create a project and select a domain (e.g., "General" for most tasks). 3. Upload images and tag them with labels. 4. Train the model—Azure automatically performs transfer learning. 5. Evaluate using the precision, recall, and mAP metrics provided.
You can also use the Custom Vision Python SDK to programmatically train and manage models. Example:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateEntry
# Authenticate
trainer = CustomVisionTrainingClient(endpoint, training_key)
# Create project
project = trainer.create_project("My Custom Project", domain_id=domain.id)
# Upload images
image_list = [ImageFileCreateEntry(name=image_name, contents=image_data) for ...]
trainer.create_images_from_files(project.id, images=image_list)
# Train
iteration = trainer.train_project(project.id)Interaction with Related Technologies
Transfer learning is the foundation of Azure Cognitive Services. Services like Computer Vision, Custom Vision, and Form Recognizer use pre-trained models that are fine-tuned on specific data. For example: - Computer Vision: Pre-trained on millions of images; you cannot retrain it, but you can use its features for transfer learning in Custom Vision. - Custom Vision: Allows you to fine-tune a pre-trained model on your own images. - Form Recognizer: Uses transfer learning with pre-trained layout and OCR models, then fine-tunes on your forms.
Azure Machine Learning also supports transfer learning. You can use the AutoML module to automatically select a pre-trained model and fine-tune it on your dataset. This is useful for advanced users who need more control.
Edge Cases and Exam Traps
Overfitting: With very small datasets (<50 images per class), transfer learning can still overfit. Azure Custom Vision mitigates this with data augmentation (random flips, rotations, crops).
Domain mismatch: If the pre-trained model's original domain (e.g., natural images) is very different from your domain (e.g., medical X-rays), transfer learning may not help much. In such cases, you may need to fine-tune more layers or train from scratch.
Catastrophic forgetting: When fine-tuning, if you unfreeze too many layers and use a high learning rate, the model may forget previously learned features. This is why freezing is common.
Azure-specific: The exam may ask about the number of images needed for Custom Vision—the answer is at least 5 images per class, but 50+ is recommended for good accuracy.
Summary
Transfer learning is the practice of leveraging a pre-trained model to solve a new task faster and with less data. It works by freezing early layers (general features) and retraining later layers (specific features). Azure Cognitive Services like Custom Vision make transfer learning accessible with minimal code. Understanding when to use it and its limitations is key for AI-900 success.
Select a Pre-Trained Model
Choose a model architecture that has been trained on a large, relevant dataset. Common choices include ResNet, VGG, Inception, or EfficientNet. In Azure Custom Vision, this is abstracted by selecting a domain (e.g., 'General' uses ResNet-18). The pre-trained model has learned to detect edges, textures, and shapes from millions of images. The choice of backbone affects accuracy and speed; deeper models (ResNet-152) are more accurate but slower. For the exam, remember that Custom Vision automatically selects an appropriate pre-trained model based on the domain.
Remove the Classifier Head
The original classifier head maps the learned features to the original classes (e.g., 1000 classes for ImageNet). This head is removed because your new task likely has a different number of classes. In code, you typically access the model's layers and slice off the final fully connected layer. For example, in PyTorch: `model.fc = nn.Identity()`. Azure Custom Vision does this internally when you train a new project.
Freeze Early Layer Weights
To preserve the general feature detectors, freeze the weights of the early layers so they are not updated during training. This is done by setting `requires_grad = False` for those layers in PyTorch or TensorFlow. In Azure Custom Vision, freezing is automatic—only the new classifier head and possibly the last few convolutional layers are trained. Freezing prevents overfitting on small datasets and reduces training time.
Add a New Classifier Head
Replace the removed head with a new one that has the number of outputs equal to your new number of classes. This new head typically consists of a few fully connected layers and a softmax activation. In Azure Custom Vision, you specify the number of tags (classes) when creating the project. The service automatically creates a new head with the appropriate size. The new head starts with random weights and will learn to combine the frozen features for your specific task.
Fine-Tune on New Dataset
Train the model on your new dataset, updating only the weights of the new head (and optionally unfreezing some later layers for fine-tuning). Use a low learning rate (e.g., 0.001) to avoid distorting the pre-trained features. In Azure Custom Vision, you click 'Train' and the service runs a number of iterations (default 10). The training minimizes cross-entropy loss on your labeled images. After training, you evaluate using the provided metrics (precision, recall, mAP).
Enterprise Scenario 1: Retail Product Classification
A large e-commerce company needs to automatically classify product images into hundreds of categories (e.g., electronics, clothing, home goods). They have a dataset of 10,000 images per category, but many categories are new and have only 100 images. Using transfer learning, they take a pre-trained ResNet-50 model from Azure Custom Vision and fine-tune it on their product images. The model achieves 95% accuracy after 20 minutes of training. Without transfer learning, training from scratch would require millions of images and days of GPU time. In production, the model is deployed as an Azure Container Instance behind a load balancer handling 1000 requests per second. Common pitfalls: using too few images per class (below 50) leads to overfitting; the solution is to use data augmentation and a simpler model like ResNet-18.
Enterprise Scenario 2: Medical Imaging Diagnostics
A hospital wants to detect pneumonia from chest X-rays. They have 5,000 labeled X-rays, but training a deep network from scratch would be risky due to data scarcity and the need for high accuracy. They use transfer learning with a pre-trained DenseNet-121 model originally trained on ImageNet. The early layers are frozen, and the classifier head is replaced with a binary output (pneumonia vs. normal). Fine-tuning on the X-ray dataset takes 2 hours on a single GPU and achieves 92% sensitivity. The model is deployed via Azure Kubernetes Service with autoscaling. Key consideration: domain mismatch—natural images vs. medical images—so they unfreeze the last 20% of layers to adapt to the new domain. Misconfiguration: forgetting to normalize X-ray images to the same mean/std as the pre-trained model's training data (ImageNet statistics) can reduce accuracy by 10%.
Enterprise Scenario 3: Document Classification
A law firm needs to classify legal documents into categories like contracts, briefs, and memos. They use Azure Form Recognizer, which internally uses transfer learning with a pre-trained layout and OCR model. They provide 200 examples per category. The service fine-tunes the pre-trained model to recognize the document structure and text patterns. In production, the model processes 10,000 documents daily with 99% accuracy. Performance considerations: the pre-trained model is optimized for English text; for other languages, additional fine-tuning is required. Common failure: using documents with unusual layouts (e.g., handwritten notes) requires more training data and possibly a custom model built from scratch.
Exactly What AI-900 Tests on This Topic
AI-900 objective 2.3 covers 'Describe concepts of transfer learning and pre-trained models.' The exam expects you to:
Define transfer learning and explain its benefits (less data, less time, higher accuracy).
Identify scenarios where transfer learning is appropriate vs. training from scratch.
Recognize that Azure Custom Vision uses transfer learning.
Understand that pre-trained models are trained on large generic datasets (e.g., ImageNet).
Know that fine-tuning adapts a pre-trained model to a specific task.
Most Common Wrong Answers and Why
'Transfer learning requires more data than training from scratch.' This is false. The whole point is to reduce data requirements. Candidates confuse transfer learning with training from scratch.
'Transfer learning only works for image classification.' False. It works for many tasks: text classification (e.g., BERT), speech recognition, etc. The exam may show a non-image example to test this.
'Pre-trained models cannot be modified.' False. They can be fine-tuned by retraining the last layers. Azure Custom Vision does exactly this.
'Transfer learning always improves accuracy.' Not always. If the new domain is very different from the original, transfer learning may not help. The exam might present a scenario with medical images and ask if transfer learning is beneficial—the answer is 'Yes, but with caution.'
Specific Numbers and Terms That Appear Verbatim
ImageNet: The dataset with 1.2 million images and 1000 classes.
ResNet: The architecture often used as a pre-trained model.
Fine-tuning: The process of retraining the last layers.
Feature extraction: Using the pre-trained model as a fixed feature extractor.
Minimum images per class: 5 (Custom Vision), but 50+ recommended.
Training iterations: Default 10 in Custom Vision.
Edge Cases and Exceptions
Small datasets: With fewer than 5 images per class, transfer learning may still overfit. Use data augmentation.
Domain shift: If the pre-trained model's domain is too different (e.g., natural images vs. satellite imagery), consider unfreezing more layers or using a different pre-trained model.
Class imbalance: If one class has 1000 images and another has 10, the model may be biased. Use weighted loss or oversampling.
How to Eliminate Wrong Answers
If an option says 'start from scratch' or 'train a new model from random weights,' it is likely wrong unless data is abundant and domain is unique.
If an option says 'no training needed,' it is wrong—transfer learning still requires fine-tuning.
If an option mentions 'Azure Custom Vision does not use transfer learning,' it is false.
Look for keywords: 'pre-trained,' 'fine-tune,' 'less data,' 'faster training.' These point to transfer learning.
Transfer learning reuses a pre-trained model to solve a new task with less data and time.
Pre-trained models are trained on large generic datasets like ImageNet (1.2M images, 1000 classes).
Fine-tuning involves removing the old classifier head, freezing early layers, adding a new head, and training on new data.
Azure Custom Vision is a service that automates transfer learning for image classification and object detection.
Minimum images per class for Custom Vision is 5, but 50+ is recommended for good accuracy.
Transfer learning is not always beneficial; it works best when the new domain is similar to the pre-training domain.
Common pre-trained architectures include ResNet, VGG, Inception, and EfficientNet.
Azure Cognitive Services like Computer Vision, Custom Vision, and Form Recognizer leverage transfer learning.
Freezing early layers prevents overfitting and reduces training time.
The exam tests understanding of when to use transfer learning vs. training from scratch.
These come up on the exam all the time. Here's how to tell them apart.
Transfer Learning
Requires less labeled data (e.g., 50 images per class vs. 1000+)
Much faster training (minutes vs. days)
Uses pre-trained weights as a starting point
Lower computational cost (can run on CPU for small datasets)
Best when new dataset is similar to pre-training dataset
Training from Scratch
Requires large labeled dataset (millions of images for deep networks)
Training can take days or weeks on multiple GPUs
Weights are initialized randomly
High computational cost (requires powerful GPU clusters)
Necessary when new domain is very different from any pre-trained model
Azure Custom Vision
Uses transfer learning to fine-tune on your images
You provide labeled images and train a custom model
Supports object detection and classification
Requires at least 5 images per class
Model can be exported for offline use
Azure Computer Vision
Provides pre-trained models that cannot be retrained
No custom training; you use the API as-is
Offers pre-built capabilities like OCR, landmark detection
No minimum image requirement; works out-of-the-box
Always requires internet access to the API
Mistake
Transfer learning means you don't need to train at all.
Correct
Transfer learning still requires training (fine-tuning) on the new dataset. The pre-trained model provides a starting point, but the new classifier head must be trained to adapt to the new task.
Mistake
Pre-trained models are only available for images.
Correct
Pre-trained models exist for many domains: text (BERT, GPT), speech (DeepSpeech), video (I3D), etc. Azure Cognitive Services include pre-trained models for language, speech, and vision.
Mistake
You cannot use transfer learning with Azure Cognitive Services.
Correct
Azure Custom Vision and Form Recognizer explicitly use transfer learning. Other services like Computer Vision offer pre-trained models that can be used as feature extractors.
Mistake
Transfer learning always gives better accuracy than training from scratch.
Correct
If the new dataset is very different from the pre-training dataset (e.g., medical images vs. natural images), transfer learning may not help and could even hurt if fine-tuned improperly. It is often beneficial but not guaranteed.
Mistake
Fine-tuning means retraining the entire model.
Correct
Fine-tuning typically retrains only the last few layers (classifier head) while freezing the early layers. In some cases, later layers may be unfrozen, but the entire model is rarely retrained from scratch.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Transfer learning is a technique where a model trained on one task is reused as the starting point for a model on a second task. For example, a model trained on ImageNet (1.2 million images) can be fine-tuned to classify X-ray images with only a few hundred examples. This saves time, data, and computational resources. In Azure, Custom Vision uses transfer learning to let you train custom image classifiers with minimal data.
Azure Custom Vision starts with a pre-trained model (e.g., ResNet-18) that has learned general image features from a large dataset. When you upload your labeled images and click 'Train,' the service removes the original classifier head, adds a new one matching your number of tags, and fine-tunes the model on your images. The early layers are frozen to preserve general features, and only the new head is trained. This process typically takes minutes and requires as few as 5 images per class.
Fine-tuning is a specific type of transfer learning where the pre-trained model's weights are slightly adjusted (fine-tuned) on the new dataset. In contrast, transfer learning can also include using the pre-trained model as a fixed feature extractor without updating its weights. Fine-tuning usually involves unfreezing some of the later layers and training them with a low learning rate. In Azure Custom Vision, both are used: early layers are frozen (feature extraction), and the new head is trained (fine-tuning).
Use transfer learning when you have a limited amount of labeled data (e.g., a few hundred images) and the new task is similar to the original task the pre-trained model was trained on (e.g., natural images). Training from scratch is only advisable if you have a very large dataset (millions of images) and a unique domain that is not well-represented by existing pre-trained models. In most real-world scenarios, transfer learning is recommended as it yields better results faster.
Pre-trained models in Azure are neural networks that have been trained on large datasets by Microsoft. Examples include the models behind Computer Vision (trained on millions of images), Text Analytics (trained on vast text corpora), and Speech Services (trained on hundreds of hours of audio). These models can be used directly via APIs or used as a starting point for transfer learning in services like Custom Vision and Form Recognizer.
Yes, transfer learning is widely used in natural language processing (NLP) with models like BERT and GPT, and in speech recognition with models like DeepSpeech. In Azure, you can use transfer learning with Custom Text (for custom text classification) and Custom Speech (for custom speech models). The same principles apply: start with a pre-trained model and fine-tune on your specific data.
The minimum is 5 images per class, but 50 or more per class is recommended for good accuracy. With fewer images, the model may overfit. Azure Custom Vision uses data augmentation (random flips, rotations, crops) to artificially increase the effective dataset size, which helps with small datasets.
You've just covered Transfer Learning and Pre-Trained Models — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?