GCDLChapter 56 of 101Objective 3.2

Pre-Trained AI Models on Google Cloud

The availability of pre-trained AI models on Google Cloud makes this a critical topic for the GCDL exam under Domain: Data Analytics & AI, Objective 3.2. Pre-trained models are a cornerstone of modern AI deployment, enabling organizations to leverage state-of-the-art capabilities without building models from scratch. Expect approximately 10-15% of exam questions to touch on this area, focusing on use cases, model types, and the trade-offs between pre-trained and custom models. You will learn about Vertex AI Model Garden, pre-built APIs, and how to select the right model for a given business problem.

25 min read

Intermediate

Updated Jul 21, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Pre-Trained Models as Expert Interns

Suppose you're running a large company and need to analyse thousands of customer emails daily. Instead of training every new employee from scratch—teaching them language, grammar, business etiquette, and product knowledge—you hire a group of expert interns who already have a strong foundation. These interns have completed a rigorous general education (pre-training) covering language, reasoning, and common knowledge. They arrive knowing how to read and write, understand context, and follow instructions. Your job is now much simpler: you give them a short, specific training session on your company's products and policies (fine-tuning). They can then start handling emails immediately with high accuracy. The key is that the heavy lifting of building general intelligence is already done; you only pay for the final specialization. In Google Cloud's AI Platform, pre-trained models like BERT, ResNet, or Cloud Vision API are analogous to these expert interns. They come with general capabilities learned from massive datasets, and you can adapt them to your specific task with minimal additional data and compute. This approach dramatically reduces the time, cost, and expertise needed to deploy AI solutions, making advanced AI accessible to organizations without deep machine learning teams.

How It Actually Works

What Are Pre-Trained AI Models and Why Do They Exist?

Pre-trained AI models are machine learning models that have been trained on large, general datasets by Google or third-party providers. They encapsulate learned patterns, features, and representations that can be reused for a variety of downstream tasks. For example, a pre-trained image classification model like ResNet-50 has already learned to recognize edges, textures, shapes, and common objects from millions of images. A pre-trained language model like BERT has learned grammar, context, and semantics from billions of words.

The primary reason pre-trained models exist is the prohibitive cost and complexity of training large models from scratch. Training a state-of-the-art model can require thousands of GPU-hours, terabytes of data, and deep expertise in machine learning. Pre-trained models democratize AI by allowing organizations to start with a powerful baseline and only fine-tune on their specific data—often with just a few hundred or thousand examples. This approach is called transfer learning.

On Google Cloud, pre-trained models are available through several services: - Vertex AI Model Garden: A curated repository of foundation models, including Google's own models (e.g., PaLM, Codey, Imagen) and open-source models (e.g., BERT, ResNet, YOLO). - Pre-built APIs: Fully managed services like Cloud Vision API, Natural Language API, Translation API, and Speech-to-Text API that provide ready-to-use inference endpoints without any model training. - AI Platform (Unified): Allows you to deploy pre-trained models from TensorFlow Hub, PyTorch Hub, or custom containers.

How Pre-Trained Models Work Internally

A pre-trained model is essentially a set of learned weights and biases that define a neural network. The training process involves two phases: 1. Pre-training: The model is trained on a large, generic dataset using unsupervised or self-supervised learning. For example, BERT is pre-trained on the entire Wikipedia corpus and BookCorpus using a masked language model objective: it learns to predict missing words in a sentence. This forces the model to understand context and syntax. 2. Fine-tuning: The pre-trained weights are used as a starting point, and the model is further trained on a smaller, task-specific dataset with a supervised objective. For instance, you can take BERT and fine-tune it on a set of customer support emails labeled by category (billing, technical, etc.). Only the final classification layer may need to be trained from scratch; the rest of the network is adjusted slightly.

During inference, the model takes input (e.g., an image, text, or audio) and passes it through its layers. The pre-trained layers extract general features (e.g., edges, shapes, syntax), and the fine-tuned layers map those features to the specific task output (e.g., category probabilities, bounding boxes, translated text).

Key Components and Defaults

Vertex AI Model Garden:

- Models are organized by task: text generation, image generation, code generation, vision, etc. - Foundation models include: - PaLM 2: Google's large language model for text generation, summarization, and chat. Available in sizes: Bison, Gecko. - Codey: For code generation and completion. Based on PaLM 2. - Imagen: For text-to-image generation. - Chirp: For speech recognition. - Open-source models include: BERT, ResNet, EfficientNet, YOLO, and many more from TensorFlow Hub. - Default inference endpoints are auto-scaled; you pay per prediction request. - Fine-tuning can be done via Vertex AI Pipelines or custom training jobs.

Pre-built APIs: - Cloud Vision API: Returns labels, OCR, safe search, and face detection. Default batch size: 16 images per request. Max image size: 20 MB. Supports JPEG, PNG, GIF, BMP, WEBP. - Natural Language API: Entity extraction, sentiment analysis, syntax analysis. Default max text length: 1 MB per document. - Translation API: Supports 100+ languages. Default translation model is NMT (Neural Machine Translation). - Speech-to-Text API: Supports 125+ languages. Default sample rate: 16 kHz. Can process up to 1 minute of audio per request (synchronous) or up to 480 minutes (asynchronous).

AI Platform (Unified):

You can deploy pre-trained models from TensorFlow SavedModel format, PyTorch TorchScript, or custom containers.

Default machine type for prediction: n1-standard-2 (2 vCPUs, 7.5 GB RAM). Autoscaling can handle up to 10,000 requests per second per deployment.

Configuration and Verification Commands

Using Vertex AI Model Garden via gcloud:

gcloud ai models list --region=us-central1

This lists all models in your project. To deploy a model:

gcloud ai endpoints create --region=us-central1 --display-name=my-endpoint
gcloud ai endpoints deploy-model --region=us-central1 --endpoint=my-endpoint --model=model-id --machine-type=n1-standard-4

Using Cloud Vision API via curl:

curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://vision.googleapis.com/v1/images:annotate -d '{
    "requests": [{
      "image": {"source": {"imageUri": "gs://bucket/image.jpg"}},
      "features": [{"type": "LABEL_DETECTION", "maxResults": 10}]
    }]
  }'

Response includes labels with confidence scores.

Verifying model performance: Use Vertex AI Model Evaluation to compute metrics like precision, recall, and F1 score on a test set.

Interaction with Related Technologies

Pre-trained models integrate with: - Vertex AI Pipelines: Automate the fine-tuning and deployment workflow. - Cloud Storage: Store training data and model artifacts. - Cloud Functions: Trigger inference on events (e.g., new image uploaded to Cloud Storage). - BigQuery ML: Use pre-trained models directly in SQL queries for predictions. - AI Explanations: Get feature attributions for model predictions.

Exam-Relevant Details

The GCDL exam tests understanding of when to use pre-trained vs. custom models. Key points:

Pre-trained models are ideal when you have limited data (<10,000 examples) or limited ML expertise.

Custom models are needed when the task is highly specific (e.g., proprietary medical imaging) or requires unique domain knowledge.

Pre-built APIs are fully managed; you cannot access the underlying model weights. Vertex AI Model Garden allows fine-tuning.

Foundation models like PaLM 2 are available via Vertex AI and can be fine-tuned with as few as 100 examples (few-shot learning).

The exam may ask about responsible AI: pre-trained models can inherit biases from training data; you must evaluate and mitigate.

Pricing: Pre-built APIs charge per request (e.g., $1.50 per 1,000 images for Cloud Vision). Vertex AI charges per prediction and per training hour.

Trap Patterns on the Exam

Candidates often confuse pre-trained models with custom models. Common wrong answers: - "Pre-trained models always require more data than custom models" — False; they require less. - "You cannot fine-tune a pre-trained model on Google Cloud" — False; you can via Vertex AI. - "All pre-trained models are available as APIs only" — False; many are available as deployable models. - "Pre-trained models are always more accurate than custom models" — False; custom models can outperform if sufficient data and expertise exist.

Another trap: assuming that a pre-trained model cannot be used for a task different from its pre-training objective. For example, BERT (pre-trained on masked language modeling) can be fine-tuned for classification, question answering, or summarization.

Walk-Through

Identify Business Requirement

Start by clearly defining the AI task: classification, object detection, text generation, translation, etc. Assess available data: do you have labeled examples? How many? If you have fewer than 10,000 labeled examples, a pre-trained model is likely the best choice. Also consider latency requirements, privacy (data residency), and budget. Document the expected input/output format. This step sets the foundation for model selection.

Select Pre-Trained Model or API

Choose between a pre-built API (fully managed, no training) or a deployable model from Vertex AI Model Garden. For common tasks like image labeling or sentiment analysis, the Cloud Vision API or Natural Language API are fastest to implement. For more custom tasks (e.g., classifying proprietary product images), select a model like ResNet from Model Garden that can be fine-tuned. Consider model size: larger models (e.g., PaLM) offer better accuracy but higher cost and latency. Use the Vertex AI Model Garden UI or gcloud to browse models.

Prepare Data and Fine-Tune (If Needed)

If using a pre-built API, skip this step. For fine-tuning, partition your labeled data into training (80%), validation (10%), and test (10%) sets. Upload to Cloud Storage. Create a Vertex AI dataset and start a training job. You can use Vertex AI AutoML for automated fine-tuning or custom training with your own code. Monitor training metrics (loss, accuracy) via Vertex AI Experiments. Fine-tuning typically takes 1-10 hours depending on data size and model complexity. Use early stopping to prevent overfitting.

Deploy Model or API Endpoint

For pre-built APIs, you get an endpoint immediately after enabling the service. For a fine-tuned model, deploy it to a Vertex AI endpoint. Choose a machine type based on latency and throughput needs. For low-latency, use n1-standard-4 or higher. Enable autoscaling with min/max nodes. Test the endpoint with sample inputs using gcloud or client libraries. Verify that the endpoint returns predictions in the expected format (e.g., JSON). Set up monitoring with Cloud Monitoring to track request count, latency, and errors.

Integrate and Monitor in Production

Integrate the endpoint into your application via REST or gRPC. Use Cloud Functions or Cloud Run for serverless integration. Implement logging with Cloud Logging to capture predictions and errors. Monitor for data drift: if the input distribution changes, model accuracy may degrade. Use Vertex AI Model Monitoring to detect drift and trigger retraining. Set up alerts for high latency or error rates. Plan for periodic retraining (e.g., monthly) to maintain accuracy. Document the model version and deployment date for audit trails.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Product Categorization

A large online retailer wants to automatically categorize new product listings into 500 categories (e.g., Electronics > Laptops, Clothing > Shoes). They have 50,000 labeled products but the categories change frequently. They choose to use a pre-trained BERT model from Vertex AI Model Garden and fine-tune it on their product titles and descriptions. The fine-tuning takes 3 hours on an n1-standard-8 machine. They deploy the model to an endpoint with autoscaling (1-10 nodes). In production, the model achieves 94% accuracy. When a new category is added, they collect 100 examples and fine-tune again, which takes only 30 minutes due to transfer learning. The key benefit: they avoided building a model from scratch, saving months of development time.

Enterprise Scenario 2: Healthcare Document Processing

A hospital network needs to extract medical codes (ICD-10) from clinical notes. They have strict data privacy requirements—data must stay within their VPC. They cannot use a public API. Instead, they deploy a pre-trained Natural Language API model in a private endpoint using Vertex AI Private Endpoints. They fine-tune the model using de-identified notes. The model is deployed in us-central1 with a single node. They process 10,000 notes per day with an average latency of 200ms. The fine-tuning improved accuracy from 70% (zero-shot) to 92%. Misconfiguration risk: if they had not used a private endpoint, data would have left their VPC, violating HIPAA.

Enterprise Scenario 3: Customer Support Chatbot

A telecom company wants to build a chatbot to answer billing questions. They use the PaLM 2 foundation model via Vertex AI with few-shot learning: they provide 20 example conversations in the prompt. No fine-tuning is needed. The model generates responses with high relevance. They integrate with Dialogflow CX for conversation management. The system handles 100,000 conversations per month. Cost is $0.002 per prediction. Common pitfall: prompt engineering is critical—poorly structured prompts lead to irrelevant answers. They iterate on prompts using Vertex AI Prompt Optimizer.

Performance and Scale Considerations

Pre-built APIs have rate limits: Cloud Vision API allows 1,500 requests per minute per project by default. For higher throughput, request a quota increase.

Deployed models can scale to thousands of requests per second with autoscaling, but cold starts can cause latency spikes. Use min replicas to keep at least one instance warm.

Fine-tuning cost: For a ResNet-50 model with 10,000 images, training on an n1-standard-8 costs ~$5 per hour. Typical fine-tuning takes 2-4 hours.

What Goes Wrong with Misconfiguration

Overfitting: Fine-tuning with too few examples (e.g., 10) causes the model to memorize rather than generalize. Solution: use data augmentation or a simpler model.

Underfitting: Not fine-tuning enough (e.g., 1 epoch) may not adapt the model. Solution: monitor validation loss and train until convergence.

Data mismatch: If the pre-training data distribution is very different from your task (e.g., medical vs. general text), performance may be poor. Solution: consider a domain-specific pre-trained model (e.g., BioBERT for biomedical text).

Cost explosion: Leaving autoscaling enabled with no max limit can cause high bills during traffic spikes. Always set max replicas.

How GCDL Actually Tests This

Exactly What GCDL Tests

Under Objective 3.2, the exam focuses on: - Identifying appropriate use cases for pre-trained models vs. custom models. - Recognizing the available pre-trained model services: Vertex AI Model Garden, Cloud Vision API, Natural Language API, Translation API, Speech-to-Text API, and Document AI. - Understanding transfer learning as the mechanism behind fine-tuning. - Knowing the benefits: reduced time, data, and expertise requirements. - Awareness of limitations: bias, lack of domain specificity, and inability to modify underlying architecture.

Key Takeaways

Pre-trained models leverage transfer learning: they are trained on large general datasets and fine-tuned on smaller task-specific data.

Google Cloud offers pre-built APIs (Vision, Natural Language, Translation, Speech) and fine-tunable models via Vertex AI Model Garden.

Use pre-trained models when you have limited data (<10,000 examples) or limited ML expertise; use custom models for highly specific tasks with abundant data.

Fine-tuning can be done with as few as 100 examples; Vertex AI AutoML simplifies the process.

Pre-built APIs charge per request; Vertex AI charges per compute hour and prediction.

Foundation models like PaLM 2 support zero-shot and few-shot learning via prompt engineering.

Pre-trained models may contain biases; always evaluate for fairness and use responsible AI practices.

Vertex AI Model Garden includes both Google and open-source models.

Private endpoints are available for sensitive data that cannot leave your VPC.

Monitor for data drift and retrain periodically to maintain accuracy.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Pre-built APIs (e.g., Cloud Vision)

Fully managed; no model training or deployment needed.

Fixed set of features (e.g., label detection, OCR). Cannot add custom labels.

Pricing per request (e.g., $1.50/1,000 images).

No access to model weights; cannot modify model architecture.

Best for common, well-defined tasks with standard output formats.

Vertex AI Model Garden (Fine-tunable)

Requires fine-tuning and deployment; more control.

Can be fine-tuned for custom labels or tasks (e.g., classify your own products).

Pricing per compute hour and per prediction; can be cheaper at high volume.

Full access to model weights; can modify architecture (if open-source).

Best for domain-specific tasks with custom requirements and moderate data.

Pre-trained Models (Transfer Learning)

Requires less data (100-10,000 examples).

Faster to develop (hours to days).

Lower expertise needed; can use AutoML.

May inherit biases or limitations from pre-training.

Ideal for most business applications.

Custom Models (Train from Scratch)

Requires large dataset (100,000+ examples).

Longer development time (weeks to months).

Requires deep ML expertise.

Full control over architecture and training data.

Necessary for highly novel or specialized tasks.

Watch Out for These

Mistake

Pre-trained models cannot be fine-tuned; you must use them as-is.

Correct

Many pre-trained models on Vertex AI Model Garden can be fine-tuned on custom data. Only pre-built APIs (e.g., Cloud Vision API) are fixed. Fine-tuning adjusts the model's weights to the new task, improving accuracy.

Mistake

Pre-trained models are always free to use.

Correct

Pre-built APIs charge per request (e.g., $1.50 per 1,000 images). Vertex AI charges for compute and storage. Only limited free tier quotas exist (e.g., 1,000 requests per month for Vision API).

Mistake

You need a large dataset to use a pre-trained model.

Correct

Pre-trained models are designed for transfer learning, which requires much less data than training from scratch. With as few as 100 labeled examples, you can fine-tune effectively. Zero-shot and few-shot learning require no additional data.

Mistake

Pre-trained models are always more accurate than custom models.

Correct

Custom models trained on large, high-quality domain-specific datasets can outperform pre-trained models. Pre-trained models are a strong baseline but are not optimal for every task, especially if the domain is unique.

Mistake

All pre-trained models on Google Cloud are Google-owned.

Correct

Vertex AI Model Garden includes both Google's foundation models (PaLM 2, Codey, Imagen) and open-source models from the community (BERT, ResNet, YOLO). Users can also bring their own pre-trained models.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a pre-built API and a fine-tunable model on Google Cloud?

A pre-built API (e.g., Cloud Vision API) is a fully managed service with a fixed set of capabilities. You send requests and get results; you cannot modify the model or add custom labels. A fine-tunable model from Vertex AI Model Garden (e.g., BERT) allows you to train it further on your own data, customizing its output. Pre-built APIs are easier to use but less flexible; fine-tunable models require more effort but can be tailored to your specific task.

Can I use a pre-trained model without any training data?

Yes, for some tasks. Foundation models like PaLM 2 support zero-shot learning: you provide a prompt and the model generates a response without any examples. For example, you can ask it to summarize a text without providing summaries. However, for higher accuracy, you may use few-shot learning by including a few examples in the prompt. For tasks requiring custom labels (e.g., classifying your own products), you typically need at least 100 labeled examples for fine-tuning.

How much does it cost to use pre-trained models on Google Cloud?

Costs vary. Pre-built APIs charge per request: Cloud Vision label detection is $1.50 per 1,000 images; Natural Language entity extraction is $1.00 per 1,000 documents. Vertex AI Model Garden charges for compute resources used during fine-tuning and inference. For example, fine-tuning a ResNet-50 on an n1-standard-8 machine costs about $5 per hour. Inference costs depend on the machine type and number of predictions. There are free tier quotas for some APIs (e.g., 1,000 requests/month for Vision).

What is Vertex AI Model Garden?

Vertex AI Model Garden is a curated repository of foundation models and open-source models that can be deployed and fine-tuned on Google Cloud. It includes Google's models like PaLM 2, Codey, and Imagen, as well as popular open-source models like BERT, ResNet, and YOLO. You can browse models by task, deploy them to endpoints, and fine-tune them using Vertex AI training. It is the primary tool for leveraging pre-trained models beyond fixed APIs.

How do I choose between a pre-trained model and a custom model?

Choose a pre-trained model if you have limited labeled data (fewer than 10,000 examples), limited ML expertise, or need a quick solution. Choose a custom model if you have a large, high-quality dataset (100,000+ examples), the task is highly domain-specific (e.g., medical imaging with rare conditions), or you need full control over the model architecture. Pre-trained models are a good starting point for most business problems; custom models are only necessary when pre-trained models fail to meet accuracy requirements.

Can I deploy a pre-trained model on-premises or in another cloud?

Yes, if the model is open-source (e.g., BERT, ResNet) and you have the weights. You can export the model from Vertex AI and deploy it anywhere. However, Google's proprietary models (e.g., PaLM 2) are only available through Google Cloud services. For on-premises deployment, you would need to use an open-source alternative or run the model in a container on your own infrastructure.

What is fine-tuning and how does it work on Vertex AI?

Fine-tuning is the process of taking a pre-trained model and training it further on a smaller, task-specific dataset. On Vertex AI, you can use AutoML to automatically fine-tune a model by providing labeled data. Alternatively, you can write your own training code and run it on Vertex AI Training. The pre-trained weights are loaded, and the model is updated using your data. Fine-tuning typically requires fewer epochs than training from scratch, and the learning rate is usually lower to avoid destroying the pre-trained features.

Terms Worth Knowing

BigQuery Cloud computing Cloud IAM Cloud storage Machine learning Region

Ready to put this to the test?

You've just covered Pre-Trained AI Models on Google Cloud — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Try GCDL practice questions Back to all chapters

Done with this chapter?

AutoML: Building ML Without Coding

AI Bias, Fairness, and Explainability

See the full GCDL study guide