This chapter provides a deep dive into Vertex AI, Google Cloud's unified machine learning platform. For the GCDL exam, understanding Vertex AI is critical because it integrates data preparation, training, deployment, and monitoring into a single service. Approximately 10-15% of exam questions touch on Vertex AI, either directly or as part of broader ML workflows. This chapter covers the core components, how they work together, and the exact configurations you need to know for the exam.
Jump to a section
Vertex AI is like a fully automated factory assembly line for building and running AI models. The raw materials (your data) arrive at the loading dock (Data Preparation). They are sorted and cleaned on the conveyor belt (Feature Engineering). The assembly line has multiple workstations: AutoML is like a robotic arm that can assemble a product from standard parts without human intervention; Custom Training is like a skilled craftsman who builds a custom product using specialized tools (your own code). Once the product is built, it moves to the quality control station (Model Evaluation) where it is tested against strict criteria. If it passes, it is packaged and shipped to the warehouse (Model Registry). When a customer places an order (prediction request), the product is retrieved from the warehouse and shipped out (Prediction Serving). The factory manager (Vertex AI Pipelines) can orchestrate the entire process, including retooling for new products (retraining) and monitoring efficiency (Model Monitoring). The factory floor is monitored by security cameras (IAM and VPC-SC) to ensure only authorized workers have access. This analogy is mechanistic: each component has a specific role, and they interact in a defined sequence, just like Vertex AI services work together.
What is Vertex AI and Why Does It Exist?
Vertex AI is Google Cloud's end-to-end machine learning platform that unifies all the services needed to build, train, deploy, and manage ML models. Before Vertex AI, Google Cloud had separate services like AI Platform (for training and prediction), AutoML, and various data preparation tools. This fragmentation required users to stitch together multiple services, leading to complexity and integration challenges. Vertex AI solves this by providing a single API and console for the entire ML workflow.
From the GCDL perspective, Vertex AI is the recommended path for most ML workloads on Google Cloud. The exam expects you to understand its key components, how they interact, and when to use each one.
How Vertex AI Works Internally – Step Through the Mechanism
Vertex AI is built on a microservices architecture running on Google Kubernetes Engine (GKE). When you submit a training job, the following happens:
Request Reception: The Vertex AI API receives your training request, which includes the training code, data source, and configuration (machine type, hyperparameters, etc.).
Resource Allocation: The Vertex AI scheduler allocates the requested resources from a pool of pre-warmed GKE clusters. These clusters are multi-tenant but isolated at the pod level using Kubernetes namespaces and resource quotas.
Container Execution: Your training code is packaged into a Docker container (either provided by Google or custom). Vertex AI pulls this container and runs it on the allocated GKE pod. The pod mounts the training data from Cloud Storage using the CSI (Container Storage Interface) driver.
Logging and Monitoring: All stdout/stderr output is streamed to Cloud Logging. Metrics (like loss, accuracy) are collected via Cloud Monitoring if you use the Vertex AI SDK for logging.
Model Artifact Creation: After training, the model artifact (e.g., SavedModel for TensorFlow) is saved to a Cloud Storage bucket specified in the configuration.
Model Registration: The model is automatically registered in the Vertex AI Model Registry, which stores metadata like version, framework, and evaluation metrics.
Deployment: For serving, you create an endpoint that deploys the model to a GKE pod with an autoscaled node pool. The endpoint receives prediction requests via gRPC or HTTP, and the model container processes them.
Key Components, Values, Defaults, and Timers
Vertex AI Workbench: Jupyter-based notebooks for exploration and development. Default machine type is n1-standard-4 (4 vCPU, 15 GB memory). Pre-built images for TensorFlow, PyTorch, etc.
AutoML: For tabular data, the default train budget is 1 hour (can be set up to 72 hours). For image classification, the default is 8 hours. AutoML uses neural architecture search (NAS) and transfer learning internally.
Custom Training: You specify machine type (e.g., n1-highmem-8, a2-highgpu-1g for GPUs). The default timeout is 24 hours (configurable up to 7 days). Training jobs have a maximum of 1000 replicas for distributed training.
Vertex AI Pipelines: Uses Kubeflow Pipelines (KFP) as the orchestration engine. Pipelines are defined as a graph of components using the KFP SDK. The default pipeline root is a Cloud Storage bucket. Each component runs as a Kubernetes pod.
Model Registry: Stores models with versions. You can set a default version for serving. Models are stored as artifacts in Cloud Storage, and the registry holds metadata (framework, version, labels).
Endpoints: Can be public or private (using Private Service Connect). Supports autoscaling with a default min replicas of 2 and max of 10. The default traffic split is 100% to the deployed model.
Model Monitoring: Monitors for feature drift and skew. The default monitoring interval is 1 hour. You can set alert thresholds (e.g., drift > 0.2 triggers an alert).
Explainable AI: Provides feature attributions using Shapley values or integrated gradients. The default number of samples for Shapley is 25.
Configuration and Verification Commands
To create a custom training job using the gcloud CLI:
gcloud ai custom-jobs create \
--region=us-central1 \
--display-name=my_training_job \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=gcr.io/cloud-aiplatform/training/tf-cpu.2-6:latestTo list training jobs:
gcloud ai custom-jobs list --region=us-central1To deploy a model to an endpoint:
gcloud ai endpoints deploy-model $ENDPOINT_ID \
--model=$MODEL_ID \
--display-name=my_deployment \
--traffic-split=0=100 \
--region=us-central1To get prediction:
gcloud ai endpoints predict $ENDPOINT_ID \
--json-request=instances.json \
--region=us-central1How Vertex AI Interacts with Related Technologies
Cloud Storage: Used for storing training data, model artifacts, and pipeline outputs. Vertex AI automatically assigns a service account with permissions to read/write to the specified bucket.
Cloud Logging and Monitoring: Training logs are sent to Cloud Logging. Vertex AI also exports metrics (e.g., prediction latency, request count) to Cloud Monitoring.
Cloud IAM: Roles like aiplatform.user (full access) and aiplatform.viewer (read-only) control permissions. Custom roles can be created for granular access.
VPC Service Controls: Can be used to restrict data exfiltration from Vertex AI. For example, you can configure a VPC perimeter that includes Vertex AI and Cloud Storage buckets.
Cloud KMS: You can encrypt model artifacts and training data with customer-managed encryption keys (CMEK) by specifying a key in the training or prediction request.
BigQuery: Vertex AI can read data directly from BigQuery for training without needing to export to Cloud Storage first. This is done by specifying the BigQuery source in the training input.
Dataflow: For large-scale data preprocessing, you can use Dataflow (Apache Beam) and store the output in Cloud Storage or BigQuery for ingestion by Vertex AI.
Trap Patterns on the Exam
Confusing AI Platform with Vertex AI: The exam may refer to AI Platform (the predecessor). Know that Vertex AI is the current unified platform, and AI Platform is legacy.
Assuming AutoML is always better: AutoML is great for tabular and vision tasks, but for custom architectures (e.g., NLP transformers), custom training is required.
Misunderstanding Model Registry vs. Model Versions: The registry stores models and their versions, but it does not store the actual model artifacts (they are in Cloud Storage). The registry only holds metadata.
Forgetting that endpoints can be private: Many assume endpoints are always public, but Vertex AI supports private endpoints via Private Service Connect for VPC-peered clients.
Prepare Data and Environment
First, you prepare your data: clean, normalize, and split into training/validation/test sets. Store the data in Cloud Storage or BigQuery. Next, set up a Vertex AI Workbench notebook instance or your local environment with the Vertex AI SDK. Install necessary libraries like tensorflow, pytorch, or sklearn. Authenticate using a service account with appropriate IAM roles (e.g., aiplatform.user). This step is crucial because Vertex AI will access the data from the specified source during training.
Define Training Configuration
You define the training job configuration: machine type (e.g., n1-standard-4), number of replicas, container image (pre-built or custom), hyperparameters, and training budget for AutoML. For custom training, you also specify the training code location (Python script or Docker image). You can optionally set up hyperparameter tuning by defining the parameter space and optimization objective. The configuration is submitted as a JSON or YAML file, or directly via the gcloud command.
Submit Training Job
Submit the training job to Vertex AI using the API, gcloud CLI, or Console. Vertex AI schedules the job on its managed GKE infrastructure. It allocates the requested resources, pulls the container image, and mounts the data. The job runs until completion or timeout. During training, you can monitor logs and metrics in Cloud Logging and Cloud Monitoring. If the job fails, Vertex AI provides error messages in the logs.
Register and Evaluate Model
After training completes, the model artifact is saved to the specified Cloud Storage location. Vertex AI automatically registers the model in the Model Registry with metadata (framework, version, evaluation metrics). You can then evaluate the model using the evaluation results (e.g., accuracy, precision) stored in the registry. If the model meets your criteria, you can promote it to production by setting it as the default version.
Deploy Model to Endpoint
Create an endpoint (if not exists) and deploy the model to it. You specify machine type, min/max replicas, and traffic split. Vertex AI provisions the serving infrastructure (GKE pods) and loads the model. The endpoint gets a DNS name (e.g., us-central1-aiplatform.googleapis.com). You can then send prediction requests to the endpoint. Vertex AI automatically scales the number of replicas based on traffic (default metric: CPU utilization).
Monitor and Retrain
Enable Model Monitoring to detect feature drift or skew. Vertex AI compares prediction requests against a baseline (training data statistics). If drift exceeds a threshold (default 0.2), an alert is sent via Cloud Monitoring. You can set up a pipeline to automatically retrain the model when drift is detected. This step ensures the model remains accurate over time.
Enterprise Scenario 1: Retail Demand Forecasting
A large retailer wants to forecast demand for thousands of SKUs across hundreds of stores. They have historical sales data in BigQuery and want to train a model that predicts weekly demand. Using Vertex AI, they create a pipeline that:
Extracts data from BigQuery and preprocesses it (feature engineering like day-of-week, holiday flags) using Dataflow.
Trains a custom time-series model using TensorFlow on a GPU machine (n1-standard-8 with 1 K80 GPU).
Registers the model and deploys it to an endpoint with autoscaling (min 2, max 10 replicas).
Sets up Model Monitoring to detect drift in feature distributions (e.g., sudden changes in promotion patterns).
In production, the endpoint handles thousands of requests per second. They use private endpoints to keep traffic within their VPC. Common misconfiguration: forgetting to set the VPC-SC perimeter, which could allow data exfiltration.
Enterprise Scenario 2: Medical Image Classification
A healthcare company needs to classify X-ray images for disease detection. They use Vertex AI AutoML Vision with a training budget of 24 hours. The dataset (100,000 images) is stored in a Cloud Storage bucket with CMEK encryption. They deploy the model to an endpoint with a single replica (since traffic is low) but enable autoscaling up to 5 replicas. They use Explainable AI to provide heatmaps showing which parts of the image influenced the prediction (for regulatory compliance).
Performance consideration: AutoML Vision training cost is proportional to the budget (e.g., $20 per hour for training). A misconfiguration is setting the budget too high (72 hours) when the model converges earlier, wasting money. The exam may ask about the default training budget for AutoML Vision (8 hours) and the maximum (72 hours).
Enterprise Scenario 3: Fraud Detection with Custom Model
A financial services company builds a custom fraud detection model using XGBoost. They use Vertex AI Workbench for development, then submit a custom training job with hyperparameter tuning (100 trials). The training data is large (10 TB) and stored in Cloud Storage. They use distributed training with 10 replicas (each with n1-highmem-16). After training, they deploy the model to an endpoint with 2 replicas for low latency (<100ms). They enable Model Monitoring to detect concept drift (e.g., changes in transaction patterns).
Common pitfall: not using a private endpoint, exposing the model to the internet. Also, forgetting to set up IAM permissions for the endpoint to access Cloud Storage for feature engineering.
What GCDL Tests on Vertex AI
The GCDL exam objectives for domain 3 (Data Analytics and AI) include: - 3.3: Explain the capabilities of Google Cloud’s AI Platform (Vertex AI) – This includes understanding the components (Workbench, AutoML, Custom Training, Pipelines, Model Registry, Endpoints, Monitoring). - 3.4: Explain the benefits and use cases of pre-trained APIs – While not Vertex AI per se, the exam may compare pre-trained APIs (Vision API, Natural Language API) with custom models built on Vertex AI.
Common Wrong Answers and Why Candidates Choose Them
"Vertex AI only supports AutoML, not custom training." – Wrong. Vertex AI supports both AutoML and custom training. Candidates confuse Vertex AI with the old AutoML product.
"You must export data from BigQuery to Cloud Storage before training." – Wrong. Vertex AI can read directly from BigQuery. Candidates think of the old AI Platform that required Cloud Storage.
"Model Registry stores the actual model files." – Wrong. The registry stores metadata; the actual model artifacts are in Cloud Storage. Candidates misunderstand the separation.
"Vertex AI endpoints are always public." – Wrong. You can create private endpoints using Private Service Connect. Candidates assume default is the only option.
Specific Numbers and Terms That Appear Verbatim
Default training budget for AutoML tabular: 1 hour (max 72 hours)
Default training budget for AutoML Vision: 8 hours (max 72 hours)
Default machine type for Workbench: n1-standard-4
Default timeout for custom training: 24 hours (max 7 days)
Default monitoring interval: 1 hour
Default drift threshold: 0.2
Default min replicas for endpoints: 2
Default max replicas: 10
Supported frameworks: TensorFlow, PyTorch, scikit-learn, XGBoost
Edge Cases and Exceptions
Regional availability: Not all Vertex AI features are available in all regions. For example, GPU support is limited to certain zones.
Quotas: There are default quotas for training jobs (e.g., 100 concurrent jobs per project) and endpoints (e.g., 50 endpoints per project).
Custom containers: You can bring your own Docker image, but it must follow Vertex AI conventions (e.g., run on port 8080 for prediction).
Prediction request size: Maximum request size is 1.5 MB for online prediction. For larger payloads, use batch prediction.
How to Eliminate Wrong Answers Using the Underlying Mechanism
When you see a question about "unified ML platform," remember that Vertex AI integrates all stages. If an answer separates services (e.g., "use AI Platform for training and AutoML for prediction"), it's likely wrong because Vertex AI is unified. For questions about model deployment, recall that endpoints are separate from models; you deploy a model version to an endpoint. If an answer says "deploy model directly from training," it's incomplete because you need to register it first.
Vertex AI unifies data preparation, training, deployment, and monitoring into a single platform.
AutoML is suitable for standard tasks with minimal code; custom training is for advanced users needing control.
Vertex AI Pipelines orchestrates multi-step ML workflows using Kubeflow Pipelines.
Model Registry stores metadata, not model artifacts (which are in Cloud Storage).
Endpoints can be public or private (Private Service Connect).
Default training budget for AutoML tabular is 1 hour; for AutoML Vision is 8 hours.
Model Monitoring detects feature drift and skew with default threshold of 0.2.
Vertex AI supports reading data directly from BigQuery without exporting to Cloud Storage.
Custom training jobs default timeout is 24 hours (max 7 days).
Prediction request size limit is 1.5 MB for online prediction.
These come up on the exam all the time. Here's how to tell them apart.
AutoML
No code required; just provide data and training budget.
Uses neural architecture search to find the best model.
Limited to supported problem types (tabular, image, text, video).
Training budget default: 1 hour (tabular), 8 hours (vision).
Less control over model architecture and hyperparameters.
Custom Training
Requires custom code (Python, Docker).
Full control over model architecture, hyperparameters, and training loop.
Supports any framework or custom logic.
Can use distributed training with multiple replicas.
More flexible but requires ML expertise.
Mistake
Vertex AI only works with TensorFlow.
Correct
Vertex AI supports multiple frameworks: TensorFlow, PyTorch, scikit-learn, XGBoost, and custom containers. The pre-built training containers include images for each framework. You can also bring your own Docker image for unsupported frameworks.
Mistake
AutoML always produces better models than custom training.
Correct
AutoML is optimized for standard tasks (tabular, image, text) but may not outperform a well-tuned custom model for specialized domains. Custom training gives full control over architecture and hyperparameters. The choice depends on the problem and data.
Mistake
Vertex AI Pipelines is required for all training jobs.
Correct
Pipelines are optional. You can submit a standalone training job without a pipeline. Pipelines are used for complex workflows that involve multiple steps (e.g., data preprocessing, training, evaluation).
Mistake
Model Registry automatically deploys models to endpoints.
Correct
Model Registry only stores metadata. You must explicitly deploy a model version to an endpoint. The registry does not trigger deployment automatically.
Mistake
Vertex AI endpoints can only be accessed from within the same project.
Correct
Endpoints can be accessed from other projects if you grant IAM permissions. Also, private endpoints can be accessed from VPC-peered networks across projects.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Vertex AI is the successor to AI Platform. It unifies all ML services (AutoML, custom training, prediction, pipelines) into one API and console. AI Platform is deprecated and not recommended for new projects. The exam focuses on Vertex AI, but you may see legacy references. Key difference: Vertex AI provides a single endpoint for training and prediction, whereas AI Platform had separate services.
Yes, you can use custom containers for training. Your container must be stored in a container registry (e.g., Artifact Registry or Docker Hub) and must follow Vertex AI conventions: it should accept command-line arguments for the training script, and it should write the model artifact to the path specified by the environment variable AIP_MODEL_DIR. The container must also be compatible with the machine type (e.g., GPU drivers if using GPUs).
Vertex AI endpoints autoscale based on CPU utilization (default target: 60%). You can configure min and max replicas (default min: 2, max: 10). The autoscaler adds or removes replicas gradually to handle traffic spikes. You can also use custom metrics for autoscaling. Note: there is a cold start delay when new replicas are added (typically 30-60 seconds).
The default timeout is 24 hours, but you can configure it up to 7 days (168 hours). After the timeout, the job is automatically cancelled. For distributed training, the timeout applies to the entire job, not individual replicas. If you need longer training, consider splitting into multiple jobs or using checkpointing.
To create a private endpoint, you must enable Private Service Connect (PSC) for the endpoint. During endpoint creation, specify 'private' as the access type and provide a VPC network. The endpoint will get an internal IP address in your VPC. Clients within the same VPC or connected via VPC peering can access it without going through the public internet. You also need to configure DNS and firewall rules.
Online prediction is for real-time, low-latency requests (e.g., <100ms). It uses deployed endpoints with autoscaling. Batch prediction is for processing large datasets asynchronously. It does not require an endpoint; you submit a batch prediction job that reads input from Cloud Storage or BigQuery and writes results to Cloud Storage. Online prediction has a 1.5 MB request size limit; batch prediction has no such limit.
Yes, but they are separate services. BigQuery ML allows you to create and train models using SQL in BigQuery. Vertex AI can import BigQuery ML models and deploy them to endpoints for online prediction. Alternatively, you can train models in Vertex AI and use them in BigQuery for batch prediction via remote models.
You've just covered Vertex AI Platform Deep Dive — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.
Done with this chapter?