GCDLChapter 10 of 101Objective 3.3

Vertex AI and Generative AI

This chapter covers Vertex AI and Generative AI on Google Cloud, a core topic for the GCDL exam (Domain: Data Analytics AI, Objective 3.3). Approximately 15-20% of exam questions touch this area, focusing on the capabilities of Vertex AI, the generative AI workflow, and the use of foundation models. We'll explore the key components: Vertex AI Studio, Model Garden, AutoML, custom training, and the AI Platform's integration with generative AI services like PaLM 2 and Gemini. Understanding these tools and their appropriate use cases is critical for the Digital Leader role, which requires making informed recommendations about AI solutions.

25 min read
Intermediate
Updated May 31, 2026

AI Studio as a Master Chef's Kitchen

Imagine a professional kitchen where a master chef (the ML engineer) can create any dish (AI model) using an array of pre-prepped ingredients (pre-trained models), precise recipes (prompts), and specialized tools (Model Garden, AutoML). The chef doesn't need to grow vegetables from seed or butcher a whole cow – those are provided ready to use. For a custom dish, the chef can tweak the recipe (fine-tuning) or combine ingredients (multimodal models). The kitchen has a pantry (Model Garden) with hundreds of curated ingredients, each labeled with its origin, best uses, and nutritional info (model cards). The chef can experiment with small batches (prototyping in Vertex AI Studio) before cooking for a banquet (deployment). The kitchen also has a food safety inspector (safety filters) that checks every dish for allergens (harmful content) before serving. This analogy is mechanistic: each component maps directly to a Vertex AI capability – pre-trained models are base ingredients, prompts are recipes, fine-tuning is recipe modification, and safety filters are quality gates. The exam tests understanding of which 'ingredient' to use for which 'dish' (use case), not just that the kitchen exists.

How It Actually Works

What is Vertex AI and Generative AI?

Vertex AI is Google Cloud's unified machine learning (ML) platform. It integrates data preparation, model training, deployment, and monitoring into a single service. For the GCDL exam, you need to understand that Vertex AI abstracts away much of the infrastructure complexity, allowing organizations to build, deploy, and scale ML models faster. Generative AI refers to models that generate new content – text, images, code, audio, video – based on prompts. Vertex AI provides access to Google's foundation models (PaLM 2, Gemini, Imagen) and tools to customize them.

Key Components and Their Roles

Vertex AI Studio: A web-based interface for prototyping and testing generative AI models. It includes a prompt library, model comparison tools, and safety settings. The exam tests that Studio is used for early experimentation, not production deployment.

Model Garden: A curated repository of foundation models, including Google's own (PaLM 2 for text, Gemini for multimodal, Codey for code, Imagen for images) and third-party models (e.g., Anthropic's Claude, open-source models). Model Garden provides model cards with details on capabilities, limitations, and pricing. The exam emphasizes that Model Garden is the central place to discover and select models.

Generative AI Studio (part of Vertex AI Studio): Specifically for generative AI workflows. It includes: - Prompt Designer: For text prompts with parameters like temperature (0-1), top-k, top-p, and max output tokens. - Tuning: For supervised fine-tuning of foundation models using your own labeled data. The exam tests that fine-tuning is for adapting a model to a specific task (e.g., summarization of medical documents) and requires a dataset of at least 100 examples. - Safety Filters: Configurable thresholds to block harmful content based on categories like hate speech, harassment, sexually explicit, and dangerous content. Each category has a slider from 1 (most restrictive) to 5 (least restrictive). The exam expects you to know the default is 2 for most categories and that you can adjust per use case.

AutoML: For training custom models without writing code. It automates architecture search, hyperparameter tuning, and feature engineering. AutoML supports tabular, image, video, and text data. The exam tests that AutoML is appropriate when you have labeled data but lack ML expertise, and that it is not for generative AI (which uses foundation models).

Custom Training: For experienced ML engineers who want full control using TensorFlow, PyTorch, or scikit-learn. Vertex AI provides managed notebooks, distributed training, and hyperparameter tuning jobs. The exam tests that custom training is the most flexible but requires more effort.

How Generative AI Works on Vertex AI

The workflow for generative AI on Vertex AI is: 1. Select a foundation model from Model Garden. For example, for text generation, you might choose PaLM 2 for Bison (text-bison@002) or Gemini Pro (gemini-pro). 2. Prototype in Generative AI Studio – write prompts, adjust parameters, test with sample inputs. 3. Optional: Tune the model using your own dataset. This creates a tuned model version that you can deploy. 4. Deploy the model to an endpoint for serving predictions. Vertex AI handles scaling, load balancing, and monitoring. 5. Integrate via the Vertex AI SDK or REST API.

Important exam details: - Temperature: Controls randomness. Low temperature (0.1) makes output more deterministic; high (0.9) makes it more creative. Default is 0.0 for PaLM 2 text models. - Max output tokens: Limits response length. Default is 256 for text-bison@002, max 1024. - Top-k and top-p: Token sampling strategies. Top-k picks from the k most likely tokens; top-p picks from tokens with cumulative probability p. Default top-p is 0.95. - Context window: The number of tokens the model can consider. For Gemini Pro, it's 32,768 tokens (about 25,000 words). For PaLM 2 Bison, it's 8,192 tokens.

Safety Filters and Responsible AI

Vertex AI enforces safety filters at the API level. The exam tests that you can configure thresholds per request or per project. Categories and default thresholds: - Hate speech: Default threshold 2 - Harassment: Default threshold 2 - Sexually explicit: Default threshold 2 - Dangerous content: Default threshold 2

Thresholds range from 1 (most restrictive) to 5 (least restrictive). If the model's response exceeds the threshold, it returns a safety error. You can also set blocklists for specific words or phrases.

Pricing and Quotas

Pay-as-you-go: Per token for input and output. For text-bison@002, input is $0.0005 per 1,000 characters, output is $0.0005 per 1,000 characters. For Gemini Pro, input is $0.00025 per 1,000 characters, output is $0.0005 per 1,000 characters.

Tuning: Additional cost per hour of training time.

Quotas: Requests per minute (RPM) and tokens per minute (TPM) limits per model per region. Default is 300 RPM for text-bison@002 in us-central1.

Integration with Other Google Cloud Services

BigQuery: Store and query data for training or evaluation.

Cloud Storage: Store training data, model artifacts, and predictions.

Cloud Functions/Cloud Run: Trigger model predictions on events.

Cloud Logging and Monitoring: Track model performance and errors.

Vertex AI Pipelines: Orchestrate ML workflows using Kubeflow Pipelines.

The exam tests that you understand these integrations, especially that BigQuery is used for large-scale data analysis and Vertex AI for ML modeling.

Common Exam Scenarios

When to use AutoML vs. custom training: AutoML for non-experts with labeled data; custom training for experts needing custom architectures.

When to use a foundation model vs. train from scratch: Always prefer a foundation model unless you have a very unique use case with no suitable pre-trained model.

When to fine-tune vs. prompt engineering: Fine-tune for domain-specific tasks (e.g., legal document analysis) with at least 100-500 examples; prompt engineering for general tasks with examples in the prompt (few-shot learning).

Safety filter configuration: Adjust per category based on application risk tolerance. For a children's app, set thresholds to 1; for an adult content platform, set to 5.

Walk-Through

1

Select foundation model from Model Garden

Navigate to Vertex AI > Model Garden. Browse or search for a model that fits your use case. For text generation, you might select PaLM 2 for Bison (text-bison@002). Review the model card for details on capabilities, pricing, and limitations. Note that Model Garden includes both Google and third-party models. The exam tests that you choose the right model for the task: PaLM 2 for text, Codey for code, Imagen for images, Gemini for multimodal.

2

Prototype in Generative AI Studio

Open Generative AI Studio from the Vertex AI console. Create a new prompt in Prompt Designer. Write your prompt (e.g., 'Summarize the following article: ...'). Set parameters: temperature (default 0.0), top-k, top-p, max output tokens, and safety filters. Click 'Submit' to see the model's response. Iterate on the prompt to improve results. This step is for experimentation; production use requires the API or SDK.

3

Tune the model (optional)

If the base model's responses are not specific enough, you can fine-tune it with your own data. In Generative AI Studio, go to the Tuning tab. Upload a dataset in JSONL format with prompt-response pairs. Specify the number of training steps (default 100) and learning rate. Start the tuning job. Vertex AI creates a tuned model version. The exam requires at least 100 examples for effective tuning. Tuning costs are based on training duration.

4

Deploy the model to an endpoint

Once satisfied with the prototype or tuned model, deploy it to a Vertex AI endpoint. In the console, go to Models, select the model version, and click 'Deploy to Endpoint'. Choose the machine type (e.g., n1-standard-4), scaling options (min/max nodes), and enable logging. The endpoint provides a REST API endpoint URL. The exam tests that deployment is the step that makes the model available for production predictions.

5

Integrate with application via API

Use the Vertex AI SDK or REST API to send prediction requests to the endpoint. For example, using Python SDK: ```python from google.cloud import aiplatform aiplatform.init(project='my-project', location='us-central1') endpoint = aiplatform.Endpoint('projects/.../locations/.../endpoints/...') response = endpoint.predict(instances=[{'prompt': 'Your prompt here'}]) print(response.predictions) ``` The exam tests that you know the API exists and that you need to initialize the client with project and location.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Chatbot

A large e-commerce company wants to build a chatbot to handle customer inquiries about orders, returns, and product information. They have a large dataset of historical support tickets. Using Vertex AI, they select the PaLM 2 for Chat model (chat-bison@002) from Model Garden. They prototype in Generative AI Studio, crafting prompts that include examples of good responses. They fine-tune the model on their support ticket data (10,000 examples) to improve domain-specific accuracy. After tuning, they deploy the model to an endpoint with autoscaling (min 2, max 10 nodes) to handle variable traffic. They integrate with their existing web app via the Vertex AI API. Safety filters are set to threshold 2 for all categories to prevent offensive responses. A common misconfiguration is setting temperature too high (e.g., 0.9), causing the chatbot to give creative but incorrect answers. The correct range is 0.1-0.3 for factual responses. The exam might ask about the appropriate model (chat-bison vs text-bison) or the need for fine-tuning.

Enterprise Scenario 2: Medical Report Summarization

A healthcare provider needs to summarize lengthy medical reports into concise summaries for doctors. They use Vertex AI's PaLM 2 for Text (text-bison@002). They cannot fine-tune due to data privacy concerns (PHI), so they rely on prompt engineering with few-shot examples. They set temperature to 0.0 for deterministic output. Max output tokens is set to 500 to limit summary length. Safety filters are critical: they set the 'dangerous content' threshold to 1 (most restrictive) because medical advice must be accurate. They deploy the model to a private endpoint with VPC-SC (Service Controls) for data isolation. The exam tests that fine-tuning is not always possible due to data sensitivity, and that prompt engineering with safety filters is a valid alternative.

Scenario 3: Image Generation for Marketing

A marketing agency uses Vertex AI's Imagen to generate product images. They select Imagen from Model Garden. In Generative AI Studio, they write prompts like 'A modern office chair with a sleek design, photorealistic, 4K'. They adjust parameters like aspect ratio (1:1, 4:3, 16:9) and number of images (1-8). Safety filters are set to threshold 3 for 'sexually explicit' to allow some artistic nudity but block explicit content. They deploy the model via an endpoint and integrate with their content management system. A common mistake is using a text model for image generation – the exam tests that you must use Imagen for images.

How GCDL Actually Tests This

Exactly What GCDL Tests on Vertex AI and Generative AI (Objective 3.3)

The GCDL exam focuses on high-level understanding, not implementation details. You must know:

The purpose of each Vertex AI component (Studio, Model Garden, AutoML, custom training).

When to use a foundation model vs. AutoML vs. custom training.

The difference between prompt engineering and fine-tuning.

The role of safety filters and how to adjust them.

The types of foundation models available (text, code, image, multimodal).

Integration with other Google Cloud services (BigQuery, Cloud Storage).

Common Wrong Answers and Why Candidates Choose Them

1.

Wrong: 'Vertex AI is only for custom-trained models.' Candidates confuse Vertex AI with older AI Platform. Reality: Vertex AI includes pre-trained foundation models and AutoML.

2.

Wrong: 'Fine-tuning requires thousands of examples.' Reality: Minimum is 100 examples, though 500+ is recommended. The exam tests the minimum.

3.

Wrong: 'Safety filters are optional and can be disabled.' Reality: Safety filters are always on by default; you can adjust thresholds but not disable them entirely.

4.

Wrong: 'AutoML is the best choice for generative AI.' Reality: AutoML is for custom models, not generative AI. Use foundation models for generative tasks.

Specific Numbers and Terms That Appear on the Exam

Default temperature: 0.0 for text-bison@002.

Context window: 8,192 tokens for PaLM 2 Bison, 32,768 for Gemini Pro.

Minimum fine-tuning examples: 100.

Safety filter thresholds: 1-5, default 2.

Model Garden vs. Generative AI Studio: Model Garden for discovery, Studio for prototyping.

Edge Cases the Exam Loves

Multimodal models: Gemini can process text, images, audio, and video. The exam may ask which model to use for a task involving images and text.

Code generation: Use Codey models (code-bison, codechat-bison), not text-bison.

Regional availability: Some models are only available in certain regions (e.g., us-central1, europe-west4).

How to Eliminate Wrong Answers

If the scenario involves generating text, image, or code from a prompt, the answer is a foundation model, not AutoML.

If the scenario requires adapting a model to a specific domain with labeled data, the answer is fine-tuning.

If the scenario is about experimenting with prompts, the answer is Generative AI Studio.

If the scenario mentions data privacy and no training, the answer is prompt engineering with a foundation model.

Key Takeaways

Vertex AI is a unified ML platform: data preparation, training, deployment, monitoring.

Model Garden is the repository for foundation models; Generative AI Studio is for prototyping.

Foundation models (PaLM 2, Gemini, Codey, Imagen) are pre-trained and can be used via prompts.

Fine-tuning requires at least 100 examples and is for domain adaptation.

Safety filters are always on; default threshold is 2 for all categories (1-5 scale).

AutoML is for custom models with your own data, not for generative AI.

Gemini is multimodal (text, image, audio, video) with 32K token context window.

Codey models are specifically for code generation and chat.

Integration with BigQuery, Cloud Storage, and Cloud Logging is common.

Pricing is per token; quotas exist at project and model level.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Vertex AI AutoML

No coding required; uses automated ML techniques.

Best for non-experts with labeled data.

Supports tabular, image, text, and video data.

Limited model architectures; no custom layers.

Faster to train and deploy.

Vertex AI Custom Training

Full control over model architecture and training.

Requires ML expertise and coding (TensorFlow, PyTorch, etc.).

Supports any framework and custom algorithms.

Can use distributed training and hyperparameter tuning.

More flexible but more effort.

Watch Out for These

Mistake

Vertex AI is just a rebranding of Cloud AutoML.

Correct

Vertex AI is a unified platform that includes AutoML, but also provides custom training, model deployment, MLOps, and access to foundation models. AutoML is only one component.

Mistake

You must fine-tune a foundation model to use it.

Correct

Foundation models can be used out-of-the-box via prompt engineering. Fine-tuning is optional and only needed for domain-specific adaptation.

Mistake

Safety filters block all harmful content automatically.

Correct

Safety filters are configurable per category with thresholds 1-5. The default threshold 2 blocks most harmful content, but you can adjust it based on your application's risk tolerance.

Mistake

Generative AI Studio is the only way to use foundation models.

Correct

Generative AI Studio is for prototyping. Production use requires the Vertex AI SDK or REST API to call the model endpoint.

Mistake

AutoML can be used for generative AI tasks like text generation.

Correct

AutoML is for custom models (e.g., classification, regression) using your own data. For generative AI, you use foundation models from Model Garden.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Vertex AI and AI Platform?

Vertex AI is the successor to AI Platform, unifying all ML services into one platform. AI Platform is legacy. Vertex AI includes additional features like Model Garden, Generative AI Studio, and integration with foundation models. For the exam, always refer to Vertex AI.

When should I use prompt engineering vs. fine-tuning?

Use prompt engineering (including few-shot examples) when the foundation model already performs well on your task with just a prompt. Use fine-tuning when you need the model to learn domain-specific patterns or terminology, and you have at least 100 labeled examples. Fine-tuning is more expensive and time-consuming.

Can I use Vertex AI with my existing TensorFlow models?

Yes. Vertex AI supports custom training with TensorFlow, PyTorch, and scikit-learn. You can bring your own container or use pre-built containers. Vertex AI also supports model deployment with these frameworks.

How do safety filters work in Vertex AI?

Safety filters evaluate model output against four categories: hate speech, harassment, sexually explicit, and dangerous content. Each category has a threshold from 1 (most restrictive) to 5 (least restrictive). If the output exceeds the threshold, the API returns a safety error. You can set thresholds per request or at the project level.

What is the context window and why does it matter?

The context window is the maximum number of tokens the model can consider when generating a response. For PaLM 2 Bison, it's 8,192 tokens; for Gemini Pro, it's 32,768 tokens. A larger context window allows the model to handle longer documents or conversations. If your input exceeds the context window, you must truncate or split it.

Can I use Vertex AI for real-time predictions?

Yes. Deploy a model to an endpoint for real-time predictions. Vertex AI automatically scales based on traffic. You can also use batch prediction for asynchronous requests.

What is the difference between text-bison and chat-bison?

text-bison is for text generation tasks (summarization, classification, etc.) with a single prompt. chat-bison is for multi-turn conversations, where you provide a history of messages. chat-bison is optimized for dialogue use cases.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Vertex AI and Generative AI — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Done with this chapter?