Knowledge + Practice

CCNA Fundamentals of Generative AI Questions

49 of 124 questions · Page 2/2 · Fundamentals of Generative AI · Answers revealed

Practice these questions Domain overview All questions

76

MCQeasy

A company is using Vertex AI to deploy a text generation model for a chatbot. They want to reduce the response latency. Which configuration change is most effective?

A.Enable model quantization

B.Use a smaller model variant

C.Increase the number of GPUs

D.Use a larger batch size

AnswerB

Smaller models have faster inference, directly reducing latency.

Why this answer

Option B is correct because using a smaller model variant directly reduces the number of parameters and computational operations required per inference, which lowers latency. In Vertex AI, smaller models like `text-bison@002` have fewer layers and attention heads than larger counterparts, resulting in faster token generation without requiring hardware changes.

Exam trap

Google Cloud often tests the misconception that increasing compute resources (GPUs) or batch size always reduces latency, when in fact these optimizations target throughput, not per-request response time.

How to eliminate wrong answers

Option A is wrong because model quantization (e.g., reducing weights from FP32 to INT8) can reduce memory footprint and improve throughput, but it does not guarantee lower latency per request and may introduce accuracy trade-offs; it is not the most effective single change for latency reduction. Option C is wrong because increasing the number of GPUs can improve throughput for batch processing but does not reduce per-request latency; in fact, it may increase communication overhead and cost without speeding up individual inference. Option D is wrong because using a larger batch size increases throughput for concurrent requests but actually increases the latency for each individual request, as the model processes more sequences together before returning results.

Practice this question →

77

MCQhard

You are a generative AI architect at a social media company. You are tasked with building a content moderation system that uses a generative model to flag toxic comments. The system must have very low false positive rates (i.e., not flag harmless comments) to avoid user backlash, but it must catch nearly all toxic comments. You have a large dataset of labeled toxic and non-toxic comments. You plan to use a pre-trained LLM and fine-tune it for classification. During experimentation, you notice that the model's recall for toxic comments is high (95%) but its precision is low (60%), leading to many false positives. You need to improve precision without substantially reducing recall. Which approach should you try first?

A.Gather additional toxic comments from similar platforms to augment the training data.

B.Apply a higher weight to the toxic class in the loss function during fine-tuning.

C.Use a smaller pre-trained model that is inherently less sensitive to subtle toxic language.

D.Tune the classification threshold on a held-out validation set to a higher value (e.g., require higher probability to classify as toxic).

AnswerD

Increasing the threshold reduces false positives (improves precision) with some loss in recall, which can be fine-tuned.

Why this answer

Option D is correct because threshold tuning is a straightforward, post-hoc method to trade recall for precision by raising the decision threshold. Option A is incorrect because adding more toxic samples might increase recall but not necessarily precision. Option B is incorrect because a smaller model might have less capacity to distinguish nuance, worsening precision.

Option C is incorrect because class weighting can improve recall but may hurt precision.

Practice this question →

78

Multi-Selecteasy

Which TWO are components of the Vertex AI Generative AI Studio?

Select 2 answers

A.Dataflow

B.Model Garden

C.Pipeline templates

D.Cloud Functions

E.Prompt Editor

AnswersB, E

Model Garden is a component for discovering and selecting models.

Why this answer

Model Garden is a core component of Vertex AI Generative AI Studio that provides a curated repository of foundation models, including Google's PaLM and Gemini models, as well as third-party models. It allows users to discover, compare, and deploy these models directly within the studio environment, making it essential for generative AI workflows.

Exam trap

Google Cloud often tests the distinction between core generative AI studio components (like Model Garden and Prompt Editor) and broader GCP services (like Dataflow or Cloud Functions) that are not part of the studio, leading candidates to select familiar but incorrect options.

Practice this question →

79

Multi-Selectmedium

Which TWO are benefits of using retrieval-augmented generation (RAG) over fine-tuning?

Select 2 answers

A.No need for training

B.Higher accuracy on all tasks

C.More up-to-date information

D.Reduced model size

E.Lower latency

AnswersA, C

RAG does not require fine-tuning; it works with the base model plus retrieval.

Why this answer

Option A is correct because RAG does not require any training or fine-tuning of the underlying model. It works by retrieving relevant documents from an external knowledge base at inference time and providing them as context to the model, which generates an answer based on that context. This eliminates the need for costly and time-consuming model retraining or parameter updates.

Exam trap

Google Cloud often tests the misconception that RAG reduces latency or model size, when in fact it increases system complexity and inference time due to the retrieval step, while fine-tuning keeps the model unchanged in size and latency.

Practice this question →

80

MCQeasy

A company wants to build a chatbot that answers questions using their internal knowledge base. Which approach is most suitable?

A.Use Retrieval-Augmented Generation (RAG)

B.Fine-tune a model on the knowledge base

C.Train a new model from scratch

D.Use zero-shot prompting with no context

AnswerA

RAG retrieves relevant context and generates answers, perfect for knowledge base Q&A.

Why this answer

Retrieval-Augmented Generation (RAG) combines retrieval of relevant documents from a knowledge base with generative responses, making it ideal for this use case.

Practice this question →

81

MCQmedium

Refer to the exhibit. A data scientist runs the gcloud command and sees the model listed. However, when they try to deploy the model to an endpoint, they get an error: 'Model is not deployable'. What is the most likely reason?

A.The model is still in training and not yet ready.

B.The model was imported from a custom container but without a serving specification or artifact.

C.The model does not have the correct IAM permissions assigned to the deployment service account.

D.The region for the endpoint is different from the model's region.

AnswerB

A model must have a serving container or artifacts to be deployable.

Why this answer

Option B is correct because a model imported from a custom container must include a serving specification (e.g., a `predict` route) and an artifact (e.g., a saved model file) to be deployable. Without these, Vertex AI cannot determine how to serve predictions, resulting in the 'Model is not deployable' error. The `gcloud` command listing the model only confirms its registration, not its readiness for deployment.

Exam trap

Google Cloud often tests the misconception that a model listed in the registry is automatically deployable, but the trap here is that Vertex AI separates model registration from deployment readiness, requiring explicit serving configuration for custom containers.

How to eliminate wrong answers

Option A is wrong because if the model were still in training, it would not appear in the model list via `gcloud`; Vertex AI only registers a model after training completes. Option C is wrong because IAM permissions affect the deployment action itself (e.g., who can deploy), not the deployability status of the model; the error 'Model is not deployable' is a model-level validation, not an authorization failure. Option D is wrong because region mismatch between the endpoint and model would cause a resource-location error, not a 'Model is not deployable' error; Vertex AI enforces regional consistency but does not block deployment based on region alone.

Practice this question →

82

Multi-Selectmedium

Which TWO of the following are best practices for prompt engineering?

Select 2 answers

A.Provide context and examples in the prompt

B.Append random noise to prompts to improve creativity

C.Use clear and specific instructions

D.Always use the maximum possible number of tokens

E.Use negative prompts to discourage undesired outputs

AnswersA, C

Context and examples help the model understand the desired output.

Why this answer

Clear and specific instructions help guide the model, and providing context and examples improves output quality. Options B, D, and E are not recommended.

Practice this question →

83

MCQeasy

A data scientist needs to fine-tune a foundation model for a sentiment analysis task without managing infrastructure. Which Google Cloud service should they use?

A.Compute Engine

B.BigQuery ML

C.Cloud Run

D.Vertex AI Model Garden

AnswerD

Model Garden offers managed fine-tuning of foundation models without infrastructure overhead.

Why this answer

Vertex AI Model Garden is the correct service because it provides a curated hub of foundation models that can be fine-tuned with managed infrastructure, eliminating the need for the data scientist to provision or manage servers. It supports one-click deployment and fine-tuning workflows for sentiment analysis, directly addressing the requirement to avoid infrastructure management.

Exam trap

The trap here is that candidates often confuse BigQuery ML's ability to train models on tabular data with the capability to fine-tune large language models, but BigQuery ML does not support fine-tuning of foundation models for NLP tasks.

How to eliminate wrong answers

Option A is wrong because Compute Engine is an IaaS offering that requires the user to manually provision, configure, and manage virtual machines, which contradicts the requirement of not managing infrastructure. Option B is wrong because BigQuery ML is designed for creating and executing machine learning models using SQL queries on structured data in BigQuery, not for fine-tuning large foundation models for natural language tasks like sentiment analysis. Option C is wrong because Cloud Run is a serverless container platform for running stateless HTTP-driven applications, but it does not provide native support for fine-tuning foundation models; it would require the user to build and manage the fine-tuning pipeline themselves.

Practice this question →

84

MCQmedium

A company fine-tunes a model using Vertex AI and notices the model's performance drops on the original training task (e.g., language understanding) after fine-tuning for a new task (e.g., summarization). What could be the cause?

A.Data leakage

B.Model quantization

C.Catastrophic forgetting

D.Underfitting

AnswerC

Fine-tuning on a narrow task can overwrite general knowledge, leading to performance degradation on the original task.

Why this answer

Catastrophic forgetting occurs when a neural network loses previously learned knowledge upon being fine-tuned on a new task. In this scenario, fine-tuning the model for summarization overwrites the weights responsible for language understanding, causing performance degradation on the original task. This is a well-known limitation of sequential fine-tuning in deep learning.

Exam trap

Google Cloud often tests the distinction between catastrophic forgetting and underfitting, as candidates may mistakenly think the model simply didn't learn the new task well, rather than recognizing that it forgot the original task due to weight overwriting.

How to eliminate wrong answers

Option A is wrong because data leakage refers to the inadvertent exposure of target information during training, which would typically inflate performance metrics rather than cause a drop on the original task. Option B is wrong because model quantization reduces numerical precision (e.g., from FP32 to INT8) to improve inference speed and memory efficiency, but it does not inherently cause performance loss on a previously learned task; any accuracy loss from quantization is generally uniform across tasks. Option D is wrong because underfitting means the model fails to capture patterns in the training data, resulting in poor performance on both the original and new tasks, not a selective drop on the original task after fine-tuning.

Practice this question →

85

MCQhard

A developer uses the Vertex AI Python SDK to call a Gemini model for structured JSON output. However, the model often returns malformed JSON. Which parameter should the developer set in the generation configuration to enforce valid JSON output?

A.Set the temperature to a lower value (0.1) to reduce variation.

B.Set the 'response_mime_type' parameter to 'application/json'.

C.Include few-shot examples of the desired JSON format in the system prompt.

D.Switch to a smaller model to reduce complexity.

AnswerB

This parameter forces the model to output valid JSON, supported by Gemini.

Why this answer

Option B is correct because setting `response_mime_type` to `'application/json'` in the generation configuration instructs the Gemini API to constrain the model's output to valid JSON format. This parameter leverages the model's native structured output capability, ensuring the response adheres to JSON syntax without relying on post-processing or prompt engineering.

Exam trap

Google Cloud often tests the misconception that prompt engineering (e.g., few-shot examples or temperature tuning) can reliably enforce structured output, when in fact the correct approach is to use the API's native structured output parameter like `response_mime_type`.

How to eliminate wrong answers

Option A is wrong because lowering temperature reduces randomness but does not enforce structural constraints; the model can still produce malformed JSON due to token-level deviations. Option C is wrong because few-shot examples in the system prompt improve formatting consistency but do not guarantee valid JSON output, as the model may still generate syntax errors or deviate from the schema. Option D is wrong because switching to a smaller model reduces capacity and may increase the likelihood of malformed output, and model size does not address the need for structured output enforcement.

Practice this question →

86

Multi-Selecthard

Which THREE of the following are potential risks when deploying generative AI?

Select 3 answers

A.Hallucinations

B.Memorization of sensitive training data

C.Bias and fairness issues

D.Increased model accuracy

E.Toxic or harmful content generation

AnswersA, C, E

Models can generate false or fabricated information.

Why this answer

Option A is correct because generative AI models, particularly large language models (LLMs), can produce plausible-sounding but factually incorrect or nonsensical outputs, known as hallucinations. This occurs due to the model's probabilistic nature and lack of true understanding, where it generates text based on learned patterns rather than verified facts.

Exam trap

Google Cloud often tests the distinction between risks and benefits, so the trap here is that candidates may mistakenly identify 'increased model accuracy' as a risk, when it is actually a performance improvement and not a deployment risk.

Practice this question →

87

MCQeasy

A startup is building a customer support chatbot using Vertex AI and wants to ground responses in their product documentation to reduce hallucinations. Which approach should they use?

A.Enable Vertex AI Grounding with a custom enterprise data store containing the documentation.

B.Use the Codey API for text generation.

C.Use the base model without any grounding to maximize flexibility.

D.Fine-tune the model on the documentation and deploy.

AnswerA

Grounding ties responses to specific documents, reducing hallucinations.

Why this answer

Vertex AI Grounding with a custom enterprise data store is the correct approach because it allows the chatbot to retrieve and cite specific chunks from the product documentation in real time, directly reducing hallucinations by constraining responses to verified content. This method uses the underlying grounding service to query a vector-based data store (powered by Vertex AI Search) and append source references to the model's output, ensuring factual accuracy without retraining.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the best way to incorporate domain knowledge, but the trap here is that fine-tuning does not provide dynamic, verifiable grounding with citations, whereas Vertex AI Grounding with a custom data store does, making it the correct choice for reducing hallucinations in a retrieval-augmented generation use case.

How to eliminate wrong answers

Option B is wrong because the Codey API is designed for code generation tasks (e.g., code completion, chat), not for grounding responses in external documents; it lacks the retrieval-augmented generation (RAG) capabilities needed to reduce hallucinations from product documentation. Option C is wrong because using a base model without grounding maximizes flexibility but also maximizes the risk of hallucination, as the model relies solely on its training data and cannot verify facts against the documentation. Option D is wrong because fine-tuning the model on the documentation embeds the content into the model's weights, which is static, costly to update, and does not provide real-time citation or retrieval; it also risks overfitting and does not leverage Vertex AI's built-in grounding infrastructure for dynamic fact-checking.

Practice this question →

88

MCQeasy

A prompt engineer wants to improve the model's adherence to a specific output format (e.g., always start with a greeting). Which technique should they try first?

A.Use a lower temperature to make the output more deterministic.

B.Fine-tune the model on many examples of the desired format.

C.Include a system instruction at the beginning of the prompt that specifies the desired format.

D.Modify the model's tokenizer to encode the format rules.

AnswerC

System instructions set global behavior and are the easiest first step.

Why this answer

Option C is correct because system instructions are the most direct and efficient method to enforce output formatting in large language models. By placing a clear directive at the beginning of the prompt (e.g., 'Always start your response with a greeting'), the model's attention mechanism is guided to prioritize this rule during generation, without requiring retraining or hyperparameter changes.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning (like temperature) can enforce structural output rules, when in fact it only controls randomness, not format adherence.

How to eliminate wrong answers

Option A is wrong because lowering temperature reduces randomness but does not enforce a specific structural rule like starting with a greeting; it only makes token selection more deterministic, which could still produce varied formats. Option B is wrong because fine-tuning is a resource-intensive process that requires a curated dataset and retraining, making it an overkill for a simple formatting constraint that can be achieved with a prompt instruction. Option D is wrong because modifying the tokenizer would alter how input text is split into tokens, not how the model adheres to output format rules; tokenizers have no mechanism to enforce generation constraints.

Practice this question →

89

MCQhard

A team is training a custom foundation model using JAX on TPUs on Google Cloud. They encounter frequent Out of Memory (OOM) errors. Which action is most effective in resolving the OOM error?

A.Reduce the model size by decreasing the number of layers.

B.Increase the batch size to maximize TPU utilization.

C.Use mixed precision training (bfloat16) to reduce memory footprint.

D.Enable model parallelism using GSPMD to distribute the model across TPU cores.

AnswerD

Model parallelism directly addresses memory constraints by partitioning the model.

Why this answer

Option D is correct because OOM errors when training large foundation models on TPUs often stem from the model exceeding the memory of a single TPU core. GSPMD (Generalized SPMD) enables automatic model parallelism, sharding the model's parameters, gradients, and optimizer states across multiple TPU cores, thereby reducing per-core memory pressure without altering the model architecture or precision.

Exam trap

Google Cloud often tests the misconception that mixed precision (bfloat16) alone is sufficient to resolve OOM errors, when in fact for very large models the memory bottleneck is the model size itself, not just the precision, and model parallelism is required.

How to eliminate wrong answers

Option A is wrong because reducing the number of layers changes the model architecture and may degrade model quality; it is a workaround, not a systematic solution to memory management. Option B is wrong because increasing the batch size increases memory consumption for activations and gradients, exacerbating OOM errors rather than resolving them. Option C is wrong because while mixed precision training (bfloat16) halves the memory footprint of tensors, it does not address the fundamental issue of a model being too large to fit on a single TPU core; it only provides a constant-factor reduction and may still result in OOM for very large models.

Practice this question →

90

MCQeasy

Refer to the exhibit. A developer sees this error when trying to call a Vertex AI endpoint for online prediction. What permission does the requesting identity need to be granted?

A.aiplatform.prediction.predict

B.aiplatform.endpoints.predict

C.aiplatform.endpoints.use

D.aiplatform.models.predict

AnswerB

The error explicitly states this permission is required.

Why this answer

The error occurs when calling a Vertex AI endpoint for online prediction, which requires the `aiplatform.endpoints.predict` permission. This permission is specifically scoped to the endpoint resource, allowing the identity to send prediction requests to a deployed model endpoint. The correct IAM role binding must include this permission for the requesting identity to successfully invoke the endpoint.

Exam trap

Google Cloud often tests the distinction between permissions scoped to endpoints versus models, and candidates mistakenly choose `aiplatform.models.predict` because they think prediction is always tied to the model, not the endpoint serving it.

How to eliminate wrong answers

Option A is wrong because `aiplatform.prediction.predict` is not a valid IAM permission in Vertex AI; the correct permission for prediction is scoped to the endpoint or model resource, not a generic 'prediction' service. Option C is wrong because `aiplatform.endpoints.use` does not exist as a permission; Vertex AI uses `aiplatform.endpoints.predict` for invoking predictions on endpoints. Option D is wrong because `aiplatform.models.predict` is a permission for calling prediction directly on a model resource, not on an endpoint, and the error specifically references an endpoint call, not a model call.

Practice this question →

91

MCQmedium

A healthcare company is using Vertex AI to build a generative AI assistant that helps doctors draft clinical notes. The assistant uses a fine-tuned PaLM 2 model deployed on a private endpoint. Recently, doctors have reported that the assistant takes over 30 seconds to respond, causing workflow delays. Additionally, the monthly Vertex AI costs have increased by 40% without a proportional increase in usage. The model responses are generally accurate but sometimes include irrelevant details. The company wants to improve response time and cost while maintaining acceptable quality. A review of logs shows that most requests are for similar note types (e.g., progress notes, discharge summaries) and that the same prompt is used repeatedly with minor variations. What should the company do first?

A.Switch to a larger model (e.g., Gemini 1.5 Pro) to improve response quality and reduce irrelevant details

B.Increase the Vertex AI endpoint's maximum request quota to handle concurrent requests

C.Apply model quantization (e.g., INT8) to reduce model size and inference time

D.Implement response caching for common queries and batch process similar requests

AnswerD

Caching reduces redundant computations, and batching improves throughput, together cutting latency and cost.

Why this answer

Option B is correct because implementing caching and batching directly addresses latency and cost by reusing common responses and processing requests in groups. Option A (switching to a larger model) would increase latency and cost. Option C (increasing quota) does not improve performance or cost efficiency.

Option D (model quantization) might help latency but could reduce accuracy; it's also more complex than caching/batching as a first step.

Practice this question →

92

MCQmedium

A healthcare company is building a clinical decision support system using Gemini 1.5 Pro on Vertex AI. They need responses that are highly accurate and comply with medical regulations, including traceability to source documents. They have a large corpus of curated medical guidelines stored in PDFs in Cloud Storage. Their team has experience with both fine-tuning and prompt engineering. Which approach best ensures regulatory compliance and accuracy?

A.Use a combination of grounding to the medical guidelines and prompt engineering with system instructions specifying compliance requirements.

B.Use prompt engineering with system instructions and few-shot examples, but no grounding.

C.Use grounding to the medical guidelines but rely on prompt engineering only for compliance instructions.

D.Fine-tune the model on the medical guidelines corpus to internalize the knowledge.

AnswerA

Grounding ensures traceability to source documents, and prompt engineering enforces regulatory language, together meeting compliance.

Why this answer

Option D is correct because combining grounding (which ties answers to the actual guidelines) with prompt engineering (which enforces compliance requirements) provides traceability and accuracy. Option A (fine-tuning only) risks the model memorizing rather than citing sources, and updates require retraining. Option B (grounding only) may still allow the model to generate ungrounded responses if not properly constrained.

Option C (prompt engineering only) relies on the model's pre-trained knowledge, which is less reliable.

Practice this question →

93

MCQeasy

Refer to the exhibit. A developer runs this command but forgets to specify the model name. What will happen?

A.The command will fail with an error

B.The command will prompt for a name

C.The model will be uploaded with a default name

D.The command will succeed but the model will be unlisted

AnswerA

Missing required --display-name causes an error.

Why this answer

In the context of the `gcloud ai models upload` command (or similar model deployment commands in Vertex AI), the model name is a required positional argument. If omitted, the CLI will fail with an error because it cannot proceed without a unique identifier to register the model in the model registry. The command does not default to any name or prompt interactively; it strictly validates required parameters before execution.

Exam trap

Google Cloud often tests the misconception that cloud CLI tools will either prompt for missing required parameters or apply a sensible default, when in reality they fail fast with a clear error to enforce explicit configuration.

How to eliminate wrong answers

Option B is wrong because the command does not prompt for a name; it expects the name as a positional argument in the initial command string, and if missing, it immediately returns a usage error. Option C is wrong because there is no default name mechanism; model names must be explicitly provided to avoid collisions and ensure traceability in the registry. Option D is wrong because the command will not succeed at all; it fails before any upload occurs, so no model is created in any state (listed or unlisted).

Practice this question →

94

MCQmedium

A data scientist notices that a Gemini model generates inconsistent responses to similar prompts. What is the likely cause?

A.Model is not fine-tuned enough

B.The prompt is too short

C.The temperature setting is too low

D.The top_p or temperature parameters are set too high causing randomness

AnswerD

High temperature or top_p increases randomness and variability.

Why this answer

Option D is correct because high temperature (e.g., >1.0) or high top_p (e.g., >0.9) increases the randomness of token sampling, causing the model to select less probable tokens. This directly leads to inconsistent responses for similar prompts, as the model's output distribution becomes more uniform and less deterministic.

Exam trap

Google Cloud often tests the misconception that fine-tuning or prompt length is the primary cause of output inconsistency, when in fact the sampling parameters (temperature and top_p) directly control randomness and are the most common culprit.

How to eliminate wrong answers

Option A is wrong because fine-tuning adjusts the model's weights for a specific task, but it does not control the randomness of token generation; even a fully fine-tuned model will produce inconsistent outputs if sampling parameters are set too high. Option B is wrong because prompt length affects context and specificity, not the inherent randomness of the generation process; a short prompt can still yield consistent responses if temperature and top_p are low. Option C is wrong because a low temperature setting (e.g., 0.1) actually reduces randomness, making outputs more deterministic and consistent, not inconsistent.

Practice this question →

95

Multi-Selectmedium

Which THREE of the following are common techniques to reduce harmful biases in generative AI models? (Choose three.)

Select 3 answers

A.Use reinforcement learning from human feedback (RLHF) with a reward model that penalizes biased or unfair outputs.

B.Curate diverse and balanced training datasets that overrepresent underrepresented groups.

C.Decrease the model's temperature parameter to make outputs more deterministic.

D.Apply adversarial training to remove protected attribute information from hidden representations.

E.Conduct a legal review of all generated outputs before release.

AnswersA, B, D

RLHF can shape model behavior to avoid biased generations.

Why this answer

A is correct because RLHF uses a reward model trained on human preferences to score model outputs, and explicitly penalizing biased or unfair outputs during fine-tuning directly reduces harmful biases. This technique aligns the model's behavior with human values by optimizing against a learned reward signal that captures bias-related concerns.

Exam trap

Google Cloud often tests the distinction between hyperparameter tuning (like temperature) and actual bias mitigation techniques, so candidates mistakenly think lowering temperature reduces bias when it only affects output randomness.

Practice this question →

96

MCQeasy

A company wants to generate images from text descriptions using Google Cloud. Which service should they use?

A.Vertex AI Imagen

B.Vertex AI Gemini

C.Cloud Vision API

D.AutoML Vision

AnswerA

Imagen is the dedicated text-to-image service.

Why this answer

Vertex AI Imagen is Google Cloud's purpose-built service for generating high-fidelity images from text descriptions using diffusion models. It directly addresses the requirement of text-to-image generation, offering capabilities like image editing, upscaling, and style transfer, which are not available in other Vertex AI or Vision services.

Exam trap

The trap here is that candidates may confuse Vertex AI Gemini's multimodal capabilities (understanding images) with generative image creation, or assume that Cloud Vision API or AutoML Vision can be repurposed for generation, when in fact they are strictly analysis or custom training tools.

How to eliminate wrong answers

Option B is wrong because Vertex AI Gemini is a multimodal large language model (LLM) that can process text, images, audio, and video, but it is not optimized or primarily designed for generating images from text; its strength lies in understanding and reasoning across modalities, not in image synthesis. Option C is wrong because Cloud Vision API is a pre-trained model for analyzing and extracting information from images (e.g., object detection, OCR, label detection), not for generating images from text. Option D is wrong because AutoML Vision is a service for training custom image classification or object detection models on labeled datasets, not for generative text-to-image tasks.

Practice this question →

97

MCQmedium

A data scientist fine-tunes a large language model on Vertex AI but gets poor results on validation data. What is the most likely cause?

A.Incorrect learning rate

B.Insufficient training data

C.Using wrong model family

D.Overfitting due to too many epochs

AnswerB

Fine-tuning requires enough representative data to adapt the model without overfitting or underfitting.

Why this answer

Fine-tuning a large language model on Vertex AI with poor validation results is most likely due to insufficient training data. Large language models have billions of parameters and require a substantial amount of high-quality, task-specific data to effectively adapt to a new domain or task; without enough examples, the model cannot learn the desired patterns and will perform poorly on unseen data.

Exam trap

The trap here is that candidates often assume hyperparameter tuning (like learning rate) is the primary cause of poor fine-tuning results, but in generative AI, data quantity and quality are the most common bottlenecks, especially when using pre-trained models on Vertex AI.

How to eliminate wrong answers

Option A is wrong because an incorrect learning rate typically causes training instability (e.g., loss divergence or slow convergence) rather than consistently poor validation results, and Vertex AI's default hyperparameters are often reasonable. Option C is wrong because using the wrong model family (e.g., choosing a text generation model for a classification task) would likely cause immediate, obvious failures or mismatches in output format, not just poor validation performance after fine-tuning. Option D is wrong because overfitting due to too many epochs would manifest as high training accuracy with low validation accuracy, but the question states poor results on validation data without mentioning training performance, and overfitting is less likely with insufficient data (the model would underfit instead).

Practice this question →

98

MCQhard

A financial services company is building a customer service agent using Vertex AI Agent Builder. They want the agent to only answer questions based on their approved policy documents, which are stored in Cloud Storage. They also need to ensure that the agent never reveals internal employee names or account numbers. They have set up grounding with the documents but find that the agent sometimes ignores the grounding and generates responses using the model's internal knowledge. What should they do to strictly constrain the agent to only use the provided documents?

A.Add a system instruction that says 'Only answer from the provided documents.'

B.Use the 'vertex-ai-agent-builder' with strict grounding mode and disable fallback to model knowledge.

C.Set the model's temperature to 0 and top_p to 0.1.

D.Fine-tune the model on the policy documents to limit its knowledge.

AnswerB

Strict grounding mode ensures the agent only uses the grounded documents, with no fallback.

Why this answer

Option C is correct because Agent Builder provides a 'strict grounding' mode that prevents the model from falling back to internal knowledge, ensuring responses rely solely on the grounded documents. Option A (temperature adjustment) does not force grounding. Option B (system instruction) may be overridden.

Option D (fine-tuning) does not fully block internal knowledge and requires effort.

Practice this question →

99

MCQmedium

A company is developing a code generation assistant and wants to ensure the model respects access control policies, e.g., it should not generate code that uses internal APIs that the user is not authorized to access. Which technique is most effective for embedding such policy constraints into the model's behavior?

A.Include a system prompt that instructs the model to never generate code using internal APIs.

B.Use a retriever to fetch policy documents and prepend them to each prompt.

C.Fine-tune the model on a dataset of code snippets that follow access control policies, including negative examples of disallowed API usage.

D.Train a separate classifier to rerank model outputs and reject non-compliant generations.

AnswerC

Fine-tuning directly embeds the policy into model weights.

Why this answer

Option A is correct because fine-tuning on a curated dataset with policy-compliant examples teaches the model to respect constraints. Option B is incorrect because prompt engineering alone can be easily circumvented by users. Option C is incorrect because retrieval-augmented generation (RAG) does not enforce policy on generated code.

Option D is incorrect because RLHF with a reward model can help but is less direct than fine-tuning on explicit compliance data.

Practice this question →

100

MCQhard

A developer uses Vertex AI to generate code but the output is not syntactically correct. Which parameter should be adjusted?

A.candidate_count

B.max_output_tokens

C.temperature

D.top_k

AnswerC

Lower temperature (e.g., 0.2) makes the model more focused and likely to produce valid syntax.

Why this answer

Temperature controls the randomness of token selection during generation. A high temperature increases the likelihood of less probable tokens, which can lead to syntactically incorrect code. Lowering temperature makes the model more deterministic and conservative, favoring higher-probability tokens that are more likely to form valid syntax.

Exam trap

Google Cloud often tests the misconception that increasing candidate_count or max_output_tokens will improve output quality, when in fact these parameters only affect quantity or length, not the underlying token selection logic that determines syntactic correctness.

How to eliminate wrong answers

Option A is wrong because candidate_count controls how many different response candidates are generated, not the syntactic correctness of any single output. Option B is wrong because max_output_tokens limits the length of the generated text, not the quality or validity of the syntax. Option D is wrong because top_k limits the number of highest-probability tokens considered at each step; while it can affect output quality, it does not directly address syntactic correctness as effectively as temperature does.

Practice this question →

101

MCQmedium

A company wants to use a pre-trained language model for customer support summarization. They need to ensure responses are concise and accurate. Which prompt engineering technique is most effective?

A.Zero-shot prompting

B.Few-shot prompting with examples

C.Chain-of-thought prompting

D.Negative prompting

AnswerB

Few-shot provides examples to guide the model, improving accuracy and conciseness.

Why this answer

Few-shot prompting (B) is most effective because it provides the model with a small set of example input-output pairs (e.g., a customer query and its concise summary), which guides the model to produce outputs that match the desired format, length, and accuracy. This technique is particularly useful for summarization tasks where consistency and adherence to a specific style are critical, as it reduces ambiguity without requiring fine-tuning.

Exam trap

Google Cloud often tests the misconception that zero-shot prompting is sufficient for all tasks, but the trap here is that candidates overlook the need for explicit guidance in format-sensitive tasks like summarization, where few-shot examples provide the necessary constraint for consistency.

How to eliminate wrong answers

Option A (Zero-shot prompting) is wrong because it relies solely on the model's pre-trained knowledge without any examples, which often leads to inconsistent or overly verbose summaries, especially when the task requires a specific format or level of conciseness. Option C (Chain-of-thought prompting) is wrong because it is designed for multi-step reasoning tasks (e.g., arithmetic or logic problems) and is unnecessary for summarization, where the goal is to condense information rather than reason through steps. Option D (Negative prompting) is wrong because it instructs the model on what to avoid (e.g., 'do not include details'), which can be imprecise and may inadvertently suppress relevant information, making it less reliable than providing positive examples of desired outputs.

Practice this question →

102

MCQmedium

A data scientist fine-tunes a foundation model on customer support transcripts. After evaluation, the model's responses are too formal. Which adjustment during fine-tuning is most likely to make responses more conversational?

A.Increase the batch size to stabilize training.

B.Decrease the number of fine-tuning steps to prevent overfitting.

C.Include examples of informal customer interactions in the fine-tuning data.

D.Use a higher learning rate for faster adaptation.

AnswerC

The training data teaches the model the desired tone; adding conversational examples directly influences style.

Why this answer

The training data directly influences the tone and style of model outputs. Including examples of informal conversations in the fine-tuning dataset teaches the model the desired conversational tone. Other options affect training dynamics but not the style.

Practice this question →

103

MCQeasy

Which Google Cloud service provides a managed environment for prompt engineering and model evaluation?

A.AI Platform Notebooks

B.Dialogflow CX

C.Vertex AI Generative AI Studio

D.Cloud Composer

AnswerC

This service provides tools for prompt design and evaluation.

Why this answer

Vertex AI Generative AI Studio is the correct answer because it is a managed service within Vertex AI specifically designed for prompt engineering, model tuning, and evaluation of generative AI models. It provides a no-code interface for testing prompts, comparing model outputs, and iterating on prompt design, directly supporting the workflow described in the question.

Exam trap

The trap here is that candidates may confuse Vertex AI Generative AI Studio with AI Platform Notebooks, assuming that any model development environment supports prompt engineering, when in fact Generative AI Studio is the specialized tool for that purpose.

How to eliminate wrong answers

Option A is wrong because AI Platform Notebooks is a managed Jupyter notebook service for custom model development and training, not a dedicated environment for prompt engineering or model evaluation. Option B is wrong because Dialogflow CX is a conversational AI platform for building chatbots and virtual agents, focused on intent classification and dialogue management, not prompt engineering or model evaluation. Option D is wrong because Cloud Composer is a managed Apache Airflow service for workflow orchestration and scheduling, unrelated to prompt engineering or model evaluation.

Practice this question →

104

Multi-Selectmedium

A company is deploying a generative AI model for medical diagnosis support. Which THREE considerations are critical for responsible AI?

Select 3 answers

A.Ensure the training data is diverse and representative.

B.Maximize model throughput to handle high volumes.

C.Implement human oversight for all diagnostic suggestions.

D.Provide clear disclaimers about the model's limitations.

E.Use the cheapest model to reduce costs.

AnswersA, C, D

Diverse data reduces bias.

Why this answer

Option A is correct because diverse and representative training data is critical for responsible AI in medical diagnosis. If the data lacks diversity, the model may exhibit bias, leading to inaccurate or harmful diagnoses for underrepresented groups. This directly impacts fairness, safety, and regulatory compliance in healthcare AI.

Exam trap

Google Cloud often tests the distinction between operational metrics (like throughput or cost) and ethical/regulatory requirements (like fairness, transparency, and human oversight) in responsible AI, leading candidates to mistakenly select performance-based options as critical considerations.

Practice this question →

105

MCQeasy

What is the purpose of grounding in Vertex AI?

A.To improve training speed

B.To connect model outputs to verifiable sources

C.To reduce model size for faster inference

D.To enable multi-modal inputs

AnswerB

Grounding ensures the model's responses are based on authoritative information.

Why this answer

Grounding in Vertex AI connects model outputs to verifiable, external sources of information (such as Google Search, enterprise data sources, or third-party databases) to reduce hallucinations and improve factual accuracy. By referencing grounded sources, the model can provide citations and allow users to verify claims, which is critical for enterprise applications requiring trust and compliance.

Exam trap

Google Cloud often tests grounding by conflating it with fine-tuning or prompt engineering, so the trap here is assuming grounding modifies the model's weights or training process, when in fact it is a retrieval-based augmentation layer applied at inference time.

How to eliminate wrong answers

Option A is wrong because grounding does not improve training speed; it is a runtime technique applied during inference to augment responses with real-time data, not a training optimization. Option C is wrong because grounding does not reduce model size or accelerate inference; it may actually add latency due to the retrieval step. Option D is wrong because grounding is not about enabling multi-modal inputs; it specifically addresses output verification and source attribution, whereas multi-modal support is a separate capability for processing images, audio, or video alongside text.

Practice this question →

106

MCQmedium

During model evaluation, a team observes good performance on training data but poor on validation data. Which regularization technique is most appropriate to address this?

A.Add more training data

B.Increase the learning rate

C.Apply dropout

D.Use a larger batch size

AnswerC

Dropout is a regularization method that prevents co-adaptation of neurons, reducing overfitting.

Why this answer

The scenario describes overfitting, where the model memorizes training data but fails to generalize to unseen validation data. Dropout is a regularization technique that randomly deactivates a fraction of neurons during training, forcing the network to learn more robust features and reducing co-adaptation, which directly mitigates overfitting.

Exam trap

Google Cloud often tests the distinction between techniques that improve generalization (regularization) versus those that improve optimization (learning rate, batch size), leading candidates to confuse data augmentation or hyperparameter tuning with regularization methods like dropout.

How to eliminate wrong answers

Option A is wrong because adding more training data can help reduce overfitting but is not a regularization technique; it addresses data scarcity, not the core issue of model complexity. Option B is wrong because increasing the learning rate can cause training instability, divergence, or overshooting of the loss minimum, and does not prevent overfitting. Option D is wrong because using a larger batch size often leads to sharper minima and poorer generalization, potentially worsening overfitting, and is not a regularization method.

Practice this question →

107

Multi-Selectmedium

A company is building a conversational AI using the Gemini API on Vertex AI. They want to reduce the chance of generating toxic content while still allowing creative and engaging responses for their gaming community. Which TWO safety settings should they adjust in the safety_settings parameter?

Select 2 answers

A.Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_NONE.

B.Set the threshold for 'SEXUALLY_EXPLICIT' category to BLOCK_LOW_AND_ABOVE.

C.Enable the 'harm_category' filter for 'DANGEROUS_CONTENT' with threshold BLOCK_ONLY_HIGH.

D.Set the threshold for 'HARASSMENT' category to BLOCK_LOW_AND_ABOVE.

E.Set the threshold for 'HATE_SPEECH' category to BLOCK_ONLY_HIGH.

AnswersC, E

Blocks only high probability dangerous content, maintaining safety without stifling creativity.

Why this answer

Option C is correct because setting the 'DANGEROUS_CONTENT' category to BLOCK_ONLY_HIGH allows the model to generate creative and engaging responses for a gaming community while still blocking the most severe dangerous content. This balances safety with creative freedom, as the gaming context may involve simulated conflict or action that is not genuinely harmful.

Exam trap

Google Cloud often tests the misconception that stricter blocking (e.g., BLOCK_LOW_AND_ABOVE) is always better for safety, but in creative contexts like gaming, BLOCK_ONLY_HIGH is the correct balance to avoid stifling legitimate content.

Practice this question →

108

MCQhard

Refer to the exhibit. This IAM policy is applied to a Vertex AI project. A user 'test@example.com' reports they cannot create a ModelEvaluationPipelineJob. Which action should the administrator take?

A.Grant the user roles/aiplatform.specialist at the project level.

B.Add the user roles/aiplatform.user at the model level to allow pipeline creation.

C.Add the user to the roles/aiplatform.admin role at the project level.

D.Remove the service account from roles/aiplatform.admin.

AnswerC

Admin role includes permissions to create pipeline jobs.

Why this answer

Roles/aiplatform.user does not have the permissions to create pipeline jobs; it only allows viewing and using models and endpoints. Roles/aiplatform.admin has full control, so adding the user to this role is the simplest fix. There is no roles/aiplatform.specialist; removing the service account would not help; and granting at the model level is insufficient for creating pipeline jobs.

Practice this question →

109

Multi-Selecthard

Which THREE factors should be considered when choosing between fine-tuning and prompt engineering for a generative AI task? (Choose three.)

Select 3 answers

A.Availability of labeled training data

B.Cost of API calls per request

C.Latency requirements for the application

D.Degree of task specialization required

E.Size of the base model

AnswersA, C, D

Fine-tuning needs labeled data.

Why this answer

Option A is correct because fine-tuning requires a labeled dataset specific to the target task to adjust model weights via supervised learning, whereas prompt engineering relies on the model's existing knowledge without additional training data. Without sufficient labeled data, prompt engineering is often the only viable approach, as fine-tuning would risk overfitting or poor generalization.

Exam trap

Google Cloud often tests the misconception that cost or model size are primary decision factors, when in reality the core trade-off is between data availability (labeled vs. unlabeled) and the degree of task specialization required.

Practice this question →

110

MCQhard

A company is using Vertex AI to generate personalized marketing emails. The model sometimes produces biased content. What is the most effective way to detect and mitigate bias?

A.Add more diverse training data

B.Manually review all generated emails before sending

C.Switch to a different generative model

D.Use Vertex AI Explainable AI to analyze predictions and detect bias in training data

AnswerD

Explainable AI helps identify bias sources.

Why this answer

Vertex AI Explainable AI provides feature attributions and model explanations that help identify which input features (e.g., demographic attributes, phrasing patterns) contribute most to biased outputs. By analyzing these attributions against training data, you can pinpoint and mitigate bias at the source, rather than relying on post-hoc manual review or model swapping.

Exam trap

Google Cloud often tests the misconception that bias mitigation is solely a data quantity problem, leading candidates to choose 'add more diverse training data' without recognizing the need for diagnostic tools like Explainable AI to first detect and understand the bias.

How to eliminate wrong answers

Option A is wrong because simply adding more diverse training data does not guarantee detection of existing bias; it may reduce bias over time but lacks the diagnostic capability to identify specific biased patterns. Option B is wrong because manual review of all generated emails is not scalable, introduces human bias, and does not address the root cause of bias in the model's training or architecture. Option C is wrong because switching to a different generative model does not inherently detect or mitigate bias; the new model may have similar or different biases, and the underlying issue of biased training data or model behavior remains unaddressed.

Practice this question →

111

MCQeasy

You are a generative AI lead at a healthcare startup developing a system to summarize patient medical records for quick review by doctors. The system uses a fine-tuned LLM. After deployment, doctors report that the summaries often miss critical details like medication dosages and allergy information. The current pipeline preprocesses patient records by extracting text from EHR, feeding it to the LLM, and outputting a summary. The team has limited time and budget. They cannot retrain the model because it is hosted as a managed API. Which action should you take to most effectively improve the summarization quality without changing the model?

A.Increase the maximum output token limit to force the model to include more details.

B.Replace the LLM with a simpler extractive summarization model that selects sentences from the original document.

C.Implement a retrieval-augmented generation (RAG) system that pulls supplementary data from external drug databases.

D.Revise the prompt to explicitly ask for medication dosages and allergies, and format the input text by adding headings (e.g., '### Medications') to emphasize important sections.

AnswerD

Prompt engineering is a low-cost, no-model-change solution that can emphasize key information.

Why this answer

Option D is correct because prompt engineering is the most effective and cost-efficient way to improve LLM output without retraining or changing the model. By explicitly instructing the model to include medication dosages and allergies, and by structuring the input with clear headings, you guide the model's attention to critical sections, directly addressing the missing details. This approach leverages the LLM's existing capabilities and requires no changes to the hosted API or additional infrastructure.

Exam trap

Google Cloud often tests the misconception that increasing output length or adding external data automatically improves quality, when in fact the most direct and cost-effective fix is to refine the input prompt to guide the model's focus.

How to eliminate wrong answers

Option A is wrong because increasing the maximum output token limit does not force the model to include specific missing details; it only allows longer responses, which may still omit critical information if the prompt does not direct the model's focus. Option B is wrong because replacing the LLM with an extractive summarization model would require retraining or deploying a new model, contradicting the constraint of not changing the model, and extractive methods cannot generate new text to explicitly mention dosages or allergies if they are not present in the original text. Option C is wrong because implementing a RAG system to pull from external drug databases adds complexity, cost, and latency, and does not address the core issue of missing details from the patient's own records; the problem is about extracting existing information, not supplementing with external data.

Practice this question →

112

Multi-Selecthard

A company is fine-tuning a Gemma model using Vertex AI. They observe that the model overfits. Which TWO actions should they take to mitigate overfitting?

Select 2 answers

A.Use a larger batch size

B.Increase the number of training epochs

C.Use more diverse data

D.Reduce the learning rate

E.Add dropout during fine-tuning

AnswersC, E

More diverse training data reduces overfitting to narrow patterns.

Why this answer

Option C is correct because introducing more diverse data helps the model generalize better by exposing it to a wider variety of patterns, reducing the risk of memorizing noise from a limited dataset. Option E is correct because dropout randomly deactivates a fraction of neurons during fine-tuning, which prevents co-adaptation and acts as a regularization technique to combat overfitting in transformer-based models like Gemma.

Exam trap

Google Cloud often tests the misconception that reducing the learning rate or increasing batch size are universal fixes for overfitting, when in fact these hyperparameters primarily affect optimization dynamics rather than regularization.

Practice this question →

113

Multi-Selecteasy

A developer is using Vertex AI Studio to test a text generation model. Which two actions can be performed in Vertex AI Studio? (Choose TWO)

Select 2 answers

A.Manage IAM roles

B.Monitor model cost

C.Create a dataset

D.Deploy a model to an endpoint

E.Fine-tune a model

AnswersD, E

From Studio, you can deploy a fine-tuned model directly to an endpoint.

Why this answer

Option D is correct because Vertex AI Studio provides a direct interface to deploy a text generation model to an endpoint for serving predictions. This action is a core capability of the platform, allowing developers to test and then operationalize their models without leaving the Studio environment.

Exam trap

Google Cloud often tests the distinction between actions performed within a specific tool (Vertex AI Studio) versus broader platform capabilities (IAM, cost monitoring, dataset creation) to see if candidates understand the scope and purpose of each service.

Practice this question →

114

MCQeasy

A startup wants to use a pre-trained model to generate product descriptions without training. Which Google Cloud service should they use?

A.Vertex AI Prediction

B.AI Platform Training

C.Cloud AutoML

D.Vertex AI Generative AI Studio

AnswerD

Vertex AI Generative AI Studio is designed for accessing and experimenting with foundation models for generative tasks.

Why this answer

Vertex AI Generative AI Studio provides access to pre-trained foundation models like Gemini for text generation via a user interface and API, making it the easiest choice for generating product descriptions without training.

Practice this question →

115

MCQmedium

Refer to the exhibit. A developer runs this command. What is the primary purpose?

A.Create a training pipeline

B.Deploy a model to an endpoint

C.Train a model

D.Upload a model artifact to Model Registry

AnswerD

The 'models upload' command registers a model with the specified container and artifacts.

Why this answer

The command shown in the exhibit is `az ml model create --name my-model --path ./model.pkl --registry-name myregistry`. This command uploads a local model artifact (model.pkl) to the Azure Machine Learning Model Registry, which is a centralized repository for versioning and managing trained models. It does not initiate training, deployment, or pipeline creation; its sole purpose is to register the model artifact for later use.

Exam trap

Google Cloud often tests the distinction between model registration (uploading a trained artifact) and model training or deployment, so the trap here is that candidates confuse the `az ml model create` command with initiating a training job or deployment, when it only stores the model artifact for versioning and reuse.

How to eliminate wrong answers

Option A is wrong because creating a training pipeline requires a command like `az ml job create` or `az ml pipeline create`, not `az ml model create`. Option B is wrong because deploying a model to an endpoint uses commands such as `az ml online-endpoint create` and `az ml online-deployment create`, which involve specifying compute targets and scoring scripts, not just uploading a model file. Option C is wrong because training a model is performed via a training job (e.g., `az ml job create` with a training script and compute target), not by registering an already-trained artifact.

Practice this question →

116

MCQhard

An MLOps engineer wants to implement continuous evaluation of a generative model in production. Which Vertex AI component should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Feature Store

C.Vertex AI Prediction

D.Vertex AI Pipelines

AnswerA

Model Monitoring provides continuous evaluation of model metrics and alerts on degradation.

Why this answer

Vertex AI Model Monitoring is the correct component because it provides continuous evaluation of model performance in production, including detecting prediction drift, data drift, and feature attribution drift. For generative models, it can monitor output quality and safety metrics over time, alerting engineers to degradation or shifts in model behavior without requiring manual intervention.

Exam trap

Google Cloud often tests the distinction between monitoring (ongoing evaluation of deployed models) and serving (handling inference requests), leading candidates to mistakenly choose Vertex AI Prediction for continuous evaluation tasks.

How to eliminate wrong answers

Option B is wrong because Vertex AI Feature Store is designed for managing, storing, and serving feature data for training and predictions, not for monitoring model performance or evaluating outputs in production. Option C is wrong because Vertex AI Prediction handles model serving and inference requests, but it does not include built-in continuous evaluation or drift detection capabilities. Option D is wrong because Vertex AI Pipelines orchestrates ML workflows for training and batch prediction, but it is not a real-time monitoring service for production model evaluation.

Practice this question →

117

MCQeasy

A company is deploying a generative AI model for medical advice. What is the most important consideration?

A.Model latency

B.Safety and fairness

C.Model size

D.Cost of inference

AnswerB

Patient safety and avoiding bias are the top priorities.

Why this answer

In medical advice applications, a generative AI model's outputs can directly impact patient health, making safety and fairness the paramount consideration. Incorrect or biased advice could lead to misdiagnosis or harm, outweighing performance metrics like latency or cost. Regulatory frameworks such as HIPAA and FDA guidelines for clinical decision support further mandate rigorous validation of model safety and fairness before deployment.

Exam trap

Google Cloud often tests the misconception that technical performance metrics like latency or cost are the primary concerns in high-stakes domains, when in fact ethical and safety considerations take precedence.

How to eliminate wrong answers

Option A is wrong because model latency, while important for user experience, is secondary to ensuring the advice is safe and unbiased; a fast but harmful response is unacceptable in healthcare. Option C is wrong because model size correlates with computational resources and potential capability, but does not inherently guarantee safety or fairness; a larger model may amplify biases or generate more confident but incorrect advice. Option D is wrong because cost of inference is a business consideration that must be balanced against safety requirements, but it is not the most critical factor when human lives are at stake.

Practice this question →

118

MCQhard

Refer to the exhibit. An administrator creates this IAM policy for a Vertex AI project. What is the effect of this policy?

A.Alice can view models; Bob can delete models

B.Alice can deploy pre-trained models; Bob can create and manage custom model code

C.Both have full access to all Vertex AI resources

D.Alice can train models; Bob can deploy models

AnswerB

aiplatform.user includes deployment permissions; customCodeModelAdmin covers custom code management.

Why this answer

Option B is correct because the IAM policy grants Alice the `aiplatform.models.get` permission (allowing her to view and deploy pre-trained models) and grants Bob the `aiplatform.models.create` and `aiplatform.models.update` permissions (allowing him to create and manage custom model code). The policy uses separate bindings for each user, with specific roles that align with these actions.

Exam trap

Google Cloud often tests the distinction between specific IAM permissions (e.g., `get` vs. `create` vs. `delete`) and the common misconception that viewing a model implies full access or that creating a model implies the ability to deploy it.

How to eliminate wrong answers

Option A is wrong because Alice is granted `aiplatform.models.get`, which allows viewing models but not deleting them; Bob is granted `aiplatform.models.create` and `aiplatform.models.update`, which allow creating and updating models but not deleting them. Option C is wrong because the policy does not grant full access to all Vertex AI resources; it only grants specific permissions on models, and neither user has permissions for other resources like datasets or endpoints. Option D is wrong because Alice's permission (`aiplatform.models.get`) does not include training models, and Bob's permissions (`aiplatform.models.create` and `aiplatform.models.update`) are for custom model code management, not deploying models.

Practice this question →

119

MCQmedium

A company is using Vertex AI to generate email responses. They want to ensure sensitive customer data (PII) is not included in the output. What is the most effective approach?

A.Use a system prompt instructing the model to avoid PII.

B.Fine-tune the model on a dataset that excludes PII.

C.Manually review each output before sending.

D.Configure safety filters to block PII categories.

AnswerD

Safety filters can automatically block PII.

Why this answer

Option D is correct because safety filters in Vertex AI are specifically designed to block categories of harmful content, including PII, at the model's output layer. This provides a deterministic, automated guardrail that prevents sensitive data from being generated, unlike prompt-based instructions which can be overridden by the model's training. Safety filters operate on the model's response before it is returned, ensuring PII is caught even if the model attempts to generate it.

Exam trap

The trap here is that candidates assume a system prompt (Option A) is sufficient to control model behavior, but Cisco tests the understanding that prompts are not enforceable guardrails, whereas safety filters are a hard technical control.

How to eliminate wrong answers

Option A is wrong because system prompts are merely instructions and do not guarantee the model will comply; the model can still generate PII due to its training data or adversarial inputs. Option B is wrong because fine-tuning on a dataset that excludes PII does not prevent the model from generating PII from its pre-trained knowledge, and fine-tuning is costly and may not cover all edge cases. Option C is wrong because manual review is not scalable, introduces latency, and is prone to human error, making it ineffective for high-volume email generation.

Practice this question →

120

MCQeasy

A medical imaging team wants to generate synthetic X-ray images to augment a training dataset for a rare disease. Which type of generative model is most suitable for generating high-fidelity, realistic medical images?

A.Generative Adversarial Network (GAN)

B.Diffusion model

C.Variational Autoencoder (VAE)

D.Autoregressive transformer (e.g., PixelCNN)

AnswerB

Diffusion models currently produce the highest quality images.

Why this answer

Diffusion models are the most suitable for generating high-fidelity, realistic medical images because they iteratively denoise random noise into a coherent image through a learned reverse diffusion process, which produces superior sample quality and diversity compared to GANs, especially for complex, high-dimensional data like X-rays. Their training stability and ability to model fine-grained anatomical details without mode collapse make them the current state-of-the-art for medical image synthesis.

Exam trap

Google Cloud often tests the misconception that GANs are the default choice for image generation due to their popularity, but the trap here is that for high-fidelity medical imaging, diffusion models are preferred because they avoid GANs' mode collapse and training instability, which are critical in safety-sensitive domains.

How to eliminate wrong answers

Option A is wrong because GANs, while capable of generating realistic images, suffer from training instability, mode collapse, and difficulty in capturing the full diversity of medical image distributions, often producing artifacts that are unacceptable in clinical contexts. Option C is wrong because VAEs generate blurry and less detailed images due to their reliance on a variational lower bound and a Gaussian prior, which fails to capture the sharp edges and fine textures critical in X-ray images. Option D is wrong because autoregressive transformers like PixelCNN generate images pixel-by-pixel, which is computationally prohibitive for high-resolution medical images and lacks the global coherence and efficiency of diffusion models.

Practice this question →

121

Multi-Selectmedium

A company is designing a prompt engineering strategy for a customer service chatbot using Gemini. Which two practices are recommended for improving response quality? (Choose TWO)

Select 2 answers

A.Use chain-of-thought prompting

B.Always provide multiple examples in the prompt

C.Avoid any context in the prompt

D.Set temperature to 1.0 for maximum creativity

E.Include a system instruction to define the role

AnswersA, E

Chain-of-thought encourages logical reasoning, improving accuracy.

Why this answer

Chain-of-thought prompting (A) is recommended because it guides the model to reason step-by-step, improving accuracy on complex customer service queries by breaking down multi-step problems. This technique leverages Gemini's ability to follow logical sequences, reducing errors in tasks like troubleshooting or escalation decisions.

Exam trap

Google Cloud often tests the misconception that higher temperature always improves creativity, but in customer service, lower temperature is critical for deterministic, safe responses, and candidates may overlook the role of system instructions in defining behavior.

Practice this question →

122

MCQhard

A research team is training a large language model from scratch using TPUs on Google Cloud. Which storage solution provides the highest throughput for training data?

A.Cloud Storage

B.Persistent Disk

C.Cloud Filestore

D.Cloud Spanner

AnswerA

Cloud Storage provides high throughput for large datasets, especially with parallel reads.

Why this answer

Cloud Storage provides the highest throughput for training data because it is designed for high-bandwidth, parallel access from TPU pods via the Google Cloud Storage FUSE or gRPC-based data loading. TPUs benefit from Cloud Storage's ability to serve data at hundreds of GB/s when using the `tf.data` service with `tf.io.gfile` or the `gcloud storage` API, avoiding the I/O bottlenecks of block storage. Persistent Disk and Filestore have lower aggregate throughput limits and are not optimized for the distributed, streaming read patterns typical of large-scale training.

Exam trap

Google Cloud often tests the misconception that local or attached block storage (Persistent Disk) is faster than object storage for ML training, but candidates fail to recognize that TPU training requires distributed, parallel data access that object storage (Cloud Storage) uniquely provides at scale.

How to eliminate wrong answers

Option B is wrong because Persistent Disk is a block storage device with a maximum throughput of ~1.2 GB/s per instance (for pd-ssd), which is far below the multi-GB/s requirements of TPU training and cannot scale horizontally across many workers without complex striping. Option C is wrong because Cloud Filestore is a managed NFS filestore that introduces network latency and has throughput caps (e.g., 1.2 GB/s for the Basic tier, 4.8 GB/s for the High Scale tier), making it unsuitable for the high-throughput, low-latency data streaming needed by TPUs. Option D is wrong because Cloud Spanner is a globally distributed relational database service designed for transactional consistency and ACID compliance, not for high-throughput sequential read of training data; its throughput is limited by node count and query overhead, and it is not a file storage solution.

Practice this question →

123

Multi-Selecthard

Which THREE are valid methods to reduce bias in generative AI outputs?

Select 3 answers

A.Using only English prompts

B.Increasing model size

C.Using a more diverse training dataset

D.Using safety filters

E.Applying prompt engineering to instruct the model to be fair

AnswersC, D, E

Diverse data reduces the risk of model learning biased patterns.

Why this answer

Option C is correct because training on a more diverse dataset reduces representational bias by exposing the model to a wider range of demographics, cultures, and perspectives. This directly mitigates the model's tendency to overrepresent majority groups or underrepresent minorities, which is a root cause of biased outputs in generative AI.

Exam trap

Google Cloud often tests the misconception that increasing model size or using a single language (like English) can solve bias, when in reality these actions can worsen bias by amplifying existing skews or introducing new cultural blind spots.

Practice this question →

124

MCQhard

A company is deploying a generative AI application that generates medical reports. They need to ensure the output is factual and minimizes hallucinations. Which approach is most effective?

A.Fine-tune the model with RLHF

B.Set the temperature to 0.0

C.Implement retrieval-augmented generation (RAG) with a curated knowledge base

D.Use prompt engineering to instruct the model to be accurate

AnswerC

RAG grounds outputs in retrieved facts, reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most effective approach because it grounds the model's output in a curated, authoritative knowledge base of medical data. By retrieving relevant, verified documents at inference time, RAG directly reduces the model's reliance on its parametric memory, which is the primary source of hallucinations in generative AI. This is especially critical in high-stakes domains like medical reporting, where factual accuracy is paramount.

Exam trap

The trap here is that candidates often choose 'Set the temperature to 0.0' because they confuse reducing randomness with eliminating factual errors, but temperature only controls output variability, not the truthfulness of the model's internal knowledge.

How to eliminate wrong answers

Option A is wrong because RLHF (Reinforcement Learning from Human Feedback) optimizes the model for human preference alignment and helpfulness, but it does not provide a mechanism to retrieve or verify facts from an external source, so it cannot reliably prevent hallucinations in factual domains. Option B is wrong because setting temperature to 0.0 makes the model deterministic (always picking the highest-probability token), but it does not correct factual errors stored in the model's weights; the model can still confidently generate false information. Option D is wrong because prompt engineering instructs the model to be accurate, but it is a soft constraint that the model can easily override; without external grounding, the model has no way to verify its own output against a trusted source.

Practice this question →

← PreviousPage 2 of 2 · 124 questions total

Ready to test yourself?

Try a timed practice session using only Fundamentals of Generative AI questions.

Start 20-question session