Knowledge + Practice

CCNA Business Strategies for Generative AI Solutions Questions

53 of 128 questions · Page 2/2 · Business Strategies for Generative AI Solutions · Answers revealed

Practice these questions Domain overview All questions

76

MCQeasy

A startup wants to quickly prototype a gen AI application. Which Google Cloud service should they use first?

A.Vertex AI Workbench

B.Cloud TPUs

C.Gen AI Studio

D.Dataflow

AnswerC

Provides a low-code environment for quickly testing and iterating on gen AI models.

Why this answer

Gen AI Studio (now part of Vertex AI) provides a low-code/no-code interface for quickly prototyping generative AI applications using pre-trained models like PaLM 2 and Gemini. It allows startups to experiment with prompts, tune models, and deploy without managing infrastructure, making it the fastest path from idea to prototype.

Exam trap

The trap here is that candidates confuse Vertex AI Workbench (a general ML IDE) with Gen AI Studio (a generative AI prototyping tool), or assume that rapid prototyping requires custom hardware like TPUs, when Google explicitly designed Gen AI Studio for this purpose.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench is a Jupyter-based development environment for building custom ML models, not a rapid prototyping tool for generative AI; it requires more setup and coding. Option B is wrong because Cloud TPUs are specialized hardware accelerators for training large models, not a service for quick prototyping—they involve significant configuration and cost. Option D is wrong because Dataflow is a serverless data processing service for batch and stream pipelines (e.g., ETL), unrelated to generative AI application prototyping.

Practice this question →

77

Multi-Selecteasy

Which TWO Google Cloud services can be used together to implement a RAG (retrieval-augmented generation) pipeline? (Select 2)

Select 2 answers

A.Cloud SQL

B.Vertex AI Vector Search

C.Bigtable

D.Vertex AI PaLM API

E.Cloud Functions

AnswersB, D

Provides vector similarity search for retrieval.

Why this answer

Vertex AI Vector Search (option B) is correct because it provides a managed vector database for storing and querying embeddings, which is essential for the retrieval step in a RAG pipeline. It enables semantic similarity search over large datasets, allowing the system to fetch relevant context documents based on a user query.

Exam trap

Google Cloud often tests the misconception that any database (like Cloud SQL or Bigtable) can serve as a vector store for RAG, but they lack native vector indexing and similarity search, making them unsuitable for efficient retrieval at scale.

Practice this question →

78

Multi-Selectmedium

A company is considering whether to use Vertex AI's Generative AI Studio. Which TWO are benefits?

Select 2 answers

A.It is always cheaper than using third-party APIs

B.It integrates seamlessly with Vertex AI Pipelines for MLOps

C.It generates outputs that are always more accurate than custom models

D.It provides built-in tools for prompt engineering and iterative testing

E.It requires no coding or machine learning expertise to use

AnswersB, D

Integration allows automating deployment, monitoring, and retraining.

Why this answer

Option B is correct because Vertex AI Generative AI Studio is designed to work natively with Vertex AI Pipelines, enabling users to incorporate generative models into end-to-end MLOps workflows for automation, monitoring, and retraining. This integration allows seamless orchestration of prompt tuning, model evaluation, and deployment within the same managed environment, reducing operational overhead.

Exam trap

Google Cloud often tests the misconception that 'no-code' tools eliminate the need for any ML expertise, but the trap here is that Generative AI Studio still requires understanding of prompt engineering, model evaluation, and cost trade-offs to avoid poor outputs or unexpected expenses.

Practice this question →

79

MCQmedium

An e-commerce company uses a generative AI model to generate marketing copy. They notice that the model occasionally produces off-brand or inappropriate content. What is the best way to mitigate this?

A.Reduce the model's temperature

B.Increase the model's top-k sampling

C.Use a safety filter

D.Fine-tune the model on brand guidelines

AnswerD

Trains the model to adhere to brand style and content.

Why this answer

Option B is correct because fine-tuning on brand guidelines directly addresses brand-specific content issues. Option A (safety filter) is too broad. Option C (reducing temperature) affects creativity but not brand adherence.

Option D (increasing top-k sampling) increases diversity, not control.

Practice this question →

80

MCQeasy

A healthcare company wants to use Gemini to analyze patient records and summarize findings. Which data privacy practice is most critical when using the Gemini API on Vertex AI?

A.Fine-tune Gemini using PHI to improve accuracy.

B.Disable request-response logging in Vertex AI to ensure data is not stored.

C.Enable Vertex AI Data Governance to mask or redact PII before sending to the API.

D.Use the text-davinci-003 model instead of Gemini, as it is more private.

AnswerC

D is correct because Data Governance can automatically protect sensitive data.

Why this answer

Option C is correct because Vertex AI Data Governance allows you to configure data masking or redaction of personally identifiable information (PII) before the data is sent to the Gemini API, ensuring compliance with healthcare regulations like HIPAA. This is the most critical practice because it prevents PHI from being exposed to the model or stored in logs, directly addressing the core privacy requirement. Disabling logging alone (Option B) does not prevent PHI from being processed by the model, and fine-tuning with PHI (Option A) introduces significant compliance risks.

Exam trap

Google Cloud often tests the misconception that disabling logging is sufficient for data privacy, when in fact the critical step is preventing sensitive data from being sent to the API in the first place, which is achieved through data masking or redaction.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini using PHI would require storing and processing that data in a training pipeline, which violates HIPAA and other data privacy regulations unless strict de-identification and contractual safeguards are in place; it also increases the attack surface for data breaches. Option B is wrong because disabling request-response logging in Vertex AI prevents storage of API interactions but does not prevent PHI from being sent to and processed by the Gemini model itself, leaving the data exposed during inference. Option D is wrong because text-davinci-003 is an OpenAI model, not available on Vertex AI, and it does not inherently offer better privacy controls; the comparison is irrelevant and the premise is false.

Practice this question →

81

Multi-Selectmedium

A healthcare provider is planning to deploy generative AI for clinical note summarization. Which THREE actions are essential for regulatory compliance (e.g., HIPAA)?

Select 3 answers

A.Implement role-based access controls to limit who can view AI-generated notes.

B.Anonymize patient data before using it for model training or inference.

C.Allow clinicians to share AI-generated summaries with anyone in the organization.

D.Store raw patient data in model training logs for auditing.

E.Ensure data encryption at rest and in transit.

AnswersA, B, E

Access controls ensure only authorized users see sensitive data.

Why this answer

Option A is correct because role-based access controls (RBAC) are a core requirement under HIPAA's Security Rule (45 CFR § 164.312(a)(1)) to ensure that only authorized personnel can access electronic protected health information (ePHI). In the context of generative AI for clinical note summarization, RBAC prevents unauthorized viewing of AI-generated summaries that may contain sensitive patient data, thereby enforcing the minimum necessary standard.

Exam trap

Google Cloud often tests the misconception that sharing AI-generated summaries freely within an organization is acceptable under HIPAA, when in fact the minimum necessary rule strictly limits access to only those who need the information for their job functions.

Practice this question →

82

Multi-Selectmedium

Which TWO strategies are effective for reducing latency in a generative AI chat application deployed on Vertex AI? (Select 2)

Select 2 answers

A.Deploy on TPU instead of GPU

B.Use streaming responses

C.Increase the max output tokens

D.Enable model quantization

E.Use larger batch sizes

AnswersB, D

Reduces perceived latency.

Why this answer

Option B is correct because streaming responses reduce perceived latency by sending tokens to the client as they are generated, rather than waiting for the full response. This leverages server-sent events (SSE) or chunked transfer encoding to deliver partial results immediately, improving user experience in chat applications.

Exam trap

Google Cloud often tests the distinction between reducing actual latency (e.g., model optimization) versus reducing perceived latency (e.g., streaming), and candidates mistakenly choose options that increase throughput (like larger batch sizes) without realizing they harm per-request latency.

Practice this question →

83

MCQmedium

A startup is building a generative AI tool that helps users write code. They want to launch quickly but need to ensure the generated code is secure and does not introduce vulnerabilities. They have a small team of developers with some ML experience. The tool should be cloud-hosted. Which approach balances speed, security, and cost?

A.Deploy the tool without any security checks and rely on manual review

B.Train a custom code generation model from scratch on a large dataset

C.Use a pre-trained code model (e.g., Codey) and add a security filtering layer

D.Use a smaller model and restrict outputs to only simple code patterns

AnswerC

Leverages existing model, adds security checks, fast to deploy.

Why this answer

Option B is correct because using a pre-trained code model with a security filtering layer provides a good balance: quick start, built-in safety checks, and manageable cost. Option A (building from scratch) is too slow. Option C (manual review) doesn't scale.

Option D (restricting outputs) may reduce usefulness.

Practice this question →

84

MCQhard

A retailer wants to generate personalized product descriptions using PaLM API. They have concerns about data privacy. What is the best practice to mitigate these concerns?

A.Train a custom model from scratch on proprietary data stored on-premise

B.Use the PaLM API directly with anonymized customer data

C.Encrypt all data in transit and at rest using customer-managed encryption keys

D.Enable data residency and use prompt engineering to avoid including personally identifiable information

AnswerD

Vertex AI allows data to stay in specific regions, and careful prompt design can generate personalized content without exposing raw PII.

Why this answer

Option D is correct because data residency ensures customer data is processed and stored within a specific geographic region, addressing regulatory compliance, while prompt engineering allows the retailer to avoid sending PII to the PaLM API entirely. This combination mitigates privacy risks without requiring custom model training or relying solely on encryption, which does not prevent the API from processing sensitive data.

Exam trap

Google Cloud often tests the misconception that encryption alone (Option C) is sufficient for data privacy, when in fact it does not prevent the API from accessing or processing the data, which is the core concern in this scenario.

How to eliminate wrong answers

Option A is wrong because training a custom model from scratch on proprietary data is cost-prohibitive, requires extensive ML expertise, and does not leverage the PaLM API's pre-trained capabilities, making it an inefficient solution for generating personalized descriptions. Option B is wrong because using the PaLM API directly with anonymized customer data still transmits data to Google's servers, and anonymization may not be irreversible or sufficient to prevent re-identification, violating privacy policies. Option C is wrong because encrypting data in transit (e.g., TLS 1.3) and at rest (e.g., AES-256) protects against unauthorized access but does not prevent the PaLM API from processing the data, meaning the retailer's privacy concerns about data exposure to the API remain unaddressed.

Practice this question →

85

MCQhard

A healthcare startup is exploring GenAI for clinical note summarization. They have concerns about patient data privacy. Which Google Cloud approach best addresses privacy while still using powerful models?

A.Deploy open-source models on-premises

B.Use a third-party API with anonymization of patient data

C.Use Vertex AI with model customization (fine-tuning)

D.Use Vertex AI with data residency controls and no external data sharing

AnswerD

Vertex AI offers regional endpoints and commitments to not use customer data for training, addressing privacy while providing powerful models.

Why this answer

Vertex AI with data residency controls and no external data sharing ensures that patient data remains within specified geographic boundaries and is not used for model training or improvement, directly addressing healthcare privacy regulations like HIPAA. This approach leverages Google Cloud's powerful models while maintaining strict data governance, unlike options that risk data exposure or lack enterprise-grade controls.

Exam trap

The trap here is that candidates often assume fine-tuning (Option C) inherently provides privacy, but without explicit data residency and no-sharing policies, it fails to meet strict healthcare compliance requirements.

How to eliminate wrong answers

Option A is wrong because deploying open-source models on-premises, while offering data control, often lacks the advanced summarization capabilities and scalability of Vertex AI's foundation models, and still requires significant effort to ensure HIPAA compliance without Google's built-in privacy safeguards. Option B is wrong because using a third-party API, even with anonymization, introduces risks of data leakage or re-identification, and typically does not provide contractual guarantees against model training on patient data, violating many healthcare privacy policies. Option C is wrong because fine-tuning a model on Vertex AI without explicit data residency controls and no external data sharing may still allow Google to process data outside desired regions or use it for service improvements, failing to meet strict data privacy requirements.

Practice this question →

86

Multi-Selecthard

A financial institution is deploying a generative AI solution that generates investment advice. They must ensure fairness, avoid toxic outputs, and comply with regulations like GDPR. Which TWO strategies should they implement? (Choose two.)

Select 2 answers

A.Use Vertex AI Safety Attributes to filter harmful content in both input and output.

B.Set the model temperature to 0 to eliminate creativity and reduce bias.

C.Implement a human review process for any advice above a certain risk threshold.

D.Fine-tune the model exclusively on compliant financial documents.

E.Disable request logging to avoid storing sensitive data.

AnswersA, C

B is correct because it proactively blocks toxic content.

Why this answer

Options B and D are correct because using safety attributes to filter harm and implementing a human-in-the-loop for high-risk outputs are direct measures. Option A is wrong because disabling logging is against compliance. Option C is wrong because training only on compliant data is insufficient for every scenario.

Option E is wrong because decreasing temperature does not guarantee fairness.

Practice this question →

87

MCQmedium

Refer to the exhibit. A data scientist runs this command to upload a custom model to Vertex AI. What is the primary purpose of the --container-image-uri flag?

A.To indicate the model artifact location

B.To set the training container

C.To specify the base image for model serving

D.To define the prediction container

AnswerC

Defines the serving environment for predictions.

Why this answer

The --container-image-uri flag in the `gcloud ai models upload` command specifies the custom container image that Vertex AI will use to serve predictions. This is the base image for model serving, not for training, because Vertex AI uses this image to create the serving environment that hosts the model and handles prediction requests.

Exam trap

The trap here is that candidates confuse the --container-image-uri flag with the training container (Option B) because both involve custom containers, but Vertex AI separates training and serving containers, and this flag is exclusively for serving.

How to eliminate wrong answers

Option A is wrong because the model artifact location is specified via the --artifact-uri flag, not --container-image-uri. Option B is wrong because the training container is set during model training (e.g., via `gcloud ai custom-jobs`), not during model upload; --container-image-uri is for serving. Option D is wrong because while the flag does define the container used for predictions, the correct technical term in Vertex AI is 'serving container' or 'prediction container' is a misnomer; the flag sets the base image for the serving container, not the prediction container itself (which is built from this base image).

Practice this question →

88

MCQhard

A company deployed a large language model on Vertex AI using the configuration shown in the exhibit. During peak usage, users report high latency. Which change is most likely to improve latency?

A.Remove the accelerator to simplify deployment.

B.Increase minReplicaCount to 3.

C.Switch to a GPU with more memory, such as NVIDIA_TESLA_A100.

D.Change machineType to n1-standard-4 to reduce cost.

AnswerB

More replicas ready at all times reduces cold-start and scaling latency.

Why this answer

Option A is correct because increasing the minimum number of replicas ensures that more instances are ready to serve traffic during bursts, reducing the time spent scaling up. Option B is wrong because the GPU (T4) is already suitable for inference; upgrading may not address the core latency issue. Option C is wrong because switching to a less powerful machine type (n1-standard-4) would likely increase latency.

Option D is wrong because removing the accelerator would significantly degrade performance for a large model.

Practice this question →

89

MCQeasy

A retail company with a large FAQ database wants to build a generative AI customer service chatbot that can answer questions accurately with up-to-date information. Which business strategy should they prioritize?

A.Use retrieval-augmented generation (RAG) with vector search on the FAQ database.

B.Train a new model from scratch using the FAQ data.

C.Fine-tune a foundational model on the entire FAQ dataset.

D.Use a general-purpose language model without any customization.

AnswerA

RAG retrieves current, relevant information from the database, providing accurate and fresh responses without model retraining.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) with vector search allows the chatbot to dynamically retrieve the most relevant, up-to-date FAQ entries from a large database at inference time, grounding the generative model's responses in verified content without requiring retraining. This approach combines the flexibility of a pre-trained language model with the accuracy of real-time information retrieval, ensuring answers reflect the latest FAQ updates.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the best way to inject domain knowledge, but the trap here is that fine-tuning cannot efficiently handle frequently changing data, whereas RAG provides a modular, update-friendly architecture that avoids retraining costs.

How to eliminate wrong answers

Option B is wrong because training a new model from scratch on FAQ data is computationally prohibitive, requires massive datasets and resources, and still cannot guarantee up-to-date answers without frequent retraining. Option C is wrong because fine-tuning a foundational model on the entire FAQ dataset risks catastrophic forgetting of general language capabilities and does not inherently handle dynamic updates; any FAQ change would require re-fine-tuning. Option D is wrong because a general-purpose language model without customization lacks domain-specific knowledge and cannot access the company's proprietary FAQ database, leading to hallucinated or outdated answers.

Practice this question →

90

MCQeasy

A startup with limited budget wants to quickly test a generative AI use case for personalized email marketing. Which approach minimizes time-to-market and cost?

A.Hire a team of AI researchers to build a solution.

B.Develop a custom model from scratch.

C.Fine-tune a large open-source model on internal data.

D.Use a managed API like the PaLM API with prompt engineering.

AnswerD

Quick to implement, pay-per-use, no infrastructure management.

Why this answer

Option D is correct because using a managed API like the PaLM API with prompt engineering eliminates the need for infrastructure setup, model training, and data preparation. This approach leverages a pre-trained model via a simple REST API call, allowing the startup to iterate on prompts and achieve personalized email content in hours rather than weeks, minimizing both time-to-market and cost.

Exam trap

Google Cloud often tests the misconception that fine-tuning (Option C) is always the fastest and cheapest path for customization, but the trap here is that fine-tuning still requires significant compute and data preparation, whereas prompt engineering on a managed API is truly zero-infrastructure and pay-per-use, making it the optimal choice for a quick, low-cost test.

How to eliminate wrong answers

Option A is wrong because hiring a team of AI researchers is expensive and time-consuming, requiring salaries, compute resources, and months of development, which contradicts the limited budget and quick testing goal. Option B is wrong because developing a custom model from scratch demands vast amounts of labeled data, significant GPU/TPU compute, and deep expertise, making it cost-prohibitive and slow for a rapid proof-of-concept. Option C is wrong because fine-tuning a large open-source model still requires substantial compute for training (e.g., GPU hours for LoRA or full fine-tuning), data curation, and deployment overhead, which exceeds the minimal cost and speed constraints of a quick test.

Practice this question →

91

MCQmedium

A healthcare company wants to use generative AI to summarize patient records but must comply with HIPAA. Which deployment option should they choose?

A.Use Vertex AI on Google Cloud with data residency

B.Use Google Workspace AI

C.Use an on-premises deployment of open-source model

D.Use a third-party API

AnswerC

Full control over data and compliance.

Why this answer

Option C is correct because an on-premises deployment of an open-source model ensures that all patient data remains within the organization's controlled infrastructure, never leaving the local network. This eliminates any risk of data transmission to external cloud services, which is critical for HIPAA compliance where protected health information (PHI) must be safeguarded against unauthorized access or breaches. On-premises solutions allow the organization to implement its own security controls, encryption, and audit trails without relying on a third-party's compliance posture.

Exam trap

The trap here is that candidates assume cloud providers like Google Cloud or AWS are automatically HIPAA-compliant with data residency, but they overlook the shared responsibility model and the need for a BAA, which still exposes data to the provider's infrastructure and potential third-party risks, making on-premises the only option that guarantees full data control.

How to eliminate wrong answers

Option A is wrong because Vertex AI on Google Cloud, even with data residency, still involves data processing on Google's infrastructure, which requires a Business Associate Agreement (BAA) and may not satisfy all HIPAA requirements if the organization cannot fully control data access or auditing. Option B is wrong because Google Workspace AI is a SaaS offering that processes data on Google's servers, and while it can be HIPAA-compliant with a BAA, it introduces shared responsibility and potential data exposure risks that an on-premises solution avoids. Option D is wrong because using a third-party API means sending PHI to an external service, which requires the third-party to be HIPAA-compliant and sign a BAA, but it still exposes data to network transmission and external processing, increasing the attack surface and compliance burden.

Practice this question →

92

MCQhard

A global bank wants to deploy a generative AI assistant for employees across multiple European countries, each with strict data residency laws. Which deployment strategy is most compliant?

A.Deploy separate model instances in each country's cloud region.

B.Use a federated learning approach where data stays on-premises.

C.Deploy a single model in a US region and use data masking.

D.Use a third-party API that processes data outside Europe.

AnswerA

Ensures data never leaves the country, meeting local compliance requirements.

Why this answer

Option A is correct because deploying separate model instances in each country's cloud region ensures that data never crosses national borders, directly complying with strict data residency laws like the GDPR's data localization requirements. This strategy uses regional cloud infrastructure (e.g., AWS eu-central-1, Azure westeurope) to keep both training and inference data within the specific jurisdiction, avoiding any cross-border data transfer.

Exam trap

Google Cloud often tests the misconception that data masking or anonymization alone satisfies data residency laws, but the trap here is that data residency requires the data to physically remain within the jurisdiction, not just be obfuscated.

How to eliminate wrong answers

Option B is wrong because federated learning only keeps training data on-premises, but the model parameters or gradients must still be exchanged with a central server, which can violate data residency if that server is outside the country. Option C is wrong because deploying a single model in a US region and using data masking does not prevent the underlying data from being processed or stored in the US, which violates EU data residency laws like GDPR. Option D is wrong because using a third-party API that processes data outside Europe directly violates data residency requirements, as the data physically leaves the European Economic Area (EEA) without adequate safeguards.

Practice this question →

93

MCQeasy

A business wants to build a generative AI application but has limited data science resources. What is the recommended path?

A.Use Vertex AI's AutoML and pre-built APIs to accelerate development

B.Hire a team of ML engineers to develop an in-house solution

C.Purchase a third-party generative AI SaaS product off-the-shelf

D.Build a custom model from scratch using TensorFlow

AnswerA

AutoML abstracts away model building complexity, and APIs provide ready-to-use functionality.

Why this answer

Option C is correct because Vertex AI's AutoML and pre-built APIs lower the barrier to entry for teams without deep machine learning expertise. Option A (building from scratch) requires extensive expertise. Option B (hiring a team) is costly and time-consuming.

Option D (buying a SaaS product) may not offer customization.

Practice this question →

94

MCQmedium

Refer to the exhibit. A sudden surge of traffic reaches 15,000 requests per second, but the endpoint can only handle 1,000 req/s per replica. What will happen to new requests?

A.They will be processed, and replicas will exceed maxReplicaCount.

B.They will be redirected to a different model.

C.They will receive HTTP 429 (Too Many Requests) errors.

D.They will be queued until capacity becomes available.

AnswerC

Once max replicas are reached, new requests get a 429 status code.

Why this answer

Option C is correct because when a surge of 15,000 requests per second hits an endpoint configured with a maxReplicaCount (e.g., 10 replicas at 1,000 req/s each = 10,000 req/s capacity), any excess requests beyond that capacity are rejected with an HTTP 429 (Too Many Requests) status code. This is standard behavior in autoscaling systems: once the replica count reaches its maximum limit, the service cannot scale further, and new requests are throttled to prevent overload.

Exam trap

The trap here is that candidates assume autoscaling can handle any traffic surge indefinitely, ignoring the hard limit of maxReplicaCount, and thus incorrectly choose Option A or D, failing to recognize that HTTP 429 is the standard throttling mechanism when capacity is exhausted.

How to eliminate wrong answers

Option A is wrong because the maxReplicaCount is a hard upper limit; replicas cannot exceed this configured value, so new requests are not processed beyond that capacity. Option B is wrong because traffic redirection to a different model is not a standard behavior for capacity overflow; it would require explicit routing rules or a load balancer configured for failover, which is not implied in the scenario. Option D is wrong because queuing is not the default behavior for HTTP-based endpoints in this context; while some systems support request queuing (e.g., with message brokers), the exhibit describes a direct endpoint handling, and HTTP 429 is the standard response for rate limiting per RFC 6585.

Practice this question →

95

MCQhard

Refer to the exhibit. A developer receives this error when trying to call a model for prediction. What is the most likely cause?

A.The project has exceeded its prediction quota.

B.The developer's service account lacks the required IAM role.

C.The model version has been deprecated.

D.The model is not deployed on an endpoint.

AnswerB

The 403 error is a standard permission denied response from IAM.

Why this answer

The error when calling a model for prediction most likely stems from the developer's service account lacking the required IAM role. In Google Cloud AI Platform, the 'aiplatform.user' or 'aiplatform.predictor' role is necessary to invoke prediction endpoints; without it, the API returns a permission-denied error. This is a common misconfiguration when service accounts are created without explicit roles attached.

Exam trap

Google Cloud often tests the misconception that quota limits are the default cause of prediction errors, but the trap here is that permission-denied errors are more frequently due to missing IAM roles rather than quota exhaustion, especially in multi-service-account environments.

How to eliminate wrong answers

Option A is wrong because exceeding the prediction quota would return a '429 RESOURCE_EXHAUSTED' or 'Quota exceeded' error, not a generic permission-denied error. Option C is wrong because a deprecated model version would still be accessible for predictions until it is deleted, and the error would typically indicate 'Model version not found' rather than an authorization failure. Option D is wrong because if the model is not deployed on an endpoint, the error would be 'Model not deployed' or 'Endpoint not found', not a permission error.

Practice this question →

96

MCQhard

A company is using generative AI for code generation and wants to evaluate the quality of generated code for security vulnerabilities. Which metric is most appropriate?

A.BLEU score

B.Automatic static analysis

C.Human evaluation

D.Perplexity

AnswerB

Scans for common vulnerabilities efficiently.

Why this answer

Option C is correct because automatic static analysis can scan code for security issues efficiently. Option A (BLEU score) measures text similarity, not security. Option B (human evaluation) is subjective and expensive.

Option D (perplexity) measures language model confidence, not code security.

Practice this question →

97

Multi-Selecteasy

A company is adopting generative AI for customer support. Which TWO strategies should they implement to manage risks related to brand reputation?

Select 2 answers

A.Establish a human-in-the-loop escalation process for sensitive interactions.

B.Publish a disclaimer that the AI may make mistakes.

C.Implement automated monitoring for toxic or off-brand language.

D.Deploy the model without any content filters to maximize helpfulness.

E.Disable customer support AI entirely to avoid any risk.

AnswersA, C

Human oversight ensures appropriate handling of sensitive issues.

Why this answer

Option A is correct because a human-in-the-loop escalation process ensures that sensitive or ambiguous customer interactions are reviewed by a human agent before an AI-generated response is sent. This directly mitigates brand reputation risk by preventing the AI from inadvertently making offensive, legally problematic, or factually incorrect statements that could go viral. The human reviewer acts as a safety net, catching edge cases that automated filters might miss, such as nuanced sarcasm or cultural insensitivity.

Exam trap

Google Cloud often tests the distinction between passive risk communication (like disclaimers) and active risk mitigation (like human-in-the-loop or automated monitoring), trapping candidates who think a disclaimer is sufficient to manage brand reputation risk.

Practice this question →

98

Multi-Selecteasy

A company is using Vertex AI generative models for a high-volume text summarization service. Which two strategies can reduce operational costs?

Select 2 answers

A.Increase the model's max output tokens to 2048.

B.Implement retry logic with exponential backoff.

C.Lower the temperature parameter to 0.

D.Use batch prediction instead of online prediction.

E.Reduce the size of the model (e.g., switch from text-bison@002 to text-bison-light).

AnswersD, E

Batch prediction has lower per-request cost for large jobs compared to online prediction.

Why this answer

Batch prediction reduces costs by processing multiple requests in a single batch job, which avoids the per-request overhead and idle compute time associated with online prediction. This is especially cost-effective for high-volume, non-real-time workloads like text summarization, as you pay only for the compute time used during the batch job rather than for each individual inference.

Exam trap

Google Cloud often tests the misconception that adjusting inference parameters like temperature or output length can reduce costs, when in reality only reducing model size or switching to batch processing directly lowers operational expenses.

Practice this question →

99

MCQeasy

A retail company wants to deploy a generative AI chatbot to assist customers with product recommendations. The chatbot must align with the company's brand voice and provide accurate, up-to-date information. Which strategy should the company prioritize when developing this solution?

A.Ground the model with proprietary product data and brand guidelines in a retrieval-augmented generation (RAG) architecture.

B.Use a generic pre-trained model without customization to reduce development time.

C.Deploy a large language model with a feedback loop to iteratively improve responses.

D.Train the model on public customer reviews to capture common preferences.

AnswerA

RAG with curated data ensures responses are accurate, up-to-date, and on-brand.

Why this answer

Option A is correct because retrieval-augmented generation (RAG) allows the chatbot to ground its responses in the company's proprietary product data and brand guidelines, ensuring factual accuracy and brand consistency. By retrieving relevant information from a curated knowledge base at inference time, the model can provide up-to-date recommendations without requiring retraining, which is critical for a retail environment with frequently changing inventory.

Exam trap

Google Cloud often tests the distinction between fine-tuning and RAG, where candidates mistakenly believe that fine-tuning on historical data is sufficient for real-time accuracy, but the trap here is that only RAG can provide up-to-date grounding without retraining.

How to eliminate wrong answers

Option B is wrong because using a generic pre-trained model without customization will produce responses that lack the company's specific brand voice and may hallucinate product details, leading to inaccurate recommendations. Option C is wrong because deploying a large language model with only a feedback loop does not address the need for accurate, up-to-date information; feedback loops improve responses over time but do not ground the model in proprietary data, so initial outputs can still be incorrect. Option D is wrong because training on public customer reviews introduces noise, bias, and outdated opinions, and does not align with the company's brand guidelines or provide accurate product information.

Practice this question →

100

Multi-Selecthard

Which THREE factors should be considered when choosing between a fine-tuned model and a prompted foundation model for a generative AI solution? (Select 3)

Select 3 answers

A.Need for domain-specific vocabulary

B.Inference latency requirements

C.Size of training data available

D.Whether the model is open-source

E.Token cost per request

AnswersA, C, E

Fine-tuning can incorporate domain language.

Why this answer

Option A is correct because fine-tuning allows the model to learn domain-specific vocabulary and terminology that may not be well-represented in the foundation model's pre-training data. This is critical for specialized fields like legal, medical, or technical domains where precise language is required for accurate outputs.

Exam trap

Google Cloud often tests the misconception that inference latency is a deciding factor between fine-tuning and prompting, when in reality both can be optimized for speed, and the key differentiators are data availability, domain specificity, and cost per token.

Practice this question →

101

MCQeasy

A large e-commerce company is experiencing high costs for their generative AI product recommendation system. The system generates personalized product descriptions for millions of users daily. The team wants to reduce cost while maintaining quality. They are using a fine-tuned version of a large foundation model hosted on Vertex AI. The current cost is driven by the number of tokens processed. Which approach should they take?

A.Optimize prompts to generate shorter, more concise descriptions

B.Switch to a larger, more capable foundation model

C.Retrain the model with more product data to improve efficiency

D.Increase the batch size of inference requests

AnswerA

Shorter outputs use fewer tokens, reducing cost.

Why this answer

Option A is correct because prompt engineering to reduce output length decreases token usage per request, directly lowering cost without model changes. Option B (switching to a larger model) increases cost. Option C (increasing batch size) may not reduce per-request cost.

Option D (retraining with more data) does not affect inference cost.

Practice this question →

102

MCQmedium

A company deployed a generative AI chatbot using Vertex AI PaLM API for customer support. Users report high latency (average 5 seconds per response). They need to reduce latency without significantly affecting response quality. Which design change should they prioritize?

A.Apply model quantization to the deployed model

B.Migrate the chatbot to run on edge devices

C.Increase the batch size of inference requests

D.Switch to a larger, more powerful foundation model

AnswerA

Quantization reduces model size and speeds inference with minor accuracy trade-offs.

Why this answer

Model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which decreases the computational load and memory footprint during inference. This directly lowers latency per request on the Vertex AI PaLM API while preserving most of the model's accuracy, making it the most effective single change for reducing response time without significantly degrading quality.

Exam trap

Google Cloud often tests the misconception that increasing computational power (larger model) or batching always improves latency, when in fact these changes can increase per-request delay or degrade quality in interactive applications.

How to eliminate wrong answers

Option B is wrong because migrating to edge devices introduces network latency and limited compute resources, which often increases overall latency and reduces response quality for a cloud-based PaLM API chatbot. Option C is wrong because increasing batch size improves throughput for bulk processing but does not reduce per-request latency; in fact, it can increase the time to first token for individual requests. Option D is wrong because switching to a larger, more powerful foundation model increases computational requirements and inference time, directly worsening latency rather than reducing it.

Practice this question →

103

Multi-Selecteasy

A company is choosing a generative AI model for code generation. Which TWO considerations are most important?

Select 2 answers

A.The total number of model parameters

B.Whether the model's training data includes the target programming languages

C.The open-source license of the model

D.The maximum context length supported by the model

E.The latency of the model's inference endpoint

AnswersB, D

If the model hasn't seen the language, it will generate poor code.

Why this answer

Option B is correct because a generative AI model for code generation must have been trained on the target programming languages to produce syntactically and semantically correct code. Without such training data, the model cannot understand language-specific syntax, libraries, or idioms, leading to irrelevant or erroneous outputs.

Exam trap

The trap here is that candidates often assume more parameters (A) or lower latency (E) are always better, but Cisco tests the understanding that domain-specific training data relevance (B) and context length (D) are critical for code generation accuracy and handling long code sequences.

Practice this question →

104

MCQhard

A large insurance company is using generative AI to automate claims processing. They have deployed a custom fine-tuned model on Vertex AI that reads claim documents and extracts key information. Recently, they noticed that the model’s performance degrades over time for certain claim types, leading to incorrect payouts. The team needs to detect and address model drift with minimal manual intervention. They have a data pipeline that captures incoming claims and user feedback on predictions. Which approach should they take?

A.Implement a human review process for all claims the model processes

B.Set up continuous evaluation with automated retraining pipelines based on performance metrics

C.Switch to a simpler rule-based system to avoid drift

D.Manually retrain the model monthly using a snapshot of recent claims

AnswerB

Automates drift detection and model updates with minimal manual intervention.

Why this answer

Option B is correct because it establishes a closed-loop MLOps pipeline where continuous evaluation of performance metrics (e.g., precision, recall, or F1-score on streaming data) triggers automated retraining when drift is detected. This minimizes manual intervention while ensuring the model adapts to distribution shifts in claim types, which is critical for maintaining accurate payouts in production.

Exam trap

Google Cloud often tests the misconception that periodic manual retraining (Option D) is sufficient, but the trap here is that it ignores the need for real-time drift detection and automated response, which is essential for production systems handling high-stakes financial decisions.

How to eliminate wrong answers

Option A is wrong because implementing human review for all claims defeats the purpose of automation and introduces significant operational cost and latency, failing the requirement for minimal manual intervention. Option C is wrong because switching to a simpler rule-based system cannot handle the complexity and variability of claim documents, and it will still suffer from drift as claim patterns evolve over time. Option D is wrong because manually retraining monthly on a snapshot ignores real-time drift detection and may miss sudden shifts between retraining cycles, leading to prolonged periods of degraded performance.

Practice this question →

105

MCQmedium

A media company uses generative AI to produce personalized news summaries for subscribers. They notice that the summaries sometimes contain factual inaccuracies, leading to customer complaints. The team needs to improve accuracy without slowing down the generation speed. They are using a pre-trained model via Vertex AI. What strategy should they implement?

A.Switch to a larger, more accurate foundation model

B.Fine-tune the model on a dataset of verified news articles

C.Implement retrieval-augmented generation (RAG) with a trusted knowledge base

D.Add a human-in-the-loop review for every summary

AnswerC

RAG provides factual grounding without sacrificing speed.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) grounds the model's output in a trusted, external knowledge base, allowing it to retrieve verified facts in real time without retraining. This directly addresses factual inaccuracies while maintaining generation speed, as the pre-trained model remains unchanged and only the retrieval step is added. RAG avoids the latency of human review and the computational cost of fine-tuning or switching models.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the default solution for accuracy issues, but the trap here is that RAG provides a faster, more scalable way to ground outputs in verified data without retraining, which is critical when speed and accuracy must both be maintained.

How to eliminate wrong answers

Option A is wrong because switching to a larger foundation model would increase inference latency and computational cost, contradicting the requirement to not slow down generation speed, and it does not guarantee improved factual accuracy without additional grounding. Option B is wrong because fine-tuning on a dataset of verified news articles requires significant time, data, and compute resources, and it may not prevent hallucinations on unseen topics, while also risking catastrophic forgetting of the model's general capabilities. Option D is wrong because adding a human-in-the-loop review for every summary introduces unacceptable latency and operational overhead, making it impractical for real-time personalized news generation at scale.

Practice this question →

106

MCQhard

A large enterprise wants to deploy multiple generative AI models across different business units while ensuring cost governance and usage tracking. Which Google Cloud solution is best suited?

A.Use Vertex AI Endpoint with monitoring

B.Deploy each model in a separate project with IAM policies

C.Implement a custom cost allocation using labels

D.Use Cloud Billing budgets and alerts per model

AnswerB

Projects isolate resources and costs per business unit.

Why this answer

Option B is correct because deploying each model in a separate Google Cloud project with IAM policies provides the strongest isolation for cost governance and usage tracking. This approach ensures that each business unit's model usage is billed to its own project, enabling granular cost allocation and independent monitoring without cross-project interference. It also allows per-project budget alerts and usage quotas, directly addressing the enterprise's need for decentralized cost control.

Exam trap

The trap here is that candidates often confuse cost allocation mechanisms (like labels or budgets) with true cost isolation, assuming that tagging or alerting alone can enforce per-business-unit governance without the structural separation that separate projects provide.

How to eliminate wrong answers

Option A is wrong because Vertex AI Endpoint with monitoring tracks model performance and latency but does not inherently isolate costs per business unit; it aggregates usage under a single project, making per-unit cost governance difficult. Option C is wrong because custom cost allocation using labels requires manual tagging and can be inconsistent or incomplete, leading to inaccurate cost tracking; labels are metadata, not a billing boundary. Option D is wrong because Cloud Billing budgets and alerts per model are not natively supported; budgets apply at the project or billing account level, not per model, and cannot enforce cost isolation across multiple business units.

Practice this question →

107

Multi-Selectmedium

A business leader is developing a gen AI strategy. Which three key components should be included in the strategy?

Select 3 answers

A.Focus solely on technology

B.Plan for responsible AI

C.Establish data governance policies

D.Define clear use cases with ROI

E.Involve stakeholders across departments

AnswersB, C, D

Responsible AI addresses fairness, transparency, and accountability.

Why this answer

Option B is correct because responsible AI is a foundational component of any generative AI strategy, ensuring ethical use, bias mitigation, and compliance with emerging regulations. Without a plan for responsible AI, the organization risks reputational damage, legal liability, and deployment failures due to lack of trust. This goes beyond simple fairness checklists to include continuous monitoring of model outputs for toxicity, hallucination, and privacy violations.

Exam trap

Google Cloud often tests the misconception that stakeholder involvement is a core strategic component, when in fact it is an implementation enabler, while responsible AI, data governance, and defined use cases with ROI are the three pillars that form the strategy itself.

Practice this question →

108

MCQeasy

A retail company plans to use Vertex AI's generative AI to create product descriptions. They need to ensure descriptions are factually accurate and do not misrepresent products. Which strategy should they prioritize?

A.Implement human-in-the-loop review

B.Use prompt engineering

C.Use a larger model

D.Increase temperature parameter

AnswerA

Humans can verify and correct factual errors.

Why this answer

Human-in-the-loop (HITL) review is the correct strategy because it directly addresses the need for factual accuracy and prevention of misrepresentation. While generative AI can produce fluent text, it lacks a reliable grounding mechanism for product-specific facts, making human oversight essential to catch hallucinations, verify claims, and ensure compliance with advertising standards. This approach aligns with responsible AI practices and is a core recommendation for high-stakes content generation.

Exam trap

Google Cloud often tests the misconception that prompt engineering or model size alone can solve factual accuracy issues, when in reality, generative AI's inherent lack of ground truth makes human validation indispensable for high-stakes content.

How to eliminate wrong answers

Option B is wrong because prompt engineering, while useful for guiding output style and structure, does not guarantee factual accuracy; it cannot prevent the model from generating plausible-sounding but incorrect product details. Option C is wrong because using a larger model may improve fluency and reduce some errors, but it does not eliminate hallucinations or misrepresentations, and can even introduce more subtle inaccuracies. Option D is wrong because increasing the temperature parameter makes the model's output more random and creative, which increases the risk of generating factually incorrect or misleading descriptions, the opposite of what is needed.

Practice this question →

109

MCQeasy

A retail company wants to use gen AI for customer service chatbots. They have a large volume of customer interactions. What is the primary business consideration for deploying a gen AI solution?

A.Minimizing latency at any cost

B.Using open-source models only

C.Choosing the most complex model

D.Ensuring data privacy and compliance

AnswerD

Data privacy and regulatory compliance are top business considerations for handling customer data.

Why this answer

Option D is correct because ensuring data privacy and compliance is critical when handling customer data. Option A is wrong because complexity doesn't guarantee success. Option B is wrong because minimizing latency at any cost can be too expensive.

Option C is wrong because open-source models may not meet all requirements.

Practice this question →

110

MCQhard

A global financial services firm wants to deploy generative AI for personalized investment recommendations. They must comply with regulations in multiple jurisdictions, including GDPR and the SEC's Marketing Rule. The solution must also be auditable. Which approach best balances regulatory compliance, scalability, and cost?

A.Build a centralized model in a cloud region with the most stringent regulations and apply it globally.

B.Use a single global model with a unified compliance layer applied post-generation.

C.Deploy separate, jurisdiction-specific models with tailored guardrails and audit trails for each region.

D.Rely on a third-party API with built-in compliance for all regions.

AnswerC

This ensures compliance with local regulations and provides auditable logs.

Why this answer

Option C is correct because deploying separate, jurisdiction-specific models allows each model to be trained and governed with guardrails and audit trails that directly map to local regulations like GDPR (data minimization, right to erasure) and the SEC Marketing Rule (fair, clear, and not misleading disclosures). This approach avoids the compliance conflicts that arise when a single model must satisfy contradictory requirements across regions, and it scales cost-effectively by only applying the necessary compliance overhead to each region's data and inference pipeline.

Exam trap

Google Cloud often tests the misconception that a single global model with a post-generation compliance layer is sufficient, but the trap is that post-generation filtering cannot undo model outputs that already violate local regulations, and it fails to provide the granular audit trails required for each jurisdiction's specific rules.

How to eliminate wrong answers

Option A is wrong because building a centralized model in the most stringent region and applying it globally would force all jurisdictions to comply with that region's rules, potentially violating local laws (e.g., GDPR's data localization requirements) and increasing latency and cost for regions with less strict regulations. Option B is wrong because a single global model with a unified compliance layer applied post-generation cannot retroactively fix model outputs that violate jurisdiction-specific rules (e.g., SEC Marketing Rule's prohibition of misleading statements), and it creates an audit trail that is difficult to map to individual regulatory frameworks. Option D is wrong because relying on a third-party API with built-in compliance for all regions assumes a one-size-fits-all solution that rarely exists; third-party APIs often lack granular control over jurisdiction-specific guardrails and audit logging, and they introduce vendor lock-in and data sovereignty risks.

Practice this question →

111

MCQmedium

A company wants to deploy a generative AI chatbot for customer service but is concerned about cost unpredictability due to variable usage. Which pricing model should they choose to best manage costs?

A.Committed use discounts

B.Free tier

C.Pay-as-you-go

D.Provisioned throughput

AnswerD

Provides fixed capacity with predictable monthly cost.

Why this answer

Option C is correct because provisioned throughput provides fixed capacity with predictable monthly cost, ideal for managing cost uncertainty. Option A (pay-as-you-go) is variable. Option B (committed use discounts) requires commitment but still variable if usage exceeds.

Option D (free tier) is too limited.

Practice this question →

112

Multi-Selecteasy

Which TWO are key business considerations when adopting generative AI solutions?

Select 2 answers

A.Training duration on public datasets

B.Number of model parameters

C.Data privacy and compliance requirements

D.Model accuracy on benchmarks

E.Cost of inference per request

AnswersC, E

Privacy and compliance are critical business and legal considerations.

Why this answer

Data privacy and compliance requirements (Option C) are a key business consideration because generative AI models often process sensitive or proprietary data, and regulations like GDPR, HIPAA, or CCPA mandate strict controls on data handling, storage, and model training. Failure to address these can result in legal penalties, reputational damage, and loss of customer trust, making it a top priority for enterprise adoption.

Exam trap

Google Cloud often tests the distinction between technical metrics (like training duration, parameter count, and benchmark accuracy) and true business considerations (like compliance, cost, and scalability), leading candidates to confuse model performance indicators with strategic business drivers.

Practice this question →

113

MCQeasy

A retail company wants to integrate generative AI into its customer service chatbot to handle routine inquiries. They have a limited budget and want to launch quickly. Which strategy is most appropriate?

A.Partner with a generative AI vendor for a custom solution

B.Use pre-trained models via Google Cloud's Generative AI Studio API

C.Fine-tune an open-source model on their customer service logs

D.Build a custom LLM from scratch using the company's own data

AnswerB

Using pre-trained models via API is cost-effective and fast to implement.

Why this answer

Option B is correct because using pre-trained models via Google Cloud's Generative AI Studio API allows the company to leverage existing, powerful models without the high cost and time investment of custom development or fine-tuning. This approach enables rapid deployment on a limited budget by simply integrating the API into their chatbot, handling routine inquiries effectively without requiring extensive machine learning expertise or infrastructure.

Exam trap

Google Cloud often tests the misconception that fine-tuning or custom models are always better for domain-specific tasks, but the trap here is that for routine inquiries with limited budget and time, pre-trained APIs offer the fastest and most cost-effective solution without sacrificing quality.

How to eliminate wrong answers

Option A is wrong because partnering with a generative AI vendor for a custom solution typically involves significant upfront costs, long development cycles, and vendor lock-in, which contradicts the company's limited budget and need for quick launch. Option C is wrong because fine-tuning an open-source model on customer service logs requires substantial computational resources, data preparation, and machine learning expertise, making it slower and more expensive than using a pre-trained API. Option D is wrong because building a custom LLM from scratch is extremely resource-intensive, requiring massive datasets, specialized hardware, and months of training, which is impractical for a company with limited budget and a need for speed.

Practice this question →

114

MCQeasy

An e-commerce company is using a generative AI model to recommend products. They notice that the recommendations are often irrelevant. What is the most likely cause?

A.Using an outdated model version

B.Incorrect regional endpoint configuration

C.Inadequate prompt engineering

D.Overfitting on training data

AnswerC

The model's output quality heavily depends on the prompt; poor prompts lead to irrelevant responses.

Why this answer

Inadequate prompt engineering is the most likely cause because generative AI models rely heavily on the quality and specificity of the input prompt to produce relevant outputs. If the prompts used to generate product recommendations are vague, poorly structured, or lack context (e.g., not including user preferences or historical behavior), the model will return generic or irrelevant suggestions. This is a common failure point in recommendation systems where the prompt acts as the primary interface for steering model behavior.

Exam trap

Google Cloud often tests the misconception that model performance issues are always due to training data or model version problems, when in fact prompt engineering is the most immediate and common cause of output irrelevance in generative AI systems.

How to eliminate wrong answers

Option A is wrong because using an outdated model version may affect performance or feature availability, but it does not directly cause irrelevant recommendations; the model would still generate outputs consistent with its training, and relevance is more tied to prompt quality. Option B is wrong because incorrect regional endpoint configuration would cause connectivity or latency issues (e.g., API timeouts or routing errors), not irrelevant content generation; the model's output relevance is independent of the endpoint's geographic location. Option D is wrong because overfitting on training data would cause the model to memorize specific patterns and perform poorly on new or diverse inputs, but in a recommendation context, overfitting typically leads to overly narrow or repetitive suggestions, not broadly irrelevant ones; the primary issue with irrelevant outputs is prompt misalignment, not training data memorization.

Practice this question →

115

MCQeasy

A company is choosing between Google's Gemini API and an open-source model. Which factor is most important for a business with limited ML expertise?

A.Ease of integration and availability of support

B.Model parameter count

C.Cost per token

D.Community size

AnswerA

Limited ML expertise means the team needs a solution that is easy to integrate and comes with reliable support.

Why this answer

Option B is correct because ease of integration and support reduces the need for in-house ML expertise. Option A (cost) is important but secondary to feasibility. Option C (parameter count) is not relevant to ease of use.

Option D (community size) is helpful but not as critical as managed support.

Practice this question →

116

Multi-Selecthard

A financial services firm must comply with regulations when using gen AI. Which two measures are critical?

Select 2 answers

A.Implement audit trails

B.Deploy without risk assessment

C.Use a closed-source model

D.Use explainable AI

E.Use only synthetic data

AnswersA, D

Audit trails provide accountability and support regulatory reviews.

Why this answer

Audit trails are critical for compliance because they provide a tamper-evident, chronological record of all AI model inputs, outputs, and decisions. This enables firms to demonstrate regulatory adherence (e.g., under GDPR or SOX) by reconstructing the exact sequence of events that led to a specific AI-generated output, which is essential for accountability and forensic review.

Exam trap

Google Cloud often tests the misconception that 'closed-source models are inherently more compliant' or that 'synthetic data eliminates privacy risks,' when in reality, compliance hinges on transparency, auditability, and risk assessment rather than the model's source or data origin.

Practice this question →

117

MCQmedium

A healthcare organization wants to use generative AI for medical report summaries. What is the primary concern?

A.Ensuring HIPAA compliance and data security when using cloud AI services

B.The model's ability to generate fluent and coherent summaries

C.Minimizing the cost of each API call to stay within budget

D.Latency of responses for real-time use cases

AnswerA

Generative AI models processing PHI must be HIPAA-compliant, requiring a signed Business Associate Agreement (BAA) with Google Cloud.

Why this answer

The primary concern for a healthcare organization using generative AI for medical report summaries is ensuring HIPAA compliance and data security when using cloud AI services. Medical data is protected health information (PHI), and any cloud-based AI service must have a Business Associate Agreement (BAA) in place and enforce encryption at rest and in transit to avoid regulatory penalties and data breaches.

Exam trap

Google Cloud often tests the misconception that technical performance (fluency, cost, latency) is the top priority, when in regulated industries like healthcare, compliance and data security are the non-negotiable primary concerns.

How to eliminate wrong answers

Option B is wrong because while fluency and coherence are important for summary quality, they are secondary to the legal and security obligations of handling PHI; a fluent summary that leaks data is non-compliant. Option C is wrong because cost minimization is an operational concern, not the primary risk; HIPAA violations carry fines up to $50,000 per violation, far outweighing API call costs. Option D is wrong because latency is a performance metric relevant for real-time use, but medical report summarization is typically asynchronous or batch-processed, and compliance takes precedence over speed.

Practice this question →

118

Multi-Selecteasy

Which THREE are essential components of a responsible AI strategy for GenAI? (Select three.)

Select 3 answers

A.Use of only open-source models

B.Maximum model size

C.Human oversight for critical decisions

D.Model transparency and explainability

E.Bias detection and mitigation

AnswersC, D, E

Human oversight prevents harmful automated decisions and ensures ethical use.

Why this answer

Human oversight for critical decisions (C) is essential because GenAI models can produce plausible but incorrect or harmful outputs. A responsible AI strategy mandates that a human-in-the-loop reviews high-stakes outputs, such as medical diagnoses or financial approvals, to prevent automated errors from causing real-world harm. This aligns with the principle of human accountability in AI governance frameworks like the NIST AI Risk Management Framework.

Exam trap

Google Cloud often tests the misconception that technical attributes like model size or open-source licensing are core to responsible AI, when in fact the focus is on governance practices like transparency, bias mitigation, and human oversight.

Practice this question →

119

MCQhard

You are the Generative AI lead for a global retail company that is building a customer service chatbot using a large language model (LLM) on Vertex AI. The chatbot will handle order inquiries, returns, and product recommendations. The company has a multi-cloud strategy and uses Google Cloud for AI workloads, but customer data is stored in AWS DynamoDB and on-premises databases. The legal team mandates that no customer personally identifiable information (PII) is sent to the LLM for training or inference, and that the model's responses must comply with GDPR and CCPA. The engineering team has proposed using a fine-tuned version of Gemini with retrieval-augmented generation (RAG) from a vector database. During a pilot, the chatbot occasionally hallucinates and invents order details, and response latency is over 10 seconds for complex queries. The budget for this project is limited, and the team needs to balance cost, compliance, and performance. Which course of action should you recommend?

A.Implement a two-model architecture: a smaller model for simple queries and a larger model for complex queries, with a router based on query complexity.

B.Switch to a purely fine-tuned model without RAG, and rely on fine-tuning data that excludes PII to ensure compliance.

C.Use a larger, more powerful LLM with chain-of-thought prompting to improve reasoning and reduce hallucinations, and cache frequent queries to reduce latency.

D.Ground the model with a curated knowledge base from DynamoDB and on-premises data, and use prompt engineering to explicitly instruct the model not to generate PII. Implement a PII detection and redaction layer before sending queries to the LLM.

AnswerD

Grounding reduces hallucinations by restricting responses to verified data, and prompt engineering with PII detection ensures compliance without significant latency increase or budget overrun.

Why this answer

Option B is correct because grounding the model with a knowledge base and using prompt engineering to restrict PII directly addresses hallucinations and compliance without high cost or latency. Option A is too complex and expensive for limited budget. Option C increases latency further due to multi-hop reasoning.

Option D removes the RAG capability, increasing hallucination risk.

Practice this question →

120

MCQhard

Refer to the exhibit. This JSON describes a Vertex AI endpoint with a deployed model. Which statement about scaling is true?

A.The endpoint uses only dedicated resources, no automatic scaling

B.The endpoint will automatically scale based on GPU utilization

C.The endpoint will scale from 1 to 3 replicas based on load using automatic scaling

D.The endpoint can scale to zero when not in use

AnswerA

DedicatedResources with min/max replicas means manual scaling.

Why this answer

Option A is correct because the JSON shows that the endpoint is configured with `dedicatedResources` and no `autoscalingMetricSpecs` or `minReplicaCount`/`maxReplicaCount` fields. In Vertex AI, when you specify only `machineSpec` and a fixed `minReplicaCount` (here implicitly 1) without a `maxReplicaCount` or autoscaling metrics, the endpoint uses dedicated resources with no automatic scaling — the model will always run on exactly the number of replicas you define, regardless of load.

Exam trap

Google Cloud often tests the misconception that any endpoint with a `minReplicaCount` and `maxReplicaCount` automatically enables scaling, but the trap here is that without `autoscalingMetricSpecs`, the endpoint uses dedicated resources and does not scale dynamically — the `maxReplicaCount` is ignored if autoscaling metrics are absent.

How to eliminate wrong answers

Option B is wrong because Vertex AI automatic scaling is based on CPU utilization or custom metrics, not GPU utilization; GPU utilization is not a supported metric for autoscaling in Vertex AI endpoints. Option C is wrong because the JSON does not include `autoscalingMetricSpecs` or a `maxReplicaCount` field, which are required to enable automatic scaling from a minimum to a maximum number of replicas; without these, the endpoint uses a fixed replica count. Option D is wrong because Vertex AI endpoints with dedicated resources cannot scale to zero; scaling to zero is only possible with private endpoints using manual scaling or when using Vertex AI Prediction with a custom container that supports scale-to-zero, but dedicated resources always maintain at least one replica.

Practice this question →

121

MCQmedium

A company's generative AI model is producing biased outputs. What is the most effective mitigation strategy?

A.Use a larger model with more parameters to improve overall accuracy

B.Fine-tune the model using a balanced, representative dataset and implement output filtering

C.Use prompt engineering to instruct the model to avoid biased language

D.Increase the diversity of input samples by random sampling

AnswerB

Balanced data reduces bias during training, and filters catch biased outputs in production.

Why this answer

Fine-tuning on a balanced, representative dataset directly addresses the root cause of biased outputs by correcting the model's learned associations, while output filtering provides a safety net to catch residual bias. This combination is more effective than superficial fixes because it modifies the model's internal weights rather than just masking outputs.

Exam trap

Google Cloud often tests the misconception that prompt engineering or model scaling alone can fix bias, when in fact only retraining or fine-tuning with balanced data addresses the underlying weight distribution.

How to eliminate wrong answers

Option A is wrong because increasing model size does not inherently reduce bias; larger models can amplify biases present in training data due to higher capacity to memorize spurious correlations. Option C is wrong because prompt engineering only provides a surface-level instruction that the model may ignore or fail to generalize, especially if the bias is deeply embedded in its parameters. Option D is wrong because random sampling of inputs does not address the model's biased internal representations; it only diversifies the prompts, not the training data that caused the bias.

Practice this question →

122

MCQmedium

A startup is building a generative AI content creation tool. They want to minimize operational costs while maintaining low latency for end users. Which deployment strategy should they adopt?

A.Deploy the model on edge devices to reduce cloud dependency.

B.Build an on-premises infrastructure to avoid cloud egress fees.

C.Use a serverless inference endpoint that scales to zero when not in use.

D.Provision dedicated GPU instances for consistent performance.

AnswerC

Serverless aligns cost with usage and auto-scales to meet demand.

Why this answer

Option C is correct because serverless inference endpoints, such as AWS Lambda with SageMaker or Google Cloud Run, automatically scale to zero when idle, eliminating costs during periods of no traffic. This directly addresses the startup's goal of minimizing operational costs while maintaining low latency through rapid cold-start optimizations and provisioned concurrency for burst handling.

Exam trap

Google Cloud often tests the misconception that 'scaling to zero' is only for CPU workloads, but serverless GPU inference endpoints (e.g., AWS SageMaker Serverless Inference) support GPU acceleration and scale to zero, making them cost-effective for variable generative AI workloads.

How to eliminate wrong answers

Option A is wrong because deploying on edge devices introduces significant hardware procurement and maintenance costs, and edge GPUs typically lack the compute power for large generative models, leading to higher latency for complex inference tasks. Option B is wrong because building on-premises infrastructure incurs high upfront capital expenditure and ongoing operational overhead for power, cooling, and maintenance, which contradicts the goal of minimizing operational costs. Option D is wrong because provisioning dedicated GPU instances incurs costs even when idle, as reserved or on-demand instances bill per hour regardless of usage, making it inefficient for variable or low-traffic workloads.

Practice this question →

123

MCQeasy

A marketing agency wants to generate images using Imagen on Vertex AI. They need to ensure the images are unique and avoid copyright issues. Which parameter adjustment is most relevant?

A.Increase training steps

B.Increase seed variability

C.Use negative prompts

D.Set safety threshold

AnswerC

Specifies elements to avoid, reducing copyright risk.

Why this answer

Negative prompts allow the model to exclude specific concepts, styles, or elements from generated images, directly reducing the risk of replicating copyrighted or trademarked content. By explicitly telling Imagen what not to include, the agency can steer outputs away from protected works without needing to modify training data or safety filters.

Exam trap

Google Cloud often tests the distinction between safety filters (which block harmful content) and negative prompts (which control stylistic or conceptual exclusion), leading candidates to mistakenly choose safety threshold adjustments for copyright avoidance.

How to eliminate wrong answers

Option A is wrong because increasing training steps does not affect the uniqueness or copyright compliance of outputs; it only refines model convergence on the existing training distribution. Option B is wrong because seed variability controls randomness in latent noise initialization, not the semantic content of the image, so it cannot prevent copyright infringement. Option D is wrong because safety thresholds filter harmful or policy-violating content (e.g., violence, hate speech), not copyrighted or trademarked elements.

Practice this question →

124

MCQmedium

A global news agency is using a generative AI model to summarize breaking news articles in real-time. The model is deployed on Vertex AI across multiple regions (us-central1, europe-west4, asia-southeast1) for low latency worldwide. The agency has a Service Level Objective (SLO) of 99.9% availability and p99 latency under 2 seconds. Recently, during a major event, traffic spiked 10x, and the europe-west4 region experienced latency spikes over 5 seconds and some 503 errors. The team suspects the regional endpoint is under-provisioned. Which combination of actions should they take to meet the SLO consistently?

A.Enable the global endpoint feature in Vertex AI with automatic traffic splitting, and increase the minimum replicas for each regional endpoint

B.Increase the maximum replicas for the europe-west4 endpoint and reduce the min replicas in other regions

C.Implement Cloud CDN caching for common summaries and reduce the number of regions to two

D.Configure a global load balancer with a single Vertex AI endpoint and increase max replicas globally

AnswerA

Global endpoint distributes traffic and increases capacity; higher min replicas prevent cold starts during spikes.

Why this answer

Enabling global endpoint with automatic traffic splitting and increasing min replicas per region (option D) provides both failover and capacity. Simply increasing replicas in europe (A) doesn't help if traffic shifts. Global endpoint without min replicas (B) still risks cold starts.

Using Cloud CDN (C) is for static content, not model inference.

Practice this question →

125

MCQhard

A large enterprise runs a generative AI solution serving millions of daily inference requests. To reduce costs, they propose using serverless endpoints (Vertex AI Prediction) with a custom container, but they notice high latency during cold starts. Which strategy best addresses this problem while minimizing cost?

A.Set a minimum number of replicas to maintain a baseline of always-on instances.

B.Upgrade to GPU-accelerated machines for all replicas.

C.Implement client-side request batching to reduce the number of inference calls.

D.Use prewarmed containers by setting an idle timeout to keep instances alive.

AnswerA

B is correct because it eliminates cold starts for the baseline load, and autoscaling handles additional traffic.

Why this answer

Option A is correct because setting a minimum number of replicas ensures that a baseline of always-on instances is maintained, eliminating cold starts for the majority of requests. This directly addresses the latency spike caused by container initialization and model loading in serverless endpoints, while the cost impact is limited to the minimum replicas rather than scaling all instances.

Exam trap

Google Cloud often tests the misconception that prewarming via idle timeout is a configurable parameter in serverless ML services, but in Vertex AI Prediction, the idle timeout is fixed and not user-adjustable, making minimum replicas the correct approach.

How to eliminate wrong answers

Option B is wrong because upgrading to GPU-accelerated machines increases cost significantly without solving cold start latency; GPUs primarily improve per-request throughput, not initialization time. Option C is wrong because client-side request batching reduces the number of inference calls but does not affect cold start latency; it may even increase perceived latency for individual requests. Option D is wrong because setting an idle timeout to keep instances alive is not a supported mechanism in Vertex AI Prediction; the service uses an internal keep-alive policy, and user-configurable idle timeouts are not available, making this option technically infeasible.

Practice this question →

126

Multi-Selectmedium

A financial institution is implementing a generative AI chatbot to handle customer inquiries. The institution must comply with regulatory requirements (e.g., GDPR, SOX) and ensure data privacy. Which TWO actions should the institution take?

Select 2 answers

A.Establish a Center of Excellence (CoE) for AI governance to oversee model deployment and monitoring.

B.Use Vertex AI without additional data governance controls to simplify deployment.

C.Use a pre-trained model without customization to reduce development time.

D.Implement model validation and testing to ensure outputs meet regulatory standards.

E.Deploy the model on-premises only to keep data within local infrastructure.

AnswersA, D

A CoE provides centralized governance and best practices.

Why this answer

Options B and D are correct. B: implementing model validation and testing ensures the model behaves as expected and helps meet compliance requirements. D: establishing a Center of Excellence (CoE) for AI governance provides oversight and standardization.

Option A is wrong because using a pre-trained model without customization may not meet specific compliance needs. Option C is wrong because deploying on-premises only is not necessary and may limit scalability. Option E is wrong because Vertex AI without data governance would not satisfy regulatory demands.

Practice this question →

127

MCQhard

A company has been using an on-premises ML infrastructure for generative AI and wants to migrate to Google Cloud. They have a pipeline that fine-tunes a large language model weekly using a proprietary dataset. The migration must minimize downtime and data transfer costs. Which approach best addresses these requirements?

A.Use Vertex AI Pipelines to orchestrate the fine-tuning process, and use Vertex AI Managed Datasets to incrementally sync new data with BigQuery as the source.

B.Use AutoML to train a new model directly from the dataset without fine-tuning.

C.Deploy the existing pipeline on a Google Kubernetes Engine cluster and use Google Cloud Filestore for shared storage.

D.Use Cloud Storage Transfer Service to move all data to Cloud Storage, then set up a Vertex AI custom training job to run the fine-tuning.

AnswerA

C is correct because it allows incremental sync and automated pipeline execution with minimal disruption.

Why this answer

Option C is correct because Vertex AI Pipelines with Managed Datasets allows incremental data transfer and automates fine-tuning in the cloud, minimizing downtime. Option A is wrong because Cloud Storage Transfer Service is for one-time bulk transfer, causing longer downtime. Option B is wrong because a custom solution on GKE is complex and may not reduce costs.

Option D is wrong because AutoML does not support fine-tuning custom large language models.

Practice this question →

128

MCQhard

A large enterprise is deploying a generative AI-powered code assistant for their developers. The solution uses Vertex AI with a fine-tuned Codey model. The security team requires that all prompts and responses be logged for audit purposes, but the logs must not contain sensitive information such as API keys or passwords. The operations team is concerned about high latency during peak usage. You need to design a solution that meets security requirements without compromising performance. Which approach should you take?

A.Use Cloud Audit Logs to capture all API calls to Vertex AI, but do not log the actual prompts and responses

B.Enable Vertex AI model monitoring with Cloud Logging, and configure a log sink with a custom exclusion filter to redact sensitive patterns before storing

C.Log all prompts and responses to Cloud Storage and use a Cloud DLP job to scan and redact sensitive data periodically

D.Implement a custom proxy that logs all requests after stripping sensitive data, then forward to the model

AnswerB

This ensures all interactions are logged but sensitive data is removed, meeting security without major performance impact.

Why this answer

Option B is correct because it uses Vertex AI model monitoring with Cloud Logging to capture prompts and responses, then applies a custom exclusion filter with a log sink to redact sensitive patterns (e.g., API keys, passwords) in real time before logs are stored. This meets the security requirement for audit logging without sensitive data while avoiding the latency overhead of post-processing or a custom proxy, thus satisfying the operations team's performance concern.

Exam trap

Google Cloud often tests the misconception that post-processing redaction (e.g., Cloud DLP) or custom proxies are acceptable for real-time logging, when in fact native streaming redaction via log sinks is required to meet both security and performance constraints.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs capture only administrative actions (e.g., model deployment) and not the actual prompts and responses, failing the audit requirement. Option C is wrong because logging all data to Cloud Storage and running a periodic Cloud DLP job introduces significant latency and potential exposure window between logging and redaction, violating the performance requirement. Option D is wrong because implementing a custom proxy adds network hop latency and operational overhead, degrading performance during peak usage, and does not leverage native Vertex AI logging capabilities.

Practice this question →

← PreviousPage 2 of 2 · 128 questions total

Ready to test yourself?

Try a timed practice session using only Business Strategies for Generative AI Solutions questions.

Start 20-question session