CCNA Gen Ai Business Strategy Questions

75 of 128 questions · Page 1/2 · Gen Ai Business Strategy topic · Answers revealed

1
Multi-Selectmedium

A company is establishing governance practices for generative AI models. Which three actions are essential for responsible AI deployment?

Select 3 answers
A.Use model versioning to track changes.
B.Regularly audit model outputs for bias.
C.Monitor for data leakage from training data.
D.Implement a human review process for critical decisions.
E.Open-source the model to ensure transparency.
AnswersA, B, D

Versioning ensures reproducibility and accountability for model updates.

Why this answer

Options A, C, and D are correct. Regular bias audits ensure fairness; model versioning provides traceability; human review processes catch critical errors. Data leakage monitoring is important but not always considered a core governance pillar; open-sourcing is voluntary and not essential.

2
MCQhard

A government agency is deploying a generative AI chatbot to answer citizen questions about public services. The chatbot must provide accurate and consistent information, scale to handle peak loads during tax season, and comply with strict data sovereignty laws that require all data to stay within the country. The agency has a moderate budget and in-house IT team but limited AI expertise. Which deployment architecture should they choose?

A.Build and host the model on-premises using open-source tools
B.Deploy a pre-trained model on Vertex AI in the required region with auto-scaling
C.Deploy the model on Vertex AI across multiple regions for availability
D.Use a third-party managed generative AI service that guarantees data residency
AnswerB

Keeps data within region, auto-scales, and requires minimal AI expertise.

Why this answer

Option A is correct because using Vertex AI within a single region ensures data sovereignty, and autoscaling handles peak loads. Option B (multi-region) violates data sovereignty. Option C (on-premises) lacks scalability and AI expertise.

Option D (managed service from a third-party) may not meet sovereignty or budget.

3
MCQmedium

A team built a GenAI chatbot that uses a vector database to retrieve context. Users report irrelevant responses. What is the most likely business strategy issue?

A.The model is too small to generate accurate responses
B.The chatbot is too verbose
C.The system is overfitting to the training data
D.The embedding model is not aligned with the domain vocabulary
AnswerD

If the embeddings do not capture domain-specific meanings, retrieved context will be irrelevant, leading to poor answers.

Why this answer

Option D is correct because irrelevant responses in a RAG (Retrieval-Augmented Generation) chatbot most often stem from the embedding model failing to capture domain-specific semantics. If the embedding model was trained on general text (e.g., Wikipedia) but the chatbot operates in a specialized field like legal or medical, the vector similarity search will retrieve context that is semantically distant from the user's query, leading to irrelevant answers. This is a business strategy issue because the team chose an embedding model that does not align with their domain vocabulary, undermining the entire retrieval pipeline.

Exam trap

Google Cloud often tests the misconception that irrelevant responses are caused by model size or overfitting, when in fact the retrieval stage (embedding model and vector search) is the primary bottleneck in a RAG architecture.

How to eliminate wrong answers

Option A is wrong because model size (number of parameters) primarily affects generation quality and coherence, not the relevance of retrieved context; a small model can still produce accurate responses if the retrieved context is correct. Option B is wrong because verbosity is a stylistic output issue unrelated to the core problem of irrelevant responses; a verbose chatbot might still be accurate. Option C is wrong because overfitting to training data would cause the model to memorize specific examples and fail to generalize, but the symptom here is irrelevant responses due to poor retrieval, not hallucination or memorization of training data.

4
MCQhard

A company has a generative AI model that is too slow for real-time inference. What architectural change would help?

A.Apply model quantization and deploy on TPUs
B.Switch to a larger, more accurate model
C.Deploy the model on more powerful CPUs
D.Use distributed training across multiple GPUs
AnswerA

Quantization reduces memory footprint and speeds up computation, and TPUs provide high throughput for trained models.

Why this answer

Model quantization reduces the precision of the model's weights (e.g., from FP32 to INT8), which significantly decreases memory footprint and computation time, enabling faster inference. Deploying on TPUs (Tensor Processing Units) further accelerates matrix operations through specialized hardware, making this combination ideal for real-time latency requirements.

Exam trap

Google Cloud often tests the distinction between training optimization (distributed training) and inference optimization (quantization, pruning, hardware acceleration), so the trap here is that candidates confuse improving training speed with improving inference latency.

How to eliminate wrong answers

Option B is wrong because switching to a larger, more accurate model increases computational complexity and latency, worsening the speed problem. Option C is wrong because CPUs are general-purpose processors with limited parallel matrix computation capabilities compared to GPUs or TPUs, so using more powerful CPUs still cannot match the throughput needed for real-time inference. Option D is wrong because distributed training across multiple GPUs addresses training speed, not inference latency; inference is typically a single-pass operation that benefits from model optimization and hardware acceleration, not parallel training techniques.

5
MCQhard

A financial services firm is developing a GenAI application for investment advice. They need to ensure regulatory compliance. Which business strategy should they prioritize?

A.Rapidly deploy an MVP and iterate based on user feedback
B.Implement strict human-in-the-loop review for all investment recommendations
C.Open-source the model to gain community trust
D.Partner with a cloud provider that offers indemnification for model outputs
AnswerB

Human oversight is required by regulations for financial advice, ensuring accuracy and compliance.

Why this answer

In regulated industries like financial services, GenAI applications must prioritize compliance over speed. Option B is correct because a human-in-the-loop (HITL) review ensures that every investment recommendation is auditable and meets regulatory standards (e.g., SEC or FINRA rules), mitigating risks of hallucinated or non-compliant outputs. This strategy directly addresses the need for accountability and transparency in high-stakes decision-making.

Exam trap

Google Cloud often tests the misconception that speed or technical features (like open-sourcing or indemnification) can substitute for regulatory compliance, but in regulated domains, human oversight and auditability are non-negotiable.

How to eliminate wrong answers

Option A is wrong because rapidly deploying an MVP without rigorous compliance checks risks generating non-compliant or misleading investment advice, which could lead to severe regulatory penalties and loss of client trust. Option C is wrong because open-sourcing the model does not inherently ensure regulatory compliance; it may expose proprietary data or create liability if the model produces biased or inaccurate outputs, and community trust does not substitute for legal adherence. Option D is wrong because cloud provider indemnification covers legal costs for model outputs but does not prevent the generation of non-compliant advice; it is a risk transfer mechanism, not a compliance strategy.

6
Multi-Selecteasy

A team is selecting a foundation model for a text summarization use case. They need to consider factors that affect both model performance and production deployment. Which THREE factors are most critical? (Choose three.)

Select 3 answers
A.Model parameter count (billions of parameters).
B.Inference latency and throughput capabilities.
C.Context window length (maximum input tokens).
D.Training data provenance and licensing.
E.Pricing per token (input + output).
AnswersB, C, E

C is correct because it affects user experience and scaling.

Why this answer

Inference latency and throughput are critical for production deployment because they directly determine the user experience and operational cost. A model with high latency may be unsuitable for real-time summarization, while low throughput limits the number of concurrent requests the system can handle, affecting scalability and cost-efficiency.

Exam trap

Google Cloud often tests the distinction between model-centric factors (like parameter count) and deployment-centric factors (like latency and pricing), trapping candidates who assume bigger models are always better without considering operational constraints.

7
MCQmedium

A company wants to scale their generative AI application globally with low latency. Which infrastructure configuration is most suitable?

A.Use a CDN to cache responses.
B.Multiple regional endpoints with traffic routing to the nearest region.
C.On-premises deployment for all regions.
D.Single endpoint in us-central1 with high max replicas.
AnswerB

Regional deployment reduces latency by serving from nearby cloud regions.

Why this answer

Option B is correct because deploying multiple regional endpoints with traffic routing to the nearest region minimizes latency by directing user requests to the geographically closest inference endpoint. This architecture leverages global load balancing (e.g., using Anycast DNS or HTTP(S) load balancers with backend services in multiple regions) to reduce round-trip time (RTT) and meet latency SLAs for real-time generative AI applications.

Exam trap

The trap here is that candidates often confuse CDN caching with real-time inference, assuming caching can accelerate dynamic AI responses, but generative AI outputs are unique per request and cannot be pre-cached.

How to eliminate wrong answers

Option A is wrong because a CDN caches static content (e.g., images, CSS) but cannot cache dynamic, context-dependent generative AI responses, which require real-time model inference; thus, it does not reduce latency for API calls. Option C is wrong because on-premises deployment lacks global scalability and introduces high latency for users outside the local region, defeating the purpose of global low-latency access. Option D is wrong because a single endpoint in us-central1 forces all global traffic to traverse long distances, causing high latency for users far from that region, regardless of the number of replicas.

8
MCQeasy

A retail company wants to use generative AI to generate product descriptions for thousands of items. They need to ensure that the descriptions are consistent with their brand voice and do not contain factual inaccuracies. What is the most effective strategy?

A.Use a rule-based system to generate descriptions from product attributes.
B.Fine-tune a model on historical product descriptions and use prompt engineering with brand guidelines.
C.Use a large language model with no safety filters to maximize output variety.
D.Use a pre-trained model without any customization and rely on post-processing filters.
AnswerB

Fine-tuning tailors the model to the brand's style; prompt engineering reinforces guidelines and reduces hallucinations.

Why this answer

Option B is correct because fine-tuning on historical product descriptions tailors the model to the brand's specific style, and prompt engineering with brand guidelines ensures adherence to voice and reduces hallucinations. Option A is wrong because a generic pre-trained model may not capture brand voice and is more prone to hallucination. Option C is wrong because rule-based systems lack the flexibility and creativity of generative AI.

Option D is wrong because removing safety filters increases the risk of inappropriate or inaccurate content.

9
Multi-Selectmedium

A company has a generative AI chatbot on Vertex AI that shows high response latency. They want to reduce latency without significantly increasing cost. Which TWO actions should they take? (Choose two.)

Select 2 answers
A.Increase the min_replica_count to keep more instances always warm.
B.Enable streaming responses using server-sent events.
C.Reduce the max_output_tokens parameter in the model configuration.
D.Use machine types with GPUs.
E.Switch to a larger model like Gemini 1.5 Pro for better accuracy.
AnswersB, C

C is correct because streaming gives partial results sooner.

Why this answer

Option B is correct because enabling streaming responses using server-sent events (SSE) allows the chatbot to send tokens incrementally as they are generated, rather than waiting for the full response. This reduces the perceived latency for the end user, as the first token appears much sooner, even though the total generation time may remain similar. This approach directly addresses high response latency without increasing compute cost, as it does not require additional infrastructure or model changes.

Exam trap

The trap here is that candidates often confuse reducing latency with reducing total generation time, but streaming only reduces perceived latency by delivering tokens earlier, while options like reducing max_output_tokens actually cut total generation time and cost by limiting output length.

10
MCQhard

A global corporation with 50,000 employees has seen rapid adoption of GenAI across marketing, product, and engineering teams. Each team selected its own models and cloud accounts, resulting in fragmented governance, unexpected costs, and varying output quality. The CFO demands a unified strategy to control costs and ensure consistency. The Chief AI Officer proposes several solutions. Which course of action best balances control with innovation?

A.Migrate all GenAI workloads to a single on-premises server to reduce cloud costs
B.Establish a GenAI Center of Excellence (CoE) that provides approved models, shared APIs, and best practices, while allowing team-specific customizations
C.Mandate all teams use a single model (e.g., Gemini) via a centralized Vertex AI endpoint with usage quotas
D.Allow teams to continue using their own models but require them to submit monthly cost reports
AnswerB

A CoE promotes standardization and governance while enabling innovation through customization, balancing both needs.

Why this answer

Option C is correct because a GenAI Center of Excellence provides standardized models and best practices while allowing teams to customize as needed, balancing control and flexibility. A (mandate a single model) stifles innovation. B (monthly reports) does not address fragmentation proactively.

D (on-prem) is costly and limits model access.

11
MCQmedium

A retail company is building a product description generator using a large language model on Vertex AI. They need to ensure the generated descriptions do not contain offensive language. Which strategy should they implement?

A.Fine-tune the model on a dataset of clean product descriptions
B.Implement a content moderation filter (e.g., Perspective API) as a post-processing step
C.Use Vertex AI Model Monitoring to detect anomalies in model predictions
D.Include explicit instructions in the prompt to avoid offensive language
AnswerB

Post-processing filters catch offensive outputs before delivery to users.

Why this answer

Option B is correct because content moderation filters like Perspective API act as a post-processing safeguard that can catch offensive language the model might generate despite prompt engineering or fine-tuning. This approach provides a deterministic, rule-based or ML-based check that is independent of the model's training, ensuring compliance with content policies in production. It is a standard practice for deploying LLMs in customer-facing applications where safety is critical.

Exam trap

Google Cloud often tests the misconception that prompt engineering or fine-tuning alone can guarantee safety, when in practice a dedicated post-processing filter is required for reliable content moderation in production.

How to eliminate wrong answers

Option A is wrong because fine-tuning on clean product descriptions reduces but does not eliminate the risk of generating offensive language; the model can still hallucinate or produce harmful outputs due to biases in the base model or adversarial inputs. Option C is wrong because Vertex AI Model Monitoring detects anomalies in prediction distributions (e.g., drift, data skew) but does not inspect individual outputs for offensive content; it is a monitoring tool, not a content filter. Option D is wrong because including explicit instructions in the prompt is a weak safeguard; LLMs can ignore or misinterpret instructions, especially under prompt injection or when generating long descriptions, making it unreliable as a sole defense.

12
MCQeasy

A developer is using Vertex AI with an API key and gets the above error. What is the likely cause?

A.Wrong endpoint
B.Excessive quota
C.Expired API key
D.Insufficient permissions
AnswerC

The error message explicitly says 'API key not valid', common when key is expired.

Why this answer

The error indicates that the API key used to authenticate with Vertex AI is no longer valid. API keys can expire due to a configured expiration policy or if they have been revoked in the Google Cloud Console. Since the developer is using an API key directly (rather than a service account or OAuth token), an expired key is the most direct cause of an authentication failure.

Exam trap

The trap here is that candidates confuse authentication failures (401) with authorization failures (403), leading them to select 'Insufficient permissions' when the actual issue is an expired or invalid API key.

How to eliminate wrong answers

Option A is wrong because a wrong endpoint would result in a DNS resolution or HTTP 404 error, not an authentication-related error. Option B is wrong because excessive quota returns a 429 HTTP status code (RESOURCE_EXHAUSTED), not an authentication failure. Option D is wrong because insufficient permissions would return a 403 HTTP status code (PERMISSION_DENIED), which is distinct from the authentication error caused by an invalid or expired API key.

13
MCQeasy

A financial services firm wants to use generative AI to summarize lengthy regulatory documents for compliance officers. They need high accuracy and the ability to reference specific source paragraphs. The team is evaluating a retrieval-augmented generation (RAG) approach on Google Cloud. However, they are concerned about latency when querying large documents. Which architecture change would most effectively reduce response time?

A.Switch to a pure vector search without indexing
B.Increase the number of chunks retrieved per query
C.Use a larger embedding model to improve retrieval accuracy
D.Implement semantic chunking with overlapping to reduce document size per retrieval
AnswerD

Smaller, well-structured chunks speed up retrieval and generation.

Why this answer

Semantic chunking with overlapping reduces the size of each retrieved chunk while preserving context, which directly lowers the amount of text processed per query and speeds up the generation step. This architecture change minimizes latency by ensuring the retriever fetches only the most relevant, compact segments, reducing the load on both the embedding and LLM inference stages.

Exam trap

Google Cloud often tests the misconception that improving retrieval accuracy or increasing context always benefits latency, when in fact reducing the per-query data volume through smarter chunking is the most direct way to cut response time.

How to eliminate wrong answers

Option A is wrong because pure vector search without indexing would require a full scan of all document embeddings, drastically increasing retrieval time and negating any latency benefit. Option B is wrong because increasing the number of chunks retrieved per query expands the context window, which increases the LLM's processing time and overall response latency. Option C is wrong because a larger embedding model improves retrieval accuracy but introduces higher computational cost during both indexing and query encoding, which increases latency rather than reducing it.

14
MCQmedium

A large enterprise has deployed generative AI assistants in three separate departments (HR, Marketing, and Customer Support) using different tools and models. Over the past quarter, the company has observed escalating cloud costs, inconsistent user experiences, and reports of data leakage in Customer Support logs. The CTO wants to address these issues while maintaining innovation velocity. As the Generative AI Leader, what course of action should you recommend?

A.Standardize on a single model and tool across all departments, restricting usage to one platform.
B.Implement a centralized AI governance platform with cost monitoring, model registry, and security guardrails.
C.Discontinue the Customer Support assistant to eliminate data leakage risk and reduce costs.
D.Allow each department to continue independently but require monthly cost and compliance reports.
AnswerB

Centralized governance addresses cost, security, and consistency while allowing flexibility.

Why this answer

Option C is correct because implementing a centralized AI governance platform provides unified cost management, security controls, and standardization, aligning with business strategies to scale generative AI responsibly. Option A is wrong because creating additional silos would worsen fragmentation. Option B is wrong because it fails to address root causes and misses an opportunity for optimization.

Option D is wrong because regulatory compliance is not optional and ignoring it can lead to severe penalties.

15
MCQhard

An enterprise wants to adopt GenAI across departments but faces resistance from legal and compliance. Which strategy should the AI leader prioritize?

A.Outsource the entire initiative to a consulting firm
B.Build a comprehensive governance framework covering data use, review, and monitoring
C.Deploy a single pilot in a low-risk department to demonstrate value
D.Mandate use of GenAI through executive order
AnswerB

A governance framework ensures that GenAI use is compliant, transparent, and aligned with corporate policies, gaining trust from legal and compliance.

Why this answer

Option B is correct because legal and compliance resistance stems from concerns about data privacy, regulatory adherence, and model accountability. A comprehensive governance framework directly addresses these by defining data usage policies, implementing review mechanisms for model outputs, and establishing continuous monitoring to detect drift or bias, which is essential for enterprise-grade GenAI deployment.

Exam trap

Google Cloud often tests the misconception that a low-risk pilot (Option C) is the best first step to overcome resistance, but the trap is that without a governance framework, even a pilot can expose the enterprise to compliance risks, and the question specifically asks for a strategy to address legal and compliance resistance, not just to demonstrate value.

How to eliminate wrong answers

Option A is wrong because outsourcing to a consulting firm does not resolve internal legal and compliance concerns; it shifts responsibility without ensuring the enterprise has control over data governance, model transparency, or audit trails, which are critical for regulatory compliance. Option C is wrong because deploying a single low-risk pilot, while useful for proof-of-concept, does not address the root cause of resistance from legal and compliance—it may demonstrate value but lacks the governance structure needed to satisfy their requirements for data handling, review, and monitoring across all departments. Option D is wrong because mandating use through executive order bypasses the legitimate concerns of legal and compliance teams, likely escalating resistance and risking non-compliance with regulations like GDPR or HIPAA, as GenAI models can inadvertently expose sensitive data or produce unverifiable outputs.

16
MCQeasy

A company wants to offer a generative AI feature where the output must follow a very specific tone and style as per the brand guidelines. Which strategy is most reliable?

A.Post-process the output with a style transfer algorithm.
B.Use a general-purpose model with a system prompt describing the style.
C.Use a different model for each content type.
D.Fine-tune a model on a dataset of branded content.
AnswerD

Fine-tuning internalizes the style, leading to more reliable and consistent output.

Why this answer

Fine-tuning a model on a dataset of branded content is the most reliable strategy because it adjusts the model's internal weights to consistently produce outputs that match the specific tone and style of the brand. Unlike prompt-based methods, fine-tuning embeds the stylistic constraints directly into the model's parameters, ensuring adherence even for complex or nuanced brand guidelines.

Exam trap

The trap here is that candidates overestimate the reliability of prompt engineering (Option B) for enforcing strict, consistent stylistic constraints, underestimating how easily a general-purpose model can deviate from a system prompt when faced with complex or ambiguous inputs.

How to eliminate wrong answers

Option A is wrong because post-processing with a style transfer algorithm adds latency, can introduce artifacts, and may not preserve the original content's meaning while reliably matching brand-specific tone and style. Option B is wrong because a general-purpose model with a system prompt is fragile—subtle variations in prompt phrasing or model updates can cause the output to drift from the desired style, and the model lacks deep internalization of the brand's unique patterns. Option C is wrong because using a different model for each content type does not guarantee consistent tone and style across types; it increases maintenance overhead and still requires each model to be individually tuned or prompted to follow brand guidelines.

17
MCQeasy

A startup is deciding between using a pre-trained model via API vs. hosting their own open-source model. Which factor is most critical for their decision?

A.The accuracy on a benchmark dataset
B.The number of parameters in the model
C.The level of community support for the open-source model
D.Total cost of ownership including infrastructure and expertise
AnswerD

A startup must consider API pricing vs. cloud infrastructure and the hiring costs for model maintenance.

Why this answer

Total cost of ownership (TCO) is the most critical factor because it encompasses not only the direct costs of infrastructure (compute, storage, networking) but also the hidden costs of expertise (MLOps engineers, security hardening, ongoing maintenance) and opportunity costs. A pre-trained API may have higher per-token costs but lower upfront investment, while self-hosting an open-source model requires significant capital expenditure on GPUs, cooling, and power, plus the operational burden of scaling inference under variable load. This decision directly impacts the startup's burn rate and runway, making TCO the primary driver for a resource-constrained organization.

Exam trap

Google Cloud often tests the misconception that technical superiority (accuracy or parameter count) is the primary decision factor, when in reality the business context—specifically TCO—drives the choice between API consumption and self-hosting for startups.

How to eliminate wrong answers

Option A is wrong because benchmark accuracy is a static metric that does not account for real-world deployment costs, latency requirements, or data privacy constraints; a model with slightly lower accuracy may be far more cost-effective or compliant. Option B is wrong because the number of parameters is a coarse proxy for model capability but does not directly determine inference cost, latency, or the total cost of ownership; a smaller model with efficient quantization can outperform a larger model in throughput and cost per request. Option C is wrong because community support, while helpful for troubleshooting, does not address the core financial and operational viability of self-hosting; a well-supported model still requires the startup to bear all infrastructure and expertise costs.

18
MCQhard

A financial institution wants to deploy a gen AI model for fraud detection but must comply with strict regulations regarding explainability. What is the best strategy?

A.Use Vertex AI Explainable AI with a complex model
B.Deploy multiple models and ensemble
C.Use a large black-box model and rely on external auditing
D.Implement a smaller interpretable model with acceptable accuracy
AnswerD

Interpretable models satisfy explainability requirements while maintaining reasonable performance.

Why this answer

Option D is correct because regulatory compliance for fraud detection demands explainability, which complex black-box models cannot provide. A smaller interpretable model (e.g., logistic regression or decision tree) offers transparency into decision factors, satisfying regulations like GDPR's right to explanation while maintaining acceptable accuracy for the use case.

Exam trap

Google Cloud often tests the misconception that post-hoc explainability tools (like Vertex AI Explainable AI) are equivalent to inherent model interpretability, leading candidates to choose complex models with added explanation layers instead of simpler, transparent models.

How to eliminate wrong answers

Option A is wrong because Vertex AI Explainable AI provides post-hoc explanations for complex models, but these approximations may not meet strict regulatory standards for full transparency and can be unreliable. Option B is wrong because ensembling multiple models increases complexity and opacity, making it harder to explain individual predictions and often violating explainability requirements. Option C is wrong because relying on external auditing for a large black-box model does not guarantee inherent explainability; auditors still face the same opacity, and regulations typically require model-inherent interpretability, not just external review.

19
MCQmedium

A company deployed a Gemini model on Vertex AI for real-time inference. After a week, they notice that some requests return 500 Internal Server Error, and the endpoint is occasionally unreachable. The endpoint is configured with minReplicaCount=1 and maxReplicaCount=2. What is the most likely cause?

A.Autoscaling is disabled, so the endpoint cannot handle traffic spikes.
B.The model was updated while the endpoint was serving requests.
C.The endpoint is under-provisioned: minReplicaCount=1 is too low for peak load, causing the single replica to become saturated.
D.The project has reached its Vertex AI endpoint quota.
AnswerC

C is correct because the single replica cannot handle bursts, leading to errors.

Why this answer

Option C is correct because with minReplicaCount=1 and maxReplicaCount=2, the endpoint starts with a single replica. Under peak load, that single replica can become saturated (CPU/memory exhaustion), causing 500 errors and unreachability. Autoscaling can add a second replica, but if the traffic spike is sudden or the scaling metric takes time to trigger, the single replica is overwhelmed before the second instance is provisioned.

Exam trap

The trap here is that candidates assume autoscaling instantly handles spikes, but they overlook the provisioning delay and the fact that a single replica can be overwhelmed before the second replica is ready.

How to eliminate wrong answers

Option A is wrong because autoscaling is not disabled; the minReplicaCount=1 and maxReplicaCount=2 configuration explicitly enables autoscaling between 1 and 2 replicas. Option B is wrong because updating a model while the endpoint is serving requests does not cause 500 errors or unreachability; Vertex AI supports model updates with zero-downtime via canary deployments or traffic splitting. Option D is wrong because reaching the Vertex AI endpoint quota would result in a 429 (Too Many Requests) or a quota-exceeded error, not a 500 Internal Server Error or intermittent unreachability.

20
MCQmedium

A healthcare company wants to use generative AI to summarize patient records. They are concerned about data privacy and HIPAA compliance. Which Google Cloud feature should they use to protect patient data?

A.Cloud Audit Logs
B.Confidential VMs
C.Cloud Data Loss Prevention (DLP) API
D.Customer-managed encryption keys (CMEK) with VPC Service Controls
AnswerD

CMEK ensures data is encrypted with keys controlled by the customer, and VPC-SC prevents data exfiltration.

Why this answer

D is correct because Customer-managed encryption keys (CMEK) with VPC Service Controls provide a defense-in-depth approach for HIPAA compliance. CMEK allows the healthcare company to control and manage the encryption keys used to protect patient data at rest, while VPC Service Controls prevent data exfiltration by restricting data movement outside a defined service perimeter. This combination ensures that even if an attacker gains access, they cannot decrypt the data or move it out of the controlled environment, directly addressing data privacy and HIPAA requirements.

Exam trap

The trap here is that candidates often confuse data discovery and de-identification tools (DLP) with data protection and access control mechanisms (CMEK + VPC Service Controls), leading them to pick Cloud DLP API despite it not providing encryption or perimeter controls required for HIPAA compliance.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs only record who did what, when, and where, but do not protect or encrypt patient data; they are a monitoring tool, not a data protection mechanism. Option B is wrong because Confidential VMs encrypt data in use using AMD SEV, but they do not control data exfiltration or provide the perimeter-based access controls needed for HIPAA compliance; they focus on memory encryption, not data movement restrictions. Option C is wrong because Cloud Data Loss Prevention (DLP) API is used for inspecting, classifying, and de-identifying sensitive data, but it does not provide encryption key management or network-level controls to prevent unauthorized data access or exfiltration.

21
Multi-Selecthard

A company is considering monetizing a generative AI-powered product. Which two business models are most common and viable?

Select 2 answers
A.Free with advertising.
B.One-time license fee for the model.
C.Pay-per-use based on tokens consumed.
D.Subscription tiered by usage.
E.Selling user data collected from interactions.
AnswersC, D

Pay-per-use matches costs to usage, common in cloud API services.

Why this answer

Option C is correct because pay-per-use based on tokens consumed aligns directly with the operational cost structure of generative AI models, where each inference incurs compute and memory costs proportional to the number of tokens processed. This model allows customers to pay only for what they use, making it viable for variable workloads and avoiding upfront commitment, while providers can scale revenue with usage. It is the most common monetization strategy for API-based generative AI services, such as OpenAI's GPT-4 or Anthropic's Claude, where pricing is explicitly tied to token counts.

Exam trap

Google Cloud often tests the misconception that one-time licensing (Option B) is viable for AI models, but candidates must recognize that generative AI models are not static software—they require ongoing compute, updates, and scaling, making subscription or pay-per-use models the only sustainable approaches.

22
MCQmedium

A company is using a generative AI model for internal report generation. They notice costs are high because each request processes large amounts of text. Which business strategy would most effectively reduce costs while maintaining quality?

A.Fine-tune a smaller model on a specialized dataset.
B.Use a more powerful model to reduce retries.
C.Implement caching for repeated requests.
D.Increase the batch size for online predictions.
AnswerA

A smaller fine-tuned model can provide sufficient quality at lower cost for specific tasks.

Why this answer

Fine-tuning a smaller model on a specialized dataset reduces computational cost per inference because smaller models have fewer parameters and require less memory and processing power. By tailoring the model to the company's specific domain (e.g., internal reports), it can maintain output quality comparable to a larger general-purpose model, directly addressing the cost-per-request issue without sacrificing accuracy.

Exam trap

Google Cloud often tests the misconception that 'bigger is always better' or that caching universally reduces costs, but the trap here is that candidates overlook the unique nature of generative AI outputs and the cost benefits of model specialization over raw scale or caching.

How to eliminate wrong answers

Option B is wrong because using a more powerful model typically increases per-request cost and latency, and while it may reduce retries, the net cost often rises due to higher compute requirements. Option C is wrong because caching only helps if identical requests are repeated frequently; for generative AI report generation, each request is often unique (different text inputs), making caching ineffective for reducing per-request processing costs. Option D is wrong because increasing batch size for online predictions can reduce per-request cost only if requests are batched together, but online (real-time) predictions usually require low latency and process one request at a time, so larger batch sizes are not applicable and may increase latency.

23
MCQhard

A company with limited AI expertise wants to adopt gen AI. They need a solution that integrates with existing data and applications. Which Google Cloud offering is best?

A.Apigee
B.Colab Enterprise
C.BigQuery ML
D.Vertex AI Agent Builder
AnswerD

Provides a low-code platform for building and deploying gen AI agents that integrate with enterprise data and applications.

Why this answer

Option D is correct because Vertex AI Agent Builder is designed for building conversational AI agents with easy integration to enterprise data sources. Option A is wrong because Colab Enterprise is a notebook environment, not a full solution. Option B is wrong because Apigee is an API management platform.

Option C is wrong because BigQuery ML is for SQL-based ML, not gen AI agents.

24
MCQmedium

A global nonprofit organization is deploying a generative AI chatbot to provide educational content in multiple languages to underserved communities. They operate in regions with limited internet connectivity. The chatbot must work offline or with minimal data usage. The team has a moderate budget and limited technical staff. Which deployment strategy should they use?

A.Fine-tune an open-source model and host it on a cloud VM with auto-scaling
B.Deploy a distilled version of the model on edge devices using TensorFlow Lite
C.Host a large foundation model on Google Cloud and use a mobile app to send API requests
D.Deploy a distill of a smaller model on Google Cloud VM instances
AnswerB

Enables offline inference with low resource usage.

Why this answer

Option B is correct because deploying a distilled version of the model on edge devices using TensorFlow Lite directly addresses the constraints of offline operation, minimal data usage, and limited technical staff. Distillation reduces model size and computational requirements, enabling inference on local hardware without cloud dependency, which is critical for underserved regions with intermittent connectivity.

Exam trap

The trap here is that candidates confuse 'distillation on edge' with 'distillation on cloud VMs' (Option D), overlooking that edge deployment is the only way to guarantee offline functionality, while cloud VMs still require network access for inference.

How to eliminate wrong answers

Option A is wrong because hosting a fine-tuned model on a cloud VM with auto-scaling requires constant internet connectivity for the chatbot to function, which fails the offline requirement. Option C is wrong because using a large foundation model via API requests from a mobile app incurs high data usage and relies on continuous cloud access, contradicting the need for minimal data usage and offline capability. Option D is wrong because deploying a distilled model on Google Cloud VM instances still requires internet connectivity for inference, missing the offline requirement, and does not leverage edge deployment for local processing.

25
MCQeasy

Which of the following is a key consideration when selecting a GenAI model for a cost-sensitive application?

A.Model size in parameters
B.Latency and throughput requirements
C.Number of training epochs
D.The model's training data source
AnswerB

Latency and throughput directly determine the infrastructure needed and thus the cost per inference.

Why this answer

For cost-sensitive applications, latency and throughput requirements directly impact infrastructure costs, as lower latency often requires more expensive compute resources (e.g., higher GPU memory, faster inference hardware) and higher throughput may necessitate scaling out instances. Model size in parameters is a secondary factor that influences latency and throughput, but the primary cost driver is the operational performance needed to meet service-level agreements (SLAs).

Exam trap

Google Cloud often tests the misconception that model size (parameters) is the primary cost driver, but the exam emphasizes that operational metrics like latency and throughput are the direct determinants of infrastructure cost in production.

How to eliminate wrong answers

Option A is wrong because model size in parameters affects memory and compute requirements but is not the key consideration for cost sensitivity; a smaller model can still be costly if latency or throughput demands are high. Option C is wrong because number of training epochs is a training-time hyperparameter that does not directly influence inference cost or operational cost in a deployed application. Option D is wrong because the model's training data source impacts bias, accuracy, and compliance, but not the direct operational cost of running inference at scale.

26
MCQhard

A research organization is building a generative AI model to assist in drug discovery by generating molecular structures. They have a large dataset of proprietary chemical compounds and want to train a model from scratch. They have extensive ML expertise but limited GPU resources. The organization must comply with strict data privacy regulations that prohibit data from leaving their on-premises environment. Which strategy enables them to train the model efficiently while meeting compliance?

A.Train the model entirely on-premises using existing servers
B.Use Google Cloud Confidential VMs with attached GPUs for secure training
C.Partner with a cloud provider to train the model on their infrastructure
D.Transfer the data to Google Cloud and use standard GPU instances
AnswerB

Confidential VMs encrypt data in use, meeting privacy needs with scalable GPUs.

Why this answer

Google Cloud Confidential VMs with attached GPUs provide hardware-based memory encryption (using AMD SEV or Intel TDX) that protects data in use, enabling secure training on sensitive proprietary chemical data in the cloud. This allows the organization to leverage scalable GPU resources for efficient model training while maintaining compliance with strict data privacy regulations that prohibit data from leaving their on-premises environment.

Exam trap

Google Cloud often tests the misconception that any cloud GPU instance is sufficient for compliance, but the trap here is that standard GPU instances lack in-use memory encryption, which is required when data privacy regulations prohibit data from leaving the on-premises environment.

How to eliminate wrong answers

Option A is wrong because training entirely on-premises using existing servers would be inefficient due to limited GPU resources, leading to excessively long training times for a generative AI model from scratch. Option C is wrong because partnering with a cloud provider without specifying a secure, encrypted compute environment (like Confidential VMs) would expose the proprietary data to potential privacy risks and violate compliance requirements. Option D is wrong because transferring data to Google Cloud and using standard GPU instances does not provide the necessary in-use data encryption, leaving the data vulnerable during processing and failing to meet strict data privacy regulations.

27
Multi-Selecthard

A global e-commerce company uses generative AI to generate product descriptions in multiple languages. They want to ensure consistency across markets while respecting cultural nuances. Which THREE strategies should they adopt?

Select 3 answers
A.Standardize all descriptions to a neutral tone to avoid cultural issues.
B.Develop region-specific prompt templates that incorporate local cultural references and legal requirements.
C.Engage local marketing teams to review and approve AI-generated descriptions before publication.
D.Use a single global model with a translation layer to convert English descriptions.
E.Use A/B testing to measure engagement metrics per region and iterate on prompts.
AnswersB, C, E

Tailored prompts guide the model to produce culturally appropriate content.

Why this answer

Options A, C, and E are correct. Region-specific prompts, local human review, and A/B testing ensure consistency and cultural sensitivity. Option B is wrong because direct translation may miss nuances.

Option D is wrong because it reduces localization.

28
MCQmedium

A bank wants to use LLMs to generate responses for customer support chat. All conversations must be logged, and any PII must be masked. The solution must comply with financial regulations. Which combination of Vertex AI services should be used?

A.Deploy a custom model on Cloud Run and write a Cloud Function to mask PII.
B.Use Vertex AI Prediction with a custom container that masks PII before inference.
C.Use the Gemini API directly with a custom logging solution in Cloud Logging.
D.Use Vertex AI Agent Builder with Data Governance, which can automatically mask PII and log interactions.
AnswerD

B is correct because it provides built-in compliance features.

Why this answer

Option D is correct because Vertex AI Agent Builder integrates with Data Governance to automatically mask PII and log interactions, meeting both the logging and compliance requirements without custom development. This managed service ensures adherence to financial regulations by providing built-in data loss prevention (DLP) capabilities and audit trails, unlike the other options which require manual or less integrated approaches.

Exam trap

Google Cloud often tests the misconception that custom development (e.g., Cloud Functions or custom containers) is necessary for PII masking and logging, when in fact managed services like Vertex AI Agent Builder with Data Governance provide a more compliant and integrated solution out of the box.

How to eliminate wrong answers

Option A is wrong because deploying a custom model on Cloud Run with a Cloud Function for PII masking introduces operational complexity and latency, and does not natively integrate with Vertex AI's logging or compliance features, risking gaps in regulatory adherence. Option B is wrong because using Vertex AI Prediction with a custom container that masks PII before inference still requires custom development for logging and does not leverage Vertex AI's built-in data governance, making it harder to ensure consistent compliance across all interactions. Option C is wrong because using the Gemini API directly with a custom logging solution in Cloud Logging lacks automatic PII masking and data governance, forcing manual implementation that is error-prone and may not meet strict financial regulations for auditability and data protection.

29
MCQmedium

A marketing agency uses gen AI for content generation. They need to brand consistently. What is a key business consideration?

A.Use only generated content
B.Implement content moderation and brand guidelines
C.Use the most creative model
D.Optimize for speed
AnswerB

Guides the model to produce on-brand content and review outputs.

Why this answer

Option B is correct because consistent branding requires enforcing predefined guidelines on tone, style, and terminology across all generated content. Without content moderation and brand guidelines, a generative AI model may produce off-brand, inconsistent, or even harmful outputs, undermining brand identity. This is a core business strategy for deploying gen AI at scale, ensuring alignment with marketing objectives.

Exam trap

Google Cloud often tests the misconception that generative AI can be deployed autonomously without governance, leading candidates to overvalue raw creativity or speed over the business-critical need for controlled, brand-aligned output.

How to eliminate wrong answers

Option A is wrong because relying solely on generated content without human oversight or curation risks producing factually incorrect, off-brand, or legally problematic material, as generative models lack inherent understanding of brand context. Option C is wrong because the most creative model may prioritize novelty over adherence to brand constraints, leading to unpredictable outputs that violate brand guidelines. Option D is wrong because optimizing for speed can sacrifice output quality and consistency, increasing the likelihood of generating content that fails to meet brand standards or requires extensive post-editing.

30
Multi-Selecthard

An organization is developing a GenAI strategy for multiple business units. Which THREE steps should they take to ensure alignment? (Select three.)

Select 3 answers
A.Implement a chargeback model for usage costs
B.Allow each business unit to independently choose models
C.Establish common data governance policies
D.Create a center of excellence (CoE) for GenAI
E.Prioritize use cases based on ROI and risk
AnswersC, D, E

Common policies ensure data consistency, compliance, and reusability across units.

Why this answer

Establishing common data governance policies (C) ensures that all business units adhere to consistent standards for data quality, privacy, and security, which is critical for training and deploying reliable GenAI models. Without unified governance, disparate data practices can lead to model bias, compliance violations, and integration failures across the organization.

Exam trap

Google Cloud often tests the misconception that financial controls (chargeback) or decentralized model selection are sufficient for alignment, when in fact they miss the core need for shared governance, centralized expertise, and risk-based prioritization.

31
MCQeasy

Refer to the exhibit. A user receives this error when trying to get predictions from a Vertex AI endpoint. What is the most likely cause?

A.The endpoint does not exist
B.The endpoint is in a different region
C.The user lacks necessary IAM permissions
D.The model is not deployed
AnswerC

PERMISSION_DENIED indicates missing permissions.

Why this answer

Option B is correct because the error message explicitly says PERMISSION_DENIED, indicating lack of IAM permissions. Option A (endpoint does not exist) would give NOT_FOUND error. Option C (model not deployed) would give a different error.

Option D (different region) would also give a different error.

32
MCQeasy

Refer to the exhibit. What access does the IAM policy grant to developer@example.com?

A.Ability to use Vertex AI models for prediction and view metadata.
B.No effective permissions because the role is incorrect.
C.Ability to deploy and manage models.
D.Full control over all Vertex AI resources.
AnswerA

roles/aiplatform.user grants permissions to predict, explain, and view resources.

Why this answer

Option B is correct because roles/aiplatform.user allows using models for predictions and viewing metadata, but not deploying or deleting models. Options A and C require higher roles.

33
MCQmedium

A healthcare startup wants to use generative AI to provide clinical decision support. They must minimize the risk of harmful hallucinations. Which business strategy is most appropriate?

A.Implement retrieval-augmented generation with meticulously curated medical literature.
B.Limit the model's output length to reduce hallucination risk.
C.Deploy a large general-purpose model and rely on post-processing filters.
D.Use a custom fine-tuned model on a proprietary medical dataset.
AnswerA

RAG uses retrieved, vetted documents to generate answers, significantly reducing hallucinations by grounding responses in authoritative sources.

Why this answer

Retrieval-augmented generation (RAG) grounds the model's output in a trusted, external knowledge base—here, curated medical literature—which directly reduces the risk of hallucination by forcing the model to cite or derive answers from verified sources. This is the most effective strategy for clinical decision support because it combines generative flexibility with factual accuracy, unlike methods that only limit output or rely on post-hoc filtering.

Exam trap

Google Cloud often tests the misconception that fine-tuning alone is sufficient for domain-specific accuracy, when in fact RAG is superior for reducing hallucinations because it provides dynamic, verifiable grounding rather than static memorization.

How to eliminate wrong answers

Option B is wrong because limiting output length does not address the root cause of hallucinations; a short response can still be factually incorrect or harmful. Option C is wrong because post-processing filters are reactive and cannot reliably catch subtle or context-dependent hallucinations in a high-stakes medical domain, and large general-purpose models lack domain-specific grounding. Option D is wrong because a custom fine-tuned model on a proprietary dataset may still hallucinate if the dataset is incomplete, biased, or not rigorously curated, and fine-tuning does not inherently provide a retrieval mechanism to verify facts against authoritative sources.

34
MCQmedium

A large enterprise is evaluating gen AI for internal knowledge management. They need to ensure accuracy and reduce hallucinations. Which strategy is most effective?

A.Fine-tune a model on domain-specific data
B.Increase model temperature
C.Use Retrieval-Augmented Generation (RAG)
D.Use a larger model without customization
AnswerC

RAG retrieves relevant documents and conditions the model on them, dramatically reducing hallucinations.

Why this answer

Retrieval-Augmented Generation (RAG) is the most effective strategy because it grounds the model's responses in an external, authoritative knowledge base, retrieving relevant documents at inference time to provide factual context. This directly reduces hallucinations by ensuring the generated output is based on retrieved evidence rather than relying solely on the model's parametric memory, which is critical for enterprise knowledge management where accuracy is paramount.

Exam trap

Google Cloud often tests the misconception that fine-tuning is the universal solution for domain adaptation, but the trap here is that fine-tuning does not provide a dynamic, verifiable knowledge source, whereas RAG explicitly decouples knowledge storage from generation, enabling real-time updates and source attribution.

How to eliminate wrong answers

Option A is wrong because fine-tuning on domain-specific data embeds knowledge into the model's weights, which can still lead to hallucinations when the model encounters novel or edge-case queries, and it does not provide a mechanism to cite or verify the source of information. Option B is wrong because increasing model temperature introduces randomness into token selection, which amplifies hallucinations and reduces the determinism required for accurate knowledge retrieval. Option D is wrong because using a larger model without customization does not address the root cause of hallucinations; larger models still rely on parametric memory and can fabricate information, especially for niche or proprietary enterprise data.

35
MCQeasy

A company wants to measure the business impact of a GenAI content generation tool. Which metric is most appropriate?

A.Reduction in content production time
B.Number of model parameters
C.Model accuracy on a test set
D.Training loss
AnswerA

This metric directly measures the business value of automation and efficiency.

Why this answer

Option A is correct because the primary business impact of a GenAI content generation tool is operational efficiency, measured by the reduction in content production time. This metric directly correlates to cost savings and faster time-to-market, which are key business outcomes. Unlike technical metrics, it reflects real-world value delivery.

Exam trap

Google Cloud often tests the confusion between technical performance metrics (e.g., accuracy, loss) and business impact metrics (e.g., time savings, cost reduction), leading candidates to select a technically impressive but irrelevant option like model parameters or accuracy.

How to eliminate wrong answers

Option B is wrong because the number of model parameters is a model architecture metric, not a business impact metric; it does not measure how the tool affects content production workflows or ROI. Option C is wrong because model accuracy on a test set evaluates technical performance on a static dataset, not the tool's effectiveness in a dynamic business environment where content quality and relevance vary. Option D is wrong because training loss is a training-phase optimization metric that indicates model convergence, not post-deployment business outcomes like productivity gains.

36
MCQeasy

A startup wants to leverage Google Cloud's generative AI but has limited ML expertise. Which Google Cloud service allows them to build generative AI applications without deep ML knowledge?

A.Vertex AI Generative AI Studio
B.Cloud TPU
C.TensorFlow
D.Apigee
AnswerA

Design and deploy generative AI apps without coding.

Why this answer

Vertex AI Generative AI Studio is a managed service that provides a low-code/no-code interface for building, testing, and deploying generative AI applications using pre-trained foundation models. It abstracts away the complexities of model training, infrastructure management, and ML pipeline orchestration, enabling teams with limited ML expertise to leverage generative AI capabilities through simple prompts and visual workflows.

Exam trap

The trap here is that candidates confuse infrastructure-level services (Cloud TPU) or developer tools (TensorFlow) with managed application-building platforms, assuming that any ML-related Google Cloud service can be used without expertise, when in fact only Vertex AI Generative AI Studio provides the necessary abstraction for non-ML practitioners.

How to eliminate wrong answers

Option B (Cloud TPU) is wrong because Cloud TPUs are specialized hardware accelerators designed for training and running large-scale ML models, requiring deep expertise in distributed computing, model optimization, and TensorFlow/PyTorch programming — not a service for building generative AI applications without ML knowledge. Option C (TensorFlow) is wrong because TensorFlow is an open-source ML framework that requires programming skills to define, train, and deploy models; it does not provide a managed, no-code interface for generative AI application development. Option D (Apigee) is wrong because Apigee is an API management platform focused on securing, scaling, and analyzing API traffic, not a service for building or deploying generative AI models or applications.

37
MCQhard

A team has developed a generative AI model for real-time translation. The evaluation metrics and business requirements are shown. Which business decision is most appropriate given the trade-offs?

A.Accept the model as-is because all other metrics are within limits.
B.Optimize the model for cost efficiency, even if accuracy drops slightly to 90%.
C.Prioritize latency reduction even if it increases cost.
D.Reduce accuracy to 85% to achieve both latency and cost targets.
AnswerB

Cost is the only metric out of range; minor accuracy loss is acceptable.

Why this answer

Option B is correct because the business requirements prioritize cost efficiency as the primary constraint, and the model currently exceeds the cost target. A slight accuracy drop to 90% (still within acceptable limits) allows cost to be reduced, aligning with the core business goal. The trade-off is acceptable since latency and other metrics remain within bounds, and accuracy at 90% still meets the minimum threshold for real-time translation quality.

Exam trap

Google Cloud often tests the ability to prioritize business constraints over model perfection, and the trap here is assuming that accuracy must be preserved at all costs, when in fact cost efficiency is the binding requirement and a small accuracy trade-off is acceptable.

How to eliminate wrong answers

Option A is wrong because accepting the model as-is ignores the fact that cost exceeds the business requirement, which is a critical failure for deployment at scale. Option C is wrong because prioritizing latency reduction, even if it increases cost, directly violates the cost constraint and does not address the primary business need for cost efficiency. Option D is wrong because reducing accuracy to 85% is unnecessary; the cost target can likely be met with a smaller accuracy drop (e.g., to 90%), and 85% may fall below the acceptable quality threshold for real-time translation, risking user trust.

38
MCQmedium

A healthcare organization is developing a generative AI system to assist doctors with clinical decision support. They are concerned about regulatory compliance (e.g., HIPAA) and potential liability. What is the most important business strategy to mitigate these risks?

A.Limit the system to non-critical administrative tasks only.
B.Use an open-source model to avoid vendor lock-in and reduce costs.
C.Fully automate the system to reduce human error.
D.Implement a human-in-the-loop review process with clear accountability for AI-generated recommendations.
AnswerD

Human oversight ensures compliance and provides a clear chain of responsibility.

Why this answer

Option B is correct because human oversight and clear accountability are essential for high-stakes decisions. Option A is wrong because automation without oversight increases liability. Option C is wrong because open-source models may not comply with privacy requirements.

Option D is wrong because limiting scope reduces utility but does not address accountability.

39
MCQmedium

A retail company wants to use GenAI to generate product descriptions. They have a small team of data scientists. What is the most efficient approach?

A.Collect more data for several months before starting
B.Train a model from scratch using their product data
C.Use a foundation model API with prompt engineering and few-shot examples
D.Buy a proprietary model from a startup
AnswerC

A foundation model API provides high-quality output with minimal effort; prompt engineering tailors it to product descriptions.

Why this answer

Option C is correct because using a foundation model API with prompt engineering and few-shot examples is the most efficient approach for a small team. It leverages pre-trained models (e.g., GPT-4, Claude) via API calls, requiring no infrastructure or training data, while prompt engineering and few-shot examples allow the model to adapt to the company's product catalog with minimal effort and cost.

Exam trap

Google Cloud often tests the misconception that more data or custom training is always better, but the trap here is that candidates overlook the efficiency and sufficiency of foundation model APIs with prompt engineering for small teams with limited data and compute resources.

How to eliminate wrong answers

Option A is wrong because collecting more data for several months delays deployment unnecessarily; foundation models already have broad language understanding and can generate product descriptions with minimal domain-specific data via few-shot prompting. Option B is wrong because training a model from scratch is computationally expensive, requires large labeled datasets, and demands deep ML expertise, which is inefficient for a small team with limited resources. Option D is wrong because buying a proprietary model from a startup introduces vendor lock-in, potential licensing costs, and may not offer the flexibility or rapid iteration that API-based foundation models provide.

40
MCQhard

A media company wants to build a multi-modal generative app that accepts text, image, and video inputs and produces summaries. The app must handle variable-length videos up to 10 minutes. Which architecture is most scalable and cost-effective?

A.Use a pipeline to split videos into short clips, extract key frames, and process with Gemini 1.5 Pro (with context caching) to generate summaries.
B.Use Video Intelligence API to generate video captions, then feed captions to a text model.
C.Convert all inputs to text descriptions and use a text-only model.
D.Deploy a single Vertex AI endpoint with a model that can ingest multi-modal data directly.
AnswerA

B is correct because it handles variable-length content efficiently within model limits.

Why this answer

Option A is correct because splitting videos into short clips and extracting key frames reduces the computational load and token usage, while Gemini 1.5 Pro's context caching efficiently handles variable-length videos up to 10 minutes by reusing processed context across requests. This approach balances scalability (by avoiding processing entire videos at once) and cost-effectiveness (by minimizing API calls and storage), making it ideal for a multi-modal summarization app.

Exam trap

Google Cloud often tests the misconception that a single multi-modal endpoint is inherently scalable, but the trap here is that direct ingestion of raw video without preprocessing (like key frame extraction) leads to prohibitive token costs and latency, making pipeline-based approaches with caching more practical for variable-length inputs.

How to eliminate wrong answers

Option B is wrong because using Video Intelligence API to generate captions and then feeding them to a text model loses visual and temporal information from the video, such as scene transitions and non-verbal cues, which degrades summary quality. Option C is wrong because converting all inputs (images, videos) to text descriptions discards multi-modal richness, forcing a text-only model to infer visual details, which is inaccurate and inefficient for variable-length videos. Option D is wrong because deploying a single Vertex AI endpoint with a multi-modal model directly ingesting raw data would be computationally expensive and unscalable for 10-minute videos, as it requires processing every frame without optimization, leading to high latency and cost.

41
Multi-Selectmedium

What are THREE best practices for responsible generative AI deployment?

Select 3 answers
A.Monitor model performance and data drift over time
B.Maximize model size for best accuracy
C.Maintain human oversight for critical decisions
D.Implement content filters to block harmful or biased outputs
E.Avoid fine-tuning the model to preserve original capabilities
AnswersA, C, D

Continuous monitoring helps detect degradation and ensures the model remains reliable.

Why this answer

Option A is correct because continuous monitoring of model performance and data drift is essential for maintaining the reliability and safety of generative AI systems. Data drift occurs when the statistical properties of input data change over time, which can degrade model accuracy and introduce unintended biases. Regular monitoring allows teams to detect these shifts early and retrain or adjust the model to sustain responsible behavior.

Exam trap

Google Cloud often tests the misconception that bigger models are always better, but the trap here is that responsible AI deployment focuses on safety, fairness, and reliability rather than raw performance metrics like model size.

42
Multi-Selectmedium

Which TWO actions are recommended best practices for cost optimization when deploying generative AI models on Vertex AI?

Select 2 answers
A.Use batch prediction for non-real-time workloads
B.Set up autoscaling with a minimum number of replicas to avoid excessive scaling
C.Deploy the model in a single region to reduce network costs
D.Store all model prediction logs indefinitely for auditing
E.Always use GPU instances for inference
AnswersA, B

Batch prediction uses preemptible VMs, reducing cost.

Why this answer

Option A is correct because batch prediction processes predictions asynchronously in large batches, which is significantly more cost-effective than online (real-time) prediction for workloads that do not require immediate responses. Vertex AI batch prediction jobs automatically scale down to zero when not in use, eliminating idle compute costs, and you only pay for the resources consumed during the job execution.

Exam trap

Google Cloud often tests the misconception that single-region deployment always reduces costs, when in reality it can increase network egress charges and latency penalties for global users, making multi-region strategies with traffic management more cost-effective.

43
MCQmedium

A team set a budget alert for their GenAI API usage at $10,000. They received the alert with current spend of $12,500. Which business action is most appropriate as a first step?

A.Pause all non-critical use cases immediately
B.Switch to a cheaper model provider
C.Review usage patterns and optimize prompt lengths and frequencies
D.Increase the budget by 50% to $15,000
AnswerC

Optimizing usage is the most cost-effective first step; it can reduce consumption without disrupting operations.

Why this answer

Option C is correct because the first step in responding to a budget overrun should be to analyze usage patterns and optimize prompt lengths and frequencies. This approach identifies inefficiencies (e.g., unnecessarily verbose prompts, excessive retries) that directly reduce token consumption and cost without disrupting critical operations. It aligns with the principle of cost optimization before making architectural or policy changes.

Exam trap

Google Cloud often tests the misconception that immediate cost-cutting actions (like pausing or switching models) are the best first step, when in fact data-driven analysis and optimization should precede any operational or financial changes.

How to eliminate wrong answers

Option A is wrong because pausing all non-critical use cases is a reactive, blunt measure that may disrupt business processes and does not address the root cause of cost overruns; it should be considered only after analysis shows specific non-critical usage is the primary driver. Option B is wrong because switching to a cheaper model provider without understanding current usage patterns risks degrading output quality or compatibility, and may not address inefficiencies like prompt bloat or high-frequency calls. Option D is wrong because increasing the budget without investigating the overrun ignores the underlying issue and can lead to uncontrolled spending; it is a financial workaround, not a cost management strategy.

44
MCQhard

A financial services firm is deploying a generative AI chatbot for customer inquiries. They have strict compliance requirements: all conversations must be auditable and the model must not use customer data for training. Which Google Cloud offering should they choose?

A.Private Google Access for on-premises connectivity
B.Dialogflow CX with Cloud Logging
C.Cloud AI Platform Pipelines
D.Vertex AI Agent Builder with data governance controls
AnswerD

Vertex AI Agent Builder offers built-in audit logging and data governance to meet compliance requirements.

Why this answer

Vertex AI Agent Builder is correct because it provides built-in data governance controls that prevent customer data from being used for model training, while also supporting full auditability through integration with Cloud Audit Logs and Cloud Logging. This directly addresses the firm's compliance requirements for auditable conversations and data privacy.

Exam trap

The trap here is that candidates may confuse Dialogflow CX (a conversational AI platform) with Vertex AI Agent Builder, not realizing that Dialogflow CX lacks the native data governance controls to prevent customer data from being used for model training, which is the key differentiator for compliance-heavy use cases.

How to eliminate wrong answers

Option A is wrong because Private Google Access is a networking feature that enables on-premises hosts to reach Google APIs over internal IP addresses, but it does not provide any chatbot functionality, audit logging, or data governance controls. Option B is wrong because Dialogflow CX with Cloud Logging provides conversational AI and audit logging, but it lacks the specific data governance controls to prevent customer data from being used for model training, which is a critical compliance requirement. Option C is wrong because Cloud AI Platform Pipelines is a workflow orchestration service for ML pipelines, not a chatbot deployment solution, and it does not offer the required auditability or data governance for customer conversations.

45
MCQhard

A machine learning engineer is defining a Vertex AI pipeline for model evaluation using the JSON representation shown. The pipeline fails with an error that the 'eval_dataset' parameter is missing. What is the issue?

A.The component 'comp-model-eval' does not accept 'eval_dataset' as input
B.The 'project' parameter should be a pipeline input, not a constant
C.The runtimeConfig parameter values must be strings, not references
D.The pipeline spec does not declare 'eval_dataset' as a pipeline input parameter
AnswerD

The root inputDefinitions is empty, so 'eval_dataset' is not recognized as a pipeline parameter.

Why this answer

The pipeline spec defines 'eval_dataset' as a componentInput, but it is not defined in the root's inputDefinitions (option B). The runtimeConfig has the value, but the pipeline spec does not declare the parameter. The component (A) may be fine.

The constant (C) is for project. The runtimeConfig (D) is correct but the spec is missing the input definition.

46
MCQhard

A company wants to ensure only authorized users can deploy gen AI models. The current policy allows all users in the domain. What is the best practice to restrict deployment?

A.Remove the binding
B.Add more roles
C.Add condition to restrict deployment
D.Use organizational policies
AnswerC

Conditions in IAM allow policies like requiring a specific IP range or MFA for deployment actions.

Why this answer

Option C is correct because adding a condition to restrict deployment (e.g., using IAM conditions in AWS, conditional access policies in Azure, or attribute-based access control in GCP) allows you to limit model deployment to only authorized users based on attributes like user role, project, or resource tags. This is the best practice because it enforces fine-grained access control without removing existing permissions or adding unnecessary roles, directly addressing the requirement to restrict deployment while maintaining existing user access.

Exam trap

Google Cloud often tests the misconception that organizational policies (Option D) are the catch-all for access control, but they are designed for resource-level governance (e.g., disabling service creation), not for user-specific deployment restrictions, which require conditional IAM policies.

How to eliminate wrong answers

Option A is wrong because removing the binding (e.g., an IAM policy binding or role assignment) would revoke all deployment permissions for all users, which is too restrictive and would break legitimate use cases. Option B is wrong because adding more roles does not inherently restrict deployment; it only grants additional permissions, potentially widening the attack surface and violating the principle of least privilege. Option D is wrong because organizational policies (e.g., organization policies in GCP or Azure Policy) are typically used for compliance and governance at the resource hierarchy level, not for fine-grained, user-specific deployment restrictions; they lack the granularity to target individual authorized users.

47
MCQmedium

An organization uses an IAM policy for Vertex AI as shown. A security audit reveals that engineer@example.com deployed a model that inadvertently exposed sensitive data. What is the most likely reason this happened?

A.Audit logging is not enabled for DATA_WRITE events.
B.The admin user did not review the deployment.
C.The engineer had the aiplatform.user role, which includes permissions to deploy models without additional review.
D.The policy does not include a separation of duties between development and production.
AnswerC

The user role allows deployment, and no approval gate is enforced.

Why this answer

Option C is correct because the `aiplatform.user` role in Vertex AI includes the `aiplatform.model.deploy` permission, which allows any user with that role to deploy models without requiring additional approvals or administrative review. This lack of a secondary authorization step means the engineer could deploy a model that exposed sensitive data, even if the model had not been properly vetted for data leakage.

Exam trap

The trap here is that candidates may focus on operational failures like missing audit logs or lack of review, rather than recognizing that the IAM role itself grants the permission to deploy without any guardrails, which is the direct technical cause of the exposure.

How to eliminate wrong answers

Option A is wrong because audit logging for DATA_WRITE events records actions after they occur, but does not prevent the deployment itself; the exposure happened due to insufficient permissions control, not missing logs. Option B is wrong because the admin user not reviewing the deployment is a process failure, but the root cause is that the IAM policy granted the engineer the ability to deploy without any review being required. Option D is wrong because while separation of duties is a best practice, the specific IAM policy shown does not enforce it; the question asks for the most likely reason the exposure occurred, which is the direct permission granted by the `aiplatform.user` role.

48
MCQeasy

A small marketing agency with 10 employees is exploring generative AI to create personalized ad copy for their clients. They have a limited budget of $5,000 per month and no in-house machine learning expertise. The CEO wants to have a working prototype within two weeks to show to a potential client. The agency's data is sensitive and cannot be shared with unauthorized third parties. Which strategy should they pursue?

A.Hire a team of data scientists to fine-tune an open-source model
B.Use a third-party platform that requires on-premise deployment
C.Build a custom foundation model from scratch using their client data
D.Use Google's Generative AI Studio with pre-trained models via API
AnswerD

Managed service enables quick, low-cost prototyping with data privacy.

Why this answer

Option D is correct because Google's Generative AI Studio provides pre-trained models via API, allowing the agency to quickly prototype personalized ad copy without needing in-house ML expertise. This approach respects the $5,000 budget (API usage is cost-effective for small-scale prototyping), meets the two-week timeline (no training required), and ensures data privacy by using Google Cloud's data governance controls (data is not shared with unauthorized third parties).

Exam trap

Google Cloud often tests the misconception that building or fine-tuning a model from scratch is the only way to achieve customization, when in fact pre-trained APIs with prompt engineering or lightweight fine-tuning can meet business constraints like budget, timeline, and expertise.

How to eliminate wrong answers

Option A is wrong because hiring a team of data scientists to fine-tune an open-source model would exceed the $5,000 monthly budget and the two-week timeline, and the agency lacks the in-house expertise to manage such a team. Option B is wrong because requiring on-premise deployment contradicts the agency's lack of ML expertise and limited budget; on-premise solutions typically involve high upfront costs and ongoing maintenance. Option C is wrong because building a custom foundation model from scratch is prohibitively expensive (often millions of dollars), requires vast amounts of data and compute resources, and cannot be completed within two weeks or within a $5,000 budget.

49
Multi-Selectmedium

Which TWO factors are most critical when deciding to build a custom GenAI model vs. using a pre-built API? (Select two.)

Select 2 answers
A.Availability of in-house ML talent
B.Need for domain-specific knowledge
C.Number of layers in the model
D.Brand reputation of the model provider
E.Volume of expected inference requests
AnswersA, B

Building a custom model requires significant ML expertise; without it, using an API is more practical.

Why this answer

Option A is correct because building a custom GenAI model requires specialized machine learning expertise, including proficiency in frameworks like PyTorch or TensorFlow, experience with distributed training (e.g., using Horovod or DeepSpeed), and the ability to fine-tune architectures like transformers. Without in-house ML talent, the organization cannot effectively manage data curation, hyperparameter tuning, or model evaluation, making a pre-built API the more viable choice. This factor directly determines whether the organization has the technical capacity to undertake custom development.

Exam trap

Google Cloud often tests the distinction between strategic business factors (like in-house talent and domain specificity) versus operational or vendor-related details (like model layers, brand reputation, or request volume) to see if candidates can separate high-level decision drivers from low-level implementation concerns.

50
MCQhard

A company wants to use generative AI for creative content generation (e.g., marketing copy). They need to ensure the content is original and does not plagiarize existing materials. Which combination of strategies is most effective?

A.Use a model with a high temperature setting and post-process with plagiarism checker.
B.Fine-tune the model on a dataset of already-created content to learn style.
C.Use a retrieval-augmented generation system that explicitly avoids copying.
D.Limit the model to generate only short snippets.
AnswerC

RAG can be configured to paraphrase or generate novel content while staying relevant, reducing plagiarism risk.

Why this answer

Option C is correct because retrieval-augmented generation (RAG) systems explicitly retrieve relevant, non-copyrighted or licensed content from a curated knowledge base and generate outputs grounded in that retrieved data, which inherently reduces the risk of verbatim copying. Unlike simple plagiarism checkers or temperature adjustments, RAG combines retrieval with generation to ensure originality by design, making it the most effective strategy for avoiding plagiarism in creative content generation.

Exam trap

Google Cloud often tests the misconception that randomness (high temperature) or post-processing (plagiarism checkers) can prevent plagiarism, when in fact only retrieval-augmented generation or similar grounding techniques address the root cause of copying from training data.

How to eliminate wrong answers

Option A is wrong because high temperature settings increase randomness and creativity but do not prevent the model from memorizing and reproducing training data verbatim; a post-process plagiarism checker can only detect copying after generation, not prevent it, and may miss paraphrased or structurally similar content. Option B is wrong because fine-tuning on already-created content teaches the model to mimic existing styles and patterns, which increases the risk of overfitting and reproducing copyrighted or plagiarized material, especially if the dataset contains protected works. Option D is wrong because limiting output length does not address the core issue of originality; short snippets can still be direct copies of existing phrases or sentences, and the strategy fails to ensure content is novel or properly attributed.

51
MCQmedium

A manufacturing company wants to use generative AI to create maintenance manuals from sensor data. The manuals must be accurate and reflect the latest equipment configurations. Which approach best ensures data freshness and consistency?

A.Train the model in real-time as sensor data streams in.
B.Periodically retrain the model with the latest sensor data.
C.Have human technicians review and update the manuals manually.
D.Use a retrieval-augmented generation (RAG) system that queries a live database of sensor configurations.
AnswerD

RAG ensures responses are based on the most current data.

Why this answer

Option D is correct because a retrieval-augmented generation (RAG) system retrieves the most current equipment configurations directly from a live database at inference time, ensuring the generated manual reflects real-time sensor data without requiring model retraining. This approach decouples the static knowledge in the LLM from the dynamic data source, guaranteeing both accuracy and freshness while avoiding the latency and cost of continuous retraining.

Exam trap

Google Cloud often tests the misconception that retraining (Option B) is the only way to keep an LLM current, when in fact RAG provides a more efficient and accurate mechanism for incorporating live data without modifying the model itself.

How to eliminate wrong answers

Option A is wrong because training a model in real-time as sensor data streams in is impractical due to catastrophic forgetting, high computational overhead, and the inability of online learning to guarantee that the model's weights stabilize to reflect the latest configurations without extensive validation. Option B is wrong because periodic retraining introduces a window of staleness between retraining cycles, during which sensor data may change, leading to manuals that are not current; it also requires significant infrastructure for data collection, preprocessing, and model deployment. Option C is wrong because manual review and update by human technicians is slow, error-prone, and cannot scale to the volume and velocity of sensor data, defeating the purpose of using generative AI for automation.

52
MCQmedium

A company wants to use Generative AI for customer support chatbots. They are concerned about cost and latency. Which deployment option best balances these concerns?

A.Deploy an open-source model on-premise to avoid cloud costs
B.Rely on a third-party chatbot API that abstracts the model
C.Use the largest available foundation model via API for highest accuracy
D.Use a fine-tuned version of a smaller model on Vertex AI with response caching
AnswerD

A tuned smaller model reduces compute cost and caching minimizes repeated inference, lowering latency. Vertex AI provides scalable infrastructure.

Why this answer

Option D is correct because using a fine-tuned smaller model on Vertex AI with response caching reduces both cost and latency. Smaller models require fewer computational resources, and caching avoids redundant inference calls, directly addressing the company's concerns without sacrificing accuracy for the specific task.

Exam trap

Google Cloud often tests the misconception that 'larger model = better accuracy always' or that 'on-premise is always cheaper,' ignoring the total cost of ownership, scaling overhead, and the efficiency gains from fine-tuning and caching for specific use cases.

How to eliminate wrong answers

Option A is wrong because deploying on-premise incurs high upfront hardware and maintenance costs, and may not scale efficiently for variable customer support loads, often increasing total cost of ownership (TCO) despite avoiding cloud fees. Option B is wrong because relying on a third-party chatbot API abstracts the model but does not inherently optimize cost or latency; it may introduce per-call pricing and network overhead, and the provider controls model size and caching. Option C is wrong because using the largest available foundation model via API maximizes accuracy but also maximizes inference cost and latency due to higher parameter count and compute requirements, which is the opposite of balancing cost and latency.

53
MCQhard

A company is evaluating the ROI of a generative AI project. Which metric is most appropriate?

A.Reduction in time to complete tasks using the generative AI tool
B.Reduction in model error rate on a test set
C.Increase in user satisfaction scores
D.Cost per inference compared to historical average
AnswerA

Time savings directly translate to labor cost reduction or increased throughput, providing a clear ROI.

Why this answer

Option A is correct because the primary business justification for a generative AI project is operational efficiency, measured directly by the reduction in time to complete tasks. Unlike technical metrics such as model error rate, this metric ties the AI's output to tangible productivity gains, which is the core of ROI analysis in a business context. Generative AI tools are designed to augment human workflows, so time savings translate into cost savings and increased throughput, making it the most appropriate metric for evaluating return on investment.

Exam trap

Google Cloud often tests the distinction between technical performance metrics (like model error rate) and business outcome metrics, trapping candidates who default to evaluating AI models as they would in a data science context rather than from a business leadership perspective.

How to eliminate wrong answers

Option B is wrong because reduction in model error rate on a test set is a technical performance metric, not a business ROI metric; it measures model accuracy but does not account for the cost of deployment, user adoption, or actual business value generated. Option C is wrong because increase in user satisfaction scores, while valuable, is a lagging indicator that can be influenced by factors unrelated to the AI's direct impact on productivity or cost, and it does not quantify financial return. Option D is wrong because cost per inference compared to historical average focuses solely on operational cost efficiency, ignoring the revenue or time-saving benefits that the generative AI tool provides, thus failing to capture the full ROI picture.

54
MCQeasy

A startup wants to generate concise summaries of long news articles using an LLM on Vertex AI. They prioritize low latency and cost. Which model choice is most appropriate?

A.Use Gemini 1.5 Pro for the highest accuracy.
B.Use PaLM 2 Bison, as it is the most economical.
C.Use Vertex AI Text Embeddings, since embeddings can generate summaries.
D.Use Gemini 1.5 Flash, which is designed for high throughput and low cost.
AnswerD

A is correct because Flash balances performance and cost.

Why this answer

Gemini 1.5 Flash is optimized for high-throughput, low-latency, and cost-efficient summarization tasks, making it the ideal choice for a startup that needs to process long news articles quickly without incurring high costs. It balances performance and economy, whereas Gemini 1.5 Pro prioritizes accuracy at higher latency and cost, and PaLM 2 Bison is less efficient for this use case.

Exam trap

The trap here is that candidates often assume the most accurate model (Gemini 1.5 Pro) is always the best choice, overlooking the specific business requirements for low latency and cost, which Gemini 1.5 Flash directly addresses.

How to eliminate wrong answers

Option A is wrong because Gemini 1.5 Pro, while offering high accuracy, has higher latency and cost, which contradicts the startup's priority for low latency and cost. Option B is wrong because PaLM 2 Bison is not the most economical for summarization; it is a general-purpose model that may not provide the optimized throughput and cost-efficiency of Gemini 1.5 Flash, and it is being deprecated in favor of newer models. Option C is wrong because Vertex AI Text Embeddings generate vector representations of text, not natural language summaries; they cannot produce concise textual summaries directly.

55
Multi-Selecteasy

A company is considering using gen AI for customer support. Which two business strategies are most important for success?

Select 2 answers
A.Measure customer satisfaction metrics
B.Ignore data privacy
C.Deploy without testing
D.Ensure human-in-the-loop for critical interactions
E.Use the cheapest model
AnswersA, D

Metrics help evaluate success and guide improvements.

Why this answer

Measuring customer satisfaction metrics (A) is critical because it provides quantitative feedback on the generative AI system's performance, enabling iterative improvements to the model's responses and alignment with business goals. Without metrics like CSAT or NPS, the company cannot validate whether the AI is reducing resolution time or improving user experience, which are key ROI indicators for gen AI deployments.

Exam trap

Google Cloud often tests the misconception that cost optimization (cheapest model) or speed-to-market (deploy without testing) are primary success factors, when in reality governance, safety, and continuous measurement are the foundational strategies for sustainable gen AI adoption.

56
MCQhard

A financial services firm wants to deploy generative AI for automated investment advice. They are subject to strict regulatory oversight requiring explainability and audit trails. Which strategy best meets these requirements?

A.Fine-tune a model on historical trading data without human review.
B.Use a black-box large language model with monitoring.
C.Deploy a rule-based system augmented with generative AI for content generation.
D.Implement human-in-the-loop with full logging of model inputs, outputs, and human decisions.
AnswerD

This provides a transparent audit trail and human accountability, satisfying regulatory demands.

Why this answer

Option D is correct because it directly addresses the regulatory requirements for explainability and audit trails by incorporating human oversight and comprehensive logging. The human-in-the-loop (HITL) mechanism ensures that critical investment decisions are reviewed by qualified professionals, while full logging of model inputs, outputs, and human decisions creates a transparent, auditable record. This approach satisfies financial regulations like MiFID II or SEC rules that mandate explainability and accountability in automated advice systems.

Exam trap

Google Cloud often tests the misconception that monitoring or rule-based augmentation alone is sufficient for regulatory compliance, when in fact strict oversight and complete audit trails are mandatory for explainability in high-stakes domains like finance.

How to eliminate wrong answers

Option A is wrong because fine-tuning a model on historical trading data without human review introduces risks of overfitting to past market conditions and lacks the necessary audit trail and explainability for regulatory compliance. Option B is wrong because using a black-box large language model with monitoring still fails to provide the required explainability, as the internal decision-making process remains opaque and cannot be audited or justified to regulators. Option C is wrong because a rule-based system augmented with generative AI for content generation, while more transparent, still lacks the structured human oversight and full logging of decisions needed to meet strict audit trail requirements, and the generative AI component can introduce unpredictable outputs that undermine explainability.

57
MCQmedium

A healthcare provider plans to implement gen AI for clinical note summarization. They have limited AI expertise. Which Google Cloud approach best aligns with their business strategy?

A.Hire a team of data scientists
B.Use Vertex AI Agent Builder with pre-built templates
C.Deploy an open-source model on Compute Engine
D.Build a custom model from scratch
AnswerB

Leverages managed services and reduces the need for in-house AI expertise.

Why this answer

Vertex AI Agent Builder provides pre-built templates and a low-code interface specifically designed for organizations with limited AI expertise. It enables rapid deployment of generative AI solutions like clinical note summarization without requiring deep data science skills, directly aligning with the healthcare provider's business strategy of minimizing technical overhead while leveraging AI.

Exam trap

Google Cloud often tests the misconception that 'more technical control' (e.g., custom models or open-source deployment) is always better, but the trap here is that the question explicitly prioritizes business strategy and limited expertise, making low-code/no-code solutions like Vertex AI Agent Builder the correct choice over technically complex alternatives.

How to eliminate wrong answers

Option A is wrong because hiring a team of data scientists contradicts the 'limited AI expertise' constraint and introduces significant cost and time overhead, which is not a strategic fit for rapid implementation. Option C is wrong because deploying an open-source model on Compute Engine requires substantial DevOps, model tuning, and infrastructure management expertise, which the provider lacks. Option D is wrong because building a custom model from scratch demands advanced machine learning skills, large labeled datasets, and extensive training resources, making it impractical for an organization with limited AI expertise.

58
MCQmedium

A startup with $500k in seed funding wants to integrate GenAI into their SaaS product for automated report generation. They have 2 ML engineers and expect 10,000 monthly users initially. They estimate that using a foundation model API (e.g., Gemini) will cost $0.10 per 1K tokens, and each report uses about 5K tokens. Alternatively, they could fine-tune an open-source model on their domain data, estimated at $50k for compute and $20k for engineering time, with inference cost of $0.02 per 1K tokens on a dedicated endpoint. Which approach is more cost-effective over the first 12 months assuming 50,000 reports per month?

A.Use the foundation model API because it has lower upfront cost
B.Use a combination of both depending on report complexity
C.Build a custom model from scratch
D.Fine-tune the open-source model because it has lower per-report cost
AnswerD

Fine-tuning yields lower per-token cost, resulting in $190k total over a year, which is cheaper than the API.

Why this answer

Option D is correct because the total cost of fine-tuning over 12 months is $70,000 upfront plus $0.02 per 1K tokens * 5K tokens per report * 50,000 reports per month * 12 months = $600,000, totaling $670,000. The API approach costs $0.10 per 1K tokens * 5K tokens * 50,000 reports * 12 = $3,000,000, making fine-tuning significantly cheaper at scale despite the upfront investment.

Exam trap

Google Cloud often tests the misconception that lower upfront cost always means lower total cost, ignoring the multiplicative effect of per-unit costs at scale—candidates fixate on the $70k fine-tuning investment versus the API's zero upfront cost without calculating the 12-month total.

How to eliminate wrong answers

Option A is wrong because it ignores the per-report cost at scale; the API's $0.10 per 1K tokens leads to $3M over 12 months, far exceeding the fine-tuning total of $670k. Option B is wrong because a combination approach would not reduce costs—using the API for complex reports still incurs high per-token costs, and the problem does not specify complexity tiers that would justify splitting workloads. Option C is wrong because building a custom model from scratch requires massive data, compute, and engineering resources (often millions of dollars), far beyond the $500k seed funding and small team, making it impractical for a startup.

59
MCQmedium

A company wants to use GenAI to automate customer support. They have a large knowledge base. Which approach maximizes ROI in the first 6 months?

A.Deploy a general-purpose chatbot without customization
B.Use a pre-built conversational AI platform with Retrieval-Augmented Generation (RAG)
C.Build a custom LLM from scratch using their data
D.Fine-tune a foundation model on historical support tickets
AnswerB

A pre-built platform with RAG allows rapid deployment and leverages existing knowledge base, maximizing ROI in the short term.

Why this answer

Option B maximizes ROI in the first 6 months because it leverages a pre-built conversational AI platform integrated with Retrieval-Augmented Generation (RAG). RAG allows the model to dynamically retrieve relevant information from the existing knowledge base at inference time, providing accurate, context-aware responses without the need for costly retraining or custom model development. This approach balances rapid deployment, low upfront investment, and high accuracy, making it the most cost-effective solution for automating customer support quickly.

Exam trap

Google Cloud often tests the misconception that fine-tuning is always the best way to incorporate proprietary data, but the trap here is that fine-tuning does not provide real-time access to a dynamic knowledge base and is far more resource-intensive than RAG, which is the optimal strategy for rapid, cost-effective deployment in customer support scenarios.

How to eliminate wrong answers

Option A is wrong because deploying a general-purpose chatbot without customization would rely solely on the model's pre-trained knowledge, which lacks access to the company's specific knowledge base, leading to frequent hallucinations and incorrect answers that degrade customer trust and require extensive human oversight. Option C is wrong because building a custom LLM from scratch using their data is prohibitively expensive (often millions of dollars) and time-consuming (typically 12+ months), far exceeding the 6-month ROI window and requiring massive computational resources and specialized ML teams. Option D is wrong because fine-tuning a foundation model on historical support tickets alone does not incorporate the live knowledge base; it only adapts the model to past conversation patterns, which may become stale or miss updated information, and still requires significant compute and data preparation costs without the real-time retrieval capability that RAG provides.

60
MCQhard

A financial institution wants to use generative AI to generate personalized investment advice. They face strict regulatory requirements on explainability and bias. Which approach should they take?

A.Use a foundation model with prompt engineering
B.Use a custom model trained from scratch
C.Use a RAG system with curated proprietary data
D.Use a closed-source model with vendor lock-in
AnswerC

Enables control, explainability, and bias auditing.

Why this answer

Option C is correct because a Retrieval-Augmented Generation (RAG) system allows the financial institution to ground generative AI outputs in curated, proprietary data sources (e.g., regulatory guidelines, client risk profiles, historical performance). This approach enhances explainability by enabling traceable citations back to specific documents, and reduces bias by controlling the data fed to the model, which is critical for meeting strict regulatory requirements like GDPR or SEC rules on algorithmic fairness.

Exam trap

Google Cloud often tests the misconception that prompt engineering alone can solve domain-specific compliance needs, when in reality RAG is required to ground outputs in curated, auditable data for regulated industries.

How to eliminate wrong answers

Option A is wrong because prompt engineering alone on a foundation model does not guarantee explainability or bias control; the model may still generate outputs based on its pre-trained, opaque weights, making it impossible to trace advice to specific regulatory or proprietary data. Option B is wrong because training a custom model from scratch requires massive amounts of labeled, unbiased data and computational resources, and still risks hidden biases in the training process, while also lacking the built-in retrieval mechanism for transparent, auditable citations. Option D is wrong because a closed-source model with vendor lock-in limits the institution's ability to audit the model's internal logic, customize bias mitigation, or ensure compliance with evolving regulations, as the vendor controls all updates and data handling.

61
MCQmedium

A financial institution wants to deploy a generative AI solution for contract analysis. They need to ensure compliance with regulations. Which approach is best?

A.Deploy a large open-source model fine-tuned on public legal documents
B.Use a general-purpose pre-trained model with no modifications to minimize risk
C.Fine-tune a model on a curated dataset of past contracts and implement human-in-the-loop review
D.Implement retrieval-augmented generation (RAG) with the company's legal document database
AnswerC

Fine-tuning on relevant data improves accuracy, and human review catches any regulatory violations before finalization.

Why this answer

Option C is best because fine-tuning on a curated dataset of past contracts ensures the model learns domain-specific language and compliance patterns, while human-in-the-loop review provides a critical safety net for regulatory adherence. This combination directly addresses the need for accuracy and accountability in contract analysis, where errors can have legal consequences.

Exam trap

Google Cloud often tests the misconception that retrieval-augmented generation (RAG) alone is sufficient for domain-specific compliance, when in fact it requires fine-tuning or strict validation to prevent misinterpretation of retrieved legal texts.

How to eliminate wrong answers

Option A is wrong because deploying a large open-source model fine-tuned on public legal documents introduces risks from unvetted, potentially outdated or jurisdictionally inappropriate data, and lacks the controlled curation needed for compliance. Option B is wrong because a general-purpose pre-trained model with no modifications will lack the specialized knowledge of contract law, regulatory terms, and clause structures, leading to high error rates and non-compliance. Option D is wrong because retrieval-augmented generation (RAG) with the company's legal document database, while useful for grounding responses, does not inherently train the model on compliance patterns and still requires careful prompt engineering and validation to avoid hallucinations in critical contract analysis.

62
MCQhard

A global company deploying gen AI across multiple regions needs to minimize latency and comply with data sovereignty. What architecture should they adopt?

A.Single global deployment with CDN
B.Multi-region deployment with Vertex AI
C.Use a third-party API
D.On-premises deployment only
AnswerB

Vertex AI supports deploying models in multiple regions, reducing latency and enabling data residency compliance.

Why this answer

Option D is correct because multi-region deployment with Vertex AI allows serving models close to users (low latency) while adhering to data residency requirements. Option A is wrong because a single global deployment may violate data sovereignty and increase latency. Option B is wrong because on-premises deployment is costly and limits scalability.

Option C is wrong because third-party APIs may not offer multi-region data control.

63
MCQhard

A media company is using a generative AI model to create video captions. The model is deployed on Vertex AI with autoscaling. During peak hours, they observe high latency and request timeouts. Which action would most effectively address this issue?

A.Optimize the prompt to reduce output length
B.Reduce the maximum number of replicas to limit resource usage
C.Switch to a GPU-based machine type for faster inference
D.Increase the minimum number of replicas in the autoscaling configuration
AnswerD

Higher minimum replicas reduce cold starts and improve latency during traffic spikes.

Why this answer

Increasing the minimum number of replicas ensures that during peak hours, the model already has a baseline of warm instances ready to handle requests, reducing cold-start latency and preventing timeouts. Autoscaling can take time to spin up new replicas, so a higher minimum replica count directly mitigates the latency spike by pre-provisioning capacity.

Exam trap

The trap here is that candidates confuse performance optimization (faster inference per request) with capacity planning (ensuring enough concurrent replicas), leading them to choose GPU upgrades or prompt tweaks instead of addressing the autoscaling configuration.

How to eliminate wrong answers

Option A is wrong because optimizing the prompt to reduce output length may lower per-request compute time but does not address the root cause of insufficient concurrent serving capacity during traffic spikes. Option B is wrong because reducing the maximum number of replicas would cap the autoscaler's ability to add instances, worsening the bottleneck and increasing timeouts. Option C is wrong because switching to a GPU-based machine type can accelerate inference per request but does not solve the scaling issue; it may even increase cold-start time and cost without guaranteeing enough replicas to handle peak load.

64
Multi-Selectmedium

Which THREE are best practices for responsible deployment of generative AI in a customer-facing application?

Select 3 answers
A.Implement human-in-the-loop review for sensitive outputs
B.Train the model on all available data to maximize coverage
C.Implement content filters to block inappropriate outputs
D.Use only small models to reduce risk
E.Conduct regular bias and fairness audits
AnswersA, C, E

Human review adds accountability and error correction.

Why this answer

Option A is correct because human-in-the-loop (HITL) review ensures that sensitive outputs—such as those involving protected health information (PHI), personally identifiable information (PII), or high-stakes decisions—are vetted by a human before reaching the customer. This mitigates the risk of harmful or biased generations that automated guardrails might miss, aligning with responsible AI principles like accountability and safety.

Exam trap

Google Cloud often tests the misconception that 'more data is always better' or that 'smaller models are safer,' when in fact responsible deployment hinges on data quality, continuous monitoring, and layered safeguards rather than model size or data volume alone.

65
MCQhard

A healthcare provider wants to use generative AI to automatically draft clinical notes from doctor-patient conversations. They must comply with HIPAA and ensure patient data privacy. Which strategy best meets their requirements?

A.Outsource note generation to a third-party HIPAA-compliant vendor
B.Use Google Cloud Healthcare API integrated with Vertex AI
C.Deploy a custom model on-premises with strict access controls
D.Use a public LLM with a data anonymization pipeline
AnswerB

The Healthcare API is HIPAA-compliant and allows secure AI processing.

Why this answer

Option B is correct because Google Cloud Healthcare API with Vertex AI provides a HIPAA-compliant, managed environment that integrates generative AI capabilities directly with healthcare data. The Healthcare API enforces data residency, access controls, and audit logging, while Vertex AI allows fine-tuning or using foundation models without exposing PHI to public endpoints. This combination ensures patient data privacy and regulatory compliance without requiring on-premises infrastructure.

Exam trap

Google Cloud often tests the misconception that on-premises deployment (Option C) is always the most secure choice, but the trap here is that cloud-native HIPAA-compliant services like Google Cloud Healthcare API can offer superior security, compliance, and scalability when properly configured with BAAs and data residency controls.

How to eliminate wrong answers

Option A is wrong because outsourcing to a third-party vendor introduces additional risk of data exposure during transmission and requires extensive Business Associate Agreements (BAAs) and due diligence, which may not fully align with the provider's direct control over privacy. Option C is wrong because deploying a custom model on-premises, while secure, is often cost-prohibitive and lacks the scalability and managed compliance features of cloud-native solutions like Google Cloud Healthcare API, which already handles HIPAA requirements. Option D is wrong because using a public LLM with a data anonymization pipeline is risky; anonymization is not foolproof and can be reversed via re-identification attacks, and public LLMs typically do not offer HIPAA-compliant data processing guarantees, violating privacy requirements.

66
MCQmedium

A company is building a search application that requires grounding answers in their internal knowledge base. They want to use Vertex AI Search and Conversation with a custom datastore. Which configuration is essential to ensure the model only answers based on their documents?

A.Enable streaming responses to get real-time answers.
B.Fine-tune the model on the company's documents.
C.Configure the answer generation to use grounding with the enterprise datastore as the source.
D.Set the model's temperature to 0 to make responses deterministic.
AnswerC

C is correct because grounding ties the answer to the datastore content.

Why this answer

Option C is correct because Vertex AI Search and Conversation provides a built-in grounding capability that explicitly ties answer generation to a specified enterprise datastore. By configuring grounding with the custom datastore as the source, the model is constrained to retrieve and synthesize answers exclusively from the indexed documents, preventing reliance on its parametric knowledge or external sources.

Exam trap

Google Cloud often tests the distinction between techniques that influence output style (temperature, streaming) versus those that control knowledge sources (grounding), leading candidates to confuse deterministic generation with factual grounding.

How to eliminate wrong answers

Option A is wrong because enabling streaming responses controls the delivery mechanism (real-time token-by-token output) but does not restrict the model's knowledge source; it can still generate answers from its training data. Option B is wrong because fine-tuning adapts the model's weights to the company's documents, which can improve relevance but does not guarantee grounding—the model may still hallucinate or use pre-training knowledge, and Vertex AI Search does not require fine-tuning for retrieval-augmented generation. Option D is wrong because setting temperature to 0 makes responses deterministic (low randomness) but does not enforce grounding; the model can still confidently produce incorrect answers from its internal knowledge.

67
Multi-Selecthard

Which THREE factors should be considered when selecting a foundation model for a generative AI application in a regulated industry?

Select 3 answers
A.Transparency of the model's training data and sources
B.Support for data residency and sovereignty requirements
C.Latency and throughput requirements
D.Size of the model in terms of parameters
E.Bias and fairness evaluation results
AnswersA, B, E

Regulated industries require understanding of data provenance to ensure compliance.

Why this answer

Option A is correct because in regulated industries (e.g., healthcare, finance), transparency of training data and sources is critical for compliance with regulations like GDPR or HIPAA. Without knowing the provenance and composition of the training data, an organization cannot audit for prohibited content, verify consent, or ensure the model does not inadvertently expose sensitive information. This transparency directly impacts the ability to perform due diligence and meet legal obligations for data usage.

Exam trap

Google Cloud often tests the misconception that technical performance metrics (like latency or parameter count) are primary selection criteria for regulated industries, when in fact governance factors like transparency, data residency, and bias evaluation are the non-negotiable requirements.

68
MCQmedium

An ML engineer sees the above deployment output. The business wants to reduce inference cost. Which action should they take?

A.Use a larger model
B.Change to a lower-cost machine type
C.Deploy to multiple regions
D.Increase traffic split
AnswerB

Using a smaller machine type reduces per-request compute cost.

Why this answer

Option B is correct because switching to a lower-cost machine type directly reduces the per-request compute cost without altering the model architecture or inference logic. This is a common cost-optimization strategy in cloud-based ML deployments, where instance types (e.g., from GPU to CPU or from a larger to a smaller GPU) can be selected based on latency and throughput requirements, provided the model fits within the machine's memory and compute constraints.

Exam trap

Google Cloud often tests the misconception that 'more resources' (larger model, more regions) always improves performance, but here the business goal is cost reduction, so the correct action is to downsize infrastructure while maintaining acceptable quality.

How to eliminate wrong answers

Option A is wrong because using a larger model increases both memory footprint and compute operations per inference, which raises cost and latency—the opposite of the business goal. Option C is wrong because deploying to multiple regions adds infrastructure overhead, data transfer costs, and management complexity, increasing rather than reducing inference cost. Option D is wrong because increasing traffic split (e.g., routing more requests to a shadow or canary deployment) does not reduce cost; it may increase resource utilization or require additional compute capacity.

69
MCQeasy

A retail company wants to build a chatbot that answers product questions and provides personalized recommendations. They have a small labeled dataset and limited ML expertise. Which approach should they take?

A.Fine-tune Gemini with their product data using Vertex AI Generative AI Studio.
B.Build a custom transformer model using TensorFlow on Vertex AI Workbench.
C.Use BigQuery ML to train a classification model on customer queries.
D.Use Vertex AI Agent Builder with a pre-built agent and integrate their product catalog via Search and Conversation.
AnswerD

A is correct because it leverages managed services with minimal ML effort.

Why this answer

Option D is correct because Vertex AI Agent Builder provides a pre-built agent framework that integrates with Search and Conversation, allowing the company to quickly deploy a chatbot using their product catalog without needing extensive ML expertise. This approach leverages Google's foundation models and retrieval-augmented generation (RAG) to answer product questions and generate personalized recommendations, making it ideal for a small labeled dataset and limited ML resources.

Exam trap

Google Cloud often tests the misconception that fine-tuning or custom model building is necessary for domain-specific tasks, when in fact pre-built agent frameworks with RAG can achieve the same goal with far less data and expertise.

How to eliminate wrong answers

Option A is wrong because fine-tuning Gemini with a small labeled dataset risks overfitting and requires significant ML expertise to manage the fine-tuning pipeline, which the company lacks. Option B is wrong because building a custom transformer model from scratch using TensorFlow on Vertex AI Workbench demands deep ML expertise and large datasets, contradicting the company's constraints. Option C is wrong because BigQuery ML is designed for structured data classification (e.g., SQL-based models), not for building conversational chatbots that handle natural language queries and recommendations.

70
MCQeasy

A company is evaluating whether to build a custom generative AI solution from scratch or use a pre-built API from a cloud provider. Which factor most strongly supports the build-from-scratch approach?

A.The team has limited machine learning expertise.
B.Speed to market is the top priority.
C.Minimizing initial development cost is critical.
D.The solution requires deep integration with proprietary data and unique domain-specific outputs.
AnswerD

Custom models can be fine-tuned on proprietary data for unique needs.

Why this answer

Building a custom generative AI solution from scratch is most strongly supported when deep integration with proprietary data and unique domain-specific outputs is required. Pre-built APIs are typically trained on general data and may not capture the nuances of specialized domains, whereas a custom model can be fine-tuned or trained from scratch on proprietary datasets to achieve higher accuracy and relevance for unique business needs.

Exam trap

The trap here is that candidates may confuse 'minimizing cost' (Option C) with long-term total cost of ownership, but Cisco specifically tests the immediate strategic driver for build vs. buy, which is the need for proprietary data integration and unique outputs.

How to eliminate wrong answers

Option A is wrong because limited ML expertise would favor using a pre-built API to avoid the complexity of model training, infrastructure management, and hyperparameter tuning. Option B is wrong because speed to market is a key advantage of pre-built APIs, which offer immediate access to generative capabilities without the months of development required for a custom solution. Option C is wrong because minimizing initial development cost typically favors pre-built APIs, which have lower upfront investment compared to the significant costs of data preparation, compute resources, and specialized talent needed for building from scratch.

71
MCQmedium

A data scientist is trying to get online predictions from a Vertex AI endpoint but receives the error shown. What is the most likely cause?

A.The region in the request does not match the endpoint region
B.The model has not been deployed to the specified endpoint
C.The endpoint ID is incorrect
D.The model ID is incorrect
AnswerB

The error message directly states the model is not deployed to the endpoint.

Why this answer

The error indicates that the model is not deployed to the endpoint. In Vertex AI, an endpoint is a resource that hosts one or more deployed models. If a model has not been deployed to the endpoint, any prediction request to that endpoint will fail with a 'model not found' or similar error, even if the endpoint ID and region are correct.

Exam trap

Google Cloud often tests the distinction between endpoint existence and model deployment, where candidates confuse a valid endpoint ID with the requirement that a model must be explicitly deployed to that endpoint before predictions can be served.

How to eliminate wrong answers

Option A is wrong because if the region in the request did not match the endpoint region, the error would typically be a 'region mismatch' or 'not found' error at the API routing level, not a model deployment error. Option C is wrong because an incorrect endpoint ID would result in a '404 Not Found' or 'endpoint not found' error, not a model deployment error. Option D is wrong because the model ID is not directly used in the prediction request to an endpoint; the endpoint routes to the deployed model, so an incorrect model ID would not cause this specific error unless the model was never deployed.

72
MCQhard

A media company uses generative AI to produce personalized news summaries. They notice that summaries occasionally contain factual errors and biased language. What business strategy should they implement to address these issues while maintaining user engagement?

A.Disable personalization and serve generic summaries to all users.
B.Allow users to flag errors and manually correct summaries in real-time.
C.Implement a human review layer for high-risk topics and use automated fact-checking for all content, with a feedback loop for model improvement.
D.Replace AI with entirely human-written summaries.
AnswerC

This ensures accuracy and allows continuous improvement.

Why this answer

Option C is correct because it balances accuracy and engagement by combining automated fact-checking with human review for high-risk topics. This hybrid approach reduces factual errors and biased language while maintaining the personalization that drives user engagement. The feedback loop continuously improves the model, addressing root causes rather than just symptoms.

Exam trap

Google Cloud often tests the misconception that either full automation or full human oversight is the only solution, when the correct answer is a hybrid approach that leverages the strengths of both AI and human judgment.

How to eliminate wrong answers

Option A is wrong because disabling personalization eliminates the core value proposition of generative AI for news summaries, likely reducing user engagement significantly without addressing the underlying model flaws. Option B is wrong because allowing real-time manual corrections by users is impractical at scale, introduces latency, and does not prevent errors from reaching users in the first place; it also lacks a systematic feedback mechanism for model improvement. Option D is wrong because replacing AI with entirely human-written summaries is cost-prohibitive, slow, and defeats the purpose of using generative AI for scalability and personalization.

73
MCQeasy

A company wants to estimate the total cost of ownership (TCO) for a gen AI solution on Google Cloud. Which factors are most important?

A.Only model training cost
B.Compute, storage, and API call costs
C.Only inference cost
D.Only compute cost
AnswerB

These three categories cover the primary cost drivers in a gen AI solution.

Why this answer

Option B is correct because the total cost of ownership (TCO) for a generative AI solution on Google Cloud encompasses all operational expenses, including compute (e.g., TPU/GPU instances for training and inference), storage (e.g., Cloud Storage for datasets and model artifacts), and API call costs (e.g., Vertex AI prediction requests). Focusing on a single cost component, such as training or inference alone, ignores the recurring expenses of serving the model and storing data, which often dominate long-term TCO.

Exam trap

Google Cloud often tests the misconception that TCO is dominated by a single cost factor (e.g., training), when in reality, inference and API costs frequently surpass training expenses in production deployments.

How to eliminate wrong answers

Option A is wrong because it ignores inference, storage, and API costs, which are significant for production gen AI solutions where models are queried repeatedly. Option C is wrong because inference cost is only one part of TCO; training, storage, and API overhead also contribute heavily, especially with large models like PaLM 2 or Gemini. Option D is wrong because compute cost alone excludes storage (e.g., model checkpoints, training data) and API call fees (e.g., per-token billing for Vertex AI), leading to an incomplete TCO estimate.

74
Multi-Selecthard

Which TWO strategies can effectively reduce the operational costs of a generative AI model in production without significantly degrading user experience?

Select 2 answers
A.Use larger batch sizes for inference
B.Increase the frequency of model retraining to improve efficiency
C.Cache frequent prompt completions
D.Adopt a pay-per-use pricing model instead of a flat rate
E.Deploy multiple models and route requests by complexity
AnswersC, D

Caching reduces duplicate inference calls, lowering cost.

Why this answer

Caching frequent prompt completions reduces operational costs by eliminating redundant inference calls for identical or similar user requests. This directly lowers compute usage and latency without degrading user experience, as cached responses are served instantly. It is a common optimization in production LLM deployments, especially for high-traffic applications with repetitive queries.

Exam trap

Google Cloud often tests the misconception that increasing batch sizes or retraining frequency inherently reduces costs, when in fact these actions typically increase resource usage or introduce operational overhead without guaranteeing cost savings.

75
MCQeasy

A startup wants to build a generative AI application for customer support. Their main concern is cost control while maintaining low latency. Which Google Cloud service is most suitable for deploying their custom model?

A.BigQuery ML
B.Cloud Run
C.Vertex AI Workbench
D.Vertex AI Prediction
AnswerD

Vertex AI Prediction provides autoscaling online prediction endpoints with low latency, ideal for cost-sensitive production.

Why this answer

Vertex AI Prediction is the correct choice because it provides a fully managed, serverless endpoint for deploying custom models with autoscaling to zero, which directly addresses the startup's need for cost control by only charging for compute resources when the endpoint serves predictions. It also supports low latency through optimized prediction containers and can leverage GPUs or TPUs for inference, making it ideal for real-time customer support applications.

Exam trap

The trap here is that candidates often confuse development tools (like Vertex AI Workbench) or batch inference services (like BigQuery ML) with production deployment services, overlooking that Vertex AI Prediction is the only option purpose-built for serving custom models with cost-efficient, low-latency inference.

How to eliminate wrong answers

Option A is wrong because BigQuery ML is designed for training and executing machine learning models using SQL queries directly within BigQuery, not for deploying custom models as low-latency, real-time prediction endpoints; it is more suited for batch inference on large datasets. Option B is wrong because Cloud Run is a serverless compute platform for running stateless containers, but it lacks native support for model serving optimizations like GPU acceleration, model versioning, and autoscaling tailored to inference workloads, which are critical for cost-effective, low-latency predictions. Option C is wrong because Vertex AI Workbench is a Jupyter-based development environment for building and training models, not a deployment service; it does not provide managed prediction endpoints or the infrastructure for serving custom models in production.

Page 1 of 2 · 128 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Gen Ai Business Strategy questions.