CCNA Foundation Model Applications Questions

60 of 135 questions · Page 2/2 · Foundation Model Applications topic · Answers revealed

76
MCQeasy

A developer is calling the Amazon Bedrock InvokeModel API to generate text with the AI21 Labs Jurassic-2 Mid model. The API call includes a maxTokens parameter, but the request fails with the error shown in the exhibit. What is the most likely cause of this error?

A.The API request is missing a required parameter such as 'prompt'.
B.The AWS region does not support the AI21 Labs model.
C.The value of 'maxTokens' exceeds the model's maximum limit.
D.The parameter name is incorrect; the model expects 'maxTokens' with a capital T.
AnswerD

Jurassic-2 Mid uses 'maxTokens' (capital T) as the parameter name for controlling output length.

Why this answer

The AI21 Labs Jurassic-2 Mid model expects the parameter name 'maxTokens' with a capital 'T' (camelCase). The error occurs because the API request used a different casing (e.g., 'maxtokens' or 'max_tokens'), which the model's schema does not recognize. Amazon Bedrock's InvokeModel API passes parameters directly to the model, so parameter names must match the model's exact specification.

Exam trap

Cisco often tests the nuance that model-specific parameter names must match exactly, including case sensitivity, and candidates mistakenly assume all Bedrock models use the same parameter naming convention.

How to eliminate wrong answers

Option A is wrong because the 'prompt' parameter is required for text generation models, and the error message in the exhibit does not indicate a missing required parameter; it points to an unrecognized parameter name. Option B is wrong because AWS region support for AI21 Labs models is independent of parameter naming errors; a region mismatch would produce a different error (e.g., 'Model not found' or 'AccessDeniedException'). Option C is wrong because exceeding the model's maximum token limit would result in a validation error about the value, not about the parameter name itself.

77
MCQeasy

A company uses Amazon Bedrock to build a chatbot. The chatbot needs to answer questions based on internal company documents. Which AWS service should be integrated with Bedrock to enable Retrieval Augmented Generation (RAG) without managing infrastructure?

A.Amazon OpenSearch Service
B.Amazon DynamoDB
C.Amazon RDS
D.Amazon Kendra
AnswerD

Kendra provides managed search with connectors to documents, ideal for RAG.

Why this answer

Amazon Kendra is a fully managed intelligent search service that can be directly integrated with Amazon Bedrock to implement Retrieval Augmented Generation (RAG) without any infrastructure management. It indexes internal company documents and retrieves relevant passages, which are then passed to the foundation model as context to generate accurate, grounded answers.

Exam trap

AWS often tests the distinction between managed services that require infrastructure management (like OpenSearch Service) and fully managed services (like Kendra) that abstract away all infrastructure concerns, making candidates incorrectly choose OpenSearch for its search capabilities.

How to eliminate wrong answers

Option A is wrong because Amazon OpenSearch Service requires you to manage clusters, configure indexing, and handle scaling — it is not a serverless, zero-infrastructure option. Option B is wrong because Amazon DynamoDB is a NoSQL key-value and document database designed for transactional workloads, not for semantic search or document retrieval needed in RAG. Option C is wrong because Amazon RDS is a relational database service that requires provisioning and managing database instances, and it lacks native semantic search capabilities for document retrieval.

78
MCQhard

A company uses Amazon SageMaker JumpStart to deploy a foundation model. They want to fine-tune the model on their own dataset. Which SageMaker capability should they use?

A.SageMaker Managed Spot Training
B.SageMaker Studio Classic
C.SageMaker Canvas
D.SageMaker Autopilot
AnswerB

Studio Classic provides Jupyter notebooks to write custom code for fine-tuning foundation models.

Why this answer

SageMaker Studio Classic provides an integrated development environment for building, training, and fine-tuning models using notebooks. Autopilot automates model building; Canvas is for no-code ML; Managed Spot Training reduces cost but is not the primary tool for fine-tuning.

79
MCQmedium

A company is using Amazon Bedrock to generate marketing copy. They want to evaluate the quality of the generated text. Which metric is MOST suitable for assessing the relevance and coherence of the content?

A.Accuracy
B.ROUGE-N
C.Perplexity
D.BLEU score
AnswerB

ROUGE-N compares n-gram overlap, suitable for summarization and copy.

Why this answer

ROUGE-N (Recall-Oriented Understudy for Gisting Evaluation) measures the overlap of n-grams between generated text and reference text, making it suitable for assessing relevance and coherence in content generation tasks like marketing copy. It evaluates how well the generated text captures key phrases and maintains logical flow, which aligns with the need to assess content quality beyond simple factual accuracy.

Exam trap

AWS often tests the distinction between metrics designed for translation (BLEU) versus summarization/generation (ROUGE), leading candidates to mistakenly choose BLEU for coherence evaluation when ROUGE is the correct choice for recall-based content assessment.

How to eliminate wrong answers

Option A is wrong because Accuracy is a classification metric (e.g., correct predictions/total predictions) and does not measure text relevance or coherence; it is irrelevant for generative text evaluation. Option C is wrong because Perplexity measures how well a language model predicts a sequence (lower is better for fluency) but does not directly assess relevance or coherence against a reference; it is a model-internal metric, not a quality metric for generated content. Option D is wrong because BLEU score (Bilingual Evaluation Understudy) is primarily designed for machine translation, focusing on precision of n-gram matches, and is less sensitive to recall and coherence in single-language text generation tasks like marketing copy.

80
MCQhard

A company uses Amazon Bedrock to generate code. They want to ensure the code follows security best practices and does not contain vulnerabilities. Which approach is most effective?

A.Implement a post-processing step using AWS WAF.
B.Use Amazon CodeGuru Security to review generated code.
C.Train a custom model on the company’s secure code.
D.Use a foundation model trained only on secure code.
AnswerB

CodeGuru Security automatically scans code for vulnerabilities and provides actionable recommendations.

Why this answer

Amazon CodeGuru Security reviews code for security vulnerabilities and provides recommendations. Using a model trained on secure code may not be sufficient; WAF is for web traffic; training a custom model requires significant effort and may not catch all issues.

81
Multi-Selectmedium

Which TWO actions are best practices when deploying foundation models on Amazon SageMaker for production? (Choose TWO.)

Select 2 answers
A.Manually warm up endpoints by sending dummy requests before traffic spikes.
B.Create a separate endpoint for each model to isolate traffic.
C.Use multi-model endpoints (MMEs) to serve multiple models on a single instance.
D.Implement inference pipelines to handle preprocessing and postprocessing steps separately.
E.Deploy models directly to production without load testing to avoid delays.
AnswersC, D

MMEs optimize resource utilization and reduce costs for multiple models.

Why this answer

Option C is correct because Amazon SageMaker Multi-Model Endpoints (MMEs) allow you to host multiple models on a single instance, which reduces hosting costs by sharing resources across models while still providing low-latency inference. This is a best practice for production deployments where you need to serve many models efficiently without provisioning separate endpoints for each.

Exam trap

AWS often tests the misconception that manual endpoint warm-up is necessary for production traffic spikes, but SageMaker's auto-scaling and built-in health checks handle this automatically, making option A a common distractor.

82
MCQmedium

Refer to the exhibit. A user invokes Claude v2 using the AWS CLI. The response is truncated. What is the most likely cause?

A.The AWS CLI is missing the --endpoint-url parameter.
B.The max_tokens_to_sample is too low.
C.The model does not support this use case.
D.The prompt includes a stop sequence 'Assistant:'.
AnswerD

Claude uses 'Assistant:' as a stop sequence, causing it to stop generating after its response.

Why this answer

Option D is correct because the prompt includes the stop sequence 'Assistant:', which causes the model to halt generation as soon as it encounters that token sequence. In Claude v2, stop sequences are used to control the output length and structure; when the model generates the exact stop sequence, it truncates the response at that point, even if more content could have been produced.

Exam trap

Cisco often tests the distinction between token limits and stop sequences, where candidates mistakenly attribute truncation to max_tokens_to_sample when the actual cause is a configured stop sequence in the prompt or API parameters.

How to eliminate wrong answers

Option A is wrong because the --endpoint-url parameter is used to specify a custom endpoint for the AWS CLI, but its absence does not cause response truncation; it would instead result in a connection error or default endpoint usage. Option B is wrong because max_tokens_to_sample controls the maximum number of tokens the model can generate, but if it were too low, the response would be cut off at that token limit, not at a specific stop sequence; the question states the response is truncated, not that it reached a token limit. Option C is wrong because Claude v2 supports a wide range of use cases including text generation, and the model's capability is not the cause of truncation; truncation is explicitly controlled by stop sequences or token limits.

83
MCQeasy

A company uses Amazon Bedrock to generate product descriptions. They notice that the output sometimes contains incorrect information. What should they do to improve accuracy?

A.Increase the temperature parameter.
B.Implement Retrieval-Augmented Generation (RAG).
C.Use a larger foundation model.
D.Use AWS WAF to filter outputs.
AnswerB

RAG retrieves relevant information from a knowledge base to augment the prompt, improving factual accuracy.

Why this answer

Option B is correct because Retrieval-Augmented Generation (RAG) enhances the accuracy of foundation model outputs by grounding the generation in authoritative, up-to-date external knowledge sources. Instead of relying solely on the model's parametric memory, RAG retrieves relevant documents or data from a vector database (e.g., Amazon OpenSearch Serverless) and injects them into the prompt context, reducing hallucinations and incorrect information in product descriptions.

Exam trap

AWS often tests the misconception that simply using a larger or more powerful model (Option C) is the universal fix for accuracy issues, when in fact the root cause of hallucinations is often a lack of grounded, retrievable context that RAG specifically addresses.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter makes the model's output more random and creative, which would likely increase, not decrease, the frequency of incorrect information. Option C is wrong because using a larger foundation model does not inherently fix factual accuracy; larger models can still hallucinate or produce outdated information without access to current or domain-specific data. Option D is wrong because AWS WAF is a web application firewall that filters HTTP traffic for security threats (e.g., SQL injection, XSS) and has no mechanism to validate or correct the factual accuracy of generated text.

84
MCQmedium

A data scientist runs the above AWS CLI command and receives the error. What is the most likely cause?

A.The IAM role does not have permissions.
B.The model is being updated.
C.The model is being deprecated.
D.The model is not supported in the current AWS region.
AnswerD

Foundation models are region-specific; the chosen model may not be available in the region used.

Why this answer

The ModelNotReadyException typically indicates the model is not available in the current region. The model may not be supported or is still being deployed. The error does not suggest deprecation, updating, or permissions issues.

85
MCQeasy

A company uses Amazon Bedrock to power a generative chatbot for employee onboarding. Recently, some employees reported that the chatbot occasionally provides responses that contain biased or offensive language. The company has a strict policy for respectful communication. They want to implement a solution quickly without retraining the model. Which action should they take?

A.Add a human reviewer to approve every response.
B.Use a different foundational model known for unbiased outputs.
C.Enable Amazon Bedrock's built-in content moderation filters.
D.Fine-tune the model on a dataset of polite conversations.
AnswerC

Guardrails can be activated immediately to filter harmful content.

Why this answer

Option B is correct because Amazon Bedrock's built-in content moderation filters (Guardrails) can be applied immediately to filter biased or offensive content without retraining. Option A (fine-tuning) is time-consuming and requires a dataset. Option C (switch model) may not be quick and still could produce biased outputs.

Option D (human reviewer) is slow and not scalable.

86
MCQeasy

A developer wants to experiment with a foundation model for code generation without writing any code. Which AWS service provides a playground for models like CodeWhisperer?

A.Amazon CodeGuru
B.AWS Lambda
C.Amazon SageMaker Studio
D.Amazon Bedrock Playground
AnswerD

Bedrock provides a no-code playground to test models like Claude or CodeWhisperer.

Why this answer

Amazon Bedrock Playground is the correct answer because it provides a web-based interface for experimenting with foundation models (FMs) from providers like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon itself, including the CodeWhisperer model for code generation. This allows the developer to test prompts, adjust parameters, and see model responses without writing any code, directly fulfilling the requirement of a no-code playground.

Exam trap

The trap here is that candidates may confuse Amazon SageMaker Studio (a full ML development environment) with a no-code playground, overlooking that Bedrock Playground is specifically designed for zero-code experimentation with foundation models like CodeWhisperer.

How to eliminate wrong answers

Option A is wrong because Amazon CodeGuru is a service for automated code reviews and application profiling, not a playground for experimenting with foundation models or code generation. Option B is wrong because AWS Lambda is a serverless compute service for running code in response to events, not a no-code environment for testing foundation models. Option C is wrong because Amazon SageMaker Studio is an integrated development environment (IDE) for building, training, and deploying machine learning models, which typically requires writing code (e.g., Python notebooks) and is not a simple playground for foundation model experimentation without coding.

87
MCQeasy

A developer wants to quickly experiment with multiple foundation models using a single API. Which service provides this capability?

A.Amazon Bedrock
B.AWS Lambda
C.Amazon Bedrock
D.Amazon SageMaker Studio
AnswerA

Bedrock provides a single API to invoke multiple foundation models.

Why this answer

Amazon Bedrock is a fully managed service that provides a single API to access and experiment with multiple foundation models from leading AI providers like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon itself. This allows developers to quickly test different models without managing underlying infrastructure or learning separate APIs for each provider.

Exam trap

The trap here is that candidates may confuse Amazon Bedrock with Amazon SageMaker, thinking SageMaker also provides a unified API for multiple foundation models, but SageMaker requires you to deploy and manage individual models, whereas Bedrock is purpose-built for serverless access to a curated set of foundation models via a single API.

How to eliminate wrong answers

Option B (AWS Lambda) is wrong because Lambda is a serverless compute service for running code in response to events, not a service for accessing or experimenting with foundation models via a single API. Option C (Amazon Bedrock) is actually the same as the correct answer (A) and is listed as a duplicate; in the exam, such duplicates are typically a distractor, but since both A and C are identical, the correct choice is the one marked as correct (A). Option D (Amazon SageMaker Studio) is wrong because SageMaker Studio is an integrated development environment (IDE) for building, training, and deploying machine learning models, but it does not provide a single unified API for multiple foundation models; it requires you to manage models and endpoints yourself.

88
MCQmedium

Refer to the exhibit. The training job is failing with an error 'CUDA out of memory'. Which hyperparameter change is MOST likely to resolve the issue?

A.Increase the number of epochs to 10
B.Increase learning_rate to 5e-4
C.Reduce per_device_train_batch_size to 4
D.Increase max_seq_length to 1024
AnswerC

Smaller batch size uses less GPU memory.

Why this answer

The 'CUDA out of memory' error indicates that the GPU memory is exhausted during training. Reducing `per_device_train_batch_size` directly decreases the number of samples processed simultaneously per GPU, which lowers memory consumption for activations, gradients, and optimizer states. This is the most direct and effective hyperparameter change to resolve an out-of-memory condition.

Exam trap

AWS often tests the misconception that increasing epochs or learning rate can fix resource exhaustion errors, when in fact only adjustments that reduce per-step memory usage (like batch size or sequence length) are effective.

How to eliminate wrong answers

Option A is wrong because increasing the number of epochs does not affect per-step memory usage; it only increases the total number of training iterations, which would not resolve an immediate memory allocation failure. Option B is wrong because increasing the learning rate changes the step size for gradient updates but has no impact on GPU memory consumption during forward/backward passes. Option D is wrong because increasing `max_seq_length` increases the sequence length of input tokens, which enlarges the memory footprint for attention matrices and hidden states, making the out-of-memory error worse.

89
MCQhard

A financial services company is using Amazon Bedrock to generate investment summaries. They want to ensure that the model outputs are factually accurate and based on the latest market data. Which combination of services should they use to achieve this? (Select TWO)

A.Amazon SageMaker Ground Truth for data labeling
B.Amazon DynamoDB as the knowledge base store
C.Amazon Kendra for indexing the knowledge base
D.Amazon Aurora with the pgvector extension
E.Amazon Bedrock Knowledge Bases with RAG
AnswerD, E

Aurora with pgvector can store and query embeddings for RAG.

Why this answer

Amazon Aurora with the pgvector extension (Option D) enables storing and querying vector embeddings directly within a PostgreSQL-compatible database, which is essential for Retrieval-Augmented Generation (RAG). When combined with Amazon Bedrock Knowledge Bases (Option E), it allows the company to retrieve the most current market data as vector embeddings, ensuring the generated investment summaries are grounded in factual, up-to-date information rather than relying solely on the model's static training data.

Exam trap

The trap here is that candidates often confuse a general-purpose search service like Amazon Kendra with a vector database purpose-built for RAG, overlooking that Bedrock Knowledge Bases requires a vector store (e.g., Aurora with pgvector or Amazon OpenSearch Serverless) to perform semantic similarity retrieval, not just keyword-based indexing.

How to eliminate wrong answers

Option A is wrong because Amazon SageMaker Ground Truth is a data labeling service for creating training datasets, not for storing or retrieving knowledge bases for RAG; it does not provide real-time market data retrieval. Option B is wrong because Amazon DynamoDB is a NoSQL key-value and document database that lacks native vector search capabilities (e.g., pgvector or OpenSearch vector engine), making it unsuitable for efficient similarity search required in RAG workflows. Option C is wrong because Amazon Kendra is an intelligent search service that can index documents, but it is not a vector database optimized for storing and querying embeddings; it also does not integrate directly with Bedrock Knowledge Bases as a vector store.

90
Multi-Selectmedium

Which TWO actions can help reduce bias in a foundation model’s outputs? (Choose two.)

Select 2 answers
A.Fine-tune the model on a balanced, representative dataset
B.Use careful prompt engineering with neutral wording
C.Restrict model access to a subset of users
D.Increase temperature to add randomness
E.Use a larger foundation model
AnswersA, B

Fine-tuning with balanced data can correct biases.

Why this answer

Options B and D are correct. Prompt engineering with neutral phrasing can reduce biased responses. Fine-tuning with a balanced dataset can mitigate biases.

Option A (increase temperature) increases randomness, not reduce bias. Option C (larger model) may amplify biases. Option E (limit users) does not address bias.

91
Multi-Selectmedium

Which TWO actions are recommended for improving the factual accuracy of a foundation model's responses when using RAG?

Select 2 answers
A.Include relevant context from the knowledge base in the prompt
B.Increase the max_tokens parameter
C.Provide clear instructions in the system prompt
D.Use the largest foundation model available
E.Increase the temperature parameter
AnswersA, C

RAG relies on accurate context to ground responses.

Why this answer

Including relevant context from the knowledge base and providing clear system instructions improve accuracy. Other options do not directly help.

92
Multi-Selectmedium

A data scientist is using a foundation model to summarize long documents. Which TWO of the following steps are most likely to improve the quality of the summaries?

Select 2 answers
A.Break the input document into chunks and summarize each chunk separately.
B.Use a high temperature parameter to increase creativity.
C.Provide few-shot examples of desired summaries in the prompt.
D.Use a low frequency penalty to reduce repetition.
E.Use a longer context length by increasing the max tokens parameter.
AnswersA, C

Chunking allows handling of long documents that exceed context length.

Why this answer

Option A is correct because foundation models have a fixed maximum context window (e.g., 4,096 tokens for GPT-3.5). By breaking a long document into smaller chunks and summarizing each independently, you avoid truncation and ensure the model can process the entire content without losing information. This chunking strategy is a standard preprocessing technique for handling documents that exceed the model's context length.

Exam trap

AWS often tests the misconception that increasing max tokens extends the model's input capacity, when in reality it only controls the output length, while the input is constrained by the model's inherent context window.

93
MCQmedium

A team deployed a text generation model on Amazon Bedrock. They want to monitor for toxic content in model outputs. Which evaluation approach is MOST effective?

A.Enable CloudWatch Logs and set a metric filter for toxic words
B.Use Amazon SageMaker Ground Truth for human annotation
C.Manually review a sample of outputs each week
D.Use Amazon Bedrock Model Evaluation with toxicity metrics
AnswerD

Bedrock Model Evaluation provides automated toxicity assessment.

Why this answer

Amazon Bedrock Model Evaluation with toxicity metrics is the most effective approach because it provides automated, built-in evaluation of model outputs for toxic content using predefined metrics, directly integrated with the Bedrock service. This eliminates the need for manual effort or custom filtering, ensuring consistent and scalable monitoring of harmful content.

Exam trap

The trap here is that candidates may choose CloudWatch metric filters (Option A) because they associate monitoring with logs, but fail to recognize that toxicity detection requires semantic understanding beyond simple keyword matching.

How to eliminate wrong answers

Option A is wrong because CloudWatch Logs with a metric filter for toxic words is a simplistic, keyword-based approach that cannot detect nuanced or context-dependent toxicity, such as sarcasm or implicit hate speech, and requires manual setup of word lists. Option B is wrong because Amazon SageMaker Ground Truth for human annotation is designed for creating labeled datasets, not for real-time or automated monitoring of model outputs, and introduces latency and cost overhead. Option C is wrong because manually reviewing a sample of outputs each week is not scalable, introduces human bias, and fails to provide continuous or real-time monitoring, making it ineffective for production systems.

94
MCQeasy

A developer is using Amazon Bedrock to generate code snippets. The model often produces insecure code. Which prompt engineering technique is MOST effective to improve security?

A.Use chain-of-thought prompting to step through the code
B.Provide few-shot examples of secure code
C.Set max_tokens to a low value to limit output
D.Include specific instructions to avoid common security vulnerabilities
AnswerD

Direct instructions in the prompt can effectively guide the model.

Why this answer

Option D is correct because directly instructing the model to avoid specific security vulnerabilities (e.g., SQL injection, buffer overflows) is the most explicit and effective way to constrain the output. Amazon Bedrock models respond well to clear, imperative instructions in the system prompt or user message, making this a direct application of prompt engineering for safety. Chain-of-thought or few-shot examples may improve reasoning or style but do not guarantee the model will avoid insecure patterns unless explicitly told to do so.

Exam trap

The trap here is that candidates often overestimate the effectiveness of few-shot examples or reasoning techniques for security, assuming they implicitly teach safety, when in fact explicit instructions are required to override the model's default training biases toward common (but insecure) coding patterns.

How to eliminate wrong answers

Option A is wrong because chain-of-thought prompting improves reasoning steps but does not inherently enforce security constraints; it may still produce insecure code if the model's reasoning path includes unsafe patterns. Option B is wrong because few-shot examples of secure code can guide style but do not prevent the model from generating insecure code when the prompt does not explicitly forbid it; the model may still default to common insecure patterns from its training data. Option C is wrong because setting max_tokens to a low value limits output length but does not affect the security of the generated code; it may truncate a secure solution or force incomplete code, not improve safety.

95
MCQmedium

A data scientist uses Amazon Bedrock. The model responses are too long. Which parameter should they adjust to limit the output length?

A.temperature
B.max_tokens
C.stop sequences
D.top_p
AnswerB

Reducing max_tokens directly caps the output length.

Why this answer

The `max_tokens` parameter directly controls the maximum number of tokens (words or subwords) the model can generate in a single response. By reducing this value, the data scientist caps the output length, preventing overly long responses. Temperature and top_p affect randomness and diversity, not length, while stop sequences define when generation halts but do not enforce a hard token limit.

Exam trap

AWS often tests the distinction between parameters that control output length (`max_tokens`) versus those that control output randomness or diversity (`temperature`, `top_p`), leading candidates to confuse 'limiting length' with 'limiting creativity'.

How to eliminate wrong answers

Option A is wrong because temperature controls the randomness of token selection (higher values increase creativity, lower values make output more deterministic), not the length of the response. Option C is wrong because stop sequences are custom strings (e.g., '###' or 'END') that tell the model to cease generation when encountered, but they do not limit the total number of tokens generated before that point. Option D is wrong because top_p (nucleus sampling) limits the cumulative probability of token choices to a threshold (e.g., 0.9), affecting diversity, not the maximum output length.

96
MCQmedium

A financial services company is evaluating Amazon Bedrock for a compliance application that requires explainable AI decisions. The model's output must be auditable and traceable to specific reasoning. Which Bedrock feature should they use to meet this requirement?

A.Create a knowledge base with financial regulations to guide the model.
B.Fine-tune a custom model on regulatory documents to improve reasoning.
C.Enable model invocation logging in Amazon Bedrock and store logs in Amazon S3.
D.Amazon Bedrock Guardrails to filter sensitive content.
AnswerC

Logging captures full input/output pairs, enabling auditors to review and trace decisions.

Why this answer

Option C is correct because model invocation logging records all requests and responses, enabling traceability. Option A is wrong because guardrails filter content but don't provide reasoning. Option B is wrong because custom models are still black boxes.

Option D is wrong because knowledge bases are for retrieval, not reasoning traceability.

97
MCQmedium

A company uses a foundation model for real-time translation in a chat application. The latency is high. Which optimization would reduce latency the most?

A.Increase batch size
B.Use model distillation to create a smaller model
C.Use a larger model
D.Use a CDN for model weights
AnswerB

Distillation reduces model size and inference latency.

Why this answer

Model distillation reduces the size of the foundation model by training a smaller 'student' model to mimic the behavior of a larger 'teacher' model. This directly decreases inference latency because the smaller model requires fewer computational resources (FLOPs) per forward pass, which is critical for real-time translation in a chat application where low latency is paramount.

Exam trap

Cisco often tests the distinction between throughput optimization (batch size) and latency optimization (model size/distillation), leading candidates to mistakenly choose increasing batch size when the question explicitly asks for reducing latency.

How to eliminate wrong answers

Option A is wrong because increasing batch size improves throughput (more requests processed per unit time) but does not reduce per-request latency; in fact, it can increase latency for individual requests as the model must wait for the batch to fill. Option C is wrong because using a larger model increases the number of parameters and computational complexity, which would increase latency, not reduce it. Option D is wrong because a CDN for model weights only accelerates the initial download of the model to edge locations, not the inference latency of each translation request; once the model is loaded, inference speed is determined by the model architecture and hardware, not network delivery.

98
MCQmedium

Refer to the exhibit. A data scientist created this endpoint config for a foundation model in Amazon SageMaker. However, the endpoint fails to scale under load. What is the most likely reason?

A.Missing AutoScaling configuration
B.Variant weight is 1.0
C.Instance type is too small
D.InitialInstanceCount is 1
AnswerA

Auto scaling policy is required to add instances under load.

Why this answer

The endpoint fails to scale under load because the endpoint configuration shown lacks an AutoScaling policy. Without AutoScaling, SageMaker will not automatically adjust the number of instances based on traffic, so even if the initial instance count is 1, the endpoint cannot add more instances to handle increased load. AutoScaling must be explicitly configured via Application Auto Scaling to define scaling policies and target tracking metrics.

Exam trap

AWS often tests the misconception that setting a higher InitialInstanceCount or choosing a larger instance type alone enables scaling, when in fact AutoScaling must be explicitly configured as a separate step.

How to eliminate wrong answers

Option B is wrong because a variant weight of 1.0 is the default and does not prevent scaling; it simply means all traffic is routed to that variant. Option C is wrong because the instance type being 'too small' would cause performance issues or throttling, but it does not prevent the endpoint from scaling out; scaling is controlled by AutoScaling, not instance size. Option D is wrong because an InitialInstanceCount of 1 is a valid starting point; the endpoint can still scale out if AutoScaling is configured, so a single initial instance does not inherently block scaling.

99
Multi-Selecteasy

Which TWO factors are MOST important when selecting a foundation model for a text summarization task? (Choose two.)

Select 2 answers
A.Maximum output length (max tokens)
B.Model creation date
C.Model training cost
D.Maximum input length (context window)
E.Image support
AnswersA, D

Determines the length of the summary.

Why this answer

Options A and C are correct. The maximum input length (context window) determines how much text the model can process at once. The output length (max tokens) affects the summary detail.

Options B (training cost) is not a selection factor for pre-trained models. D (image support) irrelevant. E (model creation date) is not a primary factor.

100
MCQmedium

A developer is trying to invoke the Claude v2 model in Amazon Bedrock from a Lambda function. The Lambda function's IAM role has the following policy attached: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "bedrock:InvokeModel", "Resource": "*" } ] } When the Lambda function runs, it receives the error shown in the exhibit. Which additional step is most likely needed to resolve this issue?

A.Change the AWS region to one where Claude v2 is available.
B.Use a different model ID such as 'anthropic.claude-v1'.
C.Request access to the Anthropic Claude model through the Amazon Bedrock console.
D.Add a condition to the IAM policy to specify the model ARN.
AnswerC

Model access must be explicitly granted per model even with IAM permissions.

Why this answer

In Amazon Bedrock, even with IAM permissions allowing access to all models, you must also request access to specific foundation models through the AWS console. The AccessDeniedException here indicates the model is not enabled for the account. Option A is correct.

Option B is incorrect because the policy already allows all models. Option C is incorrect because the region is irrelevant to this error. Option D is incorrect because the API call is correct.

101
MCQhard

Refer to the exhibit. A developer deploys this CloudFormation stack but the agent fails to query the knowledge base. What is a likely cause?

A.The KnowledgeBaseId is not passed correctly
B.The agent role does not have permissions to invoke the knowledge base
C.The embedding model is not available in the region
D.The OpenSearch collection type should be SEARCH not VECTORSEARCH
AnswerB

The agent's IAM role must have bedrock:InvokeKnowledgeBase permission.

Why this answer

The correct answer is B because the agent role must have an IAM policy that grants the `bedrock:Retrieve` and `bedrock:RetrieveAndGenerate` permissions on the knowledge base. Without these permissions, the agent cannot invoke the knowledge base, even if the KnowledgeBaseId is correctly passed and the embedding model is available.

Exam trap

AWS often tests the distinction between resource creation permissions and runtime invocation permissions, trapping candidates who assume that a successful stack deployment implies all runtime permissions are correctly configured.

How to eliminate wrong answers

Option A is wrong because if the KnowledgeBaseId were not passed correctly, the stack would likely fail during creation or the agent would receive a different error (e.g., resource not found), not a generic failure to query. Option C is wrong because if the embedding model were not available in the region, the CloudFormation stack itself would fail during creation of the knowledge base, not during a subsequent query. Option D is wrong because the OpenSearch collection type for a knowledge base must be `VECTORSEARCH` to store and query vector embeddings; `SEARCH` is used for full-text search and does not support the vector similarity search required by the knowledge base.

102
MCQmedium

A company is using Amazon Bedrock to build a text-to-SQL application. They want to ensure that the generated SQL queries are valid and safe. Which approach is BEST?

A.Fine-tune the model on a dataset of valid SQL queries
B.Use a separate model to validate the SQL after generation
C.Configure a guardrail to filter and validate the generated SQL
D.Limit the max_tokens to 50 to reduce complexity
AnswerC

Guardrails can enforce rules and reject invalid queries.

Why this answer

Amazon Bedrock guardrails provide a native, configurable mechanism to filter and validate model outputs, including SQL queries, against defined policies such as regex patterns, denied topics, and content filters. This approach directly addresses both validity and safety without requiring additional model training or external validation services, making it the most integrated and efficient solution for ensuring generated SQL is syntactically correct and free of harmful operations like DROP or DELETE.

Exam trap

Cisco often tests the misconception that fine-tuning or output length limits can solve safety and validity requirements, when in fact guardrails are the purpose-built AWS service for content filtering and validation at inference time.

How to eliminate wrong answers

Option A is wrong because fine-tuning on valid SQL queries improves the model's ability to generate syntactically correct SQL but does not guarantee safety; a fine-tuned model can still produce harmful queries (e.g., DROP TABLE) if the training data includes such patterns or if the model generalizes incorrectly. Option B is wrong because using a separate model for validation introduces additional latency, cost, and complexity, and still requires a policy or rule set to define what constitutes 'valid and safe'—which is exactly what Bedrock guardrails already provide natively. Option D is wrong because limiting max_tokens to 50 does not ensure SQL validity or safety; it only truncates output, potentially producing incomplete or syntactically invalid SQL, and does not prevent generation of dangerous commands.

103
MCQeasy

A startup is deploying a foundation model on Amazon SageMaker for real-time inference. They notice high latency (over 2 seconds per request). Which action is most likely to reduce latency?

A.Enable auto-scaling on the SageMaker endpoint to handle more concurrent requests.
B.Switch to a smaller, distilled version of the model.
C.Deploy the model on a CPU-based instance instead of GPU.
D.Increase the batch size parameter in the inference request.
AnswerB

Smaller models have fewer parameters, reducing computation time and latency.

Why this answer

Option B is correct because using a smaller, distilled version of the model directly reduces the computational complexity per inference request. Distillation compresses the model by training a smaller student network to mimic a larger teacher model, resulting in fewer parameters and faster forward passes. This is the most direct way to cut latency when the model size is the bottleneck, as it reduces the number of floating-point operations (FLOPs) required per request.

Exam trap

AWS often tests the distinction between latency (time per single request) and throughput (requests per second), so candidates mistakenly choose auto-scaling or batch size increases, which improve throughput but not per-request latency.

How to eliminate wrong answers

Option A is wrong because enabling auto-scaling adds more endpoint instances to handle higher concurrency, but it does not reduce the latency of a single inference request; it only improves throughput under load. Option C is wrong because CPU-based instances are generally slower for deep learning inference than GPU instances, especially for large foundation models, so switching to CPU would increase latency, not reduce it. Option D is wrong because increasing the batch size in the inference request means processing multiple inputs together, which increases the time to first byte for each individual request and does not reduce per-request latency; it is a throughput optimization, not a latency reduction technique.

104
Multi-Selecthard

Which TWO of the following are valid methods to reduce the risk of foundation models generating harmful or biased content?

Select 2 answers
A.Use a smaller model
B.Use a content filter
C.Apply prompt engineering to guide output
D.Fine-tune the model on a biased dataset
E.Disable all logging
AnswersB, C

Content filters can block harmful outputs.

Why this answer

Option B is correct because content filters act as a safety layer that intercepts and blocks harmful or biased outputs before they reach the user. These filters can be rule-based or use a separate classifier model trained to detect toxic, hateful, or biased language, reducing the risk of harmful content generation without altering the underlying model.

Exam trap

AWS often tests the misconception that simply using a smaller model or disabling logging can reduce bias, when in fact these actions either have no effect or worsen the problem, whereas content filters and prompt engineering are direct, effective mitigation strategies.

105
MCQmedium

An e-commerce company uses Amazon Bedrock to generate product descriptions from keywords. Some descriptions contain inaccurate details about product specifications. Which approach should the company take to reduce factual errors?

A.Increase the maxTokens parameter to allow more detailed descriptions.
B.Use a different foundation model from Bedrock for each product category.
C.Deploy the model to a SageMaker endpoint and use human-in-the-loop validation.
D.Include the product specifications in the prompt and instruct the model to base the description on the provided data.
AnswerD

Providing facts in the prompt grounds the model's output and reduces fabrication.

Why this answer

Option D is correct because providing the product specifications directly in the prompt and instructing the model to base the description on that data grounds the generation in factual information, reducing hallucinations. This technique, known as prompt engineering with in-context learning, ensures the model uses the given data rather than relying on its training data, which may contain inaccuracies.

Exam trap

AWS often tests the misconception that increasing model parameters or changing models alone improves factual accuracy, when in fact prompt engineering with grounded data is the most effective and efficient method to reduce hallucinations.

How to eliminate wrong answers

Option A is wrong because increasing maxTokens only allows longer outputs but does not improve factual accuracy; it may even increase the chance of hallucinations by generating more unverified content. Option B is wrong because using a different foundation model for each category does not inherently reduce factual errors; all models can hallucinate, and this approach adds complexity without addressing the root cause of inaccurate specifications. Option C is wrong because deploying to a SageMaker endpoint with human-in-the-loop validation is an operational pattern for custom models, but it is overkill and inefficient for this use case; prompt engineering (Option D) is a simpler, more direct solution that avoids the latency and cost of human review for every generation.

106
MCQeasy

A company wants to classify customer emails into categories (e.g., complaint, inquiry, feedback) using a foundation model. Which approach is MOST efficient?

A.Use Amazon Comprehend for custom classification
B.Train a custom model using Amazon SageMaker
C.Fine-tune a large language model on labeled emails
D.Use Amazon Lex with a classifier intent
AnswerA

Comprehend provides a ready-to-use classification API.

Why this answer

Amazon Comprehend provides a managed custom classification API that is purpose-built for text classification tasks like categorizing emails. It requires only a small set of labeled data to train a custom classifier, eliminating the need to manage infrastructure or fine-tune large models, making it the most efficient choice for this specific use case.

Exam trap

AWS often tests the misconception that any NLP task requires a large language model or custom training in SageMaker, when in fact managed services like Comprehend are optimized for common classification tasks and are more efficient.

How to eliminate wrong answers

Option B is wrong because training a custom model using Amazon SageMaker involves provisioning instances, managing training jobs, and handling model deployment, which is overkill and less efficient for a straightforward text classification task that can be handled by a managed service. Option C is wrong because fine-tuning a large language model (LLM) on labeled emails is computationally expensive, requires significant expertise in prompt engineering and hyperparameter tuning, and is not the most efficient approach when a simpler, purpose-built service like Comprehend exists. Option D is wrong because Amazon Lex is designed for building conversational chatbots and intent-based routing, not for batch or real-time text classification of emails; its classifier intent feature is meant for dialog management, not document categorization.

107
MCQmedium

A company is using Amazon Bedrock to generate images from text prompts. They need to ensure the generated images do not contain offensive content. Which feature should be enabled?

A.VPC endpoints
B.AWS WAF
C.Content moderation with AI
D.IAM policies
AnswerC

Bedrock's content moderation uses AI to detect and block offensive content.

Why this answer

Amazon Bedrock includes built-in content moderation that can filter harmful content in inputs and outputs. IAM policies (B) control access but not content. WAF (C) protects web applications.

VPC endpoints (D) secure network traffic.

108
MCQeasy

A company wants to automatically summarize customer support tickets into a short paragraph. Which AWS service is MOST appropriate for this task?

A.Amazon Bedrock
B.Amazon Rekognition
C.Amazon Polly
D.Amazon Comprehend
AnswerA

Amazon Bedrock provides access to foundation models that can summarize text.

Why this answer

Amazon Bedrock provides access to foundation models that can perform summarization. Option C is correct because Bedrock is a managed service offering pre-trained models for tasks like text summarization. Option A (Amazon Comprehend) is for NLP tasks like entity extraction, not summarization.

Option B (Amazon Rekognition) is for image/video analysis. Option D (Amazon Polly) is text-to-speech.

109
MCQhard

An organization uses SageMaker JumpStart to deploy a foundation model for real-time inference. They observe high latency. What is the most effective way to reduce latency?

A.Compile the model with SageMaker Neo
B.Use a larger instance with more memory
C.Use batch transform instead
D.Enable SageMaker Inference Recommender
AnswerA

Neo compiles models for faster inference on specific hardware.

Why this answer

SageMaker Neo compiles the model to optimize it for the target hardware, reducing inference latency by applying hardware-specific optimizations such as kernel fusion, quantization, and memory layout tuning. This directly addresses the high latency issue for real-time inference without changing the instance type or inference mode.

Exam trap

AWS often tests the misconception that increasing instance size or switching to batch processing is the primary solution for latency, when in fact model compilation with SageMaker Neo is the most direct and cost-effective optimization for real-time inference.

How to eliminate wrong answers

Option B is wrong because using a larger instance with more memory may reduce latency due to increased compute capacity, but it is less effective and more costly than model compilation, which optimizes the model itself for the existing hardware. Option C is wrong because batch transform is designed for offline, asynchronous inference on large datasets, not for real-time inference, and it would not reduce latency for a real-time endpoint. Option D is wrong because SageMaker Inference Recommender helps select the optimal instance type and configuration for a given model, but it does not directly reduce latency; it recommends deployment parameters, whereas compilation actively optimizes the model.

110
Multi-Selecthard

Which THREE are benefits of using Amazon Bedrock over self-managing foundation models on EC2? (Choose THREE.)

Select 3 answers
A.Built-in integration with AWS services such as AWS CloudWatch and AWS CloudTrail.
B.Lower data transfer costs between cloud regions.
C.Access to a curated set of foundation models from different providers.
D.Managed infrastructure for model hosting and scaling.
E.Greater control over model fine-tuning and customization.
AnswersA, C, D

Bedrock natively logs to CloudWatch and CloudTrail for monitoring and auditing.

Why this answer

Option A is correct because Amazon Bedrock provides built-in integration with AWS services like CloudWatch for monitoring model invocation metrics and CloudTrail for auditing API calls. This eliminates the need to manually set up logging and monitoring infrastructure when self-managing foundation models on EC2, where you would have to configure these integrations yourself.

Exam trap

The trap here is that candidates may confuse 'managed infrastructure' with 'greater control'—Bedrock simplifies operations but reduces customization flexibility, so option E is a common distractor for those who think managed services offer more control than self-managed solutions.

111
MCQmedium

A financial services company is deploying a foundation model to analyze customer sentiment from call transcripts. The model outputs must be consistent and deterministic for auditing purposes. Which parameter configuration should the company use?

A.Set temperature to 0.1 and top_p to 0.9.
B.Set temperature to 0.7 and top_p to 1.0.
C.Set temperature to 0.5 and top_p to 0.5.
D.Set temperature to 0 and top_p to 1.
AnswerD

Temperature 0 makes the model deterministic.

Why this answer

Setting temperature to 0 and top_p to 1 forces the model to always select the highest-probability token at each step, producing deterministic and repeatable outputs. This is essential for auditing and compliance in financial services, where consistency is required. Any nonzero temperature introduces randomness, which undermines determinism.

Exam trap

AWS often tests the misconception that low temperature (e.g., 0.1) is 'deterministic enough,' but only temperature exactly 0 guarantees deterministic outputs, and top_p must be 1 to avoid interfering with the argmax selection.

How to eliminate wrong answers

Option A is wrong because temperature 0.1 still introduces slight randomness, making outputs non-deterministic and unsuitable for auditing. Option B is wrong because temperature 0.7 introduces significant randomness, and top_p 1.0 does not constrain it, leading to high variability. Option C is wrong because temperature 0.5 introduces randomness, and top_p 0.5 further restricts token sampling but does not eliminate the stochastic behavior from the nonzero temperature.

112
MCQhard

Refer to the exhibit. A developer sees this error when calling Amazon Bedrock for inference. What is the MOST likely cause and recommended solution?

A.The model ID is incorrect; use a different model
B.The prompt is too long; reduce the number of tokens in the prompt
C.The request rate exceeds the model's throughput limit; implement retries with exponential backoff
D.Increase the max_tokens_to_sample value
AnswerC

Throttling is due to rate limits; exponential backoff handles it.

Why this answer

Option A is correct. The error indicates throttling (rate exceeded). Retries with exponential backoff handle transient throttling.

Option B (fix prompt) is unrelated. Option C (change model) not needed. Option D (increase max_tokens) could exacerbate the issue.

113
MCQhard

A research team is using Amazon Bedrock to analyze scientific papers. They want the model to generate answers based only on papers published after 2023. Which approach should they use?

A.Fine-tune the model on a dataset of post-2023 papers and deploy it.
B.Set the maxTokens to a low value to force the model to rely on recent context.
C.Include a system prompt instructing the model to ignore data before 2023.
D.Use Amazon Bedrock Knowledge Bases with a metadata filter to retrieve only papers published after 2023, and generate responses based on retrieved content.
AnswerD

Metadata filtering ensures only relevant recent documents are used, grounding the model in current data.

Why this answer

Option D is correct because Amazon Bedrock Knowledge Bases with a metadata filter allows you to restrict retrieval to only documents that match specific metadata criteria, such as publication year. By filtering the vector search to only include papers published after 2023, the model generates responses based solely on that retrieved content, ensuring it does not rely on pre-2023 data. This approach is the only one that guarantees the model's answers are grounded exclusively in the specified time range.

Exam trap

AWS often tests the misconception that a system prompt or fine-tuning can reliably restrict a model's knowledge to a specific time period, when in fact only a retrieval-based approach with metadata filtering can enforce such temporal constraints.

How to eliminate wrong answers

Option A is wrong because fine-tuning the model on a dataset of post-2023 papers does not prevent the model from using its pre-existing training data (which includes pre-2023 knowledge) during inference; fine-tuning adjusts weights but does not erase prior knowledge, so the model could still generate answers based on older information. Option B is wrong because setting maxTokens to a low value limits the length of the generated response but does not control the temporal scope of the model's knowledge; the model can still draw on pre-2023 training data regardless of token count. Option C is wrong because a system prompt instructing the model to ignore data before 2023 is merely a suggestion and not a technical enforcement; the model has no inherent mechanism to filter its own training data by date, so it may still generate answers based on pre-2023 information, especially if the prompt is not strictly followed.

114
Multi-Selecthard

A company is deploying a customer service chatbot using a large language model (LLM) via Amazon Bedrock. The application must meet high accuracy for domain-specific queries, low latency, and be cost-effective. Which TWO strategies should the company adopt to achieve these goals? (Choose two.)

Select 2 answers
A.Store user prompts in a shared cache to reuse common queries.
B.Fine-tune the model on a large corpus of customer service transcripts to improve domain knowledge.
C.Use a Retrieval-Augmented Generation (RAG) architecture with a vector database for domain context.
D.Select a smaller, faster model that trades some accuracy for throughput.
E.Increase the model's maximum token limit to handle longer customer queries.
AnswersA, C

Caching frequent queries reduces latency and cost by avoiding repeated model invocations.

Why this answer

Retrieval-Augmented Generation (RAG) provides domain-specific context without full fine-tuning, reducing cost and latency. Caching responses for common queries reduces latency. Option A is not necessarily cost-effective; fine-tuning is expensive and may be overkill.

Option B is not good practice; it reduces security. Option D is overkill for latency; model choice should be driven by capability, not just throughput.

115
MCQeasy

Which AWS service provides a serverless API for accessing foundation models with per-token pricing?

A.Amazon Bedrock
B.Amazon API Gateway
C.AWS Lambda
D.Amazon SageMaker
AnswerA

Bedrock provides a serverless API with per-token pricing.

Why this answer

Amazon Bedrock is the managed service that offers a serverless API for foundation models, charging per token.

116
MCQeasy

A data science team is fine-tuning a Llama 2 7B model on Amazon SageMaker for a text classification task. After the first training run, they notice the loss is not decreasing and the model is overfitting to the small training set. What should the team change to mitigate overfitting?

A.Add dropout layers and reduce the learning rate.
B.Increase the number of epochs to allow the model to learn more patterns.
C.Increase the batch size and use gradient accumulation.
D.Remove dropout layers from the model architecture.
AnswerA

Dropout randomly drops neurons to prevent co-adaptation, and a lower learning rate helps stabilize training, both reducing overfitting.

Why this answer

Option D is correct because increasing dropout and reducing learning rate are standard regularization techniques. Option A is wrong because increasing batch size can slightly regularize but often insufficiently. Option B is wrong because increasing epochs typically worsens overfitting.

Option C is wrong because removing dropout reduces regularization, worsening overfitting.

117
MCQmedium

A company uses Amazon Bedrock to generate marketing copy. They want to measure the quality of generated text compared to reference text. Which metric is most appropriate?

A.F1 score
B.BLEU
C.RMSE
D.Accuracy
AnswerB

BLEU calculates n-gram overlap between candidate and reference text, suitable for generation evaluation.

Why this answer

BLEU (Bilingual Evaluation Understudy) is the most appropriate metric for evaluating the quality of generated text against reference text in tasks like machine translation and text generation. It measures n-gram precision between the generated and reference texts, making it ideal for assessing marketing copy generated by Amazon Bedrock.

Exam trap

AWS often tests the distinction between classification/regression metrics and text generation metrics, leading candidates to mistakenly apply F1 score or accuracy to evaluate generated text quality instead of using BLEU or similar sequence-based metrics.

How to eliminate wrong answers

Option A is wrong because F1 score is a classification metric that measures harmonic mean of precision and recall, not suitable for evaluating text generation quality against reference text. Option C is wrong because RMSE (Root Mean Square Error) is a regression metric used for continuous numerical predictions, not for text or sequence evaluation. Option D is wrong because Accuracy is a classification metric that measures the proportion of correct predictions, which does not account for the sequential and linguistic nuances of generated text.

118
MCQmedium

A company is developing a chatbot using Amazon Bedrock and wants to ensure the model's responses do not include toxic or biased language. The company has a labeled dataset of undesirable responses. Which approach should be used to fine-tune the foundation model to reduce harmful outputs?

A.Use reinforcement learning from human feedback (RLHF) with a reward model trained on human preferences.
B.Perform supervised fine-tuning on a curated dataset of safe responses.
C.Use prompt engineering to instruct the model to avoid toxic language.
D.Implement adversarial validation by testing against toxic inputs.
AnswerA

RLHF uses human feedback to train a reward model, which then guides the base model to generate safer outputs.

Why this answer

Reinforcement learning from human feedback (RLHF) is the correct approach because it directly optimizes the model to avoid toxic or biased outputs by training a reward model on human-labeled preferences. The reward model scores the model's responses, and the foundation model is fine-tuned via reinforcement learning to maximize these scores, effectively reducing harmful language. This method is specifically designed to align model behavior with nuanced human values, such as avoiding toxicity, which supervised fine-tuning alone cannot guarantee.

Exam trap

Cisco often tests the misconception that supervised fine-tuning or prompt engineering alone can reliably eliminate harmful outputs, when in fact RLHF is required to align the model with nuanced human preferences through iterative feedback.

How to eliminate wrong answers

Option B is wrong because supervised fine-tuning on a curated dataset of safe responses teaches the model to mimic safe patterns but does not explicitly penalize toxic outputs during generation; it lacks a reward signal to discourage harmful language when the model deviates from the training distribution. Option C is wrong because prompt engineering is a static, instruction-based technique that can be easily bypassed by adversarial inputs or subtle variations in phrasing; it does not modify the model's internal weights to reliably avoid toxic language. Option D is wrong because adversarial validation only tests the model's robustness to toxic inputs without fine-tuning the model itself; it identifies vulnerabilities but does not reduce harmful outputs in production.

119
MCQeasy

A company wants to use a foundation model to automatically summarize lengthy documents. Which capability of foundation models is being utilized?

A.Text generation
B.Sentiment analysis
C.Text classification
D.Machine translation
AnswerA

Summarization is a form of text generation where the model produces concise output.

Why this answer

Summarization is a text generation task where the model produces a concise version of the original content. Foundation models (e.g., GPT, Claude) are pre-trained on vast corpora and can generate coherent summaries by predicting the next tokens conditioned on the input document. This directly utilizes the text generation capability, not classification or translation.

Exam trap

Cisco often tests the distinction between text generation and text classification, so the trap here is that candidates may confuse summarization (a generative task) with classification or analysis tasks, especially when the question emphasizes 'understanding' the document rather than 'producing' new text.

How to eliminate wrong answers

Option B (Sentiment analysis) is wrong because it involves classifying the emotional tone of text (positive, negative, neutral), not generating a summary. Option C (Text classification) is wrong because it assigns predefined labels or categories to text, whereas summarization requires generating new text. Option D (Machine translation) is wrong because it converts text from one language to another, not condensing content within the same language.

120
MCQmedium

A company uses Amazon Bedrock to generate code snippets for internal tools. They notice that the generated code often contains security vulnerabilities such as SQL injection and cross-site scripting. The security team has compiled a comprehensive list of secure coding guidelines and examples of vulnerable patterns. The development team wants to reduce vulnerabilities without significantly slowing down the code generation process. They have tried adding the guidelines to the system prompt, but the model still produces insecure code occasionally. The team is considering additional measures. Which action should they take to most effectively eliminate security vulnerabilities in the generated code?

A.Implement a post-processing step using Amazon CodeGuru or a similar static analysis tool to scan the generated code for vulnerabilities and reject or fix insecure code.
B.Use a larger, more expensive foundation model that specializes in code generation.
C.Include the complete secure coding guidelines in every prompt.
D.Increase the temperature parameter of the foundation model to promote more diverse outputs.
AnswerA

Correct: Post-processing with static analysis reliably catches vulnerabilities and can be automated without slowing down generation significantly.

Why this answer

Option A is correct because it introduces a deterministic, post-generation validation layer that catches vulnerabilities the model might miss. Amazon CodeGuru Reviewer or similar static analysis tools can scan generated code for patterns like SQL injection and XSS, then reject or fix insecure code without modifying the generation process itself. This approach directly addresses the security team's guidelines while maintaining generation speed, as the model's inference latency is unaffected.

Exam trap

AWS often tests the misconception that prompt engineering alone can fully control model output, when in reality, deterministic post-processing steps are required to enforce strict security or compliance requirements.

How to eliminate wrong answers

Option B is wrong because using a larger, more expensive foundation model does not guarantee elimination of security vulnerabilities; all models can produce insecure code, and size does not correlate with adherence to specific security guidelines. Option C is wrong because including the complete secure coding guidelines in every prompt increases token usage and may cause the model to ignore or truncate the guidelines, leading to inconsistent results and slower generation due to longer prompts. Option D is wrong because increasing the temperature parameter promotes more diverse and random outputs, which would likely increase the probability of generating insecure code rather than reducing it.

121
MCQeasy

An e-commerce company uses a foundation model to generate personalized email subject lines. The marketing team notices that the subject lines sometimes contain product recommendations that are out of stock. Which action would best reduce the generation of out-of-stock recommendations without retraining the model?

A.Implement a post-processing step to replace out-of-stock recommendations with in-stock alternatives.
B.Fine-tune the model on a dataset of past successful subject lines that only include in-stock products.
C.Add a system prompt that explicitly instructs the model to only recommend products that are in stock.
D.Use a retrieval-augmented generation (RAG) approach to retrieve a list of in-stock products and include it in the prompt.
AnswerC

A system prompt can constrain the model's output to follow the instruction, reducing unwanted recommendations.

Why this answer

Option C is correct because adding a system prompt that explicitly instructs the model to only recommend in-stock products directly constrains the model's output at inference time without requiring retraining. This leverages the model's instruction-following capability to filter its generated content based on the provided context, which is a lightweight and immediate solution.

Exam trap

AWS often tests the distinction between inference-time interventions (like prompt engineering) and training-time interventions (like fine-tuning), and the trap here is that candidates may confuse RAG (which retrieves external data but does not enforce constraints) with a system prompt that directly instructs the model, leading them to select D instead of C.

How to eliminate wrong answers

Option A is wrong because post-processing replacement of out-of-stock recommendations with in-stock alternatives is reactive and may introduce irrelevant or incorrect substitutions, failing to prevent the model from generating out-of-stock items in the first place. Option B is wrong because fine-tuning the model requires retraining on a new dataset, which contradicts the question's constraint of 'without retraining the model.' Option D is wrong because while RAG can retrieve a list of in-stock products, including it in the prompt does not guarantee the model will exclusively recommend those items; the model may still generate out-of-stock recommendations from its parametric knowledge, especially if the prompt is not strictly enforced.

122
Multi-Selecthard

A data scientist is fine-tuning a foundation model on Amazon Bedrock for a custom summarization task. Which THREE practices should they follow to optimize the fine-tuning process?

Select 3 answers
A.Start with a base model that is already strong in the domain.
B.Use the default hyperparameters without tuning.
C.Use a representative dataset that reflects the target task.
D.Monitor training loss and validation loss to avoid overfitting.
E.Train for as many epochs as possible.
AnswersA, C, D

A good base model reduces training time and improves results.

Why this answer

Starting with a base model that is already strong in the domain (Option A) is correct because it reduces the amount of fine-tuning data and compute required. Amazon Bedrock provides access to various foundation models (e.g., Anthropic Claude, Amazon Titan) that have been pre-trained on diverse corpora; selecting one that is already proficient in the target domain (e.g., legal or medical summarization) means the model's existing knowledge can be adapted with fewer training steps, leading to better performance and lower risk of catastrophic forgetting.

Exam trap

Cisco often tests the misconception that more epochs always improve model performance, when in fact excessive training leads to overfitting, and they expect candidates to recognize that monitoring loss curves and using early stopping are critical practices.

123
MCQmedium

A developer encounters the error shown above when using Amazon Bedrock. What is the most likely cause?

A.The model is not available in the region
B.The IAM role lacks the required permission
C.The request is throttled
D.The model is out of service
AnswerB

The error explicitly states the role is not authorized for the action.

Why this answer

The error indicates an access denied or authorization failure when invoking the Amazon Bedrock model. The most likely cause is that the IAM role used by the developer does not have the required permission, such as `bedrock:InvokeModel`, attached to its policy. Without this permission, the API call to Bedrock is rejected regardless of model availability or service status.

Exam trap

AWS often tests the distinction between service availability errors and authorization errors, so the trap here is that candidates may confuse a permissions failure with a model unavailability or throttling issue, especially when the error message is generic.

How to eliminate wrong answers

Option A is wrong because if the model were not available in the region, the error would typically be a `ModelNotFoundException` or `ValidationException`, not an access denied error. Option C is wrong because throttling errors return a `ThrottlingException` with HTTP 429 status code, not an authorization error. Option D is wrong because if the model were out of service, the error would be a `ServiceUnavailableException` or `ModelNotReadyException`, not a permissions-related error.

124
MCQeasy

A marketing agency uses a foundation model to generate images for social media campaigns. Some generated images have contained violent or inappropriate content, damaging the brand. The agency needs to prevent such content from being displayed automatically. They are using Amazon Bedrock for image generation with Stable Diffusion. What is the most effective way to filter out inappropriate images?

A.Use Amazon Rekognition to analyze images after generation.
B.Manually review all images before posting.
C.Restrict the prompt to avoid triggering keywords.
D.Enable the safety checker in Amazon Bedrock's image generation models.
AnswerD

Built-in safety checker filters out inappropriate images without additional overhead.

Why this answer

Option B is correct because Stable Diffusion models in Bedrock include a safety checker that can detect and block NSFW content before output. Option A (Amazon Rekognition) introduces additional cost and latency. Option C (manual review) is not scalable.

Option D (restrict prompt) is unreliable as the model can still generate inappropriate content from safe prompts.

125
MCQhard

An e-commerce company is using a foundation model to generate product descriptions. They want to reduce costs by caching frequently requested descriptions. Which AWS service should they use to implement a cache?

A.Amazon CloudFront
B.Amazon DynamoDB
C.Amazon S3
D.Amazon ElastiCache
AnswerD

ElastiCache provides low-latency caching for frequently used data.

Why this answer

Amazon ElastiCache is the correct choice because it provides an in-memory caching layer (using Redis or Memcached) that can store frequently requested product descriptions, reducing the need to invoke the foundation model repeatedly. This directly lowers inference costs and latency by serving cached responses instead of generating new ones each time.

Exam trap

Cisco often tests the distinction between caching at the application layer (ElastiCache) versus caching at the content delivery layer (CloudFront), leading candidates to mistakenly choose CloudFront for any caching need.

How to eliminate wrong answers

Option A is wrong because Amazon CloudFront is a content delivery network (CDN) that caches static and dynamic content at edge locations, but it is not designed for application-level caching of model-generated text; it caches HTTP responses, not arbitrary key-value data. Option B is wrong because Amazon DynamoDB is a fully managed NoSQL database optimized for high-throughput, low-latency reads and writes, but it is not a caching service; using it as a cache would incur higher costs and lack native TTL-based eviction policies for transient data. Option C is wrong because Amazon S3 is an object storage service for storing large amounts of unstructured data, not a low-latency cache; retrieving descriptions from S3 would introduce significant latency compared to an in-memory cache, defeating the purpose of cost reduction.

126
Multi-Selectmedium

Which THREE of the following are factors to consider when selecting a foundation model for a text generation task?

Select 3 answers
A.Supported output modalities
B.Pricing per token
C.Model size (parameters)
D.Training data source and diversity
E.Availability of automatic scaling
AnswersB, C, D

Cost per token affects operational expense.

Why this answer

Pricing per token is a critical factor because foundation model APIs (e.g., Amazon Bedrock, OpenAI) charge based on the number of input and output tokens. For text generation tasks, token costs directly impact operational budgets, especially for high-volume or long-context applications. Selecting a model with lower per-token pricing can significantly reduce inference costs without sacrificing quality.

Exam trap

AWS often tests the distinction between model-level attributes (e.g., token pricing, training data, parameter count) and platform-level operational features (e.g., scaling, output modalities), leading candidates to incorrectly select options like automatic scaling or multimodal support for a text-only task.

127
MCQmedium

A company uses Amazon Bedrock to generate summarizations of lengthy reports. Users report that the summaries are too verbose and include excessive detail. Which prompt engineering technique should the team apply to address this issue?

A.Reduce the input context length to limit available information.
B.Increase the maxTokens parameter in the inference request.
C.Include few-shot examples of desired outputs.
D.Add explicit constraints like 'Provide a concise summary in two sentences.'
AnswerD

Explicit constraints directly guide the model to produce shorter output, addressing verbosity effectively.

Why this answer

Option D is correct because adding explicit constraints like 'Provide a concise summary in two sentences' directly instructs the model to limit verbosity and detail. This prompt engineering technique uses clear, specific instructions to control output length and style, which is the most effective way to address overly verbose summaries without altering model parameters or input data.

Exam trap

The trap here is that candidates confuse reducing input length (Option A) with controlling output length, or they mistakenly think increasing maxTokens (Option B) can somehow shorten output, when in fact it does the opposite.

How to eliminate wrong answers

Option A is wrong because reducing input context length does not guarantee concise output; the model may still generate verbose summaries from the remaining text, and it risks losing critical information needed for accurate summarization. Option B is wrong because increasing the maxTokens parameter actually allows the model to generate longer outputs, which would exacerbate the verbosity issue rather than solve it. Option C is wrong because few-shot examples can guide output format but are less direct and reliable than explicit constraints; they may not consistently enforce conciseness, especially if the examples themselves are not perfectly aligned with the desired brevity.

128
MCQmedium

A large enterprise uses Amazon Bedrock to power a conversational agent that handles customer service inquiries. The agent is built using Bedrock Agents and retrieves information from a knowledge base that contains product documentation and FAQs. Recently, users have reported that the agent sometimes provides incorrect information that contradicts the knowledge base. The development team verified that the knowledge base contains accurate and up-to-date data. They also confirmed that the retrieval process correctly fetches relevant documents. However, the agent occasionally ignores the retrieved context and generates plausible-sounding but incorrect answers. The team is concerned about customer trust and wants to improve the accuracy of the agent's responses without overhauling the architecture. They have already tuned the prompt template to instruct the model to use the context. The issue persists. Which additional action should the team take to reduce the number of hallucinated responses?

A.Reduce the chunk size of documents in the knowledge base to retrieve more granular information.
B.Switch to a larger foundation model with more parameters.
C.Increase the temperature parameter of the foundation model.
D.Add explicit instructions in the system prompt to require the model to base its answers solely on the retrieved context and to state when it doesn't have enough information.
AnswerD

Correct: Strengthening the prompt with explicit directives can reduce hallucinations by forcing the model to rely on the provided context.

Why this answer

Option B directly addresses the model ignoring context by strengthening the instruction. Option A increases randomness, Option C does not guarantee use of context, Option D may not help if retrieval is already good.

129
Multi-Selecteasy

Which TWO AWS services can be used together to build a chatbot that leverages a foundation model for natural language understanding?

Select 2 answers
A.Amazon Rekognition
B.Amazon Lex
C.Amazon Polly
D.AWS Glue
E.Amazon Bedrock
AnswersB, E

Lex handles dialog management and intent recognition.

Why this answer

Amazon Lex provides the conversational interface and natural language understanding (NLU) to interpret user intents and manage dialog, while Amazon Bedrock gives access to foundation models (FMs) for advanced natural language generation and understanding. Together, Lex can route utterances to a Bedrock FM via a Lambda function or direct integration, enabling a chatbot that leverages a pre-trained FM for richer responses.

Exam trap

AWS often tests the distinction between services that handle conversational interfaces (Lex) versus those that provide generative AI models (Bedrock), tempting candidates to pick Polly (speech output) or Rekognition (vision) as part of a chatbot, when they are not core to NLU or FM integration.

130
MCQhard

A team is fine-tuning a foundation model using SageMaker. They want to minimize training time while keeping the model's original knowledge. Which technique is BEST suited?

A.Use Parameter Efficient Fine-Tuning (PEFT) such as LoRA
B.Use distributed training across multiple GPUs
C.Use prompt engineering instead of fine-tuning
D.Full fine-tuning on the new dataset
AnswerA

PEFT methods adapt the model with fewer trainable parameters, reducing training time and preserving original knowledge.

Why this answer

Parameter Efficient Fine-Tuning (PEFT) methods like LoRA (Low-Rank Adaptation) are best suited because they freeze the pre-trained model weights and inject trainable low-rank matrices into specific layers, drastically reducing the number of trainable parameters. This minimizes training time and computational cost while preserving the model's original knowledge, as only a small fraction of parameters are updated during fine-tuning.

Exam trap

AWS often tests the distinction between techniques that modify the model (fine-tuning) versus those that only change the input (prompt engineering), and the trap here is that candidates may choose distributed training (Option B) thinking it reduces time, but it does not address parameter efficiency or knowledge preservation as directly as PEFT.

How to eliminate wrong answers

Option B is wrong because distributed training across multiple GPUs accelerates training but does not inherently preserve the model's original knowledge or reduce the number of updated parameters; it still requires full or partial parameter updates and does not address the goal of minimizing training time through parameter efficiency. Option C is wrong because prompt engineering is a zero-shot or few-shot inference technique that does not involve training at all, so it cannot be used to fine-tune the model on a new dataset. Option D is wrong because full fine-tuning updates all model parameters, which is computationally expensive, time-consuming, and risks catastrophic forgetting of the original knowledge, contrary to the goal of minimizing training time while preserving original knowledge.

131
MCQhard

A data scientist is fine-tuning a foundation model on a custom dataset using Amazon SageMaker. After training, the model shows high accuracy on training data but poor on validation. Which action should be taken?

A.Add dropout layers
B.Reduce training epochs or add regularization
C.Increase learning rate
D.Use a different foundation model
AnswerB

Reducing epochs prevents overfitting; regularization also helps.

Why this answer

The model is overfitting, as indicated by high training accuracy but poor validation performance. Reducing training epochs or adding regularization (e.g., L1/L2 weight decay) directly addresses overfitting by limiting the model's capacity to memorize noise. In Amazon SageMaker, this can be implemented via hyperparameter tuning or by modifying the training script to include regularization terms.

Exam trap

AWS often tests the misconception that overfitting is solved by increasing model complexity or data augmentation, but the correct approach is to reduce capacity or add regularization.

How to eliminate wrong answers

Option A is wrong because adding dropout layers is a regularization technique that could help, but it is not the only or most direct action; the question asks for a single action, and reducing epochs or adding regularization (Option B) is a more fundamental fix for overfitting. Option C is wrong because increasing the learning rate can cause the model to diverge or overshoot minima, worsening generalization and potentially increasing overfitting. Option D is wrong because using a different foundation model does not address the root cause of overfitting; the current model is capable of learning the training data, and the issue is with training dynamics, not model architecture.

132
MCQeasy

Refer to the exhibit. A developer runs this command but gets an error: 'An error occurred (AccessDeniedException) when calling the ListFoundationModels operation'. What is the most likely cause?

A.The IAM role does not have bedrock:ListFoundationModels permission
B.The AWS CLI version is outdated
C.The foundation model is not available in us-west-2
D.The region us-west-2 does not support Bedrock
AnswerA

AccessDeniedException is due to missing IAM permissions.

Why this answer

The error 'AccessDeniedException' when calling ListFoundationModels indicates that the IAM role or user executing the AWS CLI command lacks the required permission to list foundation models in Amazon Bedrock. The specific permission needed is bedrock:ListFoundationModels, which must be attached to the IAM identity via a policy. Without this permission, the API call is denied regardless of other factors like region or CLI version.

Exam trap

AWS often tests the distinction between service availability errors (e.g., region not supported) and IAM permission errors, where candidates mistakenly attribute an AccessDeniedException to regional or model availability issues rather than missing IAM permissions.

How to eliminate wrong answers

Option B is wrong because an outdated AWS CLI version would typically produce a different error (e.g., 'InvalidClientTokenId' or 'UnrecognizedClientException'), not an AccessDeniedException, and the ListFoundationModels API is available in recent CLI versions. Option C is wrong because the error is an access denial, not a model availability issue; if a model were unavailable, the error would be something like 'ValidationException' or 'ResourceNotFoundException' when trying to use that specific model. Option D is wrong because us-west-2 (Oregon) fully supports Amazon Bedrock and its APIs; the error is explicitly an IAM permissions issue, not a regional unsupported service error.

133
MCQeasy

A company wants to use a pre-trained foundation model for sentiment analysis without any customization. Which Amazon Machine Learning service provides access to foundation models via API?

A.Amazon Bedrock
B.Amazon Textract
C.Amazon Comprehend
D.Amazon Rekognition
AnswerA

Why this answer

Amazon Bedrock provides a managed API to access foundation models from providers like AI21 Labs, Anthropic, and Amazon. Amazon Rekognition is for images; Textract for document text; Comprehend for natural language processing (but not foundation models per se).

134
MCQmedium

A company is building a chatbot using Amazon Bedrock to answer customer questions about their product catalog. The chatbot should only use information from the company's internal knowledge base and should not generate answers based on the model's pre-training data. Which feature should be enabled?

A.Use prompt engineering to instruct the model to only use the knowledge base
B.Configure a knowledge base with Retrieval Augmented Generation (RAG)
C.Enable model invocation logging to review responses
D.Fine-tune the model on the product catalog data
AnswerB

RAG grounds responses in the provided knowledge base, avoiding use of pre-training data.

Why this answer

Option B is correct because configuring a knowledge base with Retrieval Augmented Generation (RAG) allows the chatbot to retrieve relevant documents from the company's internal knowledge base and use them as context for generating answers. This ensures the model's responses are grounded solely in the provided data, preventing reliance on its pre-training knowledge.

Exam trap

The trap here is that candidates often confuse fine-tuning with RAG, assuming fine-tuning alone can restrict the model to a specific knowledge domain, when in fact fine-tuning does not prevent the model from using its pre-training data and can still produce off-topic responses.

How to eliminate wrong answers

Option A is wrong because prompt engineering alone cannot reliably prevent the model from using its pre-training data; it only provides instructions that the model may still override with its internal knowledge. Option C is wrong because model invocation logging only records responses for auditing and debugging, it does not constrain the model's source of information. Option D is wrong because fine-tuning adapts the model to the product catalog but does not guarantee that the model will ignore its pre-training data; it can still generate answers from its original training corpus.

135
MCQhard

An enterprise deploys a foundation model on Amazon Bedrock with a knowledge base. Users report that the model is returning outdated information. What is the most likely cause?

A.The model was fine-tuned
B.The model is not the latest version
C.The knowledge base data source is not refreshed
D.The inference parameters are incorrect
AnswerC

If the underlying data source hasn't been updated, the knowledge base contains stale data.

Why this answer

When a knowledge base is attached to a foundation model on Amazon Bedrock, the model retrieves information from the data source to augment its responses. If the data source is not refreshed, the model will return outdated information even if the model itself is current. Option C directly addresses this by identifying the stale data source as the root cause.

Exam trap

The trap here is that candidates may confuse model versioning (Option B) with data freshness, but the question specifically ties the symptom to the knowledge base, making the refresh cycle the critical factor.

How to eliminate wrong answers

Option A is wrong because fine-tuning adjusts the model's weights on a specific dataset, which does not inherently cause outdated information; in fact, fine-tuning could update the model with newer data. Option B is wrong because using an older model version might affect performance or capabilities, but the question specifically states the model is returning outdated information, which points to the knowledge base content, not the model version. Option D is wrong because inference parameters (e.g., temperature, top_p) control randomness and creativity of responses, not the freshness or accuracy of the information retrieved from the knowledge base.

← PreviousPage 2 of 2 · 135 questions total

Ready to test yourself?

Try a timed practice session using only Foundation Model Applications questions.