CCNA Generative Ai Fundamentals Questions — Page 2 of 2

MCQhard

Refer to the exhibit. A developer receives an error when trying to invoke the Claude Instant model from an application. The application uses the IAM role 'MyAppRole'. Which IAM policy statement should be added to the role to resolve the error?

A.{"Effect":"Allow","Action":"bedrock:GetFoundationModel","Resource":"*"}

B.{"Effect":"Allow","Action":"bedrock:InvokeModel","Resource":"arn:aws:bedrock:us-east-1::foundation-model/*"}

C.{"Effect":"Allow","Action":"bedrock:InvokeModel","Resource":"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1"}

D.{"Effect":"Allow","Action":"bedrock:*","Resource":"*"}

AnswerC

This grants the minimal required permission for the specific model.

Why this answer

The error indicates missing permission to invoke the specific model. The correct action is 'bedrock:InvokeModel' on the specific model ARN.

Practice this question →

MCQhard

Refer to the exhibit. A developer is optimizing latency for a generative AI model deployed on SageMaker. Based on the exhibit, which change would most likely reduce per-token latency?

A.Use a CPU instance

B.Reduce model size through quantization

C.Switch to a larger instance type

D.Increase batch size to 10

AnswerB

Quantization reduces the precision of model weights, decreasing compute per token and thus latency.

Why this answer

Option C, reduce model size through quantization, directly reduces computation per token, lowering latency. Option A (larger instance) may help but is less targeted. Option B (increase batch size) improves throughput but not per-token latency.

Option D (CPU instance) would increase latency.

Practice this question →

MCQhard

An enterprise wants to ensure that generative AI applications built on AWS comply with data privacy regulations. They need to prevent the model from using customer data in future training. Which feature of Amazon Bedrock should they enable?

A.Policy-based data governance

B.Opt-out of model improvement

C.Data encryption at rest

D.Model customization with customer data

AnswerB

Opting out ensures customer data is not used for AWS model training or service improvement.

Why this answer

Option D, opt-out of model improvement, prevents AWS from using customer data for service improvement and training. Option A (encryption) protects data at rest but does not prevent use in training. Option B (model customization) may actually use customer data.

Option C (policy-based data governance) is not a specific Bedrock feature for this purpose.

Practice this question →

Multi-Selecthard

A company is using Amazon Bedrock to generate creative marketing copy. They want to reduce the randomness of the output while maintaining diversity. Which TWO parameters should they adjust?

Select 2 answers

A.Increase the temperature

B.Increase the max token count

C.Increase the top_k value

D.Decrease the top_p value

E.Decrease the temperature

AnswersD, E

Lower top_p reduces the set of possible tokens, making output less random.

Why this answer

Decreasing the temperature (Option E) reduces randomness by lowering the probability of sampling lower-ranked tokens, making the model more deterministic. Decreasing top_p (Option D) narrows the cumulative probability threshold for token selection, which also reduces randomness while still allowing some diversity within the narrowed set. Together, these parameters control the trade-off between creativity and determinism in Amazon Bedrock's text generation.

Exam trap

Cisco often tests the misconception that increasing top_k or top_p reduces randomness, when in fact increasing either expands the token pool and can increase randomness, while decreasing them is what reduces randomness.

Practice this question →

Multi-Selecthard

Which THREE considerations are essential when deploying a generative AI application in a regulated industry such as healthcare?

Select 3 answers

A.Lowest possible inference latency for real-time responses.

B.Full audit trail of model inputs and outputs for accountability.

C.Robust content filtering to block harmful or inaccurate outputs.

D.Maximum creative freedom for the model to generate diverse responses.

E.Data privacy and compliance with regulations like HIPAA.

AnswersB, C, E

Required for compliance and investigation.

Why this answer

Options A, B, and D are correct. Data privacy and compliance (e.g., HIPAA) are mandatory. Robust filtering for harmful output is required to prevent harm.

Full auditability of model responses is needed for regulatory compliance. Option C is wrong because creative freedom is often restricted in regulated contexts. Option E is wrong because faster inference is a performance concern, not a regulatory essential.

Practice this question →

MCQhard

A team is developing a real-time code completion feature using an LLM deployed on Amazon SageMaker. They observe high latency under load. Which optimization technique should they prioritize?

A.Increase batch size

B.Switch to a larger instance type

C.Increase instance count with Auto Scaling

D.Use model quantization

AnswerD

Quantization reduces model precision and size, leading to faster inference with minimal accuracy loss.

Why this answer

Option B, model quantization, reduces the model size and speeds up inference directly, lowering latency. Option A (Auto Scaling) improves throughput but not per-request latency. Option C (increase batch size) improves throughput but may increase per-token latency.

Option D (larger instance) may improve but not as effectively as quantization.

Practice this question →

MCQmedium

A company is using Amazon Bedrock to summarize long documents. They notice that the summary sometimes omits key details. What is the most likely cause?

A.The model is overfitted

B.The prompt lacks examples

C.The model's context window is too small

D.The temperature parameter is too high

AnswerC

A small context window truncates the input document, causing the model to miss key details.

Why this answer

Option A, the model's context window is too small, causes the model to only see part of the document, resulting in omitted details. Option B (temperature too high) increases randomness, not omission. Option C (lack of examples) may affect quality but not omission due to length.

Option D (overfitting) would affect performance on new data, not specifically omission of details.

Practice this question →

MCQmedium

A company is using Amazon SageMaker JumpStart to deploy a pre-trained text generation model. After deployment, the model produces slow inference responses. Which action is most likely to improve inference latency?

A.Quantize the model weights to FP16 or INT8.

B.Deploy the model on a more powerful instance type with higher GPU memory.

C.Fine-tune the model on a smaller dataset.

D.Increase the batch size for inference requests.

AnswerB

More compute resources reduce inference time per request.

Why this answer

Option B is correct because deploying the model on a more powerful instance type with higher GPU memory directly addresses the computational bottleneck causing slow inference. A larger GPU provides more CUDA cores and memory bandwidth, enabling faster matrix operations and reducing the time per forward pass for the pre-trained text generation model.

Exam trap

Cisco often tests the misconception that model optimization techniques like quantization always improve latency without trade-offs, but the most direct and reliable method for reducing inference latency is upgrading to a more powerful instance type with higher GPU memory.

How to eliminate wrong answers

Option A is wrong because quantizing model weights to FP16 or INT8 reduces model size and can improve latency, but it may degrade output quality and is not the most direct or guaranteed fix for slow inference; the question asks for the action most likely to improve latency, and upgrading hardware is more reliable. Option C is wrong because fine-tuning on a smaller dataset adjusts the model for a specific task but does not inherently speed up inference; it may even increase latency if the fine-tuned model is larger or uses more complex attention patterns. Option D is wrong because increasing batch size for inference requests typically increases throughput (requests per second) but can increase per-request latency due to longer queue times and higher memory usage, making it counterproductive for reducing individual response time.

Practice this question →

MCQeasy

A data scientist wants to fine-tune a foundation model on a specific domain dataset using Amazon SageMaker. Which built-in SageMaker feature can simplify the training process?

A.SageMaker Neo

B.SageMaker Canvas

C.SageMaker JumpStart

D.SageMaker Ground Truth

AnswerC

JumpStart offers one-click fine-tuning for many foundation models, with built-in notebooks and scripts.

Why this answer

Option A, SageMaker JumpStart, provides pre-trained foundation models and built-in fine-tuning scripts, simplifying the process. Option B (Ground Truth) is for data labeling. Option C (Neo) is for model optimization.

Option D (Canvas) is a no-code ML tool for business analysts.

Practice this question →

MCQeasy

A developer is using Amazon Bedrock to create a chatbot. They want to ensure the bot does not generate toxic or offensive content. Which feature should they enable?

A.Use careful prompt engineering to avoid toxic responses.

B.Fine-tune the model on a dataset of safe responses.

C.Enable content filtering on the Bedrock model.

D.Implement external response validation using a third-party API.

AnswerC

Content filtering provides automated detection and blocking of inappropriate content.

Why this answer

Option B is correct because Bedrock offers content filtering to detect and block harmful content. Option A is wrong because fine-tuning may not fully filter toxic content. Option C is wrong because prompt engineering alone is not enough.

Option D is wrong because response validation is not a built-in feature of Bedrock.

Practice this question →

MCQeasy

A company wants to build a generative AI application that generates personalized marketing emails based on customer data. They have a small dataset of past emails. Which AWS service should they use to fine-tune a foundation model with their data?

A.Amazon SageMaker

B.Amazon Comprehend

C.AWS Lambda

D.Amazon Bedrock

AnswerA

SageMaker with JumpStart allows fine-tuning of foundation models using custom datasets and provides managed training infrastructure.

Why this answer

Amazon SageMaker provides a managed environment for training and fine-tuning models, including foundation models via JumpStart. Bedrock offers managed APIs but not direct fine-tuning. Lambda is for serverless code, not model training.

Comprehend is for NLP analysis, not text generation.

Practice this question →

MCQeasy

A startup wants to generate product descriptions from a few keywords using a foundation model. They need a fully managed serverless solution that requires no infrastructure setup. Which AWS service should they use?

A.Amazon SageMaker

B.Amazon Comprehend

C.AWS Lambda

D.Amazon Bedrock

AnswerD

Bedrock is a serverless service offering foundation models via API.

Why this answer

Amazon Bedrock is a fully managed serverless service that provides access to foundation models (FMs) from leading AI providers via a simple API, making it ideal for generating product descriptions from keywords without any infrastructure management. It directly supports generative AI tasks like text generation, unlike other AWS services that focus on different ML or NLP capabilities.

Exam trap

The trap here is that candidates may confuse Amazon SageMaker's managed ML capabilities with a serverless generative AI service, overlooking that SageMaker requires explicit infrastructure setup for model hosting, while Bedrock is purpose-built for serverless access to foundation models.

How to eliminate wrong answers

Option A is wrong because Amazon SageMaker is a fully managed machine learning platform that requires setting up training jobs, endpoints, and infrastructure for custom models, not a serverless solution for directly using pre-built foundation models. Option B is wrong because Amazon Comprehend is a natural language processing (NLP) service for tasks like sentiment analysis and entity extraction, not for generative text creation from keywords. Option C is wrong because AWS Lambda is a serverless compute service that runs custom code but does not natively provide access to foundation models; you would need to integrate it with another service like Bedrock to generate descriptions, making it not a standalone solution for this use case.

Practice this question →

MCQeasy

A developer is testing different prompts for a text generation model on Amazon Bedrock. Which parameter controls the randomness of the model's output?

A.top_p

B.stop_sequences

C.temperature

D.max_tokens

AnswerC

Temperature directly scales the logits before softmax, controlling the randomness of token selection.

Why this answer

Option D is correct because temperature controls the randomness of the model's predictions. Lower values make output more deterministic; higher values increase randomness. Option A (max_tokens) controls output length.

Option B (top_p) is nucleus sampling. Option C (stop_sequences) defines stopping criteria.

Practice this question →

MCQhard

An organization is using Amazon Bedrock to power a customer service chatbot. They notice that the chatbot occasionally generates hallucinated information about product specifications. Which strategy should be implemented to reduce hallucinations?

A.Fine-tune the model on a dataset of product specification conversations.

B.Integrate a Retrieval Augmented Generation (RAG) system with the product catalog.

C.Use more detailed prompts with explicit instructions to avoid speculation.

D.Increase the temperature parameter to make outputs more conservative.

AnswerB

RAG provides up-to-date, factual context to the model, reducing hallucinations.

Why this answer

Retrieval Augmented Generation (RAG) grounds the model's responses in authoritative, up-to-date product catalog data, directly reducing hallucinations by ensuring the chatbot references verified facts rather than relying solely on its parametric memory. This is the most effective strategy because it provides a retrieval-based factual foundation that fine-tuning or prompt engineering alone cannot guarantee.

Exam trap

Cisco often tests the misconception that prompt engineering or fine-tuning alone can solve hallucination problems, when in fact they lack the dynamic, verifiable grounding that RAG provides.

How to eliminate wrong answers

Option A is wrong because fine-tuning on product specification conversations may reinforce patterns from the training data but does not prevent the model from generating plausible-sounding but incorrect details when faced with queries outside the fine-tuned distribution; it also cannot dynamically incorporate real-time catalog updates. Option C is wrong because while more detailed prompts can reduce speculation, they do not provide the model with access to external, authoritative data—hallucinations can still occur when the model's internal knowledge is incomplete or outdated. Option D is wrong because increasing the temperature parameter makes outputs more random and creative, not more conservative; decreasing temperature would make outputs more deterministic and less prone to hallucination, but even low temperature cannot eliminate hallucinations without a retrieval mechanism.

Practice this question →

MCQeasy

A developer wants to test different foundation models quickly without setting up infrastructure. Which AWS service allows interactive prompting and comparison of multiple models?

A.Amazon Comprehend

B.Amazon Bedrock Playground

C.Amazon Lex

D.Amazon SageMaker Studio

AnswerB

Bedrock offers a playground to interactively test and compare foundation models.

Why this answer

Amazon Bedrock provides a playground feature for testing models. Option A (SageMaker Studio) is for full notebook environment. Option C (Comprehend) is for analysis.

Option D (Lex) is for chatbots.

Practice this question →

MCQeasy

A startup is building a customer support chatbot using Amazon Bedrock with the Claude foundation model. The chatbot needs to answer questions based on a knowledge base of frequently asked questions (FAQs) stored in an Amazon S3 bucket. The team wants to implement Retrieval Augmented Generation (RAG) to provide accurate and context-aware responses. They are evaluating different approaches to integrate the knowledge base. What is the most efficient way to implement RAG with Bedrock?

A.Use AWS Lambda to fetch documents from S3 and inject them into the prompt.

B.Manually extract all FAQs and include them in the prompt each time the chatbot responds.

C.Fine-tune the Claude model on the FAQs so the model memorizes the knowledge base.

D.Use Amazon Bedrock Knowledge Bases to directly connect the S3 bucket and retrieve relevant documents for the prompt.

AnswerD

Bedrock Knowledge Bases provides a managed RAG solution with automatic indexing and retrieval.

Why this answer

Option A is correct. Amazon Bedrock Knowledge Bases provides a native feature to connect to data sources like S3, automatically chunk and index documents, and retrieve relevant information. This is the most efficient and managed approach.

Option B is incorrect because manually including all FAQs in the prompt would exceed token limits and be impractical. Option C is incorrect because fine-tuning the model on FAQs is overkill for this use case and does not allow dynamic updates. Option D is a possible custom solution but is less efficient than using the built-in knowledge base feature.

Practice this question →

Multi-Selectmedium

Which TWO factors are most important when selecting a foundation model in Amazon Bedrock for a text summarization task with strict latency requirements?

Select 2 answers

A.Average response latency per request.

B.Model size in billions of parameters.

C.Maximum input token limit.

D.Output quality and token efficiency for summarization tasks.

E.Availability of fine-tuning capability for domain adaptation.

AnswersA, D

Low latency is critical for real-time summarization.

Why this answer

Options A and B are correct. Response latency directly impacts user experience, so a model with low latency is essential. Output quality/token ensures the summaries are accurate and concise.

Option C is wrong because fine-tuning increases cost and latency. Option D is wrong because model size affects latency but latency itself is the direct factor. Option E is wrong because input token limit is relevant but not as critical as latency and quality for this use case.

Practice this question →

Multi-Selecteasy

Which TWO actions can help reduce the likelihood of hallucinations in a generative AI model used for question answering?

Select 2 answers

A.Increase the maximum token count to allow more complete answers.

B.Use Retrieval Augmented Generation (RAG) with a trusted knowledge base.

C.Fine-tune the model on the training data used for the application.

D.Set a lower temperature parameter (e.g., 0.1) to reduce randomness.

E.Use a larger foundation model with more parameters.

AnswersB, D

Grounding on real documents reduces hallucinations.

Why this answer

Options A and C are correct. Grounding the model on a knowledge base (RAG) reduces hallucinations by providing factual context. Reducing the temperature parameter makes the model more deterministic, lowering the chance of making up information.

Option B is wrong because fine-tuning on the same data that caused hallucinations may not fix the issue. Option D is wrong because increasing max tokens may allow more hallucinated content. Option E is wrong because using a larger model often increases hallucination risk due to more parameters.

Practice this question →

MCQmedium

Refer to the exhibit. A user invoked a Claude model using provisioned throughput and received a ThrottlingException. Which is the most likely cause?

A.The model is not available in the region

B.The provisioned throughput request per minute limit was exceeded

C.The prompt was too long

D.The inference type should be ON_DEMAND

AnswerB

Throttling occurs when the request rate exceeds the allowed limit for the provisioned throughput.

Why this answer

Option A is correct. Provisioned throughput has a requests-per-minute limit, and exceeding it causes a ThrottlingException. Option B would produce a different error.

Option C would be a validation error, not throttling. Option D is not the cause because PROVISIONED is valid.

Practice this question →

Multi-Selectmedium

Which TWO actions would improve the grounding of responses from a generative AI model using RAG? (Choose 2)

Select 2 answers

A.Fine-tune the model on unrelated data

B.Reduce the context window to save tokens

C.Increase the model's temperature parameter

D.Use RAG with a knowledge base of relevant documents

E.Include source citations in the prompt instructions

AnswersD, E

RAG provides retrieved context, reducing reliance on model's parametric knowledge.

Why this answer

Using RAG with relevant documents directly grounds responses in factual data. Including source citations in prompts encourages the model to base answers on retrieved information. Increasing temperature or reducing context would likely hurt grounding.

Fine-tuning on unrelated data does not help.

Practice this question →

Multi-Selecthard

Which THREE are best practices for building a secure and scalable generative AI application using Amazon Bedrock? (Choose 3)

Select 3 answers

A.Implement guardrails to filter harmful content

B.Deploy models on EC2 instances for better control

C.Store API keys in source code for easy access

D.Use AWS KMS to encrypt data and model artifacts

E.Use foundation models from multiple providers via Bedrock

AnswersA, D, E

Guardrails enforce content policies and prevent inappropriate outputs.

Why this answer

Using multiple foundation models through Bedrock's multi-model support allows flexibility and best-of-breed selection. Guardrails provide content safety. KMS encryption protects data at rest and in transit.

Storing keys in source code is insecure. EC2 deployment is not applicable to Bedrock's serverless model.

Practice this question →

MCQhard

A company fine-tunes a foundation model using SageMaker to create a domain-specific chatbot. After deployment on Bedrock, the model shows high confidence in incorrect answers. What is the most likely cause and its solution?

A.The model was not pre-trained on enough data; use a larger base model

B.The training data was imbalanced; collect more diverse data

C.The model is overfitting; apply regularization techniques during fine-tuning

D.The inference temperature is too low; increase it

AnswerC

Overfitting leads to overconfidence on training patterns. Regularization helps generalize better.

Why this answer

Overfitting during fine-tuning can cause the model to be overly confident even when wrong. Regularization (e.g., early stopping, dropout) reduces overconfidence.

Practice this question →

MCQeasy

A developer is using Amazon Bedrock's Claude model to summarize long documents. The developer notices that the summaries sometimes miss key points. Which parameter adjustment is most likely to improve summary completeness?

A.Increase the max_tokens parameter.

B.Increase the top_k parameter.

C.Increase the temperature parameter.

D.Increase the top_p parameter.

AnswerA

More tokens allow the model to include more details in the summary.

Why this answer

Increasing max_tokens allows the model to generate longer outputs, which is essential when summarizing long documents because the summary may need more tokens to capture all key points. If max_tokens is too low, the model truncates the response, potentially omitting important details. This directly addresses the issue of missing key points by providing sufficient output length for a complete summary.

Exam trap

Cisco often tests the misconception that parameters controlling randomness (temperature, top_k, top_p) affect output length or completeness, when in fact they only influence token selection diversity and creativity.

How to eliminate wrong answers

Option B is wrong because increasing top_k controls the number of highest-probability tokens considered during sampling, which affects randomness and diversity, not the length or completeness of the output. Option C is wrong because increasing temperature increases randomness in token selection, which can lead to more creative but less focused summaries, potentially worsening completeness. Option D is wrong because increasing top_p (nucleus sampling) also controls randomness by selecting tokens with cumulative probability, and does not extend the output length or guarantee inclusion of key points.

Practice this question →

Multi-Selecthard

An organization is evaluating different foundation models (FMs) on Amazon Bedrock for a legal document analysis task. Which THREE factors should they consider when selecting a model? (Choose 3.)

Select 3 answers

A.The region where the model is hosted

B.Model size (number of parameters)

C.Cost per inference call

D.Support for the specific language of the documents

E.Token limits for input and output

AnswersB, D, E

Larger models often have better understanding but higher cost.

Why this answer

Options A, B, and D are correct. Model size affects capability and cost, token limits determine the length of documents that can be processed, and language support is critical for legal documents. Option C (region) is not a model capability factor.

Option E (cost per inference) is operational but not a primary technical selection factor for this task.

Practice this question →

100

Multi-Selecthard

A company wants to evaluate the performance of a generative AI model before deployment. Which TWO metrics are most relevant for measuring model quality? (Select two.)

Select 2 answers

A.BLEU score

B.Response time

C.Perplexity

D.Model size

E.CPU utilization

AnswersA, C

BLEU evaluates the quality of generated text by comparing n-grams with reference translations.

Why this answer

Options A (BLEU score) and C (Perplexity) are standard for evaluating text generation quality. BLEU measures similarity to reference text, and perplexity measures how well the model predicts a sample. Option B (CPU utilization) is operational, not quality.

Option D (Response time) is latency. Option E (Model size) is a design parameter.

Practice this question →

101

MCQmedium

A data scientist is evaluating foundation models for a text summarization task and wants to use a standard metric. Which metric is commonly used to assess the quality of generated summaries?

A.F1 score

B.ROUGE

C.BLEU

D.Accuracy

AnswerB

ROUGE measures recall-based overlap for summaries.

Why this answer

ROUGE is the standard metric for summarization, measuring overlap of n-grams. Option A (Accuracy) is for classification. Option C (BLEU) is for translation.

Option D (F1 score) is for classification.

Practice this question →

102

Multi-Selectmedium

A company is using Amazon Bedrock to generate marketing copy. They want to ensure the output is safe and appropriate. Which TWO actions should they take? (Choose 2.)

Select 2 answers

A.Enable content filtering with guardrails

B.Set temperature to 0 for deterministic output

C.Use model fine-tuning with unsafe examples

D.Use a private endpoint for Bedrock

E.Implement human review of all generated content

AnswersA, E

Guardrails can block harmful or inappropriate content automatically.

Why this answer

Options A and D are correct. Guardrails filter content in real time, and human review catches subtle issues. Option B (fine-tuning with unsafe examples) could introduce bias.

Option C (low temperature) reduces creativity but does not ensure safety. Option E (private endpoint) addresses networking, not content safety.

Practice this question →

103

MCQeasy

A developer wants to generate product description images using Amazon Bedrock. They need to ensure the generated images match a specific brand style. Which feature should they primarily use?

A.Prompt engineering with detailed style descriptions.

B.Output grounding to verify brand compliance.

C.Data augmentation to increase dataset diversity.

D.Fine-tuning the image generation model on brand assets.

AnswerA

Prompt engineering is the simplest way to steer image generation toward a desired style.

Why this answer

Option A is correct because prompt engineering allows the developer to specify style guidelines in the text prompt, influencing the output. Option B is wrong because fine-tuning for image style is time-consuming. Option C is wrong because grounding is for text, not images.

Option D is wrong because data augmentation is not directly relevant.

Practice this question →

104

MCQmedium

A developer invoked an Amazon Bedrock model and received the following error: 'ValidationException: 1 validation error detected: Value 'claude-instant-v1' at 'modelId' failed to satisfy constraint: Member must satisfy enum value set: [ai21.j2-mid-v1, amazon.titan-text-lite-v1, anthropic.claude-v2, ...]'. What is the likely cause?

A.The Lambda function does not have the necessary IAM permissions

B.The modelId is not available in the current AWS region

C.The modelId is not part of the allowed enum of models for the account

D.The modelId is deprecated and has been renamed

AnswerC

The error explicitly states the value must satisfy the enum set, meaning the model ID is invalid or not in the allowed list.

Why this answer

Option C is correct because the error indicates the modelId 'claude-instant-v1' is not in the allowed enum set. This is usually because the model ID is incorrectly spelled or not available in this region/account. Option A (deprecated) would give a different message.

Option B (region availability) would mention region. Option D (permissions) would be a different error type.

Practice this question →

105

MCQhard

A team is deploying a generative AI model for medical report generation. They must ensure patient data privacy and comply with HIPAA. Which AWS service feature is essential for de-identifying protected health information (PHI) before sending data to a foundation model?

A.AWS CloudHSM

B.Amazon Comprehend Medical

C.Amazon Macie

D.AWS Key Management Service (AWS KMS)

AnswerB

Comprehend Medical provides PHI detection and de-identification.

Why this answer

Amazon Comprehend Medical is the correct service because it is specifically designed to extract and de-identify protected health information (PHI) from unstructured medical text using natural language processing (NLP). It can detect entities such as patient names, dates, and medical record numbers, and then redact or replace them before the data is sent to a foundation model, ensuring HIPAA compliance.

Exam trap

The trap here is that candidates confuse general data protection services like Macie or encryption services like KMS with the specialized PHI de-identification capability of Amazon Comprehend Medical, assuming any security service can handle HIPAA compliance for generative AI workflows.

How to eliminate wrong answers

Option A is wrong because AWS CloudHSM provides hardware security modules (HSMs) for cryptographic key storage and operations, but it does not perform data de-identification or PHI detection. Option C is wrong because Amazon Macie is a data security service that discovers and protects sensitive data using machine learning and pattern matching, but it is designed for data classification and access control, not for de-identifying PHI in unstructured text for downstream AI processing. Option D is wrong because AWS Key Management Service (AWS KMS) manages encryption keys for data at rest and in transit, but it does not have the capability to identify or remove PHI from text content.

Practice this question →

106

MCQhard

A developer deployed this guardrail to block sensitive topics and sexual content. However, the model still generates responses about a specific sensitive topic that is not in the TopicPolicy. What should the developer do to prevent this?

A.Add a SensitiveInformationPolicy to filter PII

B.Increase the InputStrength of the content filter to MAX

C.Change the TopicPolicy Type from DENY to ALLOW

D.Add the specific topic to the TopicPolicy list

AnswerD

Adding the topic to the TopicPolicy with Type DENY will block it.

Why this answer

The guardrail's TopicPolicy only blocks the defined topic 'sensitive-topic'. To block additional topics, add them to the list. Option A (change type) would allow.

Option B (SensitiveInformationPolicy) is for PII. Option C (increase strength) does not add topics.

Practice this question →

107

MCQhard

A team is using Amazon Bedrock to generate images from text prompts. The generated images often contain artifacts and do not match the prompt description. Which combination of steps should the team take to improve image quality?

A.Fine-tune the model using SageMaker Ground Truth and increase the training epochs.

B.Increase the max token count and use a larger model variant.

C.Refine the prompt with more descriptive language and adjust the CFG scale and inference steps.

D.Use a different foundation model and increase the image resolution.

AnswerC

Better prompts and tuning inference parameters directly improve image quality.

Why this answer

Option C is correct because refining the prompt with more descriptive language helps the model better interpret the user's intent, while adjusting the CFG (Classifier-Free Guidance) scale controls how strictly the model adheres to the prompt, and increasing inference steps allows the diffusion process to produce higher-quality, artifact-free images. These are standard hyperparameters in diffusion-based image generation models on Amazon Bedrock, directly addressing both artifacts and prompt mismatch.

Exam trap

AWS often tests the misconception that image quality issues are best solved by model retraining or changing the model, rather than by adjusting inference-time parameters like CFG scale and inference steps, which are the immediate and correct levers for prompt adherence and artifact reduction.

How to eliminate wrong answers

Option A is wrong because fine-tuning a model using SageMaker Ground Truth and increasing training epochs is a data labeling and retraining approach that is overkill and not directly applicable to improving inference-time image quality for a pre-trained Bedrock model; it also does not address prompt adherence or artifact reduction. Option B is wrong because increasing the max token count and using a larger model variant does not fix artifacts or prompt mismatch—max token count affects text generation length, not image quality, and a larger model may not inherently improve prompt alignment without prompt engineering. Option D is wrong because using a different foundation model and increasing image resolution may change output characteristics but does not systematically address artifacts or prompt mismatch; higher resolution can even amplify artifacts if the underlying generation process is not optimized.

Practice this question →

108

MCQeasy

A data scientist is using Amazon SageMaker to train a large language model from scratch. Which AWS service is most suitable for managing the training infrastructure, including automatic scaling and spot instance recovery?

A.AWS Lambda function.

B.Amazon SageMaker Notebook instance.

C.Amazon SageMaker Training job.

D.Amazon EC2 with a custom setup.

AnswerC

SageMaker Training manages infrastructure, automatically recovers from spot interruptions, and scales.

Why this answer

Amazon SageMaker Training jobs are the most suitable service for managing training infrastructure because they provide built-in automatic scaling, managed spot instance recovery, and distributed training orchestration. This allows the data scientist to focus on model development rather than provisioning and managing EC2 instances, load balancers, or recovery scripts.

Exam trap

Cisco often tests the distinction between managed services (SageMaker Training) and unmanaged services (EC2 custom setup), where candidates mistakenly choose EC2 thinking they need full control, overlooking SageMaker's built-in spot recovery and scaling capabilities.

How to eliminate wrong answers

Option A is wrong because AWS Lambda functions are serverless compute services designed for short-running, event-driven tasks (max 15-minute execution time) and cannot manage long-running training jobs or infrastructure scaling. Option B is wrong because Amazon SageMaker Notebook instances are interactive development environments for prototyping and exploration, not designed to manage production training infrastructure or handle automatic scaling and spot instance recovery. Option D is wrong because Amazon EC2 with a custom setup requires manual provisioning, configuration of auto-scaling groups, and custom scripts for spot instance interruption handling, which is less efficient and more error-prone than SageMaker's managed training service.

Practice this question →

109

MCQmedium

A company uses Amazon Bedrock to generate marketing content. They want to reduce costs while maintaining response quality. Which action is most effective?

A.Fine-tune a larger model to improve accuracy and reduce retries.

B.Increase the temperature parameter to get shorter responses.

C.Select a smaller foundation model that still meets accuracy requirements.

D.Cache previous responses to reuse for similar prompts.

AnswerC

Smaller models have lower per-token costs and are faster.

Why this answer

Option D is correct because selecting a smaller, efficient foundation model can reduce cost per token while maintaining quality for simple tasks. Option A is wrong because increasing temperature does not reduce cost. Option B is wrong because caching may not be effective for variable outputs.

Option C is wrong because fine-tuning increases cost.

Practice this question →

110

MCQhard

A healthcare organization wants to use generative AI to draft clinical notes from patient-physician conversations. They must comply with HIPAA and minimize false medical information. Which approach should they take?

A.Use Amazon SageMaker JumpStart with a publicly available clinical model and no additional modifications.

B.Use a generic open-source LLM hosted on Amazon EC2 with manual prompt engineering.

C.Use Amazon Bedrock with a HIPAA-eligible foundation model and connect it to a medical knowledge base via RAG.

D.Use Amazon Bedrock with a large foundation model and a high temperature setting for creativity.

AnswerC

Ensures compliance and accuracy through grounding on trusted medical sources.

Why this answer

Option A is correct because Amazon Bedrock offers HIPAA-eligible models and allows grounding with medical knowledge bases to reduce hallucinations. Option B is wrong because it does not use grounding. Option C is wrong because open-source LLMs may not be HIPAA-compliant.

Option D is wrong because increasing temperature introduces more randomness, worsening accuracy.

Practice this question →

111

MCQeasy

A company uses Amazon Bedrock Agents to build an agent that interacts with users through a chat interface. The agent is configured with a knowledge base containing product documentation. Sometimes the agent fails to answer simple questions like 'What is your return policy?' and instead says it cannot find the answer. The knowledge base does contain the return policy. What is the most likely reason?

A.Increase the agent's maximum timeout for processing

B.Use a more powerful foundation model for reasoning

C.Add more documents to the knowledge base

D.Simplify and clarify the agent's instruction prompt to emphasize knowledge base usage

AnswerD

A clear prompt instructing the agent to consult the knowledge base for all answers can dramatically improve consistency.

Why this answer

The agent's instruction prompt might be too complex or not explicitly directing the agent to use the knowledge base. Simplifying the prompt to clearly instruct the agent to first search the knowledge base can resolve the issue. Increasing timeout or adding more data is unnecessary.

A stronger model may help but is not the root cause.

Practice this question →

112

MCQmedium

A media company runs batch inference jobs to generate captions for thousands of images weekly using a foundation model on Amazon Bedrock. They want to minimize costs while maintaining predictable throughput. Which pricing option should they choose?

A.SageMaker Batch Transform

B.On-demand inference

C.Provisioned Throughput

D.Spot instances (EC2 Spot)

AnswerC

Reserves capacity for a model, providing consistent performance and lower per-token cost for large batches.

Why this answer

Provisioned Throughput reserves capacity for a specific model, offering predictable performance and cost savings for steady workloads. On-demand is pay-per-use but may be costlier for high volume. Batch Transform is for SageMaker, not Bedrock.

Spot instances are not available for Bedrock.

Practice this question →

113

Multi-Selectmedium

Which TWO strategies can help reduce inference costs when using Amazon Bedrock? (Select TWO.)

Select 2 answers

A.Use a higher temperature setting to generate fewer tokens

B.Increase the max tokens to allow longer responses

C.Use provisioned throughput for high-volume, predictable workloads

D.Cache frequently used responses in Amazon ElastiCache

E.Select a smaller foundation model variant

AnswersC, E

Provisioned throughput offers a discounted hourly rate compared to on-demand per-request pricing.

Why this answer

Using provisioned throughput for predictable workloads reduces per-request cost. Choosing a smaller model variant requires less compute. Caching responses is not directly supported, and increasing max tokens increases cost.

Practice this question →

114

Multi-Selecthard

A research team is using Amazon SageMaker to fine-tune a large language model. They want to optimize training cost and time without sacrificing model quality. Which THREE strategies should they implement? (Choose 3)

Select 3 answers

A.Use a larger instance type with more GPUs.

B.Apply parameter-efficient fine-tuning (PEFT) techniques like LoRA.

C.Increase the batch size to the maximum that fits in GPU memory.

D.Use managed spot training with checkpointing.

E.Enable mixed precision training (FP16).

AnswersB, D, E

LoRA fine-tunes a small subset of parameters, reducing compute and memory.

Why this answer

Option B is correct because Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA (Low-Rank Adaptation) freeze the pre-trained model weights and inject trainable rank decomposition matrices into specific layers. This drastically reduces the number of trainable parameters (often by 10,000x), lowering memory and compute requirements while preserving model quality, making it ideal for cost- and time-sensitive fine-tuning.

Exam trap

Cisco often tests the misconception that simply scaling up hardware (larger instances) or maximizing batch size is the best optimization strategy, when in fact algorithmic efficiency (PEFT, mixed precision) and cost-saving infrastructure (spot instances) are the correct approaches for balancing cost, time, and quality.

Practice this question →

115

MCQeasy

Refer to the exhibit. A developer wants to choose a model that can generate text (not just embeddings) and has the lowest cost. Based on the exhibit, which model should they select?

A.Titan Embed Text

B.Titan Text Express

C.Titan Text Lite

D.Need more information

AnswerC

Titan Text Lite is a text generation model and is the most cost-effective option among those listed.

Why this answer

Option A, Titan Text Lite, is a text generation model and is the lighter, cheaper option compared to Express. Titan Text Express is more expensive. Titan Embed Text is for embeddings, not text generation.

Therefore, Titan Text Lite is correct.

Practice this question →

116

MCQhard

A research team needs to generate high-quality images with Amazon Bedrock that are realistic and consistent with a specific artistic style. Which combination of parameters should they use?

A.Use a CFG (classifier-free guidance) scale and include a style prompt

B.High temperature, low top_p

C.Low temperature, high top_p

D.Increase the number of steps and reduce the number of samples

AnswerA

CFG scale controls how closely the image follows the prompt; a style prompt (e.g., 'in the style of Monet') ensures artistic consistency.

Why this answer

Option D is correct because for image generation models like Stable Diffusion XL, the CFG (classifier-free guidance) scale controls adherence to the prompt, and a style prompt can enforce artistic consistency. Option A and B (temperature, top_p) are for text models. Option C (increasing steps) improves quality but not style consistency.

Practice this question →

117

Multi-Selecteasy

Which THREE factors should be considered when selecting a foundation model for a text generation task? (Select three.)

Select 3 answers

A.Context window length

B.Inference latency

C.Model license

D.Number of parameters

E.AWS Region availability

AnswersA, C, D

Determines the maximum input size, critical for long documents or conversations.

Why this answer

Options A (Context window length), B (Number of parameters), and D (Model license) are key selection criteria. Context window affects input length, parameters affect capability, license affects usage rights. Option C (Inference latency) is operational but often considered after selection.

Option E (AWS Region availability) is relevant for deployment but not model selection.

Practice this question →