CCNA Fundamentals of Generative AI Questions

75 of 117 questions · Page 1/2 · Fundamentals of Generative AI · Answers revealed

1
MCQhard

A data scientist is deploying a fine-tuned Mistral model on Amazon Bedrock. After deployment, inference latency is too high for real-time applications. Which configuration change can reduce latency without significantly impacting output quality?

A.Reduce max tokens from 1024 to 256
B.Switch to a larger model variant
C.Decrease the top-p to 0.5
D.Increase the temperature to 0.9
AnswerA

Generating fewer tokens speeds up inference, and for many use cases 256 tokens is sufficient.

Why this answer

Reducing the max tokens limit decreases the number of generated tokens, directly reducing latency. Lowering temperature or using a larger model may not help or may degrade quality.

2
Multi-Selectmedium

Which THREE steps are typically involved in fine-tuning a foundation model? (Select THREE.)

Select 3 answers
A.Deploy the model immediately without additional training
B.Prepare a labeled dataset specific to the target domain
C.Train the model on the domain dataset with a lower learning rate
D.Select a pre-trained foundation model as the starting point
E.Choose a model architecture with more parameters than the base model
AnswersB, C, D

Fine-tuning requires annotated data that reflects the desired task.

Why this answer

Fine-tuning involves preparing a labeled dataset, selecting a pre-trained base model, and training it further on the domain data. Deploying without tuning is not fine-tuning, and selecting a model with more parameters is not a step.

3
MCQhard

Refer to the exhibit. A developer runs the CLI command to summarize text using Claude v2 in Bedrock. The output is shorter than expected. Which change should the developer make to allow a longer response?

A.Increase 'max_tokens_to_sample' to 1000
B.Change the prompt to include 'Write a long summary'
C.Set 'stop_reason' to 'none'
D.Use a different region like us-west-2
AnswerA

This parameter directly controls the maximum number of tokens in the generated response.

Why this answer

The 'max_tokens_to_sample' parameter is set to 200, which limits output length. Increasing it allows longer responses. The prompt or region change does not affect length.

4
MCQmedium

A developer is using the Amazon Bedrock API to generate text. They notice that the model sometimes returns harmful content despite setting safety parameters. What is the BEST way to add an additional layer of content filtering?

A.Fine-tune the model on a curated safe dataset
B.Configure content filters in Amazon Bedrock Guardrails
C.Improve prompt engineering with more specific instructions
D.Use AWS WAF to filter API responses
AnswerB

Guardrails provide configurable content filters to block harmful content.

Why this answer

Amazon Bedrock Guardrails provides a dedicated, configurable content filtering layer that can block harmful content at inference time, independent of the model's built-in safety parameters. This allows developers to enforce custom policies (e.g., hate speech, violence) without modifying the model itself, making it the best additional safeguard.

Exam trap

Cisco often tests the misconception that fine-tuning or prompt engineering alone can fully prevent harmful outputs, when in fact a separate, configurable guardrail layer is the recommended approach for production-grade content filtering in Amazon Bedrock.

How to eliminate wrong answers

Option A is wrong because fine-tuning the model on a curated safe dataset adjusts the model's weights to reduce harmful outputs, but it does not guarantee filtering of all harmful content at inference and requires significant retraining effort; it is not an 'additional layer' but a model modification. Option C is wrong because improving prompt engineering with more specific instructions can guide the model's behavior but cannot reliably block harmful content that the model might generate despite instructions, as it lacks enforcement at the API response level. Option D is wrong because AWS WAF is a web application firewall designed to filter HTTP requests to web applications, not to inspect or filter the content of API responses from Bedrock; it operates at the network layer, not the application content layer.

5
MCQmedium

A media company uses a foundation model on Amazon Bedrock to generate article summaries. The model occasionally omits important details. Which prompt engineering technique is most likely to improve completeness?

A.Use a lower temperature setting
B.Increase the max tokens limit
C.Include a list of required key points in the prompt
D.Add 'Be concise' to the prompt
AnswerC

Specifying required content guides the model to include those details, improving completeness.

Why this answer

Providing a structured output format (e.g., bullet points, required sections) helps the model cover all aspects, reducing omissions.

6
MCQmedium

A company is building a chatbot using Amazon Bedrock. They want to ensure the model's responses are grounded in their internal knowledge base and avoid generating information outside that scope. Which feature should they use?

A.Amazon Bedrock Knowledge Bases
B.Agents for Amazon Bedrock
C.Model Evaluation on Amazon Bedrock
D.Guardrails for Amazon Bedrock
AnswerA

Knowledge Bases enable RAG by connecting FM to private data, grounding responses.

Why this answer

Amazon Bedrock Knowledge Bases is the correct feature because it allows you to connect a foundation model (FM) to your internal data sources, such as documents or databases, and use Retrieval Augmented Generation (RAG) to ground responses in that specific knowledge. This ensures the chatbot only generates information from the provided knowledge base, preventing hallucinations or out-of-scope content.

Exam trap

Cisco often tests the distinction between features that control content (Guardrails) versus features that provide source data (Knowledge Bases), leading candidates to mistakenly choose Guardrails when the question is about grounding responses in internal data.

How to eliminate wrong answers

Option B is wrong because Agents for Amazon Bedrock are designed to orchestrate multi-step tasks and interact with external APIs, not to restrict the model's responses to a specific knowledge base. Option C is wrong because Model Evaluation on Amazon Bedrock is used to assess model performance and safety, not to control the source of information for responses. Option D is wrong because Guardrails for Amazon Bedrock enforce content policies (e.g., filtering harmful or off-topic content) but do not ground responses in a specific internal knowledge base.

7
MCQhard

A company operates in a region where Amazon Bedrock is not available. They want to use generative AI but must keep data within the country. Which solution should they consider?

A.Use Amazon SageMaker to host an open-source model in the local region.
B.Wait for Bedrock to become available in their region; there is no alternative.
C.Use Amazon Bedrock in the nearest available region with cross-region inference.
D.Use an API from a third-party generative AI provider with AWS PrivateLink.
AnswerA

SageMaker is available in all regions and allows full control over data residency.

Why this answer

Option A is correct because Amazon SageMaker can deploy models in any AWS region, including those without Bedrock, and can use custom models with data staying in-region. Option B is wrong because cross-region inference sends data outside the country. Option C is wrong because using a third-party model outside AWS may not comply.

Option D is wrong because there is no region-constrained Bedrock offering.

8
MCQeasy

A company is using Amazon Bedrock to generate marketing copy. They want to ensure the model's responses are factually accurate and grounded in their proprietary knowledge base. Which feature should they use?

A.Model customization
B.Fine-tuning
C.Retrieval Augmented Generation (RAG)
D.Prompt engineering
AnswerC

RAG retrieves relevant documents from the knowledge base and includes them in the prompt, enabling factually grounded responses.

Why this answer

Option B, Retrieval Augmented Generation (RAG), retrieves relevant information from the company's knowledge base to ground the model's responses, improving factual accuracy. Option A (Model customization) tailors the model's behavior but does not necessarily ground responses in real-time data. Option C (Prompt engineering) relies on crafting prompts, which may not guarantee factual accuracy.

Option D (Fine-tuning) updates model weights but may not incorporate up-to-date knowledge.

9
MCQhard

A data scientist fine-tuned a large language model on Amazon SageMaker for financial report generation. The model produces responses that are too short and incomplete, often cutting off mid-sentence. What parameter should be adjusted first?

A.Increase the temperature parameter
B.Increase the top_p parameter
C.Increase the maximum token count
D.Switch to a different foundation model
AnswerC

Max tokens sets a hard limit on the number of tokens generated; raising it allows longer responses.

Why this answer

The max tokens parameter limits the length of generated responses. Increasing it allows the model to produce longer completions. Temperature, top_p, and model change affect quality or diversity, but not the length cap.

10
MCQhard

A research lab is using Amazon SageMaker to fine-tune a large language model (LLM) for scientific text summarization. The training dataset contains 10 million documents, and the lab has a limited budget but needs to minimize training time. They have access to SageMaker Training with managed spot instances, which offer significant cost savings but are interruptible. The team is considering different training strategies to balance cost, time, and model quality. Which strategy should they use?

A.Use SageMaker's distributed training with data parallelism on multiple managed spot instances, and enable checkpointing.
B.Fine-tune only the last few layers of the model on a smaller subset of the data.
C.Use a single on-demand instance to avoid interruptions and maximize throughput.
D.Use a single large GPU instance to train the model from scratch.
AnswerA

Distributed training speeds up processing, spot instances reduce cost, and checkpointing handles interruptions.

Why this answer

Option B is correct. Using SageMaker's distributed training with data parallelism on multiple spot instances, combined with checkpointing, maximizes throughput while managing interruptions. Spot instances reduce cost, and checkpointing allows resuming from failures.

Option A is incorrect because training from scratch on a single GPU is extremely slow and expensive. Option C is incorrect because on-demand instances are costly and do not optimize budget. Option D is incorrect because fine-tuning only the last few layers on a subset reduces model quality and does not effectively use the full dataset.

11
MCQeasy

Which AWS service provides a serverless experience for building and scaling generative AI applications with access to various foundation models?

A.Amazon Bedrock
B.Amazon SageMaker
C.Amazon Lex
D.AWS Lambda
AnswerA

Bedrock provides a serverless experience with pre-trained foundation models from leading AI companies.

Why this answer

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models through a single API, without managing the underlying infrastructure.

12
MCQmedium

A company wants to use Amazon Bedrock to generate images from text descriptions. Which model should they use?

A.Amazon Titan Image Generator
B.Stable Diffusion XL
C.Amazon Titan Text
D.Amazon Polly
AnswerA

Titan Image Generator is an AWS-managed model for generating images from text.

Why this answer

Option B is correct because Amazon Titan Image Generator is designed for image generation, and it is fully managed by AWS. Option A (Stable Diffusion XL) is also available but not AWS-native. Option C (Titan Text) is for text.

Option D (Polly) is for speech.

13
MCQhard

A financial services company is deploying a generative AI model on Amazon SageMaker for real-time fraud detection. The model, a fine-tuned Llama 2 7B, must respond to transaction requests within 500 milliseconds. The team has deployed the model using a SageMaker real-time endpoint with a single ml.g5.2xlarge instance. During load testing, the endpoint achieves an average latency of 450 ms at 10 requests per second (RPS), but the latency spikes to over 2 seconds at 20 RPS. The team needs to maintain sub-500 ms latency at up to 50 RPS. The model is too large to fit on a single GPU, so they are using CPU instances. They considered using a larger instance type but want to minimize cost. What should the team do to meet the latency requirement cost-effectively?

A.Upgrade to a single ml.g5.4xlarge instance
B.Attach an Amazon Elastic Inference accelerator to the existing instance
C.Use a SageMaker multi-model endpoint with multiple ml.g5.xlarge instances and auto scaling
D.Use SageMaker Serverless Inference to automatically scale
AnswerC

Distributing load across smaller instances reduces cost and meets latency via scaling.

Why this answer

Option C is correct because a SageMaker multi-model endpoint (MME) allows multiple model replicas to be hosted on a fleet of instances, enabling horizontal scaling to handle increased throughput. By using multiple ml.g5.xlarge instances with auto scaling, the team can distribute the 50 RPS load across several instances, keeping per-instance latency low while minimizing cost compared to a single larger instance. This approach also leverages the fact that the model is too large for a single GPU but can be efficiently served on CPU instances with proper load distribution.

Exam trap

The trap here is that candidates assume a larger single instance (Option A) is the simplest solution, but they overlook the cost-efficiency and scalability benefits of horizontal scaling with a multi-model endpoint, which is specifically designed for high-throughput, low-latency inference with models that don't fit on a single GPU.

How to eliminate wrong answers

Option A is wrong because upgrading to a single ml.g5.4xlarge instance provides more vCPUs and memory but does not address the fundamental bottleneck of a single instance handling 50 RPS; latency would still spike due to sequential processing limits. Option B is wrong because Amazon Elastic Inference (EI) accelerators are designed for low-latency GPU-based inference and are not compatible with CPU-only instances; they also cannot help if the model does not fit on a single GPU. Option D is wrong because SageMaker Serverless Inference has a maximum concurrency limit and cold start latency that can exceed 500 ms, making it unsuitable for real-time fraud detection with strict sub-500 ms latency requirements.

14
MCQeasy

A developer invoked an Amazon Bedrock model and received this output. What does the stopReason field indicate?

A.A content filter blocked the output
B.The input prompt was too long
C.The model reached the maximum token limit set in the request
D.The model reached a natural stopping point
AnswerC

The stopReason 'max_tokens' explicitly indicates the output was truncated due to the token limit.

Why this answer

Option A is correct because stopReason 'max_tokens' means the model stopped because it reached the maximum token limit specified in the request. Option B (natural stop) would be 'stop' or 'end_turn'. Option C (input too long) would be a different error.

Option D (content filter) would be 'content_filtered'.

15
Multi-Selectmedium

Which TWO AWS services can be used to build a chatbot that responds to customer inquiries using a company's documentation as source? (Select two.)

Select 2 answers
A.Amazon Bedrock with RAG
B.Amazon Polly
C.Amazon Q Business
D.Amazon Transcribe
E.Amazon Lex
AnswersA, C

Bedrock with RAG can retrieve from documentation and generate answers using foundation models.

Why this answer

Options B (Amazon Bedrock with RAG) and C (Amazon Q Business) are correct. Bedrock with RAG retrieves from the documentation to generate answers, and Amazon Q Business is a conversational service that can use enterprise data. Option A (Amazon Lex) can build chatbots but requires integration for documentation retrieval.

Option D (Amazon Polly) is text-to-speech. Option E (Amazon Transcribe) is speech-to-text.

16
MCQmedium

A user has this IAM policy and attempts to invoke the model in the us-west-2 region. They receive an AccessDenied error. What is the reason?

A.The resource ARN is malformed
B.The action 'bedrock:InvokeModel' is incorrect
C.The model ID is case-sensitive
D.The policy does not allow the us-west-2 region
AnswerD

The ARN's region is us-east-1, so the policy does not grant access in us-west-2.

Why this answer

The resource ARN specifies us-east-1, so the policy grants access only in that region. Option B (action) is correct but not the issue. Option C (case sensitivity) is not relevant.

Option D (malformed) is false.

17
MCQeasy

A developer runs this AWS CLI command to invoke a model in us-west-2 but receives an error: 'An error occurred (ModelNotFoundException) when calling the InvokeModel operation: Model not found'. What is the most likely cause?

A.The request body is not properly formatted
B.The --region parameter is missing from the command
C.The model is not available in the us-west-2 region
D.The user's IAM role lacks permissions to invoke the model
AnswerC

Claude v2 is available only in certain regions like us-east-1.

Why this answer

The model anthropic.claude-v2 is not available in us-west-2; it is available in us-east-1. Option A (permissions) would give AccessDenied. Option B (body format) would give validation error.

Option D (region missing) is not the issue as region is specified.

18
MCQmedium

A developer is using the Amazon Bedrock InvokeModel API with the above request to summarize meeting notes. The response is a single word repeated many times. Which parameter is MOST likely causing this issue?

A.topP set to 0.9
B.stopSequences is empty
C.maxTokenCount set to 100
D.temperature set to 0
AnswerD

Temperature 0 makes output deterministic and prone to repetition.

Why this answer

A temperature of 0 forces the model to always select the highest-probability token at each step, which can lead to repetitive loops if the most likely token repeatedly points back to itself (e.g., the same word). This deterministic behavior eliminates randomness, causing the model to get stuck in a single-word cycle rather than generating diverse or coherent text.

Exam trap

AWS often tests the misconception that temperature only affects 'creativity' or 'randomness,' when in fact a temperature of 0 causes deterministic argmax selection, which can paradoxically produce repetitive or stuck outputs rather than simply 'less creative' text.

How to eliminate wrong answers

Option A is wrong because topP set to 0.9 (nucleus sampling) actually increases diversity by considering tokens whose cumulative probability reaches 0.9, which would reduce repetition, not cause it. Option B is wrong because an empty stopSequences list means no custom stopping conditions are applied, but this does not force repetition; the model would still generate until a natural stop (e.g., EOS token) or maxTokenCount is reached. Option C is wrong because maxTokenCount set to 100 only limits the total number of tokens generated; it does not influence token selection probability or cause a single word to repeat—it would simply stop after 100 tokens regardless of content.

19
MCQmedium

A company wants to personalize its generative AI model for its specific domain without sharing data with third-party model providers. Which method should they use?

A.Fine-tuning the foundation model on their proprietary data
B.Prompt engineering with domain-specific examples
C.Retrieval-augmented generation (RAG) with a domain-specific knowledge base
D.Model distillation using a larger foundation model
AnswerA

Fine-tuning adapts the model to the domain using private data, and Bedrock supports this.

Why this answer

Amazon Bedrock allows fine-tuning of foundation models with customer data in a secure environment. Option A (Prompt engineering) doesn't personalize deeply. Option B (RAG) adds context but doesn't modify model.

Option C (Distillation) requires an existing model.

20
MCQmedium

A media company is using Amazon Bedrock to generate captions for images. They have a batch processing pipeline that sends thousands of images daily to the Bedrock API using the Titan Image Generator G1 model. Recently, they started receiving ThrottlingException errors during peak hours. The team needs to process all images within 24 hours without changing the model or the application code. The current account has a default quota of 10 requests per second (RPS) for the Titan model in us-east-1. The team estimates they need 50 RPS during peak hours. They have already implemented exponential backoff in the client, but the errors persist. What is the MOST effective solution to resolve the throttling issue?

A.Request a service quota increase for the InvokeModel API for the Titan model in us-east-1
B.Use Amazon SageMaker batch transform to process images offline
C.Distribute the requests across multiple AWS Regions
D.Switch to a different foundation model that has a higher default quota
AnswerA

Increasing quota directly resolves throttling.

Why this answer

The team has already implemented exponential backoff, but the errors persist because their current quota of 10 RPS is insufficient for the required 50 RPS. Requesting a service quota increase for the InvokeModel API for the Titan Image Generator G1 model in us-east-1 directly addresses the root cause by raising the throughput limit, allowing the existing application code and model to handle the peak load without any architectural changes.

Exam trap

The trap here is that candidates may think exponential backoff or distributing across Regions solves all throttling, but the core issue is a hard service quota that must be increased, not a transient rate limit.

How to eliminate wrong answers

Option B is wrong because Amazon SageMaker batch transform is designed for offline inference on SageMaker endpoints, not for invoking Bedrock APIs; it would require changing the application code and infrastructure, which the question explicitly prohibits. Option C is wrong because distributing requests across multiple AWS Regions would require modifying the application code to route traffic to different endpoints, and it does not address the underlying quota issue in the primary region; it also introduces latency and complexity. Option D is wrong because switching to a different foundation model would require changing the application code and potentially the image generation logic, which is not allowed; moreover, other models may have different default quotas or capabilities, and the goal is to process images with the Titan model.

21
MCQeasy

A developer is building an application that generates product descriptions from images using a multimodal model. Which AWS service provides access to multimodal foundation models?

A.Amazon Rekognition
B.Amazon Textract
C.Amazon Comprehend
D.Amazon Bedrock
AnswerD

Bedrock provides access to foundation models, including multimodal models that can generate text from images.

Why this answer

Option B, Amazon Bedrock, offers access to multimodal models like Claude 3 that can process images and text. Option A (Rekognition) is for image and video analysis, not generation. Option C (Textract) extracts text from documents.

Option D (Comprehend) is for NLP.

22
MCQmedium

An application uses this configuration to enable RAG. What is required for the knowledge base to function?

A.The agent must have internet access to retrieve documents
B.The embedding model ARN must include the account ID
C.The embedding model must be fine-tuned on the domain data
D.The knowledge base must have a vector index configured in Amazon OpenSearch Serverless
AnswerD

A vector store is required to index and query the embedded documents.

Why this answer

Option A is correct because a vector index (e.g., in Amazon OpenSearch Serverless) is necessary to store and retrieve embeddings. Option B (fine-tune embedding model) is optional. Option C (internet access) is not needed.

Option D (account ID in ARN) is incorrect format.

23
MCQmedium

A company deployed a chatbot using Amazon Lex integrated with a Lambda function that invokes Claude on Amazon Bedrock. The Lambda function retrieves relevant documents from an Amazon Kendra index to use as context. Users report that the chatbot's responses are often irrelevant or incorrect despite the Kendra index containing accurate information. The logs show that the Lambda function is correctly passing retrieved documents to the model. What is the most likely cause and solution?

A.Switch to a larger foundation model like Claude 3 Opus
B.The model's temperature is set too high; reduce it to 0.1
C.The maximum tokens limit is too low; increase it to 4096
D.The chunking strategy for documents is too coarse or inappropriate; refine chunking and use semantic search in Kendra
AnswerD

Proper chunking ensures each chunk contains coherent information relevant to potential queries; Kendra's semantic search improves relevance.

Why this answer

The issue likely stems from the chunking and retrieval strategy. If the retrieved document chunks do not contain the exact answer or are poorly segmented, the model may not have the necessary context. Improving chunking to be more semantic and ensuring retrieval uses a relevant similarity metric (e.g., using Kendra's relevance tuning) would help.

Increasing temperature or reducing tokens would degrade quality. Switching model may not address the root cause.

24
Multi-Selectmedium

A company is building a generative AI application using Amazon Bedrock and needs to ensure that the model does not generate outputs containing personally identifiable information (PII). Which TWO actions should the company take? (Choose 2)

Select 2 answers
A.Implement a custom AWS Lambda function to scan and redact PII from inputs and outputs.
B.Use AWS Identity and Access Management (IAM) policies to restrict model access.
C.Enable Amazon CloudWatch Logs to capture and audit model outputs.
D.Configure Amazon Bedrock Guardrails to block or mask PII.
E.Place the Bedrock model endpoint within a private VPC.
AnswersA, D

Lambda can use PII detection libraries to filter sensitive data.

Why this answer

Option A is correct because a custom AWS Lambda function can be integrated into the application workflow to programmatically scan and redact PII from both inputs and outputs before they reach or leave the Bedrock model. This provides a flexible, code-driven approach to data sanitization, allowing the use of libraries like Amazon Comprehend or regex patterns to detect and mask PII entities such as names, addresses, and social security numbers.

Exam trap

Cisco often tests the distinction between network-level security controls (like VPCs) and content-level data protection mechanisms, leading candidates to mistakenly choose VPC isolation as a solution for PII redaction.

25
MCQhard

A company wants to use a large language model to generate code based on natural language descriptions. They need to minimize latency and control costs by running inference on their own infrastructure. Which approach is most suitable?

A.Use Amazon Bedrock API
B.Use Amazon SageMaker to deploy a custom LLM
C.Use Amazon Comprehend
D.Use Amazon Lex
AnswerB

SageMaker can deploy models on customer-specified instances, giving control over latency and cost.

Why this answer

Option B, using Amazon SageMaker to deploy a custom LLM, allows the company to run inference on their own infrastructure with controlled costs and latency. Option A (Amazon Bedrock) is a managed service and does not run on customer infrastructure. Option C (Amazon Lex) is for conversational bots, not code generation.

Option D (Amazon Comprehend) is for NLP tasks like sentiment analysis, not code generation.

26
MCQeasy

A startup wants to quickly prototype a generative AI application for summarizing news articles. They have limited ML expertise and want minimal infrastructure management. Which AWS service should they use?

A.Amazon Bedrock with a foundation model accessed via API.
B.Amazon SageMaker to build and train a custom summarization model.
C.AWS Lambda with a custom Python script using the Hugging Face Transformers library.
D.Amazon EC2 instance running a pre-trained model from AWS Marketplace.
AnswerA

Serverless, fully managed, and easy to use for prototyping.

Why this answer

Option C is correct because Amazon Bedrock provides serverless access to foundation models via API, requiring no ML infrastructure. Option A is wrong because Amazon SageMaker requires managing training jobs and endpoints. Option B is wrong because AWS Lambda is a compute service, not a generative AI service.

Option D is wrong because Amazon EC2 requires manual setup of models.

27
Multi-Selectmedium

A company is deploying a generative AI model on Amazon Bedrock and needs to monitor for potential misuse. Which THREE measures should they implement? (Choose 3)

Select 3 answers
A.Require multi-factor authentication (MFA) for all API calls.
B.Configure Amazon Bedrock Guardrails to block harmful content.
C.Use AWS CloudTrail to log API calls and Amazon Bedrock actions.
D.Place the Bedrock endpoint in a private VPC with no internet access.
E.Enable model invocation logging in Amazon CloudWatch.
AnswersB, C, E

Guardrails proactively filter inputs and outputs.

Why this answer

Option B is correct because Amazon Bedrock Guardrails provides configurable content filters that can block harmful or undesirable content in both input prompts and model responses. This is a direct monitoring and prevention mechanism for misuse, allowing administrators to define policies for topics, toxicity, and sensitive information.

Exam trap

Cisco often tests the distinction between security controls that prevent access (like MFA or VPC isolation) versus monitoring controls that detect or block misuse at the content level, leading candidates to confuse network security with content safety.

28
MCQmedium

A company is building a chatbot using Amazon Bedrock and wants to ensure that the model generates responses consistent with its brand voice. Which technique should be used to provide the model with examples of desired responses without fine-tuning the model?

A.Fine-tune the model on a dataset of brand-compliant conversations.
B.Use prompt chaining to break down the conversation into multiple steps.
C.Implement a Retrieval Augmented Generation (RAG) system with brand documents.
D.Include few-shot examples in the system prompt to demonstrate the desired tone.
AnswerD

In-context learning via few-shot examples guides model behavior without retraining.

Why this answer

Option D is correct because few-shot prompting allows you to provide the model with examples of desired responses directly in the system prompt, guiding the model's tone and style without modifying its underlying weights. This technique is ideal for brand voice consistency when fine-tuning is not an option, as it leverages in-context learning to influence output behavior.

Exam trap

AWS often tests the distinction between in-context learning (few-shot prompting) and fine-tuning, trapping candidates who confuse RAG (which retrieves facts) with style guidance, or who think prompt chaining is for tone control rather than task decomposition.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires modifying the model's weights, which contradicts the requirement of not fine-tuning the model. Option B is wrong because prompt chaining is a technique for decomposing complex tasks into sequential steps, not for providing examples of desired tone or style. Option C is wrong because Retrieval Augmented Generation (RAG) retrieves external knowledge from documents to ground responses in facts, but it does not inherently teach the model the specific tone or brand voice; it augments context, not style.

29
MCQhard

A financial services company is subject to strict regulatory requirements. They plan to use generative AI to summarize customer interaction logs. Which combination of AWS services and configurations best ensures compliance while maintaining accuracy?

A.Deploy an open-source model on Amazon Bedrock in a local on-premises server.
B.Use Amazon Bedrock with a foundation model and public internet access without encryption.
C.Use Amazon SageMaker to host a fine-tuned model with a public API key.
D.Use Amazon Bedrock with a private VPC endpoint, AWS KMS encryption, and content filtering.
AnswerD

This configuration meets regulatory requirements for data privacy and content safety.

Why this answer

Option D is correct because Amazon Bedrock with a private VPC endpoint and data encryption at rest and in transit ensures data sovereignty, and using a foundation model that supports content filtering reduces risk of non-compliant outputs. Option A is wrong because Bedrock does not support local on-premises deployment. Option B is wrong because SageMaker alone does not provide built-in content filtering.

Option C is wrong because using a public API key violates security policies.

30
Multi-Selecthard

Which TWO factors are most important when selecting a foundation model for a sentiment analysis task? (Choose 2)

Select 2 answers
A.Composition of the model's training data (domain, language)
B.API pricing per invocation
C.Inference latency
D.Model size (number of parameters)
E.Color scheme of the model's documentation
AnswersA, D

Training data must cover the target domain and language for accurate sentiment analysis.

Why this answer

Model size and training data composition directly impact performance on sentiment. API pricing (C) is a business factor but less critical for model selection; latency (E) is also a factor but not as fundamental as size and data quality.

31
MCQmedium

A retail company wants to generate product descriptions from catalog data. The data includes structured attributes (e.g., price, brand) and unstructured reviews. The team needs to ensure factual accuracy. Which approach is most appropriate?

A.Use prompt engineering with few-shot examples
B.Fine-tune a foundation model on the entire product catalog
C.Deploy a larger foundation model with more parameters
D.Implement Retrieval-Augmented Generation (RAG) with a knowledge base
AnswerD

RAG retrieves relevant product data at inference time, ensuring factual accuracy and allowing updates without retraining.

Why this answer

Retrieval-Augmented Generation (RAG) retrieves relevant documents (product attributes, reviews) and provides them as context to the model, reducing hallucinations and grounding responses in facts.

32
MCQeasy

What is a foundation model?

A.A model that only works with tabular data
B.A model that requires no additional tuning for new tasks
C.A model trained on diverse data that can be adapted to many tasks
D.A model that is specifically trained for one task, like image classification
AnswerC

This defines a foundation model: large-scale, pre-trained, adaptable.

Why this answer

A foundation model is a large-scale neural network trained on vast amounts of data, which can be adapted to various downstream tasks through fine-tuning or prompt engineering.

33
MCQmedium

A company wants to automate the extraction of key information from customer support tickets using generative AI. They have a small labeled dataset. Which approach would be most effective?

A.Fine-tune a foundation model on the labeled data
B.Use zero-shot prompting with a foundation model
C.Train a custom model from scratch
D.Use Amazon Comprehend custom entity recognition
AnswerA

Fine-tuning adapts the model to the task using the labeled data, improving accuracy with limited samples.

Why this answer

Option C, fine-tuning a foundation model on the labeled data, is most effective with a small dataset as it adapts the model to the specific task without needing massive data. Option A (training from scratch) requires large datasets. Option B (zero-shot) may not be accurate enough.

Option D (Comprehend custom entities) is a traditional approach that may also work but fine-tuning often yields better results with generative AI.

34
MCQmedium

A company deployed a question-answering system using Amazon Bedrock with a knowledge base (RAG). Users report that the model often hallucinates facts not in the knowledge base. What is the most effective way to reduce hallucinations?

A.Reduce the maximum context length to limit model input
B.Fine-tune the foundation model on a large general corpus
C.Improve the relevance of retrieved documents by refining the retrieval strategy
D.Increase the chunk size of documents in the knowledge base
AnswerC

Better retrieval ensures only pertinent information is provided, reducing the chance of hallucination.

Why this answer

Improving retrieval relevance ensures that the model receives accurate and contextually relevant information, reducing its reliance on parametric knowledge. Increasing chunk size or context may include irrelevant data. Fine-tuning alone may not fix hallucination if the model still lacks specific facts.

35
Multi-Selectmedium

A company is deploying a customer-facing chatbot using Amazon Bedrock. They want to reduce the risk of generating biased or harmful responses. Which TWO measures should they implement? (Choose 2.)

Select 2 answers
A.Implement a human-in-the-loop review for sensitive replies
B.Train the model exclusively on historical customer conversations
C.Use guardrails to filter content
D.Set the temperature parameter to 1.5
E.Disable logging to improve performance
AnswersA, C

Human reviewers can catch subtle biases that automated filters miss.

Why this answer

Options A and C are correct. Guardrails filter biased/harmful content, and human-in-the-loop review catches nuanced issues. Option B (training on historical conversations) may reinforce existing biases.

Option D (high temperature) increases randomness and potential harm. Option E (disabling logging) reduces ability to audit and improve.

36
MCQmedium

A startup is building an AI-powered code assistant using a large language model (LLM). They want to ensure the model generates syntactically correct code and avoids security vulnerabilities. Which technique should they prioritize?

A.Augment prompts with few-shot examples of secure coding practices and unit tests
B.Deploy the model with max tokens set to 4096
C.Fine-tune the model on a large corpus of open-source code
D.Use chain-of-thought prompting to explain reasoning before code generation
AnswerA

Providing examples of secure code and expected test results helps ground the model's output in desired patterns.

Why this answer

Contextual grounding by providing code examples and security guidelines in the prompt (prompt engineering) helps guide the model to produce safe and correct code. Fine-tuning on secure codebases would also help but is more resource-intensive; prompt engineering is a quicker first step.

37
MCQmedium

A financial services company wants to generate personalized investment recommendations using a large language model via Amazon Bedrock. They have customer data that includes risk tolerance, portfolio holdings, and financial goals. The company is highly concerned about data privacy and must avoid exposing sensitive personally identifiable information (PII) to the model. They plan to use a foundation model to generate recommendations based on customer profiles. What is the best approach to protect customer privacy while still enabling personalization?

A.Fine-tune the model on a large dataset of investment recommendations without any customer-specific data.
B.Use prompt engineering to instruct the model to disregard any personally identifiable information.
C.Preprocess the customer data to replace sensitive fields with placeholders, then use the processed data in the prompt.
D.Include the customer data directly in the prompt and rely on the model to anonymize it.
AnswerC

This reduces privacy risk by removing PII while retaining relevant non-sensitive features for personalization.

Why this answer

Option C is correct. Preprocessing customer data to replace sensitive fields with placeholders (e.g., using synthetic IDs) allows the model to generate personalized recommendations without accessing real PII. This minimizes risk.

Option A is incorrect because relying on the model to anonymize data is unreliable and may still leak PII. Option B is incorrect because prompt engineering instructions are not a robust privacy control. Option D is incorrect because fine-tuning on generic data does not produce personalized recommendations for individual customers.

38
MCQhard

A bank is using Amazon Bedrock to summarize customer support transcripts. The summaries often contain factual inaccuracies (hallucinations). Which approach is most effective for reducing hallucinations?

A.Decrease the top-p to 0.1
B.Increase the model's temperature to make outputs more diverse
C.Fine-tune a smaller model on a large dataset of transcripts
D.Implement RAG by grounding summarization on retrieved transcripts
AnswerD

RAG provides specific context from the original transcript, aligning summarization with facts.

Why this answer

Retrieval-Augmented Generation (RAG) grounds the model's output on retrieved transcripts, reducing the chance of fabricating details. Fine-tuning on transcripts may reinforce patterns but does not guarantee factual accuracy at inference time.

39
MCQmedium

A machine learning engineer notices that a generative AI model occasionally produces biased outputs. Which AWS feature can automatically filter harmful content before it reaches users?

A.Amazon CloudWatch alarms
B.Amazon SageMaker Clarify
C.AWS Identity and Access Management (IAM) policies
D.Amazon Bedrock Guardrails
AnswerD

Guardrails allow configuring filters for harmful content, topics, and PII.

Why this answer

Amazon Bedrock Guardrails provide content filtering and topic control. Option A (IAM) is for permissions. Option C (CloudWatch) is for monitoring.

Option D (SageMaker Clarify) is for bias detection, not real-time filtering.

40
MCQhard

A machine learning team is fine-tuning a foundation model using Amazon SageMaker. They need to optimize training time and cost. Which approach should they take?

A.Use a larger instance type with more vCPUs
B.Increase the batch size to the maximum possible
C.Use the full model weights and train on a single GPU
D.Use Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA
AnswerD

PEFT reduces memory and time by updating only a small number of parameters.

Why this answer

Option B is correct because Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA only update a small subset of parameters, significantly reducing compute requirements. Option A (full model weights on single GPU) is slow and expensive. Option C (maximum batch size) may cause out-of-memory errors.

Option D (larger instance) increases cost without necessarily improving efficiency.

41
MCQeasy

A startup wants to integrate a generative AI chatbot into their mobile app with minimal latency. Which AWS service is purpose-built for deploying foundation models with low latency and high throughput?

A.AWS Lambda
B.Amazon SageMaker
C.Amazon Bedrock
D.Amazon Transcribe
AnswerC

Bedrock provides managed endpoints with low latency for foundation models.

Why this answer

Amazon Bedrock offers low-latency inference endpoints. Option A (SageMaker) is for training and inference but not purpose-built for foundation models. Option C (Lambda) is for serverless compute.

Option D (Transcribe) is for speech.

42
Multi-Selecteasy

Which TWO actions are best practices for reducing hallucinations in generative AI models? (Choose 2)

Select 2 answers
A.Increase the model size
B.Fine-tune the model on proprietary data
C.Use retrieval-augmented generation (RAG)
D.Use a smaller model to limit complexity
E.Apply prompt engineering with clear instructions and constraints
AnswersC, E

RAG grounds responses in retrieved documents, improving factual accuracy.

Why this answer

RAG provides factual grounding, and prompt engineering with clear constraints reduces hallucinations. Increasing model size (A) may not reduce hallucinations. Fine-tuning (C) can help but is not a direct best practice for all cases; RAG and prompt engineering are more effective.

Using a smaller model (E) may increase hallucinations.

43
MCQhard

A company operates a customer support chatbot that uses Amazon Bedrock with a knowledge base sourced from an S3 bucket containing frequently updated product documentation. The knowledge base uses OpenSearch Serverless as the vector store and is configured to sync daily. The chatbot uses the RetrieveAndGenerate API with a custom Lambda function that applies a system prompt instructing the model to base answers solely on the retrieved context. After a major update to the product documentation, the IT team verifies that the data source sync completed successfully and the new chunks are present in the OpenSearch index. However, the chatbot continues to respond with outdated information. Further investigation reveals that the Lambda function includes a response caching mechanism using Amazon ElastiCache for Redis with a Time-To-Live (TTL) of 24 hours. The cache key is based on the user query. The team notes that no cache invalidation is performed after documentation updates. What is the most likely cause of the outdated responses?

A.The ElastiCache cache is returning stale cached responses that contain the old information.
B.The 'maximum results' parameter in the RetrieveAndGenerate API is set to a value too low to retrieve the new chunks.
C.The embedding model used by the knowledge base has not been retrained on the new documentation.
D.The IAM role for the Lambda function lacks permissions to access the new S3 objects.
AnswerA

Correct. The cache is not invalidated on document updates, so identical queries return cached old responses.

Why this answer

Since the data source sync succeeded and the index contains new chunks, the retrieval should be able to access the latest data. However, the Lambda function caches responses keyed by query. With a 24-hour TTL and no invalidation, the cache returns stale responses containing the old safety information.

Clearing the cache or reducing TTL would resolve the issue. The other options are less likely: low max results might cause missing new chunks but would not consistently return old info; the embedding model is not retrained per sync; IAM permissions would affect sync, not retrieval.

44
MCQhard

A data scientist is unable to invoke the Claude v2 model from an EC2 instance with IP 10.0.1.5. What is the most likely reason?

A.The condition is too restrictive
B.The policy does not allow the ec2:AssociateAddress action
C.The resource ARN is incorrect because it should not include the account ID
D.The model is not available in us-east-1
AnswerC

Bedrock model ARNs omit the account ID; having it results in an invalid resource.

Why this answer

Option B is correct because the resource ARN is malformed. Amazon Bedrock model ARNs do not include the account ID; the correct format is arn:aws:bedrock:region::foundation-model/anthropic.claude-v2. Option A is not an action.

Option C is incorrect because the IP is within the allowed range. Option D is incorrect because the model is available in us-east-1.

45
MCQeasy

A company wants to use a pre-trained generative AI model to analyze customer feedback. They need to adjust the model for their specific domain without retraining from scratch. Which approach is MOST suitable?

A.Fine-tuning the model on domain-specific data
B.Reinforcement Learning from Human Feedback (RLHF)
C.Training a new model from scratch on the domain data
D.Using prompt engineering to provide context
AnswerA

Fine-tuning is efficient for domain adaptation using pre-trained models.

Why this answer

Fine-tuning is the most suitable approach because it takes a pre-trained generative AI model and updates its weights using a smaller, domain-specific dataset (e.g., customer feedback transcripts). This allows the model to adapt to the company's specific terminology, sentiment patterns, and context without the massive computational cost and data requirements of training from scratch. It preserves the general language understanding from pre-training while specializing the model for the target domain.

Exam trap

Cisco often tests the distinction between prompt engineering (a zero-shot or few-shot method that does not modify the model) and fine-tuning (which updates model weights), leading candidates to mistakenly choose prompt engineering as a simpler but insufficient solution for deep domain adaptation.

How to eliminate wrong answers

Option B (RLHF) is wrong because RLHF is a technique used to align model outputs with human preferences through reward modeling, not primarily for domain adaptation; it requires a separate reward model and human feedback loop, making it overkill and less direct for simply specializing on domain-specific data. Option C (training a new model from scratch) is wrong because it discards the benefits of pre-training, requiring enormous amounts of domain data and compute resources, which contradicts the requirement to avoid retraining from scratch. Option D (prompt engineering) is wrong because while it can provide context at inference time, it does not adjust the model's internal weights or permanently adapt it to the domain; it relies on the model's existing knowledge and may fail for nuanced or rare domain-specific terms.

46
MCQeasy

A company is building a customer service chatbot using Amazon Bedrock. Which component of a foundation model determines the creativity and randomness of the generated responses?

A.Prompt template
B.Temperature
C.Max tokens
D.Top-p
AnswerB

Temperature scales the logits before softmax, controlling randomness. Lower values make outputs more deterministic.

Why this answer

The temperature parameter controls randomness. Higher values (e.g., >1) produce more creative but less focused outputs, while lower values (e.g., near 0) produce more deterministic responses.

47
MCQhard

A healthcare startup is using Amazon Bedrock to generate clinical notes. They must prevent the model from outputting any personally identifiable information (PII) such as patient names. What is the most effective approach?

A.Fine-tune the model on de-identified data only
B.Configure a guardrail in Amazon Bedrock to deny PII topics
C.Use a prompt engineering technique to instruct the model to avoid PII
D.Post-process the output with a regex filter
AnswerB

Guardrails provide robust content filtering that can detect and block PII, making this the most effective approach.

Why this answer

Option B is correct because Amazon Bedrock guardrails provide content filtering that can deny PII topics and block sensitive information. Option A (prompt engineering) can be bypassed. Option C (fine-tuning on de-identified data) is costly and not guaranteed.

Option D (regex post-processing) is brittle and incomplete.

48
MCQhard

A company is building a multi-step AI agent using Amazon Bedrock Agents to automate a complex business process that requires memory across interactions. The agent needs to remember user preferences and previous steps. Which approach best maintains state across sessions?

A.Store state in AWS Lambda environment variables
B.Use Amazon DynamoDB tables managed by Lambda functions
C.Implement state management using AWS Step Functions
D.Leverage Amazon Bedrock Agents' built-in memory feature
AnswerD

Bedrock Agents provide session memory out-of-the-box, automatically persisting context across turns.

Why this answer

Amazon Bedrock Agents have built-in memory management (session memory) that automatically persists context across interactions, using a backend like DynamoDB. Lambda would require custom state management. Step Functions orchestrate but don't inherently store memory.

DynamoDB alone lacks the agent logic.

49
MCQmedium

A company is building a customer support chatbot using Amazon Bedrock. They have a large corpus of internal documentation and want to provide accurate answers without retraining the model. Which approach should they use?

A.Use advanced prompt engineering with a generic foundation model.
B.Use Retrieval Augmented Generation (RAG) with Amazon Bedrock Knowledge Base.
C.Use a pre-trained foundation model without customization.
D.Fine-tune the model on their documentation using Amazon SageMaker.
AnswerB

RAG retrieves relevant documents at inference time, providing accurate answers from internal data without retraining.

Why this answer

Option C is correct because Retrieval Augmented Generation (RAG) with Bedrock Knowledge Base allows the model to retrieve relevant documents from internal sources and generate grounded answers, avoiding the need for fine-tuning. Option A is wrong because a pre-trained model alone lacks domain knowledge. Option B is wrong because fine-tuning requires labeled data and is more costly.

Option D is wrong because prompt engineering alone cannot incorporate proprietary data effectively.

50
MCQhard

A startup is fine-tuning a large language model (LLM) for code generation using Amazon SageMaker. They are using a p4d.24xlarge instance with a single GPU. The training process is extremely slow, taking over 48 hours for one epoch. The dataset is 10GB of code snippets. The company needs to iterate quickly. Which action would most significantly reduce training time without sacrificing model quality?

A.Enable distributed training using SageMaker’s data parallelism library across multiple GPUs
B.Switch to spot instances to reduce cost, not time
C.Increase the batch size to use GPU memory more efficiently
D.Use a smaller foundation model to reduce compute per step
AnswerA

Distributed training scales across GPUs/nodes, significantly speeding up training while preserving model size.

Why this answer

Distributed training across multiple GPUs and instances dramatically reduces time by parallelizing the workload. Increasing instance count or using a smaller model helps but may not be optimal. Spot instances could be unstable.

Data parallelism is a standard technique for large models.

51
Multi-Selecteasy

Which TWO of the following are key advantages of using Amazon Bedrock for building generative AI applications?

Select 2 answers
A.Automatic optimization of prompts for all models without user intervention.
B.Ability to fine-tune models using your own data without managing underlying infrastructure.
C.Eliminates the need for any data preprocessing before model invocation.
D.Guaranteed identical outputs from all models for the same prompt.
E.Access to multiple foundation models from different providers via a single API.
AnswersB, E

Correct. Bedrock provides managed fine-tuning capabilities, abstracting infrastructure.

Why this answer

Amazon Bedrock provides access to multiple foundation models from different providers via a single API, and it allows you to fine-tune models using your own data without managing infrastructure. Automatic prompt optimization is not a built-in feature; model outputs are not guaranteed to be identical; and data preprocessing is still required.

52
Multi-Selectmedium

A data science team is evaluating foundation models for a code generation task. They need a model that is fine-tuned for code and can be deployed on Amazon SageMaker. Which THREE criteria are important to consider when selecting a model?

Select 3 answers
A.Licensing and usage terms
B.Cost per token for inference
C.Context window length
D.The training algorithm used
E.Model size and architecture
AnswersA, C, E

Must be compatible with the deployment and business use.

Why this answer

Option A is correct because licensing and usage terms directly impact whether a foundation model can be legally used for commercial code generation. Models like Code Llama or StarCoder have specific licenses (e.g., Llama 2 Community License, OpenRAIL-M) that may restrict fine-tuning, redistribution, or use in proprietary products. Ignoring these terms could lead to compliance violations when deploying on Amazon SageMaker.

Exam trap

Cisco often tests the distinction between model selection criteria (licensing, context window, architecture) and operational concerns (cost, training internals), tempting candidates to pick cost per token or training algorithm as relevant when they are not primary factors for selecting a fine-tuned model for deployment.

53
MCQhard

A developer is using Amazon Bedrock's Converse API to build a multi-turn conversation. They notice the model forgets earlier context after a few exchanges. What is the most likely cause?

A.The API has a rate limit that truncates history
B.The model's context window is too small for the conversation
C.The model's maximum output length is set too low
D.The developer is not sending the previous messages in each request
AnswerD

The Converse API requires the client to maintain and send conversation history.

Why this answer

The API requires passing the message history explicitly; if not, context is lost. Option A (Rate limits) affect throughput. Option B (Model capacity) is not typical.

Option D (Output length) truncates responses.

54
Multi-Selectmedium

Which THREE are key capabilities of Amazon Bedrock? (Choose 3)

Select 3 answers
A.Automatic model selection based on use case
B.Model customization through fine-tuning
C.Guardrails to filter harmful content
D.Serverless inference for foundation models
E.Built-in vector database for knowledge bases
AnswersB, C, D

Bedrock supports fine-tuning for Amazon Titan and other models.

Why this answer

Bedrock offers serverless inference (A), model customization (B), and guardrails for content filtering (D). Built-in vector database (C) is not a Bedrock capability; Bedrock integrates with external vector stores. Auto model selection (E) is not available; users choose models explicitly.

55
MCQeasy

Which AWS service provides a fully managed experience for building generative AI applications with a variety of foundation models through a unified API?

A.AWS Lambda
B.Amazon SageMaker
C.Amazon Rekognition
D.Amazon Bedrock
AnswerD

Amazon Bedrock provides a unified API to access and manage foundation models from various providers.

Why this answer

Option D is correct because Amazon Bedrock is a fully managed service that offers a choice of foundation models from Amazon and other providers via a single API. Option A (SageMaker) is a broader ML platform. Option B (Lambda) is serverless compute.

Option C (Rekognition) is for image/video analysis.

56
MCQhard

A team is using Amazon Bedrock with a Claude model and wants to ensure responses adhere to a specific output format such as JSON. Which technique should be applied?

A.Use a retrieval-augmented generation (RAG) approach
B.Attach a guardrail with a JSON schema
C.Include a system prompt with explicit formatting instructions
D.Customize the model with a JSON training dataset
AnswerC

A system prompt can instruct the model to output JSON or other formats.

Why this answer

System prompt instructions can enforce output format. Option A (Guardrails) is for safety. Option B (Retrieval augmentation) is for knowledge.

Option D (Model customization) is for fine-tuning.

57
MCQeasy

A company wants to generate product descriptions from a few keywords without managing infrastructure. Which AWS service provides a serverless API for accessing foundation models?

A.Amazon Lex
B.Amazon SageMaker
C.Amazon Bedrock
D.Amazon Comprehend
AnswerC

Bedrock provides a serverless API to invoke foundation models directly.

Why this answer

Amazon Bedrock offers a serverless API to invoke foundation models. Option A (SageMaker) is for training and deployment, not serverless API. Option C (Comprehend) is for NLP analysis, not generative.

Option D (Lex) is for conversational interfaces.

58
MCQhard

A company is building a chatbot that must provide accurate answers based on internal documents without retraining the model. Which approach should they use?

A.Reinforcement learning from human feedback (RLHF)
B.Fine-tuning the model on internal documents
C.Model distillation to a smaller model
D.Prompt engineering with retrieval-augmented generation (RAG)
AnswerD

RAG retrieves relevant documents at inference time, providing up-to-date answers.

Why this answer

RAG (Retrieval-Augmented Generation) retrieves relevant documents and augments the prompt, avoiding retraining. Option A (Fine-tuning) requires retraining. Option C (Distillation) reduces model size.

Option D (RLHF) is for aligning model behavior.

59
MCQhard

A data scientist is fine-tuning a large language model on Amazon SageMaker for a text summarization task. The training loss decreases steadily but the validation loss starts increasing after a few epochs. What should the scientist do to address this issue?

A.Reduce the batch size
B.Increase the learning rate
C.Increase the number of training epochs
D.Use early stopping based on validation loss
AnswerD

Early stopping prevents overfitting by halting training when validation loss stops improving.

Why this answer

The validation loss increasing while training loss decreases is a classic sign of overfitting. Early stopping based on validation loss halts training when the validation loss stops improving, preventing overfitting and saving computational resources. This is a standard technique in SageMaker's built-in training algorithms and custom training scripts.

Exam trap

Cisco often tests the distinction between overfitting and underfitting; the trap here is that candidates may mistakenly think increasing epochs (Option C) always improves performance, ignoring the validation loss divergence that signals overfitting.

How to eliminate wrong answers

Option A is wrong because reducing batch size introduces more noise into gradient estimates, which can actually worsen generalization and does not directly address overfitting. Option B is wrong because increasing the learning rate can cause the optimizer to overshoot minima, leading to divergence or unstable training, not reduced overfitting. Option C is wrong because increasing the number of training epochs would exacerbate overfitting, as the model would continue to memorize the training data beyond the point where validation loss degrades.

60
MCQhard

A developer attached this IAM policy to a role used by an application that invokes Claude v2 in us-east-1. The application receives an access denied error. What is the MOST likely cause?

A.The Allow statement does not include a condition on the region
B.The Deny statement is blocking requests because the condition does not match the resource ARN's region
C.The Deny statement uses StringNotEquals instead of StringEquals
D.The resource ARN in the Allow statement is incorrect
AnswerB

The Deny condition checks aws:RequestedRegion, which may differ from the region in the resource ARN if requests are made to a different region.

Why this answer

The Deny statement uses a `StringNotEquals` condition on `aws:RequestedRegion` set to `us-east-1`. This means the Deny applies to any request where the requested region is NOT `us-east-1`. Since the resource ARN in the Deny statement is `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2`, the condition does not match the resource's region (the resource ARN itself is in us-east-1), but the Deny is triggered when the request is made to a different region, blocking the call.

The application is likely invoking the model from a region other than us-east-1, causing the Deny to take effect.

Exam trap

Cisco often tests the subtle interaction between Allow and Deny statements with condition operators, where candidates mistakenly think the Deny is blocking because of a region mismatch on the resource ARN itself, rather than understanding that the Deny's condition evaluates the request's region, not the resource's region.

How to eliminate wrong answers

Option A is wrong because the Allow statement does not need a region condition; the Allow grants access to the specific resource ARN, and the Deny is the one causing the issue. Option C is wrong because using `StringNotEquals` is correct for this pattern—it denies requests that are NOT in the specified region; `StringEquals` would deny only requests in us-east-1, which is not the intended behavior. Option D is wrong because the resource ARN in the Allow statement (`arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2`) is correct for Claude v2 in us-east-1; the error is not due to an incorrect ARN but due to the Deny statement's condition logic.

61
MCQhard

A company is building a generative AI application to personalize email marketing campaigns. They use Amazon Bedrock with Anthropic Claude 3 Sonnet. The system takes customer data (name, purchase history) from an Amazon DynamoDB table and generates a personalized email body. During testing, the team notices that some emails contain factually incorrect information, such as recommending products the customer never purchased. The DynamoDB table is queried correctly and the correct data is passed to the model. The prompts include the customer data as context. The team has already tried adjusting the temperature and top-p parameters, but the issue persists. They need to improve the factual accuracy of the generated emails without significantly increasing latency or cost. The application is currently deployed on a single AWS Lambda function that invokes Bedrock. The DynamoDB table is small (few thousand records). Which course of action should the team take?

A.Use a structured prompt that explicitly instructs the model to base its response only on the provided customer data, and request a JSON object as output with the email body as a field.
B.Switch to a larger model like Claude 3 Opus to improve accuracy.
C.Reduce the temperature to 0 to make the model fully deterministic.
D.Fine-tune the model on a dataset of correct email examples paired with customer data.
AnswerA

This makes the model strictly follow the context and reduces hallucinations.

Why this answer

Option B is correct because the issue is that the model is ignoring the provided context despite it being passed. Anthropic Claude supports prompt caching for repeated context, but the core problem is that the model is not using the context reliably. Using a deterministic response format with JSON mode and adding explicit instructions to base responses only on provided data can significantly improve accuracy.

Option A is wrong because fine-tuning would be overkill for a small dataset and may cause overfitting, plus it increases cost and latency. Option C is wrong because reducing temperature further may make outputs too repetitive but does not guarantee factual correctness. Option D is wrong because using a larger model would increase cost and latency without necessarily solving the context adherence issue.

62
MCQeasy

A developer is creating a generative AI application using Amazon Bedrock and needs to ensure that responses do not include toxic or harmful content. Which feature should be enabled?

A.Amazon CloudWatch Logs for prompt logging.
B.Amazon Virtual Private Cloud (VPC) for network isolation.
C.Amazon Bedrock Guardrails.
D.AWS Identity and Access Management (IAM) policies.
AnswerC

Guardrails enforce content policies, filter toxic content, and block denied topics.

Why this answer

Amazon Bedrock Guardrails is the correct feature because it is specifically designed to enforce content policies, filter toxic or harmful content, and block undesirable topics in generative AI responses. It provides configurable thresholds for hate, insults, sexual content, violence, and other harmful categories, ensuring compliance with safety requirements without modifying the underlying model.

Exam trap

The trap here is that candidates often confuse monitoring/logging services (CloudWatch) or security controls (VPC, IAM) with content safety features, not realizing that Bedrock Guardrails is the only option that directly filters toxic or harmful content at the application layer.

How to eliminate wrong answers

Option A is wrong because Amazon CloudWatch Logs for prompt logging captures and stores logs for monitoring and debugging, but it does not actively filter or block toxic content in responses. Option B is wrong because Amazon Virtual Private Cloud (VPC) provides network isolation and security at the infrastructure layer, but it has no mechanism to inspect or control the semantic content of AI-generated responses. Option D is wrong because AWS Identity and Access Management (IAM) policies control authentication and authorization for API calls, but they cannot enforce content safety rules or filter harmful language in model outputs.

63
Multi-Selecthard

Which TWO practices help ensure responsible AI when deploying generative AI applications? (Select TWO.)

Select 2 answers
A.Deploy the model without any content filters to maximize creativity
B.Increase model size to improve accuracy at the expense of interpretability
C.Use only synthetic data for training to avoid privacy issues
D.Implement guardrails to filter harmful or inappropriate content
E.Monitor the model's outputs for bias and drift over time
AnswersD, E

Guardrails like Amazon Bedrock Guardrails help enforce content policies and prevent harmful outputs.

Why this answer

Implementing guardrails (e.g., content filtering) and monitoring for bias are key responsible AI practices. Using diverse training data is important but not a deployment practice. Publicly deploying without safeguards is irresponsible.

64
MCQmedium

Refer to the exhibit. A company sets up a knowledge base for a customer support chatbot using Amazon Bedrock. Users report that the chatbot misses relevant details from long documents. Which change to the data source configuration would most likely improve retrieval?

A.Increase the chunk size in FIXED_SIZE chunking
B.Change chunking strategy to SEMANTIC
C.Add more documents to the S3 bucket
D.Change the embedding model to a larger one
AnswerB

Semantic chunking groups related content, preserving context and improving retrieval accuracy.

Why this answer

The chunking strategy is set to FIXED_SIZE, which may split documents into chunks that are too small or lose context. Switching to SEMANTIC chunking improves retrieval by grouping paragraphs with similar meaning.

65
MCQeasy

A company wants to build a generative AI application that can summarize customer support tickets. They need to ensure the model stays up-to-date with the latest product documentation without retraining. Which AWS service would best support this requirement?

A.Amazon Bedrock with Retrieval Augmented Generation (RAG)
B.Amazon Comprehend
C.Amazon Rekognition
D.Amazon SageMaker Ground Truth
AnswerA

Amazon Bedrock supports RAG, which enables the model to retrieve current information from a knowledge base, keeping summaries up-to-date without retraining.

Why this answer

Option D is correct because Amazon Bedrock with RAG allows the model to retrieve and incorporate up-to-date information from external sources without retraining. Option A (Amazon Comprehend) is for NLP but not generative summarization with live updates. Option B (Amazon Rekognition) is for image/video analysis.

Option C (Amazon SageMaker Ground Truth) is for data labeling.

66
MCQmedium

A financial services firm fine-tuned a generative AI model on Amazon SageMaker to summarize quarterly reports. The summaries often miss key financial metrics such as revenue and profit margins. The fine-tuning dataset contained full reports with summaries that included these metrics. The model appears to understand the reports but omits critical numbers. Which course of action would most likely improve the summaries?

A.Re-fine-tune using a carefully crafted dataset that includes explicit instructions to include key metrics and provides examples of correct summaries
B.Increase the maximum number of tokens in the summary
C.Switch to a different pre-trained model like Claude instead of the current one
D.Implement a post-processing Lambda function that extracts metrics from the original report and appends them to the summary
AnswerA

Better alignment through example prompts and targets teaches the model to focus on essential numbers.

Why this answer

The fine-tuning dataset likely lacks explicit instruction in the prompts to include specific metrics. Re-fine-tuning with examples that emphasize extracting and reporting numbers, or using a format that forces structured output, would help. Increasing length may include more text but not guarantee key metrics.

Changing model or post-processing won't fix the underlying training deficiency.

67
Multi-Selectmedium

A company is building a generative AI application to generate product descriptions from customer reviews. They want to use Amazon Bedrock to access a foundation model. Which TWO actions should the company take to ensure responsible AI practices?

Select 2 answers
A.Use a single foundation model without any customization to avoid bias.
B.Implement human review of all generated descriptions before publication.
C.Monitor and log model inputs and outputs for auditing.
D.Regularly evaluate model performance and fine-tune with diverse data.
E.Disable content filtering to allow maximum creativity.
AnswersB, C

Human review provides oversight to catch harmful or biased outputs.

Why this answer

Options A and C are correct. Implementing human review (A) ensures oversight and catches harmful outputs. Monitoring and logging (C) enables auditing and detection of misuse.

Option B is incorrect because using a single model does not automatically avoid bias; customization may be needed. Option D is incorrect because disabling content filtering increases risk of generating inappropriate content. Option E is plausible but not a requirement specific to responsible AI; evaluation is part of ongoing improvement but not the immediate action.

68
MCQmedium

Refer to the exhibit. A developer has attached this IAM policy to their user. When trying to invoke the Anthropic Claude v2 model using the Bedrock runtime, they receive an AccessDeniedException. Which change to the policy would resolve the issue?

A.Add the bedrock:InvokeModelWithResponseStream action
B.Change the Action to bedrock:ListFoundationModels
C.Change the Resource to arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2
D.Remove the Resource element and set Effect to Deny
AnswerC

Correct. This ARN matches the Claude v2 model, allowing invocation.

Why this answer

The policy grants access only to the Titan model resource. To invoke Claude v2, the resource must match the Claude model ARN. Adding other actions or removing the resource condition would not grant the correct access; listing models does not allow invocation.

69
MCQhard

A healthcare company wants to use generative AI to automatically generate patient summary reports from electronic health records (EHRs). The solution must be HIPAA compliant and data must not leave AWS. They plan to use Amazon Bedrock with a foundation model. The EHR data is stored in Amazon S3 and contains protected health information (PHI). Which approach best meets compliance requirements?

A.Use Amazon Bedrock with a HIPAA-eligible account, enable encryption with KMS, and de-identify PHI in the prompt
B.Use a publicly available foundation model API outside AWS for better accuracy
C.Use Amazon Comprehend Medical for entity extraction and then feed results into a model on Amazon Bedrock without de-identification
D.Use Amazon SageMaker with a public model from the internet without encryption
AnswerA

Bedrock is HIPAA-eligible when used with AWS Organizations and BAA; de-identification and KMS encryption protect PHI.

Why this answer

Amazon Bedrock operates within a HIPAA-eligible environment when configured appropriately, and using AWS KMS for encryption and not storing PHI in prompts (using de-identification) can maintain compliance. Using public models or non-HIPAA services would violate requirements. SageMaker with encryption can also be HIPAA-eligible, but Bedrock with proper settings is simpler.

70
MCQmedium

A company is using Amazon Bedrock to generate code snippets. They notice the model occasionally generates code that fails to compile. What is the most effective way to improve code quality without retraining?

A.Reduce the temperature parameter to 0 for deterministic output.
B.Increase the max token limit to allow the model to complete the code fully.
C.Fine-tune the model on a dataset of correct code snippets.
D.Use few-shot prompt engineering with correct code examples and formatting instructions.
AnswerD

Examples help the model understand the expected output and reduce errors.

Why this answer

Option B is correct because prompt engineering with examples and constraints can guide the model to produce more accurate code. Option A is wrong because reducing temperature increases determinism but doesn't guarantee correctness. Option C is wrong because fine-tuning is expensive and may overfit.

Option D is wrong because increasing max tokens may lead to more errors.

71
MCQeasy

A developer wants to test different prompt variations for a chatbot without making repeated API calls. Which Amazon Bedrock feature can help compare model responses?

A.Model evaluation on Amazon SageMaker
B.Amazon Bedrock Playground
C.AWS Security Token Service (STS)
D.Amazon CloudWatch Logs
AnswerB

The playground allows developers to test and compare prompts interactively.

Why this answer

Option D is correct because Amazon Bedrock Playground provides an interactive interface to experiment with prompts and compare outputs side by side. Option A (CloudWatch Logs) is for monitoring. Option B (Model evaluation on SageMaker) is for offline evaluation.

Option C (AWS STS) is for security tokens.

72
Multi-Selecteasy

Which TWO are benefits of using Amazon SageMaker JumpStart for foundation models? (Choose 2)

Select 2 answers
A.Built-in fine-tuning scripts and notebooks
B.No coding required to fine-tune models
C.Automatic scaling without any configuration
D.Pre-trained foundation models available in the catalog
E.Free unlimited usage for all models
AnswersA, D

JumpStart provides prebuilt notebooks and scripts for common fine-tuning tasks.

Why this answer

JumpStart provides pre-trained foundation models and built-in fine-tuning scripts, accelerating development. It does require some coding for customization. It offers many models but not unlimited free usage (charges apply for infrastructure).

Scaling is configurable but not fully automatic without setup.

73
MCQmedium

A company wants to build a customer support chatbot that answers questions based on a large internal knowledge base. Which AWS service is most suitable for implementing RAG to retrieve relevant documents?

A.Amazon Lex
B.Amazon Polly
C.Amazon Connect
D.Amazon Kendra
AnswerD

Kendra provides intelligent search and retrieval from indexed documents, ideal for RAG workflows.

Why this answer

Amazon Kendra is a highly accurate enterprise search service that can retrieve relevant documents from various sources, which can then be provided to a foundation model for generation. Lex, Connect, and Polly are not designed for document retrieval.

74
MCQeasy

A developer is building a customer-facing chatbot using Amazon Bedrock. To ensure the chatbot does not generate offensive or inappropriate content, which AWS feature should they implement?

A.AWS Identity and Access Management (IAM) policies
B.Amazon Bedrock Guardrails
C.Prompt engineering with system prompts
D.Increasing the model temperature parameter
AnswerB

Guardrails enable content filtering, topic control, and safety mechanisms for Bedrock models.

Why this answer

Amazon Bedrock Guardrails provide content filtering, allowing you to define policies to block harmful or inappropriate content. Prompt templates and temperature affect output style but not safety. IAM controls access but not content.

75
MCQmedium

A developer is building a chatbot using Amazon Bedrock and Claude. They notice that the model sometimes generates harmful or biased responses. Which AWS service can they use to implement guardrails?

A.AWS WAF
B.Amazon GuardDuty
C.AWS Shield
D.Amazon Bedrock Guardrails
AnswerD

Bedrock Guardrails allows you to define content filters and deny topics to moderate model responses.

Why this answer

Option C, Amazon Bedrock Guardrails, is the native service for adding content filters and safety controls to models in Bedrock. Option A (AWS WAF) is a web application firewall, not for model output. Option B (Amazon GuardDuty) is a threat detection service.

Option D (AWS Shield) protects against DDoS attacks.

Page 1 of 2 · 117 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Fundamentals of Generative AI questions.