This chapter covers Amazon Bedrock, a fully managed service that provides access to foundation models (FMs) from leading AI companies via a single API, enabling developers to build generative AI applications without managing underlying infrastructure. For the DVA-C02 exam, understanding Bedrock's capabilities—such as model access, fine-tuning, knowledge bases, agents, and guardrails—is critical, as questions on generative AI and foundation models are increasingly common. While the exact percentage varies, expect 5-10% of exam questions to touch on Bedrock or related AI/ML services, especially in the context of developing and deploying AI-powered features.
Jump to a section
Imagine a master chef (the developer) who wants to create a gourmet meal (an AI application) but doesn't have the time or expertise to grow the ingredients from scratch. Instead, the chef uses a platform like a high-end culinary marketplace (Amazon Bedrock). This marketplace provides pre-prepared, high-quality base ingredients (foundation models, or FMs) from various renowned suppliers (AI companies like Anthropic, Stability AI, etc.). The chef selects the perfect base ingredient (e.g., a pre-trained language model) and then customizes it by adding a secret sauce (fine-tuning with their own data) or combining it with other elements (RAG using a knowledge base). The platform handles all the kitchen logistics: storage (model hosting), temperature control (inference scaling), and cleanup (compliance and governance). The chef never needs to build a farm, raise livestock, or even know how to make the base ingredient from scratch. The platform also ensures that the chef's proprietary recipe (data) remains private and secure, and that the final dish can be served to thousands of customers (scalable inference) without the chef worrying about the underlying infrastructure. Furthermore, the platform provides a standard menu (model catalog) and allows the chef to create limited-time specials (custom models) that can be reused. The chef pays only for the ingredients and the kitchen time used (pay-per-use pricing), not for the entire restaurant infrastructure. In this analogy, the master chef is the AI developer, the culinary marketplace is Amazon Bedrock, the base ingredients are the foundation models, the secret sauce is fine-tuning, and the gourmet meal is the final AI application. The platform's value is in abstraction, speed, and scale, exactly as Bedrock does for AI development.
What is Amazon Bedrock and Why It Exists
Amazon Bedrock is a fully managed AWS service that provides access to a curated set of foundation models (FMs) from leading AI companies like Anthropic, Stability AI, AI21 Labs, Cohere, Meta, and Amazon itself. It was launched in April 2023 to address a key challenge for developers: building generative AI applications typically requires either training a model from scratch (costly and complex) or managing the hosting and inference of existing models (operational overhead). Bedrock abstracts away these complexities by offering a serverless experience where developers can discover, customize, and deploy FMs via a single API, paying only for what they use.
For the DVA-C02 exam, Bedrock is relevant under Domain 1 (Development) and Objective 1.6 (Develop applications using AWS services). The exam tests your ability to integrate Bedrock into applications, understand its security model, and differentiate it from other AI/ML services like SageMaker or Lex.
How Bedrock Works Internally
Bedrock operates as a managed inference layer between the developer and the underlying foundation models. When you call the InvokeModel API, Bedrock routes your request to the appropriate FM hosted in a secure, multi-tenant environment. The request includes:
- modelId: The identifier of the FM (e.g., anthropic.claude-v2).
- contentType: Typically application/json.
- accept: The expected response format.
- body: The input payload, which varies by model (e.g., for Claude, it includes prompt, max_tokens_to_sample, temperature, etc.).
Bedrock handles: - Inference optimization: Automatic scaling of compute resources based on demand. - Model versioning: You can pin to a specific version or use the latest. - Security: Data is encrypted in transit (TLS) and at rest (AWS KMS). Bedrock does not use your data to improve the base models (unless you opt in). - Monitoring: Integration with CloudWatch for metrics (invocations, latency, throttling) and CloudTrail for API logging.
Key Components, Values, Defaults, and Timers
- Model Catalog: A list of available FMs. Each model has a unique modelId (e.g., stability.stable-diffusion-xl-v0 for image generation). The catalog is region-specific; not all models are available in all regions.
- Inference Parameters: Common parameters include:
- maxTokens (or max_tokens_to_sample): Maximum number of tokens in the response. Default varies by model (e.g., Claude v2 defaults to 256).
- temperature: Controls randomness (0-1). Default 1.0 for many models.
- topP: Nucleus sampling threshold (0-1). Default 1.0.
- stopSequences: Array of strings that stop generation.
- Fine-tuning: You can customize a model using your own labeled data (JSONL format with prompt-completion pairs). The fine-tuning process creates a new model version that you can invoke via a provisioned throughput or on-demand. Fine-tuning is available only for select models (e.g., Amazon Titan, Cohere Command, Meta Llama 2).
- Provisioned Throughput: For guaranteed performance, you can purchase provisioned throughput in model units. Each model unit provides a baseline of requests per minute (RPM) and tokens per minute (TPM). Pricing is per-hour per-model-unit.
- Guardrails: A feature that allows you to define content filters (e.g., block hate speech) and topic policies (e.g., prevent the model from discussing certain topics). Guardrails are evaluated before and after model inference.
- Knowledge Bases: A managed RAG (Retrieval-Augmented Generation) solution. You point Bedrock to an S3 bucket containing documents, and Bedrock automatically chunks, embeds, and indexes them into a vector store (e.g., Amazon OpenSearch Serverless). The RetrieveAndGenerate API then retrieves relevant chunks and passes them to the FM with the user query.
- Agents: Bedrock Agents enable you to create AI agents that can orchestrate multi-step tasks, call Lambda functions, and use knowledge bases. Agents are built using a prompt template and can be configured with action groups.
Configuration and Verification Commands
To invoke a model using the AWS CLI:
aws bedrock-runtime invoke-model \
--model-id anthropic.claude-v2 \
--body '{"prompt":"Human: Hello, how are you?
Assistant:","max_tokens_to_sample":256,"temperature":0.5}' \
--cli-binary-format raw-in-base64-out \
invoke-model-output.txtTo list available models:
aws bedrock list-foundation-modelsTo create a guardrail:
aws bedrock create-guardrail \
--name my-guardrail \
--blocked-input-messaging "Input blocked" \
--blocked-outputs-messaging "Output blocked" \
--content-policy-config '{"filtersConfig":[{"type":"SEXUAL","inputStrength":"HIGH","outputStrength":"HIGH"}]}'Interaction with Related Technologies
AWS Lambda: Bedrock Agents can invoke Lambda functions as part of an action group. This allows the agent to perform actions like querying a database or sending an email.
Amazon S3: Used to store training data for fine-tuning and documents for knowledge bases.
Amazon OpenSearch Serverless: Default vector store for knowledge bases. Bedrock automatically creates and manages the index.
AWS KMS: For encryption of data at rest (e.g., fine-tuned models, knowledge base data).
CloudWatch: For logging and monitoring inference requests.
Exam-Relevant Details
Bedrock is serverless; you do not manage instances.
Fine-tuning is asynchronous; you check status via GetModelCustomizationJob.
Provisioned throughput is regional and model-specific.
Guardrails are evaluated synchronously and can block input or output.
Knowledge bases use Amazon Titan Embeddings by default for vectorization.
Bedrock supports streaming responses via InvokeModelWithResponseStream.
The maximum input token limit varies by model (e.g., Claude v2 supports up to 100k tokens).
Select a Foundation Model
The developer starts by browsing the Bedrock model catalog via the AWS Console, CLI, or SDK. They select a model based on task requirements (text generation, image generation, embeddings, etc.), pricing, and latency. Each model has a unique `modelId` (e.g., `anthropic.claude-instant-v1`). The developer must ensure the model is available in their AWS region. For the exam, remember that not all models are in every region; check `list-foundation-models` for availability.
Configure Inference Parameters
The developer sets parameters like `temperature`, `topP`, `maxTokens`, and `stopSequences` in the request body. These control the randomness and length of the output. For example, a low temperature (0.1) makes output more deterministic, suitable for factual Q&A; a high temperature (0.9) produces more creative responses. The exam may ask which parameter to adjust to reduce repetitiveness (increase temperature) or to make output more focused (lower temperature or use `topP`).
Invoke the Model
The developer calls the `InvokeModel` API (or `InvokeModelWithResponseStream` for streaming). The request includes `modelId`, `body` (the prompt and parameters), and optional headers like `contentType`. Bedrock routes the request to the appropriate model, performs inference, and returns the response. The response includes the generated text and metadata like `usage` (input and output token counts). CloudWatch logs the invocation for monitoring.
Fine-Tune the Model (Optional)
If the base model does not meet accuracy requirements, the developer can fine-tune it on custom data. They prepare a JSONL file with prompt-completion pairs, upload it to S3, and start a `CreateModelCustomizationJob`. The job trains a new model version. The developer specifies the base model, training data location, output S3 bucket, and optional hyperparameters (e.g., learning rate, epoch count). The job is asynchronous; status can be checked via `GetModelCustomizationJob`. Once complete, the custom model is available for inference.
Set Up a Knowledge Base (RAG)
For retrieval-augmented generation, the developer creates a knowledge base. They specify an S3 bucket with documents (PDF, HTML, TXT, etc.), a vector store (default Amazon OpenSearch Serverless), and an embedding model (default Amazon Titan Embeddings). Bedrock automatically chunks documents (default chunk size ~300 tokens), generates embeddings, and indexes them. The developer can then use the `RetrieveAndGenerate` API to query the knowledge base: the user query is embedded, relevant chunks are retrieved, and the FM generates an answer grounded in those chunks.
Enterprise Scenario 1: Customer Support Chatbot
A large e-commerce company wants to build a chatbot that answers customer queries about orders, returns, and policies. They use Bedrock with Anthropic Claude v2 as the base model. They fine-tune Claude on historical support tickets to improve accuracy on domain-specific questions. They also create a knowledge base from their help center articles (S3 bucket) so the bot can ground responses in official documentation. The agent is configured with an action group that calls a Lambda function to look up order status from a DynamoDB table. In production, the chatbot handles thousands of concurrent requests. Provisioned throughput is purchased for the fine-tuned model to ensure consistent latency (e.g., 100 model units). Common misconfigurations: not setting appropriate stopSequences (the bot may generate extra text), or not monitoring guardrails (the bot may produce inappropriate responses). The exam might ask how to reduce hallucination: use RAG with a knowledge base.
Enterprise Scenario 2: Content Generation for Marketing
A media company uses Bedrock to generate social media posts, ad copy, and blog drafts. They use Stability AI's Stable Diffusion XL for image generation and AI21 Labs' Jurassic-2 for text. They fine-tune Jurassic-2 on their brand voice guidelines. To ensure brand consistency, they set temperature low (0.2) for factual copy and higher (0.8) for creative ideas. They use guardrails to block offensive content and restrict topics to approved categories. The system integrates with a CI/CD pipeline: new copy is generated, reviewed by a human, and published. Performance considerations: image generation is compute-intensive; using on-demand inference may cause variable latency. They use provisioned throughput for text generation during peak hours. A common mistake is not handling streaming responses properly—they use InvokeModelWithResponseStream to show partial results to the user.
Enterprise Scenario 3: Document Summarization for Legal
A law firm uses Bedrock to summarize lengthy contracts. They use Amazon Titan Text Express for summarization. They fine-tune Titan on legal documents to handle jargon. They also use a knowledge base with past case summaries. The agent is configured to extract key clauses and present them in a structured format. Security is critical: data must not leave the AWS environment. Bedrock's data privacy guarantees (no training on customer data) and KMS encryption are essential. They also use AWS PrivateLink to keep traffic within the VPC. A common issue is hitting token limits: contracts may exceed the model's context window. They pre-chunk documents and summarize each chunk, then combine results. The exam may test that Bedrock does not use your data for model improvement unless you opt in.
What DVA-C02 Tests on Bedrock
The exam focuses on practical integration of Bedrock into applications, not on the underlying ML theory. Key areas:
- Model Invocation: How to call InvokeModel and InvokeModelWithResponseStream. Know the required parameters: modelId and body. Understand that body format varies by model.
- Fine-tuning vs. Prompt Engineering: The exam tests when to fine-tune (need to improve accuracy on domain-specific tasks) vs. use prompt engineering (simpler, faster). Fine-tuning is for when prompt engineering isn't enough.
- Knowledge Bases (RAG): Understand that RAG reduces hallucinations by grounding responses in retrieved documents. Know the default embedding model (Amazon Titan Embeddings) and vector store (Amazon OpenSearch Serverless).
- Guardrails: Know that guardrails can filter content based on categories (hate, sexual, violence, etc.) and can block input or output. They are evaluated synchronously.
- Agents: Understand that agents can orchestrate multi-step tasks and call Lambda functions. They use a prompt template.
- Security: Data is encrypted at rest and in transit. Bedrock does not use customer data for training (unless opted in).
Common Wrong Answers
Using SageMaker instead of Bedrock: Candidates might choose SageMaker for deploying a pre-trained model, but SageMaker requires managing instances. Bedrock is serverless and easier for simply invoking FMs.
Confusing fine-tuning with training from scratch: Fine-tuning starts from a base model; training from scratch requires massive data and compute. The exam tests that fine-tuning is for customization, not training a new model.
Believing Bedrock trains on your data: AWS explicitly states Bedrock does not use customer data to improve models unless you opt in. This is a common trap.
Not knowing the difference between on-demand and provisioned throughput: On-demand is pay-per-call; provisioned is reserved capacity for consistent performance.
Specific Numbers and Terms
modelId examples: anthropic.claude-v2, amazon.titan-text-express-v1, stability.stable-diffusion-xl-v0.
Default chunk size for knowledge bases: ~300 tokens.
Maximum input tokens for Claude v2: 100k.
Guardrail content filter strengths: NONE, LOW, MEDIUM, HIGH.
Fine-tuning job statuses: InProgress, Completed, Failed, Stopped.
Edge Cases
Some models do not support fine-tuning (e.g., Stable Diffusion). The exam may ask which models can be fine-tuned.
Knowledge bases require the S3 bucket to be in the same region as Bedrock.
Guardrails can be applied at the model invocation level or at the agent level.
How to Eliminate Wrong Answers
If the question asks for a serverless way to use foundation models, eliminate SageMaker (which is not serverless).
If the question involves retrieving external data to augment prompts, the answer is likely a knowledge base (RAG).
If the question is about content filtering, look for guardrails.
If the question involves multi-step reasoning and API calls, consider agents.
Bedrock is a serverless service for accessing foundation models via API – no infrastructure management.
Use `InvokeModel` for synchronous inference and `InvokeModelWithResponseStream` for streaming responses.
Fine-tuning customizes a base model with your data; it is asynchronous and uses `CreateModelCustomizationJob`.
Knowledge bases provide RAG by chunking documents, embedding them, and storing in Amazon OpenSearch Serverless.
Guardrails filter content based on categories (hate, sexual, violence, etc.) with configurable strength levels.
Agents can orchestrate multi-step tasks and call Lambda functions via action groups.
Bedrock does not use your data to improve its models unless you opt in.
Model availability varies by region; always check with `list-foundation-models`.
Provisioned throughput provides reserved capacity for consistent performance; on-demand is pay-per-call.
The default embedding model for knowledge bases is Amazon Titan Embeddings.
These come up on the exam all the time. Here's how to tell them apart.
Amazon Bedrock
Serverless – no infrastructure management.
Provides pre-trained foundation models via API.
Fine-tuning only on supported base models.
Pay-per-inference or provisioned throughput.
Best for integrating generative AI quickly.
Amazon SageMaker
Requires managing instances or endpoints.
Can train models from scratch or deploy any model.
Full control over training and deployment.
Pay for compute and storage resources.
Best for custom ML workflows and training.
Bedrock On-Demand
No upfront commitment; pay per token.
May have variable latency during peak times.
Suitable for low-volume or bursty workloads.
Automatic scaling, but subject to availability.
No model units to manage.
Bedrock Provisioned Throughput
Reserved capacity in model units (hourly billing).
Consistent low latency under load.
Suitable for high-volume production workloads.
Must specify number of model units.
Can be used for custom models and fine-tuned models.
Mistake
Bedrock trains foundation models from scratch on customer data.
Correct
Bedrock provides access to pre-trained models. It does not train models from scratch. Fine-tuning adapts a base model using customer data, but the base model remains unchanged. Bedrock does not use customer data for training its own models unless the customer explicitly opts in.
Mistake
You must provision EC2 instances to use Bedrock.
Correct
Bedrock is serverless. You do not manage any underlying infrastructure. Provisioned throughput is an option for guaranteed capacity, but it is still managed by AWS.
Mistake
All foundation models are available in every AWS region.
Correct
Model availability varies by region. For example, some models may only be available in us-east-1 or us-west-2. Always check the model catalog for regional availability.
Mistake
Bedrock Agents can only use Bedrock models.
Correct
Agents can invoke Lambda functions and use knowledge bases, which can interact with any AWS service. They are not limited to Bedrock models.
Mistake
Knowledge bases require you to manage the vector database yourself.
Correct
Bedrock integrates with Amazon OpenSearch Serverless, which is a fully managed service. Bedrock automatically creates and manages the vector index. You do not need to provision or scale the database.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Bedrock is serverless and provides immediate access to a curated set of foundation models via a single API. You do not manage any infrastructure. SageMaker requires you to deploy a model on an endpoint (EC2 instances) and manage scaling, patching, etc. For the exam, if the question asks for the easiest way to integrate a generative AI model without managing servers, choose Bedrock. If you need to deploy a custom model not available in Bedrock, choose SageMaker.
Yes, Bedrock includes image generation models, most notably Stability AI's Stable Diffusion XL (modelId: `stability.stable-diffusion-xl-v0`). You invoke it with a text prompt and parameters like `cfg_scale` (prompt adherence) and `steps` (quality). The response is a base64-encoded image. Other image models may be added over time.
Bedrock encrypts data in transit (TLS) and at rest using AWS KMS. AWS does not use customer data to improve the base models unless the customer explicitly opts in (via a separate setting). Additionally, you can use AWS PrivateLink to keep traffic within your VPC. For compliance, Bedrock integrates with CloudTrail for auditing API calls.
It varies by model. For example, Anthropic Claude v2 supports up to 100,000 tokens (about 75,000 words). Amazon Titan Text Express supports up to 8,000 tokens. Always check the model's documentation. If your input exceeds the limit, you must truncate or chunk it.
Bedrock integrates with Amazon CloudWatch, which provides metrics like `Invocations`, `Latency`, `ThrottledCount`, and `TokenCount`. You can set alarms. Costs are tracked via AWS Cost Explorer, with separate line items for on-demand inference, provisioned throughput, and storage (for custom models).
Yes, you can use AWS PrivateLink to create a VPC endpoint for Bedrock. This allows you to invoke models without traffic leaving the AWS network. However, the models themselves are hosted by AWS and accessed via the endpoint. There is no option to run Bedrock models entirely offline on-premises.
A guardrail is a content filter that blocks harmful or unwanted inputs/outputs based on predefined policies (e.g., hate speech, violence). A knowledge base is a RAG system that provides the model with external information to ground responses. They serve different purposes: guardrails enforce safety; knowledge bases improve accuracy.
You've just covered Amazon Bedrock for AI Developers — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?