CLF-C02Chapter 19 of 130Objective 3.3

AWS Database Services

This chapter covers AWS database services, a core part of the Cloud Technology Services domain. On the CLF-C02 exam, this objective (3.3) represents approximately 12-15% of the questions. You will need to differentiate between relational and non-relational databases, understand the purpose of each major AWS database service, and know when to use them. We'll explore Amazon RDS, Amazon DynamoDB, Amazon Redshift, Amazon ElastiCache, Amazon Neptune, and Amazon DocumentDB, with a focus on exam-relevant details.

25 min read
Beginner
Updated May 31, 2026

Database as a Filing Cabinet System

Imagine you run a busy office with thousands of customer files. Your filing cabinets are like traditional databases. You have two main ways to organize them: relational and non-relational. A relational database is like a set of cabinets with predefined folders and cross-reference cards. Each folder has a specific structure: customer name, address, order history. To find all customers who ordered a specific product, you use cross-reference cards (indexes) to quickly link orders to customers. This works great when your data is highly structured and you need complex queries. But if you start adding new types of data, like social media posts or sensor readings, you have to redesign the folders and cross-reference system, which is slow and rigid. A non-relational database is more like a set of boxes where each box can hold different types of items. You can label each item with tags (key-value pairs) and search for items by tags. This is faster for simple lookups and scales easily because you can just add more boxes. However, complex queries across multiple boxes are harder because there are no built-in cross-references. AWS offers both types: Amazon RDS for relational, and DynamoDB for non-relational. RDS manages the filing cabinet for you—handles backups, patching, and replication. DynamoDB gives you a box system that automatically scales to millions of items per second. The exam tests your understanding of when to use each based on data structure and query patterns.

How It Actually Works

What Are AWS Database Services and Why Do You Need Them?

Databases store and manage data. In traditional on-premises environments, you install database software on servers, configure storage, set up backups, and manage patching—all yourself. AWS database services are managed services that offload these administrative tasks to AWS. You get a database endpoint and credentials, and AWS handles the underlying infrastructure. This allows you to focus on application development rather than database administration.

The CLF-C02 exam expects you to know the main categories of databases: relational (SQL) and non-relational (NoSQL). Within relational, Amazon RDS is the primary service. For non-relational, Amazon DynamoDB is the key service. There are also specialized services: Amazon Redshift for data warehousing, Amazon ElastiCache for caching, Amazon Neptune for graph databases, and Amazon DocumentDB for MongoDB-compatible document databases.

How AWS Database Services Work

All AWS database services are launched as database instances (for RDS) or tables (for DynamoDB). You define the configuration via the AWS Management Console, CLI, or SDK. Behind the scenes, AWS provisions compute and storage resources, installs the database engine, sets up networking, and enables monitoring.

Amazon RDS supports multiple database engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora. Aurora is a MySQL/PostgreSQL-compatible engine designed by AWS for higher performance and availability. RDS automates: - Backups: Automated daily backups with point-in-time recovery up to 35 days. - Patching: Automatic minor version updates. - Replication: Multi-AZ deployments for high availability (synchronous standby replica in another Availability Zone). - Scaling: You can scale compute (CPU/RAM) and storage independently, but with some downtime for compute changes.

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It is fully managed and serverless—you don't provision servers. You create tables, define a primary key, and optionally secondary indexes. DynamoDB automatically distributes data across partitions based on the partition key. It supports: - On-demand capacity: Pay per request, ideal for unpredictable workloads. - Provisioned capacity: Specify read/write capacity units, with auto-scaling options. - Global tables: Multi-region replication for low-latency access worldwide. - DAX (DynamoDB Accelerator): In-memory cache for microsecond read latency.

Amazon Redshift is a petabyte-scale data warehouse. It uses columnar storage and massively parallel processing (MPP) to run complex analytical queries. It is not for transactional workloads. Redshift clusters consist of a leader node (coordinates queries) and compute nodes (store and process data). You can pause, resize, and use concurrency scaling.

Amazon ElastiCache provides managed Redis or Memcached. It is an in-memory cache that sits in front of a database (like RDS) to reduce read latency and offload read traffic. It is not a primary database; it stores temporary data that can be regenerated.

Amazon Neptune is a graph database for highly connected data like social networks, recommendation engines, and fraud detection. It supports property graph and RDF models.

Amazon DocumentDB is a MongoDB-compatible document database. It is useful when you need to migrate MongoDB workloads to AWS without changing code.

Key Tiers, Configurations, and Pricing Models

RDS: - Instance types: Standard (e.g., db.m5, db.t3), Memory Optimized (e.g., db.r5), Burstable (db.t3 - for dev/test). - Storage: General Purpose SSD (gp2/gp3), Provisioned IOPS (io1/io2), Magnetic (deprecated). - Multi-AZ: Synchronous standby replica in another AZ for high availability. - Read Replicas: Asynchronous copies for read scaling (up to 15 for Aurora, 5 for others). - Pricing: Pay per instance hour + storage + I/O. Reserved instances offer discounts.

DynamoDB: - Capacity modes: On-demand (pay per request) or Provisioned (pay for reserved capacity). - Read/Write Capacity Units: 1 RCU = 1 strongly consistent read per second for items up to 4 KB; 1 WCU = 1 write per second for items up to 1 KB. - DAX: Additional cost for in-memory cache nodes. - Global tables: Replication cost across regions.

Redshift: - Node types: Dense Compute (dc2) for high performance, Dense Storage (ds2) for large data volumes, RA3 with managed storage. - Pricing: Per node hour plus storage. Reserved instances available. - Concurrency Scaling: Additional clusters for burst capacity.

ElastiCache: - Node types: Similar to EC2 instance types (e.g., cache.m5, cache.r5). - Pricing: Per node hour.

Comparison to On-Premises

On-premises: You buy servers, install database software, configure storage, manage backups, apply patches, set up replication, and monitor performance. This requires DBA expertise and capital expenditure.

AWS Managed: You choose a service and configuration. AWS handles OS patching, database patching, backup automation, hardware maintenance, and replication (if Multi-AZ). You pay as you go. The trade-off is less control over the underlying environment (e.g., you cannot modify OS parameters).

When to Use Each Service

RDS: Use for traditional applications that require relational data with ACID transactions, complex joins, and structured schemas. Examples: ERP, CRM, e-commerce platforms.

DynamoDB: Use for applications that need high scalability, low latency, and flexible schemas. Ideal for serverless, IoT, gaming, ad-tech, and real-time bidding.

Redshift: Use for business intelligence, analytics, and reporting on large datasets. Not for OLTP.

ElastiCache: Use to speed up read-heavy workloads, session stores, or as a cache for database query results.

Neptune: Use for social networks, fraud detection, knowledge graphs.

DocumentDB: Use when migrating MongoDB workloads or building applications that need a document database with MongoDB compatibility.

Exam Trap: RDS vs. DynamoDB

A common exam question asks: "Which AWS service is best for storing user session data?" Many candidates choose RDS because they think relational is always better. But session data is typically key-value (session ID -> session state), requires low latency, and does not need complex queries. DynamoDB is the better choice. Conversely, for a financial ledger with strict consistency and joins, RDS is correct.

Code Example: Creating a DynamoDB Table via AWS CLI

aws dynamodb create-table \
    --table-name Users \
    --attribute-definitions AttributeName=UserId,AttributeType=S \
    --key-schema AttributeName=UserId,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

This creates a table with a partition key (UserId) and on-demand billing. No servers to manage.

Aurora Serverless

Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora. It starts up, shuts down, and scales based on application demand. It is useful for intermittent or unpredictable workloads. Behind the scenes, AWS uses a fleet of instances that are pooled. You pay only for the capacity consumed (ACU - Aurora Capacity Units).

Walk-Through

1

Launch an RDS Instance

Navigate to the RDS console in AWS Management Console. Click 'Create database'. Choose a database engine (e.g., MySQL). Select a template: Production (Multi-AZ, Provisioned IOPS) or Dev/Test (single-AZ, burstable). Specify DB instance identifier, master username, and password. Choose instance class (e.g., db.t3.micro for free tier) and storage type (General Purpose SSD). Configure connectivity: VPC, subnet group, public accessibility (usually no). Set backup retention period (0-35 days). Optionally enable Multi-AZ and read replicas. Click 'Create database'. AWS provisions the instance, installs the engine, and creates a DB endpoint. This process takes several minutes. Behind the scenes, AWS allocates EC2 instances, attaches EBS volumes, and configures security groups.

2

Configure DynamoDB Table

In DynamoDB console, click 'Create table'. Enter table name and primary key (partition key, optionally sort key). Choose data type for each key (string, binary, number). Select capacity mode: On-demand (pay per request) or Provisioned (set read/write capacity units). Optionally add secondary indexes (LSI or GSI). Enable auto-scaling if provisioned. Optionally enable DynamoDB Streams to capture changes. Click 'Create'. DynamoDB automatically partitions data across multiple storage nodes. Behind the scenes, AWS allocates throughput capacity and storage resources. The table is immediately available for reads and writes, with eventual consistency by default.

3

Set Up ElastiCache Cluster

In ElastiCache console, choose Redis or Memcached. Click 'Create'. Select cluster mode enabled or disabled. Specify cluster name, node type (e.g., cache.t3.micro), number of replicas (for Redis), and subnet group. Configure security: encryption in transit and at rest (optional). Set backup window and retention. Click 'Create'. AWS provisions EC2 instances with the caching engine. The cluster endpoint is provided for application use. Behind the scenes, ElastiCache sets up replication (for Redis) and sharding (if cluster mode). Memcached does not support replication.

4

Query DynamoDB with PartiQL

PartiQL is a SQL-compatible query language for DynamoDB. In the DynamoDB console, go to 'PartiQL editor'. Type a query: `SELECT * FROM Users WHERE UserId = '123'`. Click 'Run'. DynamoDB translates PartiQL into internal API calls. Behind the scenes, DynamoDB uses the partition key to locate the item in the correct partition. If no partition key is specified, it performs a scan (inefficient). PartiQL supports INSERT, UPDATE, DELETE, and SELECT. It is useful for users familiar with SQL but note that DynamoDB does not support joins.

5

Create a Redshift Cluster

In Redshift console, click 'Create cluster'. Provide cluster identifier, choose node type (e.g., dc2.large), number of compute nodes (start with 2 for free trial). Set master user credentials. Configure networking: VPC, subnet, security group. Optionally enable enhanced VPC routing, audit logging, and automatic snapshots (retention up to 35 days). Click 'Create cluster'. AWS provisions a leader node and compute nodes. The leader node handles query coordination and result aggregation. Compute nodes store data and execute query fragments. The cluster takes 10-20 minutes to become available. You can then connect using SQL clients like psql or Amazon QuickSight.

What This Looks Like on the Job

Scenario 1: E-Commerce Platform with RDS and ElastiCache

An online retailer uses Amazon RDS with MySQL to store customer accounts, product inventory, and order history. The database handles complex queries like "show all orders from last month with total over $100". As traffic grows, read-heavy pages (product details) slow down. The team adds an ElastiCache Redis cluster in front of RDS. Product details are cached with a TTL of 5 minutes. This reduces database load by 70%. Cost: RDS instance (db.r5.large, ~$200/month), ElastiCache (cache.r5.large, ~$150/month). Without caching, they would need larger RDS instances or read replicas, costing more. Misconfiguration: Setting TTL too long causes stale prices; too short reduces cache hit rate. The team monitors cache hit ratio via CloudWatch and adjusts TTL accordingly.

Scenario 2: Gaming Leaderboard with DynamoDB

A mobile game stores player scores in DynamoDB. The table has partition key = game_id, sort key = player_id. Scores are updated frequently. The leaderboard query is: "top 100 players by score". To support this, the team creates a Global Secondary Index (GSI) on score attribute with descending order. DynamoDB automatically maintains the index. The game uses on-demand capacity because traffic spikes during tournaments. Cost: During normal load, ~$50/month; during tournament, ~$500/month due to high writes. Without GSI, scanning the entire table would be slow and expensive. Misconfiguration: Choosing provisioned capacity with insufficient WCUs causes throttling and failed writes. The team sets CloudWatch alarms on ThrottledRequests.

Scenario 3: Data Warehouse for Analytics with Redshift

A marketing analytics company ingests billions of ad impressions daily into Amazon Redshift. They use a ds2.xlarge cluster with 10 nodes. Data is loaded via COPY command from S3. Analysts run complex queries across dimensions (user, campaign, time). Redshift's columnar storage and compression reduce storage costs. The team uses workload management (WLM) to prioritize queries. Cost: ~$2,000/month for cluster, plus S3 storage. Misconfiguration: Not distributing data properly (e.g., using EVEN distribution for a table joined on a key) leads to data skew and slow queries. The team uses KEY distribution on join keys. Also, forgetting to vacuum and analyze after large deletes degrades performance.

How CLF-C02 Actually Tests This

Exactly What CLF-C02 Tests on Objective 3.3 (AWS Database Services)

The exam expects you to:

Differentiate between relational and non-relational databases.

Identify the appropriate AWS service for a given use case.

Understand the basic features of Amazon RDS, DynamoDB, Redshift, ElastiCache, Neptune, and DocumentDB.

Know the difference between RDS Multi-AZ (high availability) and Read Replicas (performance).

Recognize that Aurora is a MySQL/PostgreSQL-compatible relational database with better performance and availability.

Understand that DynamoDB is serverless and scales automatically.

Know that Redshift is for analytics/data warehousing, not OLTP.

Understand that ElastiCache is for caching, not as a primary database.

Common Wrong Answers and Why

1.

Wrong: Use RDS for a gaming leaderboard with high write throughput. Why: Candidates think relational is always best. But DynamoDB handles high write throughput and provides low latency. RDS would need sharding and is harder to scale.

2.

Wrong: Use DynamoDB for an ERP system with complex joins. Why: DynamoDB does not support joins. ERP typically requires relational integrity. RDS is correct.

3.

Wrong: Use Redshift for an OLTP application. Why: Redshift is optimized for batch analytics, not transactional workloads. RDS or DynamoDB are better.

4.

Wrong: Multi-AZ RDS improves read performance. Why: Multi-AZ is for high availability only. Read replicas improve read performance.

Specific Terms and Values

RDS: Supports MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora.

Aurora: MySQL-compatible (5x throughput) and PostgreSQL-compatible (3x throughput).

DynamoDB: Single-digit millisecond latency, supports key-value and document.

Redshift: Columnar storage, MPP, petabyte-scale.

ElastiCache: Supports Redis and Memcached.

Neptune: Graph database.

DocumentDB: MongoDB-compatible.

Tricky Distinctions

RDS Multi-AZ vs. Read Replicas: Multi-AZ provides a standby in another AZ for failover; not used for reads. Read Replicas are for read scaling and can be in a different region.

DynamoDB On-Demand vs. Provisioned: On-demand is for unpredictable workloads; provisioned with auto-scaling is for predictable.

Aurora vs. RDS MySQL: Aurora is a separate engine with better performance and automatic storage scaling (up to 128 TB).

Decision Rule for Multiple Choice

When a question asks "Which AWS database service is best for...?" follow this elimination strategy: 1. If the workload requires complex queries, joins, or transactions → relational (RDS or Aurora). 2. If the workload requires high scalability, low latency, or flexible schema → DynamoDB. 3. If the workload is analytics/reporting on large data → Redshift. 4. If the workload is caching → ElastiCache. 5. If the workload involves highly connected data → Neptune. 6. If the workload needs MongoDB compatibility → DocumentDB.

Key Takeaways

Amazon RDS is a managed relational database service supporting six engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Aurora.

Amazon DynamoDB is a fully managed NoSQL key-value and document database that delivers single-digit millisecond latency at any scale.

Amazon Redshift is a petabyte-scale data warehouse using columnar storage and MPP, ideal for analytical queries.

Amazon ElastiCache provides managed Redis or Memcached for in-memory caching, not as a primary database.

RDS Multi-AZ provides high availability via a synchronous standby replica; Read Replicas provide read scaling via asynchronous replication.

Aurora is a MySQL/PostgreSQL-compatible relational database with 5x throughput, automatic scaling, and 6 copies across 3 AZs.

DynamoDB supports two capacity modes: On-demand (pay per request) and Provisioned (reserved capacity with auto-scaling).

For the exam, remember: OLTP + relational = RDS/Aurora; OLTP + high scale = DynamoDB; OLAP = Redshift; caching = ElastiCache; graph = Neptune; MongoDB = DocumentDB.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Amazon RDS

Relational database with schema enforcement

Supports complex queries with joins

Managed service with Multi-AZ and read replicas

Best for OLTP with moderate scalability needs

Pricing based on instance hours and storage

Amazon DynamoDB

Non-relational key-value and document database

No joins; queries based on primary key or indexes

Serverless, auto-scales to millions of requests per second

Best for high-scale, low-latency applications

Pricing based on read/write capacity units or on-demand

Watch Out for These

Mistake

Amazon RDS is a single database engine.

Correct

RDS is a managed service that supports multiple database engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora. You choose the engine when launching.

Mistake

DynamoDB cannot handle complex queries.

Correct

DynamoDB supports PartiQL (SQL-compatible), secondary indexes, and transactions. However, it does not support joins. Complex queries can be implemented using application-side joins or streams.

Mistake

Redshift is suitable for transactional (OLTP) workloads.

Correct

Redshift is optimized for online analytical processing (OLAP) and data warehousing. It is not designed for high-frequency inserts/updates typical of OLTP. Use RDS or DynamoDB for OLTP.

Mistake

ElastiCache can be used as a primary database.

Correct

ElastiCache is an in-memory cache, not a durable database. Data is volatile and can be lost on node failure if not persisted (Redis supports persistence but is still not a primary store). Use it to speed up access to data stored in a durable database like RDS or DynamoDB.

Mistake

Aurora and RDS MySQL are the same.

Correct

Aurora is a separate engine that is MySQL- and PostgreSQL-compatible. It offers better performance (up to 5x throughput for MySQL), automatic storage scaling (up to 128 TB), and higher availability (6 copies across 3 AZs). RDS MySQL is standard MySQL on EC2.

Frequently Asked Questions

What is the difference between RDS Multi-AZ and Read Replicas?

Multi-AZ creates a synchronous standby replica in a different Availability Zone for high availability. If the primary fails, traffic automatically fails over to the standby. Read Replicas are asynchronous copies used to offload read traffic; they can be in the same or different region. Multi-AZ does not improve read performance; Read Replicas do. Exam tip: If the question mentions 'high availability' or 'disaster recovery', think Multi-AZ. If it mentions 'read scaling' or 'reducing read latency', think Read Replicas.

When should I use DynamoDB vs. RDS?

Use DynamoDB when you need high scalability, low latency (single-digit milliseconds), and a flexible schema. It's ideal for serverless, IoT, gaming, and real-time applications. Use RDS when you need complex queries, joins, transactions (ACID), and a fixed schema. For example, an e-commerce order system with multiple related tables is better on RDS. Exam tip: If the scenario mentions 'key-value' or 'document', think DynamoDB. If it mentions 'SQL queries' or 'joins', think RDS.

What is Amazon Aurora and how is it different from RDS MySQL?

Amazon Aurora is a MySQL- and PostgreSQL-compatible relational database built for the cloud. It offers up to 5x throughput of standard MySQL and 3x of PostgreSQL. Aurora automatically scales storage from 10 GB to 128 TB, replicates six copies of data across three Availability Zones, and has built-in failover in seconds. RDS MySQL is standard MySQL on EC2 with managed features. Aurora is more expensive but provides better performance and availability. Exam tip: If the question mentions 'high performance' or 'MySQL compatible with better throughput', think Aurora.

Can I use ElastiCache as a primary database?

No. ElastiCache is an in-memory cache, not a durable database. Data stored in ElastiCache is volatile; if the cache node fails, data is lost (unless Redis persistence is enabled, but still not recommended as primary). Use ElastiCache to cache data from a durable database like RDS or DynamoDB to improve read performance. Exam tip: If a question says 'store user session data', ElastiCache is a good choice because session data is temporary and can be regenerated.

What is the difference between Amazon Redshift and Amazon RDS?

Redshift is a data warehouse for analytical queries (OLAP) on large datasets. It uses columnar storage and MPP for fast query performance. RDS is a transactional database (OLTP) for operational workloads. Redshift is not designed for high-frequency inserts/updates; RDS is. Exam tip: If the scenario involves 'business intelligence', 'reporting', or 'large-scale analytics', choose Redshift. If it involves 'customer orders' or 'inventory', choose RDS.

What is DynamoDB Accelerator (DAX)?

DAX is an in-memory cache for DynamoDB that delivers microsecond read latency. It sits between your application and DynamoDB. When you read an item, DAX checks its cache first; if found, it returns the cached result. If not, it queries DynamoDB and caches the result. DAX is useful for read-heavy workloads that require extremely low latency. Exam tip: If the question mentions 'microsecond latency' for DynamoDB, think DAX.

What is the difference between Amazon DocumentDB and DynamoDB?

Amazon DocumentDB is a MongoDB-compatible document database that supports MongoDB APIs and drivers. DynamoDB is a key-value and document database with its own API. DocumentDB is better if you are migrating a MongoDB application or need MongoDB-specific features like aggregation pipelines. DynamoDB is better for high-scale, low-latency key-value workloads. Exam tip: If the question mentions 'MongoDB compatibility', choose DocumentDB.

Terms Worth Knowing

Ready to put this to the test?

You've just covered AWS Database Services — now see how well it sticks with free CLF-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?