This chapter covers Amazon DocumentDB, a fully managed, MongoDB-compatible document database service. For the DVA-C02 exam, DocumentDB appears in questions about migrating MongoDB workloads to AWS, choosing the right database for JSON-like documents, and understanding managed services vs. self-managed EC2 deployments. Approximately 5-8% of exam questions touch on DocumentDB, often comparing it with DynamoDB or self-managed MongoDB on EC2. You will learn its architecture, key features, migration considerations, and how it differs from native MongoDB.
Jump to a section
Imagine you need to store and query thousands of books in a library. With a self-managed MongoDB library, you must build the shelves, organize the catalog, handle checkouts, and fix broken shelves yourself. Amazon DocumentDB is like a fully managed library service: you bring your books (JSON-like documents) and your catalog (indexes), and AWS provides the building, security, backups, and scaling. Specifically, DocumentDB’s storage is like a set of automated book stacks that replicate every book to three different rooms (Availability Zones). When you query for a book by title, the library’s catalog (in-memory cache) quickly points you to the right shelf. If the catalog doesn’t have the book, the system checks the actual stacks (storage). The library also automatically makes nightly copies of the entire collection (backups) and can instantly restore a book from a previous version (point-in-time recovery). As more readers come, the library can add more reading rooms (read replicas) without closing. This managed approach eliminates the need to hire librarians (DBAs) for routine tasks, exactly as DocumentDB reduces operational overhead for MongoDB workloads.
What is Amazon DocumentDB?
Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, fully managed document database service that supports MongoDB workloads. It is designed to be compatible with MongoDB 3.6, 4.0, and 5.0 APIs, meaning you can use existing MongoDB drivers, applications, and tools with minimal changes. DocumentDB is not a fork of MongoDB; it implements the MongoDB wire protocol and uses a distributed, fault-tolerant, self-healing storage system that replicates data across three Availability Zones (AZs) in a single AWS Region.
Why DocumentDB Exists
Before DocumentDB, customers who wanted a managed MongoDB experience on AWS had to either run MongoDB on EC2 (self-managed) or use MongoDB Atlas (a third-party service). Self-managing MongoDB on EC2 requires significant operational overhead: provisioning EC2 instances, setting up replication, managing backups, handling failovers, scaling storage, and patching the database software. DocumentDB eliminates this by providing a fully managed service with built-in high availability, automated backups, and seamless scaling. The exam tests whether you can identify scenarios where DocumentDB is the right choice: when you need MongoDB compatibility but want to offload database administration to AWS.
How It Works Internally
DocumentDB separates compute (database instances) from storage (a distributed storage volume). The storage layer is a 6-replica, 3-AZ distributed storage system. Data is written to 6 copies across 3 AZs, and reads can be served from any of the 6 copies. This provides durability (99.999999% durability) and high availability. The compute layer consists of one primary instance (for writes) and up to 15 read replicas. The primary writes to the storage layer, and replicas read from the same storage. This means replicas have very low replication lag (typically under 100 ms) because they don’t replay a write-ahead log; they simply read from the shared storage.
Key Components and Defaults
Instance classes: Memory-optimized (R5, R6g) and burstable (T3) – exam may ask which is suitable for production vs. dev/test.
Storage: Automatically scales from 10 GiB up to 64 TiB. No need to provision storage in advance; it grows as you write data. This is a key exam point: DocumentDB uses auto-scaling storage, unlike self-managed MongoDB where you must pre-allocate storage.
Backups: Automated backups are enabled by default with a 1-day retention (can be increased up to 35 days). Manual snapshots persist indefinitely.
Point-in-Time Recovery (PITR): Allows restoring to any second within the backup retention period, up to the last 5 minutes.
Encryption: At rest encryption is enabled using AWS KMS (default is AWS managed key, but you can use a customer managed key). In-transit encryption is supported using TLS.
Monitoring: Amazon CloudWatch metrics (CPU, memory, connections, read/write latency, etc.) and Amazon RDS Performance Insights (available for DocumentDB) – exam may ask about monitoring tools.
Configuration and Verification Commands
To create a DocumentDB cluster using the AWS CLI:
aws docdb create-db-cluster \
--db-cluster-identifier my-cluster \
--engine docdb \
--master-username myuser \
--master-user-password mypassword \
--vpc-security-group-ids sg-12345678Then create an instance:
aws docdb create-db-instance \
--db-instance-identifier my-instance \
--db-instance-class db.r5.large \
--db-cluster-identifier my-clusterTo verify the cluster status:
aws docdb describe-db-clusters --db-cluster-identifier my-clusterOutput includes endpoints, port (default 27017), cluster status, and more.
Interaction with Related Technologies
AWS DMS (Database Migration Service): Can migrate existing MongoDB databases to DocumentDB with near-zero downtime using ongoing replication.
AWS Lambda: Can connect to DocumentDB via MongoDB drivers to build serverless applications. Note: Lambda functions must be in the same VPC as the DocumentDB cluster, or use VPC peering/NAT.
AWS CloudFormation: Infrastructure as code for provisioning DocumentDB clusters.
Amazon CloudWatch: Logs, metrics, and alarms.
AWS KMS: For encryption key management.
Compatibility Limitations
DocumentDB is compatible with MongoDB 3.6, 4.0, and 5.0 APIs but does not support all MongoDB features. Exam-critical unsupported features include: - Change streams: Not supported in DocumentDB (use AWS DMS or custom solutions). - MapReduce: Not supported; use aggregation pipeline instead. - TTL indexes: Not supported (use a custom TTL mechanism). - Bulk write operations with ordered:true: Supported, but with some nuances. - Some aggregation operators: e.g., $lookup with uncorrelated subqueries may have limitations. - Sharding: DocumentDB does not support sharding; scaling is achieved by increasing instance size or using read replicas. This is a major exam distinction: DocumentDB is a single-cluster database, not a sharded cluster like MongoDB Atlas.
Pricing Model
DocumentDB charges for instance hours (compute), I/O operations, backup storage (1x the cluster storage is free for automated backups), and data transfer. The I/O cost can be significant for write-heavy workloads; the exam may ask about cost optimization (e.g., using larger instance classes to reduce I/O).
Create a DocumentDB Cluster
You start by creating a cluster using the AWS Management Console, CLI, or CloudFormation. You specify the engine as 'docdb', a master username and password, VPC security groups, and optionally a KMS key for encryption. The cluster is created with a primary endpoint and a reader endpoint. Storage is provisioned automatically as a 6-replica, 3-AZ distributed volume. The cluster status changes from 'creating' to 'available' (typically within a few minutes). There is no need to specify storage size; it starts at 10 GiB and auto-scales.
Add DB Instances to the Cluster
After the cluster is available, you create DB instances (compute nodes) that connect to the shared storage. The first instance becomes the primary (writer). You can add up to 15 read replicas. Each instance has its own endpoint. The reader endpoint load-balances across all read replicas. Instances can be of different classes (e.g., primary as db.r5.large, replicas as db.r5.xlarge) – but typically they are the same for balanced performance. The exam may ask about the maximum number of read replicas (15).
Connect to the Cluster
You connect using standard MongoDB connection strings. The primary endpoint is used for writes; the reader endpoint for read-only operations. Example connection string: 'mongodb://myuser:mypassword@my-cluster.cluster-xxxxxx.docdb.amazonaws.com:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0'. DocumentDB requires TLS/SSL by default. The 'replicaSet' parameter is 'rs0' (fixed). The exam may test that the port is 27017 and that SSL is mandatory.
Perform CRUD Operations
You can use standard MongoDB CRUD commands: insertOne, insertMany, find, updateOne, updateMany, deleteOne, deleteMany. DocumentDB supports indexes, including compound, sparse, unique, and text indexes (but not TTL indexes). The storage engine is similar to WiredTiger but optimized for cloud. Write operations are acknowledged when written to the storage layer (6 copies). Reads can be from the primary or any replica (eventual consistency by default; you can request read concern 'majority' for stronger consistency).
Monitor and Scale
You monitor the cluster using CloudWatch metrics (e.g., DatabaseConnections, ReadIOPS, WriteIOPS, CPUUtilization, FreeableMemory). To scale vertically, modify the instance class (requires a reboot). To scale horizontally, add read replicas. Storage scales automatically – no action needed. Automated backups occur daily during the preferred backup window. You can restore a cluster to any point within the retention period using the AWS CLI or console. If you delete the cluster, you can retain a final snapshot.
Enterprise Scenario 1: Migrating a MongoDB Application to AWS Managed Services
A financial services company runs a customer profile application on MongoDB 3.6 on EC2. They want to reduce operational overhead and improve availability. They choose DocumentDB because it is compatible with MongoDB 3.6 API and provides automated backups, multi-AZ replication, and patching. The migration uses AWS DMS: a full load plus ongoing replication from the source MongoDB to DocumentDB. During cutover, they update the application connection string to point to the DocumentDB cluster endpoint. The application works with minimal changes because the queries, indexes, and aggregation pipelines are compatible. Post-migration, they see reduced latency because DocumentDB’s storage is optimized for cloud. A key lesson: they had to remove TTL indexes before migration (unsupported) and implement a custom cleanup job. They also set up CloudWatch alarms for high CPU and connections to trigger scaling actions.
Enterprise Scenario 2: Building a Serverless Content Management System
A media company builds a serverless CMS using AWS Lambda and DocumentDB. They store articles as JSON documents with nested metadata. They choose DocumentDB over DynamoDB because they need rich querying (e.g., aggregation pipelines for analytics) and MongoDB’s document model. They deploy the DocumentDB cluster in a VPC with private subnets. Lambda functions are also in the same VPC (or use VPC endpoints) to connect. They use connection pooling (e.g., Mongoose with a connection pool size of 10) to avoid exhausting database connections. They set up read replicas to handle read-heavy traffic (article views). For write-heavy operations (content ingestion), they ensure the primary instance has enough IOPS. They learned that DocumentDB’s I/O cost can be high for frequent updates; they optimized by batching writes. They also use Performance Insights to identify slow queries and add appropriate indexes.
Common Pitfalls in Production
Connection limits: Each instance type has a maximum number of connections. Exceeding it causes connection failures. For example, db.r5.large supports up to 1,600 connections. Applications must use connection pooling.
Storage scaling: Although storage auto-scales, it cannot shrink. You can only delete and recreate the cluster to reduce storage.
Failover behavior: During a primary failure, DocumentDB automatically fails over to a read replica (promotes it). The failover typically completes in under 30 seconds. Applications should use the reader endpoint for reads and handle connection retries for writes.
Unsupported features: Attempting to use unsupported MongoDB features (e.g., change streams, TTL indexes, MapReduce) will result in errors. The exam often tests these limitations.
The DVA-C02 exam tests Amazon DocumentDB in the context of selecting appropriate AWS database services for application development (Objective 1.3: Determine the appropriate AWS data store for a given workload). Specific areas:
Compatibility and Limitations: You must know which MongoDB features are unsupported. The most tested unsupported features are: change streams, TTL indexes, MapReduce, and sharding. A common wrong answer is selecting DocumentDB when the scenario requires sharding (e.g., petabyte-scale data). The correct answer would be MongoDB Atlas or self-managed MongoDB on EC2 with sharding.
High Availability and Replication: DocumentDB replicates data across 3 AZs with 6 copies. Read replicas can be up to 15. The exam may ask about failover behavior (automatic, <30 seconds) and how to achieve read scaling (add read replicas). A trap: some candidates think DocumentDB uses synchronous replication like Amazon RDS Multi-AZ; actually, storage is shared, so replicas have very low lag.
Migration with AWS DMS: The exam may present a scenario of migrating MongoDB to DocumentDB. The correct approach is to use AWS DMS with ongoing replication (CDC). A wrong answer might suggest using AWS Database Migration Service with SCT (Schema Conversion Tool) – but SCT is for relational databases. DocumentDB is schema-less, so SCT is not needed.
Pricing and Scaling: DocumentDB charges for I/O operations. A scenario may ask how to reduce costs for a write-heavy workload. The correct answer is to use a larger instance class to reduce the number of I/O operations (since larger instances have more memory for caching). A wrong answer might suggest adding read replicas (which helps read scaling but not write I/O cost).
Connection Management: The exam may ask about best practices for connecting from Lambda. The correct answer: place Lambda in the same VPC as DocumentDB and use connection pooling. A wrong answer: using a NAT gateway or VPC peering (unnecessary if in same VPC).
Specific Values: Default port 27017, maximum read replicas 15, storage auto-scales from 10 GiB to 64 TiB, backup retention up to 35 days, point-in-time recovery to any second within retention.
Edge cases: DocumentDB does not support transactions across multiple documents (like MongoDB 4.0+). If the scenario requires multi-document ACID transactions, DocumentDB is not suitable. Also, DocumentDB does not support Geospatial indexes (2dsphere) – but this is less tested.
DocumentDB is compatible with MongoDB 3.6, 4.0, and 5.0 APIs but does not support change streams, TTL indexes, MapReduce, or sharding.
Storage auto-scales from 10 GiB to 64 TiB and is replicated across 3 AZs with 6 copies.
Maximum of 15 read replicas per cluster; use reader endpoint for load balancing.
Automated backups are enabled by default with 1-day retention (up to 35 days); point-in-time recovery allows restoring to any second within retention.
DocumentDB does not support multi-document transactions; use DynamoDB or self-managed MongoDB if needed.
Connection from Lambda requires the function to be in the same VPC as the DocumentDB cluster.
AWS DMS can migrate MongoDB to DocumentDB with full load + ongoing replication.
I/O cost is a significant factor; use larger instance classes to reduce I/O for write-heavy workloads.
Failover is automatic and typically completes in under 30 seconds.
Default port is 27017; TLS/SSL is mandatory.
DocumentDB is not sharded; scaling is vertical (instance size) or horizontal (read replicas).
Performance Insights is available for monitoring query performance.
These come up on the exam all the time. Here's how to tell them apart.
Amazon DocumentDB
Fully managed – automated backups, patching, failover.
Auto-scaling storage from 10 GiB to 64 TiB.
Replicates data across 3 AZs with 6 copies.
Max 15 read replicas.
I/O charges apply; cost can be higher for write-heavy workloads.
Self-Managed MongoDB on EC2
Full control over configuration and MongoDB version.
Must provision and manage EBS volumes; storage scaling requires manual intervention.
Replication configurable (e.g., replica sets) but managed by you.
Can set up any number of replicas (within limits of EC2).
EC2 and EBS costs; no I/O charges, but you pay for provisioned IOPS.
Amazon DocumentDB
Document database – supports nested JSON, arrays, and complex queries.
MongoDB-compatible – use existing MongoDB drivers and tools.
Supports aggregation pipeline, secondary indexes, and text search.
Not serverless; you provision instances.
Ideal for migrating existing MongoDB applications.
Amazon DynamoDB
Key-value and document database – simple key-based access.
Proprietary API – requires DynamoDB SDK or PartiQL.
Limited query capabilities – primary key and secondary indexes only.
Serverless – auto-scales throughput and storage.
Ideal for new applications requiring low-latency at any scale.
Mistake
DocumentDB is a fully managed version of MongoDB Atlas.
Correct
DocumentDB is an AWS native service that implements the MongoDB wire protocol. It is not a hosted MongoDB Atlas; it is a separate implementation with different architecture and some feature differences. AWS does not license MongoDB code; it built its own document database compatible with MongoDB drivers.
Mistake
DocumentDB supports all MongoDB features, including change streams and TTL indexes.
Correct
DocumentDB does not support change streams, TTL indexes, MapReduce, or sharding. It supports a subset of MongoDB 3.6, 4.0, and 5.0 APIs. Always check the compatibility table in the AWS documentation.
Mistake
You can scale DocumentDB storage down by modifying the cluster.
Correct
Storage auto-scales up but never down. The only way to reduce storage is to restore a snapshot to a new cluster with less data. This is a common exam trap.
Mistake
DocumentDB read replicas can be promoted to primary for writes.
Correct
Any read replica can be promoted to primary during a failover. However, you cannot manually promote a replica for write operations; it is only for failover. The primary is the only writer under normal operation.
Mistake
DocumentDB requires you to provision storage in advance.
Correct
DocumentDB uses auto-scaling storage that starts at 10 GiB and grows as needed up to 64 TiB. No upfront provisioning is required. This is a key advantage over self-managed MongoDB.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Yes, DocumentDB is compatible with MongoDB drivers for versions 3.6, 4.0, and 5.0. You can use the same connection string format with TLS enabled. However, some driver features like change streams or TTL indexes will not work because DocumentDB does not support them. Always test your application against DocumentDB to ensure compatibility.
Use AWS Database Migration Service (DMS). Create a DMS replication instance, set the source as your MongoDB database and the target as your DocumentDB cluster. Perform a full load first, then enable ongoing replication (change data capture) to keep the target in sync. During cutover, stop writes to the source and update your application to connect to DocumentDB. This approach typically results in seconds of downtime.
No, DocumentDB does not support sharding. To scale horizontally, you add read replicas (up to 15) to handle read traffic. For write scaling, you must vertically scale the primary instance (choose a larger instance class). If you need sharding (e.g., for very large datasets), consider self-managed MongoDB on EC2 or MongoDB Atlas.
DocumentDB storage auto-scales up to 64 TiB. It starts at 10 GiB and grows automatically as you insert data. You cannot provision storage manually. If you need more than 64 TiB, you must partition data across multiple clusters (sharding at the application level).
Place the Lambda function in the same VPC as the DocumentDB cluster. Ensure the security group of the Lambda function allows outbound traffic to the DocumentDB security group on port 27017. Use a MongoDB driver in your Lambda code (e.g., pymongo for Python). Implement connection pooling by creating a global client outside the handler and reusing it across invocations. Avoid opening a new connection for each invocation.
No, DocumentDB does not support multi-document transactions. It only supports single-document atomic operations. If your application requires ACID transactions across multiple documents, consider using DynamoDB Transactions or self-managed MongoDB (which supports multi-document transactions since version 4.0).
DocumentDB supports memory-optimized instances (db.r5, db.r6g) and burstable instances (db.t3). For production workloads, use db.r5 or db.r6g. Burstable instances (db.t3) are suitable for development, testing, or low-traffic applications. The exam may ask which instance type is cost-effective for a dev/test environment.
You've just covered Amazon DocumentDB for MongoDB Workloads — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?