This chapter covers AWS Cloud Map service discovery for Amazon ECS, a critical mechanism for dynamic microservice communication. On the DVA-C02 exam, service discovery appears in roughly 5-8% of questions, often integrated with ECS, EKS, or App Mesh. Understanding Cloud Map's namespace types, health checking, and integration with Route 53 is essential for architecting scalable, resilient containerized applications. We'll dive into the internals, configuration, and exam traps.
Jump to a section
Imagine a large office building where each department has its own phone number, but those numbers change every time a new employee joins or leaves. Instead of memorizing numbers, each desk has a dynamic phonebook that updates automatically. When Department A wants to call Department B, it looks up "Billing Team" in the phonebook, which returns the current extension for any available billing employee. The phonebook also tracks health: if a billing employee is on lunch (unhealthy), their extension is removed from the list. This is exactly how AWS Cloud Map works. Cloud Map maintains a registry of service instances (like employees) with their IP addresses and ports (like phone extensions). Services query Cloud Map by a logical name (e.g., "backend-service") and get back a list of healthy, available instances. Cloud Map also supports health checking via Route 53 health checks to automatically deregister unhealthy instances. In ECS, tasks register themselves with Cloud Map upon launch and deregister when stopped. The service mesh (App Mesh or service discovery integration) uses this information to route traffic only to healthy instances, enabling dynamic, resilient communication without hardcoded endpoints.
What is AWS Cloud Map and Why Does It Exist?
AWS Cloud Map is a fully managed service discovery resource that allows you to map logical service names to the actual physical resources (IP addresses, ports, etc.) that implement those services. In a dynamic environment like Amazon ECS, tasks are ephemeral — they come and go, and their IP addresses are not known in advance. Traditional DNS-based service discovery with static records breaks because records are not automatically updated when tasks scale up/down or fail. Cloud Map solves this by providing a real-time, health-aware registry.
Cloud Map supports two types of namespaces: DNS-based and API-based. - DNS-based namespaces (public or private) use Route 53 to create DNS records that resolve to service instances. They support weighted routing, latency-based routing, and health checks. DNS TTL is configurable (default 60 seconds). - API-based namespaces (HTTP) use a REST API to discover instances. They are faster (no DNS resolution overhead) and support attributes (custom key-value pairs) for advanced filtering.
For ECS, the typical pattern is to use AWS Cloud Map with DNS-based namespaces because ECS tasks can register themselves using the AWS_CLOUD_MAP service discovery type in task definitions.
How Cloud Map Works Internally
Cloud Map operates as a centralized registry. Each service in Cloud Map corresponds to a logical service name (e.g., "orders-service"). Each instance of that service (e.g., an ECS task) is registered with a unique identifier, IP address, port, and optional attributes (e.g., version, region).
When a task starts, the ECS agent (or the container itself via the Cloud Map API) calls RegisterInstance to create a resource record set in the DNS namespace (for DNS) or an instance entry in the HTTP namespace. The instance is initially marked as healthy if health checking is disabled; if health checks are enabled, Cloud Map waits for the health check to pass before marking it healthy.
When a client service needs to discover the orders-service, it performs a DNS lookup (for DNS namespace) or an HTTP API call (for HTTP namespace). The response contains the current healthy instances. The client then picks one (e.g., round-robin for DNS, or custom logic for API).
Key Components, Values, Defaults, and Timers
Namespace: A logical container for services. Types: DNS (public or private) and HTTP (API-based).
Service: A logical group of instances. Each service has a name, namespace, and optional health check configuration.
Instance: A single endpoint (e.g., an ECS task) with a unique ID, IP address, port, and attributes.
Health Check: Can be Route 53 health check (for DNS namespaces) or custom (for HTTP namespaces). Default health check interval is 30 seconds. Unhealthy threshold is configurable (default 3).
TTL: For DNS namespaces, the TTL of DNS records. Default is 60 seconds. Minimum is 1 second (but not recommended for performance).
Service Discovery Type in ECS Task Definition: AWS_CLOUD_MAP with service name and namespace ID.
Registration: ECS tasks automatically register with Cloud Map when they start (if the task definition includes a service discovery configuration). Deregistration occurs when the task stops.
Configuration and Verification Commands
To create a namespace:
aws servicediscovery create-namespace --name my-namespace --type DNS_PRIVATE --vpc vpc-12345678To create a service:
aws servicediscovery create-service --name orders-service --namespace-id ns-abc123 --dns-config '{"RoutingPolicy":"MULTIVALUE","DnsRecords":[{"Type":"A","TTL":60}]}'To register an instance:
aws servicediscovery register-instance --service-id srv-xyz789 --instance-id task-001 --attributes '{"AWS_INSTANCE_IPV4":"10.0.1.5","AWS_INSTANCE_PORT":"8080"}'To discover instances via API:
aws servicediscovery discover-instances --namespace-name my-namespace --service-name orders-serviceInteraction with Related Technologies
Amazon ECS: ECS integrates natively with Cloud Map. When you define a task definition, you can specify ServiceConnectConfiguration or ServiceDiscoveryRegistry. The latter uses Cloud Map. ECS automatically calls RegisterInstance when a task starts and DeregisterInstance when it stops.
AWS App Mesh: App Mesh uses Cloud Map for service discovery when you configure virtual nodes with Cloud Map service discovery type. App Mesh queries Cloud Map to get the list of healthy endpoints for each virtual node.
Route 53: For DNS namespaces, Cloud Map creates Route 53 private hosted zones and manages record sets. Health checks can be Route 53 health checks that monitor the instance's health endpoint.
Elastic Load Balancing: Cloud Map is often used together with ALB/NLB for traffic routing, but Cloud Map itself is not a load balancer; it provides the list of targets.
Important Exam Points
DNS vs API namespaces: DNS is slower (due to caching and TTL) but simpler; API is faster and supports attributes. The exam may ask which to use for low-latency discovery.
Health checking: If health checks are enabled, only healthy instances are returned. The default unhealthy threshold is 3 (i.e., after 3 consecutive failures).
Multi-value answer: For DNS MULTIVALUE routing policy, Cloud Map returns up to 8 healthy records randomly. For WEIGHTED, it returns based on weights.
Service discovery for ECS with Fargate: Same as EC2 launch type, but Fargate tasks also support Cloud Map.
Cross-account discovery: Cloud Map does not support cross-account discovery natively; you need to use Route 53 private hosted zones or API-based with IAM.
Limits: Default limit of 50 namespaces per account, 50 services per namespace, 1000 instances per service (soft limits).
Step-by-Step: How an ECS Task Registers with Cloud Map
Task Definition specifies ServiceDiscoveryRegistry with namespace ID and service name.
ECS starts the task on a container instance (EC2 or Fargate).
The ECS agent (or the awsvpc network mode) assigns an IP address to the task's elastic network interface.
ECS calls RegisterInstance with the task's private IP, port (from container port mapping), and optional attributes.
Cloud Map creates a DNS record (for DNS namespace) or adds an instance entry (for HTTP namespace). If health check is configured, the instance starts in UNKNOWN state.
Route 53 health check (if any) pings the instance's health endpoint every 30 seconds.
After the health check passes (default 3 successful checks), the instance state becomes HEALTHY.
Client services query Cloud Map and get the list of healthy instances.
When the task stops, ECS calls DeregisterInstance, removing the record.
Common Pitfalls
DNS caching: Clients may cache DNS records beyond TTL, causing them to try unhealthy instances. Use low TTL or API-based discovery.
Health check misconfiguration: If health check endpoint returns 200 but the service is not ready, traffic may be routed to unready instances.
Namespace type mismatch: Using a public namespace for private resources exposes internal IPs to the internet (though Route 53 public namespace does not expose VPC IPs if you use private IPs? Actually, public namespace records are publicly resolvable; do not use public for internal services).
IAM permissions: The ECS task execution role needs servicediscovery:RegisterInstance and servicediscovery:DeregisterInstance.
Define Task Definition with Service Discovery
In the ECS task definition, you specify the service discovery configuration under the `ServiceDiscoveryRegistry` field. You provide the `NamespaceId` and `ServiceName` from Cloud Map. Optionally, you can set `ContainerName` and `ContainerPort` if the container uses a different port than the task's network interface. The ECS agent uses this configuration to automatically register the task when it starts.
Task Launch and IP Assignment
ECS launches the task using the `awsvpc` network mode (required for service discovery). Each task gets a unique elastic network interface (ENI) with a private IP address from the VPC subnet. The ECS agent records this IP along with the container port (from the task definition's port mappings). This IP is the endpoint that will be registered in Cloud Map.
ECS Agent Calls RegisterInstance
The ECS agent calls the Cloud Map `RegisterInstance` API with the service ID, instance ID (the task ARN or a unique identifier), and attributes including `AWS_INSTANCE_IPV4` (the private IP) and `AWS_INSTANCE_PORT` (the container port). If the namespace is DNS-based, Cloud Map creates a Route 53 record set. If the service has health checks configured, the instance is initially marked as `UNKNOWN`.
Health Check Execution (if configured)
If the service has a health check configuration (e.g., Route 53 health check), Cloud Map initiates health checks every 30 seconds against the instance's IP and port. The health check must return a 200 OK response for the path specified (default is '/'). After 3 consecutive successful checks, the instance state changes to `HEALTHY`. If 3 consecutive failures occur, it becomes `UNHEALTHY` and is removed from DNS responses.
Client Discovery via DNS or API
A client service (e.g., another ECS task) needs to communicate with the registered service. It performs a DNS lookup on the service's DNS name (e.g., `orders-service.my-namespace.local`). Route 53 returns up to 8 healthy IP addresses (for MULTIVALUE policy) using round-robin. Alternatively, the client can use the Cloud Map `DiscoverInstances` API to get a list of instances with attributes, allowing custom filtering.
Deregistration on Task Stop
When the ECS task stops (due to scale-in, failure, or manual stop), the ECS agent calls `DeregisterInstance` with the same instance ID. Cloud Map removes the DNS record or instance entry. If health checks are enabled, the instance is removed from the healthy set immediately, but DNS caches may still have the record until TTL expires.
Enterprise Scenario 1: E-Commerce Microservices on ECS
A large online retailer runs its order processing system on ECS with Fargate. The system consists of multiple microservices: orders, inventory, payments, and notifications. Each service scales independently based on load. Without Cloud Map, the services would need to hardcode IPs or use a load balancer for every inter-service call, which adds latency and cost. They deploy Cloud Map with a private DNS namespace (internal.example.com). Each service is registered with a service name (e.g., orders.internal.example.com). The orders service discovers the inventory service by DNS lookup. They configure health checks on each service's /health endpoint. During Prime Day, the inventory service scales from 10 to 100 tasks. Cloud Map automatically registers new tasks and deregisters terminated ones. The orders service's DNS cache TTL is set to 5 seconds to quickly adapt to changes. Without Cloud Map, they would have to implement custom service discovery logic.
Enterprise Scenario 2: Hybrid Deployment with App Mesh
A financial services company uses AWS App Mesh for service-to-service communication with mutual TLS. They deploy their services on ECS and on-premises via AWS Direct Connect. They use Cloud Map as the service discovery backend for App Mesh. Each service is registered in Cloud Map with attributes like version and region. App Mesh virtual nodes are configured with CloudMapServiceDiscovery referencing the namespace and service name. App Mesh queries Cloud Map to get the list of healthy instances and distributes traffic using weighted routing. Health checks are performed by App Mesh's sidecar proxy (Envoy). Cloud Map ensures that unhealthy instances are not included in the Envoy's endpoint list. This setup allows them to migrate services from on-prem to ECS gradually by registering both on-prem and ECS instances in the same Cloud Map service, with different attributes for routing.
What Goes Wrong When Misconfigured
DNS TTL too high: If TTL is set to 300 seconds, clients will cache stale IPs for 5 minutes, causing traffic to go to terminated tasks. This results in connection errors. Solution: use low TTL (e.g., 5 seconds) or switch to API-based discovery.
Health check path incorrect: If the health check path is /health but the service exposes /status, the health check always fails, and all instances are marked unhealthy. No traffic is routed. Solution: ensure path matches.
Missing IAM permissions: If the ECS task execution role lacks servicediscovery:RegisterInstance, tasks fail to start with an access denied error. Solution: attach the AWS managed policy AWSCloudMapRegisterInstanceAccess.
What DVA-C02 Tests
The DVA-C02 exam tests Cloud Map service discovery primarily under Domain 1: Development with AWS Services (Objective 1.1 - Develop code that uses AWS services). Specific sub-objectives include:
Implement service discovery using AWS Cloud Map.
Differentiate between DNS-based and API-based namespaces.
Understand health check integration.
Integrate Cloud Map with ECS and App Mesh.
Common Wrong Answers and Why
"Use Route 53 private hosted zones instead of Cloud Map" – While Route 53 can be used for service discovery, Cloud Map provides health checking, instance registration/deregistration, and API-based discovery. The exam expects you to know Cloud Map is the managed service discovery solution.
"Cloud Map supports weighted routing only" – Cloud Map supports MULTIVALUE (default), WEIGHTED, and LATENCY routing policies for DNS namespaces. MULTIVALUE returns up to 8 healthy records.
"Health checks are mandatory" – Health checks are optional. If not configured, all registered instances are considered healthy.
"API-based namespaces are slower than DNS" – API-based namespaces are actually faster because they avoid DNS resolution and caching. The exam may test this.
Specific Numbers and Terms
Default TTL: 60 seconds.
Default health check interval: 30 seconds.
Unhealthy threshold: 3 consecutive failures.
Max records returned for MULTIVALUE: 8.
Instance attributes: AWS_INSTANCE_IPV4, AWS_INSTANCE_IPV6, AWS_INSTANCE_PORT, AWS_INSTANCE_CNAME.
Namespace types: DNS_PUBLIC, DNS_PRIVATE, HTTP.
Routing policies: MULTIVALUE, WEIGHTED, LATENCY (for DNS).
Edge Cases and Exceptions
Cross-VPC discovery: Cloud Map DNS namespaces are limited to a single VPC (for private). For cross-VPC, use Route 53 private hosted zones with VPC associations, or use HTTP namespaces with API calls across VPCs (requires VPC peering or Transit Gateway).
IPv6: Cloud Map supports IPv6 via AWS_INSTANCE_IPV6 attribute and AAAA records.
Service discovery for ECS with external tasks: External instances (on-prem) can be registered manually via API, but ECS does not manage them.
How to Eliminate Wrong Answers
If a question asks about service discovery for microservices, eliminate options that mention:
Using only Route 53 without Cloud Map (unless the context is simple DNS without health checks).
Using Elastic Load Balancer for inter-service communication (ELB is for external traffic, not internal service discovery).
Hardcoding IP addresses.
Using EC2 Auto Scaling groups directly (they don't provide per-task registration).
Cloud Map is the AWS managed service discovery solution, not just a DNS service.
Two namespace types: DNS (simpler, slower) and HTTP (faster, supports attributes).
ECS tasks automatically register/deregister with Cloud Map when using ServiceDiscoveryRegistry.
Health checks are optional; default interval 30s, unhealthy threshold 3.
DNS MULTIVALUE returns up to 8 healthy records; TTL default 60s.
API-based discovery is faster than DNS and avoids caching issues.
Cloud Map does not support cross-account discovery natively.
IAM permissions required: servicediscovery:RegisterInstance and DeregisterInstance.
Cloud Map integrates with ECS, EKS, App Mesh, and Route 53.
For low-latency discovery, use HTTP namespaces.
These come up on the exam all the time. Here's how to tell them apart.
Cloud Map DNS Namespace
Uses Route 53 for DNS resolution
Supports A, AAAA, SRV, CNAME records
TTL-based caching (default 60s)
Supports health checks via Route 53
Slower due to DNS caching
Cloud Map HTTP Namespace
Uses REST API for discovery
Returns JSON with IP, port, attributes
No caching; real-time results
Health checks via custom or Route 53
Faster; suitable for low-latency
Cloud Map Service Discovery
Decentralized; each client gets list of endpoints
No single point of failure
Supports attributes for advanced routing
Health checking per instance
Integrated with ECS and App Mesh
Traditional Load Balancer (ALB/NLB)
Centralized; all traffic goes through LB
Potential SPOF (unless multi-AZ)
No attribute-based routing
Health checking per target group
Adds latency and cost
Mistake
Cloud Map is just a managed Route 53 hosted zone.
Correct
Cloud Map provides a full service registry with instance registration/deregistration, health checking, and API-based discovery. Route 53 is only used as the DNS backend for DNS namespaces; Cloud Map adds a layer of abstraction and automation.
Mistake
You must use health checks with Cloud Map.
Correct
Health checks are optional. You can configure a service without health checks, and all registered instances will be considered healthy and returned in discovery results.
Mistake
Cloud Map DNS namespaces support only A records.
Correct
Cloud Map supports A (IPv4), AAAA (IPv6), SRV (service location), and CNAME records. You specify the record type in the `DnsRecords` configuration when creating the service.
Mistake
API-based namespaces are slower because they require an API call.
Correct
API-based namespaces are actually faster than DNS because they avoid DNS resolution overhead and caching issues. API calls are direct and return results immediately, whereas DNS has TTL and propagation delays.
Mistake
Cloud Map can discover instances across different AWS accounts.
Correct
Cloud Map does not natively support cross-account service discovery. You would need to use Route 53 private hosted zones shared via AWS RAM or use API-based discovery with cross-account IAM roles.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
In the task definition, under the container definition, add a `ServiceDiscoveryRegistry` block with the `NamespaceId` and `ServiceName` from Cloud Map. You must use the `awsvpc` network mode. ECS will automatically register the task when it starts and deregister it when it stops.
DNS namespaces use Route 53 to create DNS records (A, AAAA, SRV) that are resolved by clients. They support health checks and routing policies (MULTIVALUE, WEIGHTED, LATENCY). HTTP namespaces use a REST API to discover instances directly; they return instance attributes and are faster because there is no DNS caching. Use HTTP for low-latency requirements and when you need attribute-based filtering.
Yes, Cloud Map works with both ECS EC2 and Fargate launch types. The task must use the `awsvpc` network mode. The registration and discovery process is identical.
For DNS namespaces, you can configure Route 53 health checks that monitor the instance's health endpoint (default path '/'). The health check runs every 30 seconds. After 3 consecutive successful checks, the instance is marked healthy; after 3 consecutive failures, it becomes unhealthy and is removed from DNS responses. For HTTP namespaces, you can implement custom health checks or use Route 53 health checks as well.
The ECS task execution role needs `servicediscovery:RegisterInstance` and `servicediscovery:DeregisterInstance`. Additionally, if using health checks, `route53:UpdateHealthCheck` may be needed. The AWS managed policy `AWSCloudMapRegisterInstanceAccess` provides these permissions.
Yes, you can manually register on-premises instances using the Cloud Map API or AWS CLI. However, ECS will not automatically manage those instances. You need to handle registration/deregistration yourself.
The default TTL is 60 seconds. You can set it to any value between 1 and 86400 seconds. Lower TTLs reduce caching but increase DNS query load.
You've just covered ECS Service Discovery with Cloud Map — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?