DVA-C02Chapter 69 of 101Objective 1.1

ECS Service Discovery with Cloud Map

This chapter covers AWS Cloud Map service discovery for Amazon ECS, a critical mechanism for dynamic microservice communication. On the DVA-C02 exam, service discovery appears in roughly 5-8% of questions, often integrated with ECS, EKS, or App Mesh. Understanding Cloud Map's namespace types, health checking, and integration with Route 53 is essential for architecting scalable, resilient containerized applications. We'll dive into the internals, configuration, and exam traps.

25 min read
Intermediate
Updated May 31, 2026

Cloud Map as a Dynamic Phonebook for Microservices

Imagine a large office building where each department has its own phone number, but those numbers change every time a new employee joins or leaves. Instead of memorizing numbers, each desk has a dynamic phonebook that updates automatically. When Department A wants to call Department B, it looks up "Billing Team" in the phonebook, which returns the current extension for any available billing employee. The phonebook also tracks health: if a billing employee is on lunch (unhealthy), their extension is removed from the list. This is exactly how AWS Cloud Map works. Cloud Map maintains a registry of service instances (like employees) with their IP addresses and ports (like phone extensions). Services query Cloud Map by a logical name (e.g., "backend-service") and get back a list of healthy, available instances. Cloud Map also supports health checking via Route 53 health checks to automatically deregister unhealthy instances. In ECS, tasks register themselves with Cloud Map upon launch and deregister when stopped. The service mesh (App Mesh or service discovery integration) uses this information to route traffic only to healthy instances, enabling dynamic, resilient communication without hardcoded endpoints.

How It Actually Works

What is AWS Cloud Map and Why Does It Exist?

AWS Cloud Map is a fully managed service discovery resource that allows you to map logical service names to the actual physical resources (IP addresses, ports, etc.) that implement those services. In a dynamic environment like Amazon ECS, tasks are ephemeral — they come and go, and their IP addresses are not known in advance. Traditional DNS-based service discovery with static records breaks because records are not automatically updated when tasks scale up/down or fail. Cloud Map solves this by providing a real-time, health-aware registry.

Cloud Map supports two types of namespaces: DNS-based and API-based. - DNS-based namespaces (public or private) use Route 53 to create DNS records that resolve to service instances. They support weighted routing, latency-based routing, and health checks. DNS TTL is configurable (default 60 seconds). - API-based namespaces (HTTP) use a REST API to discover instances. They are faster (no DNS resolution overhead) and support attributes (custom key-value pairs) for advanced filtering.

For ECS, the typical pattern is to use AWS Cloud Map with DNS-based namespaces because ECS tasks can register themselves using the AWS_CLOUD_MAP service discovery type in task definitions.

How Cloud Map Works Internally

Cloud Map operates as a centralized registry. Each service in Cloud Map corresponds to a logical service name (e.g., "orders-service"). Each instance of that service (e.g., an ECS task) is registered with a unique identifier, IP address, port, and optional attributes (e.g., version, region).

When a task starts, the ECS agent (or the container itself via the Cloud Map API) calls RegisterInstance to create a resource record set in the DNS namespace (for DNS) or an instance entry in the HTTP namespace. The instance is initially marked as healthy if health checking is disabled; if health checks are enabled, Cloud Map waits for the health check to pass before marking it healthy.

When a client service needs to discover the orders-service, it performs a DNS lookup (for DNS namespace) or an HTTP API call (for HTTP namespace). The response contains the current healthy instances. The client then picks one (e.g., round-robin for DNS, or custom logic for API).

Key Components, Values, Defaults, and Timers

Namespace: A logical container for services. Types: DNS (public or private) and HTTP (API-based).

Service: A logical group of instances. Each service has a name, namespace, and optional health check configuration.

Instance: A single endpoint (e.g., an ECS task) with a unique ID, IP address, port, and attributes.

Health Check: Can be Route 53 health check (for DNS namespaces) or custom (for HTTP namespaces). Default health check interval is 30 seconds. Unhealthy threshold is configurable (default 3).

TTL: For DNS namespaces, the TTL of DNS records. Default is 60 seconds. Minimum is 1 second (but not recommended for performance).

Service Discovery Type in ECS Task Definition: AWS_CLOUD_MAP with service name and namespace ID.

Registration: ECS tasks automatically register with Cloud Map when they start (if the task definition includes a service discovery configuration). Deregistration occurs when the task stops.

Configuration and Verification Commands

To create a namespace:

aws servicediscovery create-namespace --name my-namespace --type DNS_PRIVATE --vpc vpc-12345678

To create a service:

aws servicediscovery create-service --name orders-service --namespace-id ns-abc123 --dns-config '{"RoutingPolicy":"MULTIVALUE","DnsRecords":[{"Type":"A","TTL":60}]}'

To register an instance:

aws servicediscovery register-instance --service-id srv-xyz789 --instance-id task-001 --attributes '{"AWS_INSTANCE_IPV4":"10.0.1.5","AWS_INSTANCE_PORT":"8080"}'

To discover instances via API:

aws servicediscovery discover-instances --namespace-name my-namespace --service-name orders-service

Interaction with Related Technologies

Amazon ECS: ECS integrates natively with Cloud Map. When you define a task definition, you can specify ServiceConnectConfiguration or ServiceDiscoveryRegistry. The latter uses Cloud Map. ECS automatically calls RegisterInstance when a task starts and DeregisterInstance when it stops.

AWS App Mesh: App Mesh uses Cloud Map for service discovery when you configure virtual nodes with Cloud Map service discovery type. App Mesh queries Cloud Map to get the list of healthy endpoints for each virtual node.

Route 53: For DNS namespaces, Cloud Map creates Route 53 private hosted zones and manages record sets. Health checks can be Route 53 health checks that monitor the instance's health endpoint.

Elastic Load Balancing: Cloud Map is often used together with ALB/NLB for traffic routing, but Cloud Map itself is not a load balancer; it provides the list of targets.

Important Exam Points

DNS vs API namespaces: DNS is slower (due to caching and TTL) but simpler; API is faster and supports attributes. The exam may ask which to use for low-latency discovery.

Health checking: If health checks are enabled, only healthy instances are returned. The default unhealthy threshold is 3 (i.e., after 3 consecutive failures).

Multi-value answer: For DNS MULTIVALUE routing policy, Cloud Map returns up to 8 healthy records randomly. For WEIGHTED, it returns based on weights.

Service discovery for ECS with Fargate: Same as EC2 launch type, but Fargate tasks also support Cloud Map.

Cross-account discovery: Cloud Map does not support cross-account discovery natively; you need to use Route 53 private hosted zones or API-based with IAM.

Limits: Default limit of 50 namespaces per account, 50 services per namespace, 1000 instances per service (soft limits).

Step-by-Step: How an ECS Task Registers with Cloud Map

1.

Task Definition specifies ServiceDiscoveryRegistry with namespace ID and service name.

2.

ECS starts the task on a container instance (EC2 or Fargate).

3.

The ECS agent (or the awsvpc network mode) assigns an IP address to the task's elastic network interface.

4.

ECS calls RegisterInstance with the task's private IP, port (from container port mapping), and optional attributes.

5.

Cloud Map creates a DNS record (for DNS namespace) or adds an instance entry (for HTTP namespace). If health check is configured, the instance starts in UNKNOWN state.

6.

Route 53 health check (if any) pings the instance's health endpoint every 30 seconds.

7.

After the health check passes (default 3 successful checks), the instance state becomes HEALTHY.

8.

Client services query Cloud Map and get the list of healthy instances.

9.

When the task stops, ECS calls DeregisterInstance, removing the record.

Common Pitfalls

DNS caching: Clients may cache DNS records beyond TTL, causing them to try unhealthy instances. Use low TTL or API-based discovery.

Health check misconfiguration: If health check endpoint returns 200 but the service is not ready, traffic may be routed to unready instances.

Namespace type mismatch: Using a public namespace for private resources exposes internal IPs to the internet (though Route 53 public namespace does not expose VPC IPs if you use private IPs? Actually, public namespace records are publicly resolvable; do not use public for internal services).

IAM permissions: The ECS task execution role needs servicediscovery:RegisterInstance and servicediscovery:DeregisterInstance.

Walk-Through

1

Define Task Definition with Service Discovery

In the ECS task definition, you specify the service discovery configuration under the `ServiceDiscoveryRegistry` field. You provide the `NamespaceId` and `ServiceName` from Cloud Map. Optionally, you can set `ContainerName` and `ContainerPort` if the container uses a different port than the task's network interface. The ECS agent uses this configuration to automatically register the task when it starts.

2

Task Launch and IP Assignment

ECS launches the task using the `awsvpc` network mode (required for service discovery). Each task gets a unique elastic network interface (ENI) with a private IP address from the VPC subnet. The ECS agent records this IP along with the container port (from the task definition's port mappings). This IP is the endpoint that will be registered in Cloud Map.

3

ECS Agent Calls RegisterInstance

The ECS agent calls the Cloud Map `RegisterInstance` API with the service ID, instance ID (the task ARN or a unique identifier), and attributes including `AWS_INSTANCE_IPV4` (the private IP) and `AWS_INSTANCE_PORT` (the container port). If the namespace is DNS-based, Cloud Map creates a Route 53 record set. If the service has health checks configured, the instance is initially marked as `UNKNOWN`.

4

Health Check Execution (if configured)

If the service has a health check configuration (e.g., Route 53 health check), Cloud Map initiates health checks every 30 seconds against the instance's IP and port. The health check must return a 200 OK response for the path specified (default is '/'). After 3 consecutive successful checks, the instance state changes to `HEALTHY`. If 3 consecutive failures occur, it becomes `UNHEALTHY` and is removed from DNS responses.

5

Client Discovery via DNS or API

A client service (e.g., another ECS task) needs to communicate with the registered service. It performs a DNS lookup on the service's DNS name (e.g., `orders-service.my-namespace.local`). Route 53 returns up to 8 healthy IP addresses (for MULTIVALUE policy) using round-robin. Alternatively, the client can use the Cloud Map `DiscoverInstances` API to get a list of instances with attributes, allowing custom filtering.

6

Deregistration on Task Stop

When the ECS task stops (due to scale-in, failure, or manual stop), the ECS agent calls `DeregisterInstance` with the same instance ID. Cloud Map removes the DNS record or instance entry. If health checks are enabled, the instance is removed from the healthy set immediately, but DNS caches may still have the record until TTL expires.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Microservices on ECS

A large online retailer runs its order processing system on ECS with Fargate. The system consists of multiple microservices: orders, inventory, payments, and notifications. Each service scales independently based on load. Without Cloud Map, the services would need to hardcode IPs or use a load balancer for every inter-service call, which adds latency and cost. They deploy Cloud Map with a private DNS namespace (internal.example.com). Each service is registered with a service name (e.g., orders.internal.example.com). The orders service discovers the inventory service by DNS lookup. They configure health checks on each service's /health endpoint. During Prime Day, the inventory service scales from 10 to 100 tasks. Cloud Map automatically registers new tasks and deregisters terminated ones. The orders service's DNS cache TTL is set to 5 seconds to quickly adapt to changes. Without Cloud Map, they would have to implement custom service discovery logic.

Enterprise Scenario 2: Hybrid Deployment with App Mesh

A financial services company uses AWS App Mesh for service-to-service communication with mutual TLS. They deploy their services on ECS and on-premises via AWS Direct Connect. They use Cloud Map as the service discovery backend for App Mesh. Each service is registered in Cloud Map with attributes like version and region. App Mesh virtual nodes are configured with CloudMapServiceDiscovery referencing the namespace and service name. App Mesh queries Cloud Map to get the list of healthy instances and distributes traffic using weighted routing. Health checks are performed by App Mesh's sidecar proxy (Envoy). Cloud Map ensures that unhealthy instances are not included in the Envoy's endpoint list. This setup allows them to migrate services from on-prem to ECS gradually by registering both on-prem and ECS instances in the same Cloud Map service, with different attributes for routing.

What Goes Wrong When Misconfigured

DNS TTL too high: If TTL is set to 300 seconds, clients will cache stale IPs for 5 minutes, causing traffic to go to terminated tasks. This results in connection errors. Solution: use low TTL (e.g., 5 seconds) or switch to API-based discovery.

Health check path incorrect: If the health check path is /health but the service exposes /status, the health check always fails, and all instances are marked unhealthy. No traffic is routed. Solution: ensure path matches.

Missing IAM permissions: If the ECS task execution role lacks servicediscovery:RegisterInstance, tasks fail to start with an access denied error. Solution: attach the AWS managed policy AWSCloudMapRegisterInstanceAccess.

How DVA-C02 Actually Tests This

What DVA-C02 Tests

The DVA-C02 exam tests Cloud Map service discovery primarily under Domain 1: Development with AWS Services (Objective 1.1 - Develop code that uses AWS services). Specific sub-objectives include:

Implement service discovery using AWS Cloud Map.

Differentiate between DNS-based and API-based namespaces.

Understand health check integration.

Integrate Cloud Map with ECS and App Mesh.

Common Wrong Answers and Why

1.

"Use Route 53 private hosted zones instead of Cloud Map" – While Route 53 can be used for service discovery, Cloud Map provides health checking, instance registration/deregistration, and API-based discovery. The exam expects you to know Cloud Map is the managed service discovery solution.

2.

"Cloud Map supports weighted routing only" – Cloud Map supports MULTIVALUE (default), WEIGHTED, and LATENCY routing policies for DNS namespaces. MULTIVALUE returns up to 8 healthy records.

3.

"Health checks are mandatory" – Health checks are optional. If not configured, all registered instances are considered healthy.

4.

"API-based namespaces are slower than DNS" – API-based namespaces are actually faster because they avoid DNS resolution and caching. The exam may test this.

Specific Numbers and Terms

Default TTL: 60 seconds.

Default health check interval: 30 seconds.

Unhealthy threshold: 3 consecutive failures.

Max records returned for MULTIVALUE: 8.

Instance attributes: AWS_INSTANCE_IPV4, AWS_INSTANCE_IPV6, AWS_INSTANCE_PORT, AWS_INSTANCE_CNAME.

Namespace types: DNS_PUBLIC, DNS_PRIVATE, HTTP.

Routing policies: MULTIVALUE, WEIGHTED, LATENCY (for DNS).

Edge Cases and Exceptions

Cross-VPC discovery: Cloud Map DNS namespaces are limited to a single VPC (for private). For cross-VPC, use Route 53 private hosted zones with VPC associations, or use HTTP namespaces with API calls across VPCs (requires VPC peering or Transit Gateway).

IPv6: Cloud Map supports IPv6 via AWS_INSTANCE_IPV6 attribute and AAAA records.

Service discovery for ECS with external tasks: External instances (on-prem) can be registered manually via API, but ECS does not manage them.

How to Eliminate Wrong Answers

If a question asks about service discovery for microservices, eliminate options that mention:

Using only Route 53 without Cloud Map (unless the context is simple DNS without health checks).

Using Elastic Load Balancer for inter-service communication (ELB is for external traffic, not internal service discovery).

Hardcoding IP addresses.

Using EC2 Auto Scaling groups directly (they don't provide per-task registration).

Key Takeaways

Cloud Map is the AWS managed service discovery solution, not just a DNS service.

Two namespace types: DNS (simpler, slower) and HTTP (faster, supports attributes).

ECS tasks automatically register/deregister with Cloud Map when using ServiceDiscoveryRegistry.

Health checks are optional; default interval 30s, unhealthy threshold 3.

DNS MULTIVALUE returns up to 8 healthy records; TTL default 60s.

API-based discovery is faster than DNS and avoids caching issues.

Cloud Map does not support cross-account discovery natively.

IAM permissions required: servicediscovery:RegisterInstance and DeregisterInstance.

Cloud Map integrates with ECS, EKS, App Mesh, and Route 53.

For low-latency discovery, use HTTP namespaces.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Map DNS Namespace

Uses Route 53 for DNS resolution

Supports A, AAAA, SRV, CNAME records

TTL-based caching (default 60s)

Supports health checks via Route 53

Slower due to DNS caching

Cloud Map HTTP Namespace

Uses REST API for discovery

Returns JSON with IP, port, attributes

No caching; real-time results

Health checks via custom or Route 53

Faster; suitable for low-latency

Cloud Map Service Discovery

Decentralized; each client gets list of endpoints

No single point of failure

Supports attributes for advanced routing

Health checking per instance

Integrated with ECS and App Mesh

Traditional Load Balancer (ALB/NLB)

Centralized; all traffic goes through LB

Potential SPOF (unless multi-AZ)

No attribute-based routing

Health checking per target group

Adds latency and cost

Watch Out for These

Mistake

Cloud Map is just a managed Route 53 hosted zone.

Correct

Cloud Map provides a full service registry with instance registration/deregistration, health checking, and API-based discovery. Route 53 is only used as the DNS backend for DNS namespaces; Cloud Map adds a layer of abstraction and automation.

Mistake

You must use health checks with Cloud Map.

Correct

Health checks are optional. You can configure a service without health checks, and all registered instances will be considered healthy and returned in discovery results.

Mistake

Cloud Map DNS namespaces support only A records.

Correct

Cloud Map supports A (IPv4), AAAA (IPv6), SRV (service location), and CNAME records. You specify the record type in the `DnsRecords` configuration when creating the service.

Mistake

API-based namespaces are slower because they require an API call.

Correct

API-based namespaces are actually faster than DNS because they avoid DNS resolution overhead and caching issues. API calls are direct and return results immediately, whereas DNS has TTL and propagation delays.

Mistake

Cloud Map can discover instances across different AWS accounts.

Correct

Cloud Map does not natively support cross-account service discovery. You would need to use Route 53 private hosted zones shared via AWS RAM or use API-based discovery with cross-account IAM roles.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I enable service discovery for an ECS task?

In the task definition, under the container definition, add a `ServiceDiscoveryRegistry` block with the `NamespaceId` and `ServiceName` from Cloud Map. You must use the `awsvpc` network mode. ECS will automatically register the task when it starts and deregister it when it stops.

What is the difference between DNS and HTTP namespaces in Cloud Map?

DNS namespaces use Route 53 to create DNS records (A, AAAA, SRV) that are resolved by clients. They support health checks and routing policies (MULTIVALUE, WEIGHTED, LATENCY). HTTP namespaces use a REST API to discover instances directly; they return instance attributes and are faster because there is no DNS caching. Use HTTP for low-latency requirements and when you need attribute-based filtering.

Can I use Cloud Map with ECS Fargate?

Yes, Cloud Map works with both ECS EC2 and Fargate launch types. The task must use the `awsvpc` network mode. The registration and discovery process is identical.

How does health checking work in Cloud Map?

For DNS namespaces, you can configure Route 53 health checks that monitor the instance's health endpoint (default path '/'). The health check runs every 30 seconds. After 3 consecutive successful checks, the instance is marked healthy; after 3 consecutive failures, it becomes unhealthy and is removed from DNS responses. For HTTP namespaces, you can implement custom health checks or use Route 53 health checks as well.

What IAM permissions are needed for Cloud Map service discovery?

The ECS task execution role needs `servicediscovery:RegisterInstance` and `servicediscovery:DeregisterInstance`. Additionally, if using health checks, `route53:UpdateHealthCheck` may be needed. The AWS managed policy `AWSCloudMapRegisterInstanceAccess` provides these permissions.

Can I use Cloud Map for on-premises services?

Yes, you can manually register on-premises instances using the Cloud Map API or AWS CLI. However, ECS will not automatically manage those instances. You need to handle registration/deregistration yourself.

What is the default TTL for DNS records in Cloud Map?

The default TTL is 60 seconds. You can set it to any value between 1 and 86400 seconds. Lower TTLs reduce caching but increase DNS query load.

Terms Worth Knowing

Ready to put this to the test?

You've just covered ECS Service Discovery with Cloud Map — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?