SAA-C03Chapter 123 of 189Objective 3.3

ECS Service Discovery and AWS Service Connect

This chapter covers ECS Service Discovery and AWS Service Connect, two mechanisms for enabling communication between microservices running on Amazon ECS. For the SAA-C03 exam, these topics appear in roughly 5-8% of questions, often in scenarios involving dynamic port mapping, service meshes, or inter-service communication. Understanding the trade-offs between DNS-based discovery and proxy-based service connectivity is critical for designing high-performance, scalable container architectures. We will dive deep into the mechanisms, configuration, and exam traps.

25 min read
Intermediate
Updated May 31, 2026

Airline Hub-and-Spoke vs. Direct Flights

ECS Service Discovery is like an airline's hub-and-spoke system where each flight (service) announces its arrival gate (IP address and port) to a central registry (Amazon Route 53 or Cloud Map). When a passenger (client service) wants to connect to another flight, they check the registry to find the correct gate and then walk directly there. This works well for static, predictable schedules. AWS Service Connect, on the other hand, is like a direct, non-stop shuttle service between two specific airports. The shuttle company (AWS Service Connect) handles all the logistics: it assigns a dedicated, fixed gate (virtual DNS name) for each destination, and passengers simply show up at that gate. The shuttle automatically routes them to any available aircraft (task) at the destination, even if the aircraft changes gates (IP changes). With Service Connect, the client doesn't need to query a registry; it just sends traffic to a well-known endpoint, and the network handles the rest. The key difference: discovery requires the client to look up and remember the address each time; Service Connect provides a stable, always-available endpoint that abstracts away the underlying task churn.

How It Actually Works

What are ECS Service Discovery and AWS Service Connect?

ECS Service Discovery and AWS Service Connect are two distinct approaches to enable communication between services running in an Amazon ECS cluster. Both solve the problem of finding and connecting to a service instance, but they do so in fundamentally different ways.

ECS Service Discovery is a feature that integrates with AWS Cloud Map to automatically register each task (container instance) as a DNS A record or SRV record. When a task starts, it registers its IP address and port (including dynamic ports) with Cloud Map. Other services can then resolve the service name via DNS to get the IP:port of a healthy task. This is a client-side discovery pattern: the client must perform DNS resolution and then connect directly.

AWS Service Connect is a newer capability that provides a transparent proxy layer between services. It uses an Envoy proxy sidecar injected into each task to intercept outbound traffic to a configured service name and load-balance it across healthy tasks. The client sends traffic to a local endpoint (e.g., localhost:8080), and the proxy forwards it to a backend service. This is a server-side discovery pattern: the client does not need to know the backend addresses.

How They Work Internally

ECS Service Discovery (DNS-based):

When you enable service discovery on an ECS service, you create a namespace in AWS Cloud Map (either public or private, hosted in Route 53). Each task, upon registration, creates a DNS record with a TTL of 60 seconds by default. The record type can be A (IPv4), AAAA (IPv6), or SRV (for dynamic port mapping). The SRV record includes the port number. Clients perform standard DNS resolution, which returns up to 8 healthy IP addresses (or SRV records) in random order (DNS round-robin). The client then picks one and opens a direct TCP connection.

Example DNS resolution:

$ dig +short my-service.my-namespace.local
10.0.1.45
10.0.2.12
10.0.3.78

If a task becomes unhealthy or stops, Cloud Map automatically deregisters the DNS record. However, due to DNS caching (TTL), clients may still attempt to connect to the old IP for up to 60 seconds. This can cause brief connection failures during deployments or scaling events.

AWS Service Connect (Proxy-based):

Service Connect deploys an Envoy proxy as a sidecar container in each task. The proxy listens on a local port (e.g., 15000 for inbound, 15001 for outbound). The application is configured to send traffic to a virtual endpoint (e.g., http://service-b:8080). The proxy intercepts this traffic using iptables rules and forwards it to a backend service. Service Connect uses a control plane to distribute routing tables to all proxies, including the list of healthy tasks and their IP:port combinations. The proxy performs client-side load balancing (round-robin or least-request) and health checks. If a backend task fails, the proxy immediately stops sending traffic to it, without waiting for DNS TTL.

Service Connect also supports mTLS encryption between proxies, and you can configure timeouts, retries, and circuit breakers at the proxy level.

Key Components, Values, Defaults, and Timers

Cloud Map Namespace: - Types: HTTP (for API calls) or DNS (for DNS-based discovery). DNS namespaces can be private (VPC only) or public. - DNS record TTL: Default 60 seconds, configurable from 1 to 86400 seconds. - Health checks: Cloud Map can perform HTTP/HTTPS/TCP health checks on tasks. Unhealthy tasks are removed from DNS responses.

Service Connect Configuration: - Client alias: The DNS name that the client uses to reach the service. This is resolved locally by the proxy (not via DNS). - Port mapping: You specify the port the proxy listens on and the port the application listens on. - Timeout: Default 5 seconds for idle timeout, configurable. - Health check: Proxies perform health checks every 5 seconds by default. - mTLS: Requires AWS Certificate Manager (ACM) private CA.

Configuration and Verification Commands

Creating a Service Discovery namespace:

aws servicediscovery create-private-dns-namespace \
    --name my-namespace.local \
    --vpc vpc-12345678

Creating an ECS service with service discovery:

aws ecs create-service \
    --cluster my-cluster \
    --service-name my-service \
    --task-definition my-task:1 \
    --desired-count 3 \
    --service-registries registryArn=arn:aws:servicediscovery:us-east-1:123456789012:service/srv-xxx

Verifying service discovery records:

aws servicediscovery list-instances --service-id srv-xxx

Enabling Service Connect on an ECS service:

aws ecs create-service \
    --cluster my-cluster \
    --service-name my-service \
    --task-definition my-task:1 \
    --desired-count 3 \
    --service-connect-configuration "{\"enabled\": true, \"namespace\": \"arn:aws:servicediscovery:us-east-1:123456789012:namespace/ns-xxx\", \"services\": [{\"portName\": \"http\", \"clientAliases\": [{\"dnsName\": \"service-b\", \"port\": 8080}]}]}"

Verifying Service Connect: Check the proxy logs in CloudWatch or use the ECS console to view the service connect configuration.

Interaction with Related Technologies

Application Load Balancer (ALB): Both service discovery and Service Connect can be used alongside an ALB. For external-facing services, you typically front them with an ALB. For internal microservices communication, you use Service Connect or service discovery. The exam often tests whether to use ALB + service discovery vs. Service Connect alone.

AWS App Mesh: App Mesh is a full service mesh that also uses Envoy proxies. Service Connect is a simpler, ECS-native alternative to App Mesh. App Mesh provides more advanced traffic management (weighted routing, retries, circuit breakers) but requires more configuration. For SAA-C03, Service Connect is the recommended approach for simple inter-service communication within an ECS cluster.

VPC Lattice: AWS VPC Lattice is a newer service that provides service-to-service connectivity across VPCs and accounts. It can also be used with ECS, but Service Connect is limited to within the same cluster and namespace.

Performance and Scaling Considerations

Service Discovery: DNS resolution overhead adds latency (typically <10ms in VPC). Caching can reduce this, but TTL-based staleness can cause issues. It scales well because DNS is distributed.

Service Connect: The proxy adds ~1-3ms latency per hop. It scales by adding more proxy sidecars. The control plane updates routing tables quickly (within seconds). For high-throughput applications, ensure the proxy has adequate CPU/memory limits.

Walk-Through

1

Enable Cloud Map Namespace

First, you create a Cloud Map namespace. For private DNS, you specify the VPC. Cloud Map creates a private hosted zone in Route 53 associated with the VPC. The namespace name (e.g., 'my-app.local') becomes the DNS suffix for all services. This step is one-time per application environment.

2

Register a Service in Cloud Map

For each ECS service you want to discover, you create a Cloud Map service. You define the DNS record type (A or SRV), TTL, and health check configuration (optional). Cloud Map assigns a unique service ID. This step is also one-time per service.

3

Create ECS Service with Service Discovery

When you create or update an ECS service, you specify the Cloud Map service ARN in the serviceRegistries parameter. ECS automatically registers each task as an instance in Cloud Map. The registration includes the task's private IP and, if using dynamic port mapping, the host port. ECS also deregisters the instance when the task stops.

4

Client Resolves DNS

When a client service needs to call the backend service, it performs a DNS query for the service name (e.g., 'backend.my-app.local'). Route 53 returns up to 8 IP addresses (or SRV records) in random order. The client then opens a TCP connection to one of the IPs. DNS caching at the OS level may reuse the same IP for up to the TTL (default 60 seconds).

5

Service Connect Proxy Interception

With Service Connect, each task has an Envoy proxy sidecar. When the client application sends a request to a configured service name (e.g., 'service-b:8080'), iptables rules redirect the traffic to the local proxy. The proxy consults its routing table (distributed by the control plane) and forwards the request to a healthy backend task. The proxy also handles retries and timeouts.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Microservices with Service Discovery

A large e-commerce platform runs hundreds of microservices on ECS (Fargate). They initially used service discovery with Cloud Map to allow services like 'inventory', 'pricing', and 'orders' to find each other. However, during flash sales, tasks scaled up rapidly and DNS TTL of 60 seconds caused many client requests to hit stale IPs, leading to increased error rates. They mitigated this by reducing TTL to 10 seconds and implementing client-side retry logic. Still, the architecture required each service to handle DNS resolution and connection management. The team later migrated to Service Connect, which eliminated the staleness issue because the proxy immediately updates routing tables. Performance improved, and error rates dropped by 40%. Configuration: They used a single namespace 'prod.local', and each service had a client alias like 'inventory.prod.local'. The proxies were allocated 256 CPU units and 512 MB memory.

Enterprise Scenario 2: Financial Services with Strict Compliance

A financial institution needed mTLS between all microservices for compliance. They evaluated App Mesh but found it too complex for their 20-service stack. Service Connect provided built-in mTLS with ACM private CA. They configured each service with a client alias and enabled mTLS. The proxies handled certificate rotation automatically. They also used Service Connect's circuit breaker to prevent cascading failures. One misconfiguration: they initially set the idle timeout too low (2 seconds), causing long-running database queries from an API service to be cut off. They increased it to 30 seconds. The team monitors proxy metrics (active connections, request duration) in CloudWatch.

Scenario 3: Hybrid Deployment with Service Discovery and ALB

A media company runs a legacy monolith alongside new microservices on ECS. The monolith is exposed via an ALB. New microservices communicate internally using Service Connect. However, the monolith needs to call a new service. They could not add a proxy to the monolith, so they instead used service discovery: the monolith queries DNS to find the new service. This hybrid approach works, but the monolith's DNS cache (60s) causes occasional connection failures during deployments. They plan to eventually migrate the monolith to ECS with Service Connect.

How SAA-C03 Actually Tests This

Exactly What SAA-C03 Tests

Objective 3.3: Design high-performing and scalable application architectures. Sub-objectives include 'Implement service-to-service communication' and 'Choose between service discovery and service mesh.'

Key concepts tested: Differences between DNS-based discovery and proxy-based connectivity; when to use service discovery vs. Service Connect; integration with Cloud Map; dynamic port mapping; health checks; TTL values; mTLS; and the role of Envoy proxy.

Common Wrong Answers and Why

1.

"Service Discovery is always better because it has lower latency." – Wrong. While DNS adds minimal latency, the stale DNS cache can cause failures. Service Connect has slightly higher per-request latency (proxy overhead) but provides faster failover and better resiliency. The exam expects you to choose based on requirements like 'minimize connection failures during deployments' → Service Connect.

2.

"Service Connect requires an Application Load Balancer." – Wrong. Service Connect works independently; it does not require an ALB. The ALB is used for external traffic; Service Connect is for internal.

3.

"Service Discovery supports mTLS." – Wrong. Service Discovery does not provide encryption between services. You would need to implement mTLS at the application level or use a separate service mesh. Service Connect supports mTLS natively.

4.

"Service Connect only works with Fargate." – Wrong. It works with both Fargate and EC2 launch types.

Specific Numbers and Terms

Default DNS TTL: 60 seconds

Service Connect proxy: Envoy

Cloud Map namespace types: DNS (public/private) and HTTP

Service Connect supports up to 1000 services per namespace (soft limit)

mTLS requires ACM Private CA

Service Connect idle timeout: default 5 seconds, configurable

Edge Cases and Exceptions

If you use service discovery with dynamic port mapping, you must use SRV records (not A records). The exam might test that A records only work with static port mapping.

If your client is outside the VPC, it cannot resolve private DNS namespaces. You would need a public namespace or a Route 53 resolver endpoint.

Service Connect does not support Cross-Account or Cross-VPC communication out of the box. For that, you need VPC Lattice or App Mesh.

How to Eliminate Wrong Answers

If the question mentions 'minimize latency' and 'no additional proxy', choose service discovery.

If the question mentions 'fast failover', 'mTLS', or 'reduce application complexity', choose Service Connect.

If the question mentions 'external clients', consider ALB + service discovery or ALB alone.

If the question mentions 'cross-account', look for VPC Lattice or Transit Gateway.

Key Takeaways

ECS Service Discovery uses Cloud Map DNS (A or SRV records) with a default TTL of 60 seconds.

AWS Service Connect uses an Envoy sidecar proxy for transparent service-to-service communication.

Service Discovery requires the client to perform DNS resolution and handle load balancing; Service Connect handles it automatically.

Service Connect supports mTLS using ACM Private CA; Service Discovery does not.

Service Connect provides faster failover because proxy routing tables update immediately, unlike DNS TTL.

Dynamic port mapping with Service Discovery requires SRV records; A records are only for static ports.

Service Connect works with both Fargate and EC2 launch types.

For cross-VPC or cross-account communication, use VPC Lattice or App Mesh, not Service Connect.

The exam often asks you to choose between these based on requirements: low latency vs. fast failover vs. mTLS.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

ECS Service Discovery (DNS-based)

Client-side discovery: client resolves DNS and connects directly.

No additional proxy; minimal overhead (~1ms DNS lookup).

Requires client to handle retries, load balancing, and failover.

DNS caching (TTL) can cause stale connections during scaling.

No built-in mTLS or traffic management.

AWS Service Connect (Proxy-based)

Server-side discovery: proxy handles routing transparently.

Envoy proxy adds ~1-3ms latency per request.

Built-in load balancing, retries, circuit breakers, and mTLS.

Immediate failover via proxy routing table updates.

Tightly integrated with ECS; simpler client code.

Watch Out for These

Mistake

ECS Service Discovery automatically load balances traffic across tasks.

Correct

Service Discovery only provides DNS resolution with round-robin. The client is responsible for load balancing (e.g., connection pooling, retry logic). Service Connect provides actual load balancing via the Envoy proxy.

Mistake

Service Connect requires an Application Load Balancer.

Correct

Service Connect is independent of ALB. ALB is for external traffic; Service Connect is for internal service-to-service communication within the ECS cluster.

Mistake

Service Discovery works across VPCs by default.

Correct

Private DNS namespaces are only resolvable within the VPC they are associated with. For cross-VPC resolution, you need a Route 53 Resolver or a public namespace.

Mistake

Service Connect adds significant latency (10ms+).

Correct

The Envoy proxy adds approximately 1-3ms per hop, which is negligible for most applications. The benefits of fast failover and mTLS outweigh the slight overhead.

Mistake

You can use A records for dynamic port mapping with service discovery.

Correct

Dynamic port mapping requires SRV records because the port is not the standard container port. A records only contain IP addresses and assume a well-known port.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can I use ECS Service Discovery without Cloud Map?

No. ECS Service Discovery is built on AWS Cloud Map. You must create a Cloud Map namespace and service, then reference the service ARN in your ECS service definition. Cloud Map handles registration and deregistration of tasks.

Does Service Connect work with external clients (outside the VPC)?

No. Service Connect is designed for internal service-to-service communication within the same ECS cluster and namespace. For external clients, you typically use an Application Load Balancer or API Gateway.

What is the default health check interval for Service Connect?

Service Connect proxies perform health checks every 5 seconds by default. You can configure this in the service connect configuration.

Can I use both Service Discovery and Service Connect on the same ECS service?

Yes, but it is not recommended. They serve different purposes. If you enable both, the service will be registered in Cloud Map and also have the proxy sidecar. This might be useful during migration, but typically you choose one.

How does Service Connect handle scaling events?

When a new task starts, the proxy registers with the control plane, and routing tables are updated across all proxies within seconds. When a task stops, the proxy deregisters, and traffic is immediately redirected to other healthy tasks. No DNS TTL delays.

Is there an additional cost for using Service Connect?

Service Connect itself does not have an additional cost beyond the underlying resources (CPU/memory for the proxy sidecar). However, you pay for Cloud Map if you use it for the namespace. There are no per-request charges.

What are the limitations of Service Connect?

Service Connect is limited to within a single ECS cluster and namespace. It does not support cross-account or cross-VPC communication. It also does not support gRPC (though Envoy does, but Service Connect may not expose all Envoy features).

Terms Worth Knowing

Ready to put this to the test?

You've just covered ECS Service Discovery and AWS Service Connect — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?