This chapter covers Amazon OpenSearch Service, a managed service for search, log analytics, and real-time application monitoring. For the SAA-C03 exam, OpenSearch Service appears in roughly 5-8% of questions, often integrated with Kinesis Data Firehose, CloudWatch Logs, or Lambda for log ingestion. You must understand its architecture, deployment options, security, and integration patterns to answer scenario-based questions correctly.
Jump to a section
Amazon OpenSearch Service is like a massive, automated library card catalog system for a library that holds millions of books, but the books are your log files, application metrics, and searchable data. Instead of a librarian manually filing index cards, OpenSearch automatically ingests data, breaks it into pieces (shards), and distributes those shards across multiple bookshelves (nodes) in a cluster. When you want to find a specific piece of information, you don't walk the entire library; you query the catalog, which knows exactly which shelf holds the relevant shard. The catalog also keeps copies of each card (replicas) on different shelves so if one shelf collapses, you can still find the card. You can also set up automated routines to archive old cards to cheaper storage (UltraWarm) or delete them entirely (rollover). The entire system is managed by a head librarian (the cluster manager) who decides which shelves get new cards and handles any shelf that goes silent. This analogy breaks down: the 'catalog' is the cluster state and routing table, 'shards' are partitions of an index, 'replicas' are copies for fault tolerance, 'UltraWarm' is a warm storage tier using S3 and caching, and the 'head librarian' is the elected master node.
What is Amazon OpenSearch Service?
Amazon OpenSearch Service is a fully managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud. OpenSearch is a fork of Elasticsearch 7.10 and Kibana, offering search, analytics, and visualization capabilities. The service is used for:
Log analytics (ingesting and analyzing application logs, infrastructure logs)
Full-text search (e-commerce, content management)
Real-time application monitoring (metrics, traces)
Security information and event management (SIEM)
The SAA-C03 exam focuses on its operational aspects: architecture, data ingestion, security, performance, and cost optimization. You are not expected to write complex OpenSearch queries, but you must know how to set up a cluster, ingest data, secure it, and troubleshoot common issues.
How It Works Internally
An OpenSearch cluster consists of nodes (EC2 instances) that store data and perform indexing and search operations. Data is organized into indices, which are logical namespaces pointing to one or more shards. Shards are the fundamental unit of data distribution — each index is split into a configurable number of primary shards, and each primary shard can have one or more replica shards.
When you send a document to be indexed, the following happens: 1. The coordinating node receives the request and determines which shard the document belongs to, based on a hash of the document's ID. 2. The request is forwarded to the node holding the primary shard. 3. The primary shard indexes the document and, if replication is enabled, forwards the request to all replica shards in parallel. 4. Once all replicas confirm, the primary shard sends a success response back to the coordinating node, which responds to the client.
Search requests follow a scatter-gather model: the coordinating node sends the query to all shards (primary and replicas) in the index, each shard executes the query locally, and the results are merged and returned.
Key Components, Values, and Defaults
- Node types: - Data nodes: Store data and execute queries. By default, all nodes are data nodes. - Master nodes: Manage cluster state, track node membership, and coordinate shard allocation. For production, you should have at least 3 dedicated master nodes (not data nodes) to avoid split-brain scenarios. - UltraWarm nodes: Provide a warm storage tier using Amazon S3 and a local cache (node EBS or instance store). Ideal for infrequently accessed data. - Coordinating-only nodes: Act as load balancers, offloading coordination tasks from data nodes. Useful for large clusters.
Instance types: OpenSearch supports a wide range of instances, including general-purpose (e.g., m5, m6g), compute-optimized (c5, c6g), memory-optimized (r5, r6g), and storage-optimized (i3, i3en). For UltraWarm, only certain instance types are allowed (e.g., ultrawarm1.large, ultrawarm1.medium).
EBS storage: Data nodes can use EBS volumes (gp2, gp3, io1, io2). The default is gp2. Volume size can range from 10 GiB to 3 TiB per node (some instances allow larger).
Shard sizing: A recommended shard size is between 10 GiB and 50 GiB. The maximum number of shards per node is around 1,000, but AWS recommends no more than 20 shards per GiB of heap memory.
Replication: By default, each index has 1 replica (1 primary + 1 replica = 2 copies). You can increase replicas for higher read throughput and fault tolerance.
Snapshot: Automated snapshots are taken every hour by default and retained for 14 days. Manual snapshots can be stored in S3.
Configuration and Verification Commands
You can create a domain (cluster) via AWS Management Console, AWS CLI, or CloudFormation.
CLI example to create a domain:
aws opensearch create-domain \
--domain-name my-domain \
--engine-version OpenSearch_2.5 \
--cluster-config InstanceType=r5.large.search,InstanceCount=3,DedicatedMasterEnabled=true,DedicatedMasterType=c5.large.search,DedicatedMasterCount=3 \
--ebs-options EBSEnabled=true,VolumeType=gp2,VolumeSize=100 \
--vpc-options SubnetIds=subnet-abc,subnet-def \
--access-policies '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::123456789012:role/MyApp"},"Action":"es:*","Resource":"arn:aws:es:us-east-1:123456789012:domain/my-domain/*"}]}'Check cluster health:
GET _cluster/healthResponse includes status (green, yellow, red), number of nodes, active shards, etc.
List indices:
GET _cat/indices?vView node information:
GET _cat/nodes?vHow It Interacts with Related Technologies
Amazon Kinesis Data Firehose: Commonly used to stream data into OpenSearch. Firehose can transform data via Lambda and then batch-load into an OpenSearch domain. This is a common pattern for log ingestion.
Amazon CloudWatch Logs: You can stream CloudWatch Logs to OpenSearch via a Lambda subscription filter that pushes log events to an Amazon ES endpoint. Alternatively, use Kinesis Data Firehose.
AWS Lambda: Used as a transformation layer before indexing, or as a custom ingestion mechanism.
Amazon S3: Used for snapshots and for UltraWarm storage. You can also load data from S3 using Logstash or custom scripts.
Amazon Cognito: Provides authentication for OpenSearch Dashboards (Kibana) via user pools or identity pools.
AWS IAM: Used for access control via resource-based policies (domain access policies) or identity-based policies.
Security
Network isolation: Deploy the domain inside a VPC (recommended). Public access is possible but requires an IP-based policy.
Encryption: At rest using AWS KMS (optional but recommended). In transit using TLS (default).
Authentication: Fine-grained access control (FGAC) for OpenSearch users and roles. Also supports SAML, Cognito, and HTTP basic auth.
Audit logs: Enabled to log all requests to CloudWatch Logs.
Performance Considerations
Shard count: Too many shards (oversharding) wastes memory and CPU. Too few (undersharding) limits parallelism. The general rule: (Number of data nodes) * (Heap per node in GB) / 20 = maximum shard count.
Instance sizing: Monitor CPU, memory, and disk I/O. Use CloudWatch metrics like CPUUtilization, JVMMemoryPressure, FreeStorageSpace.
UltraWarm: Use for indices older than a certain age (e.g., 30 days) to reduce costs. UltraWarm nodes are cheaper but have higher latency.
Reserved instances: Available for cost savings with 1- or 3-year terms.
Common Pitfalls
Not using dedicated master nodes: For production clusters with more than 10 data nodes, dedicated masters are essential for stability.
Insufficient replica count: A single replica provides fault tolerance; without it, losing a node means data loss.
Oversharding: Creating an index with many shards (e.g., 100) for a small dataset. This wastes resources.
Ignoring cluster health: A yellow status means some replicas are unassigned; red means missing primary shards.
Create an OpenSearch Domain
In the AWS Management Console, navigate to Amazon OpenSearch Service and create a new domain. Choose the engine version (e.g., OpenSearch 2.5). Configure the cluster: number of nodes, instance types, dedicated master nodes (recommend 3 for production). Enable EBS storage and choose volume type and size. Configure network: either public access (with IP-based policy) or VPC placement. Set up access policies: either JSON policy or use IAM role/ user. Enable encryption at rest and in transit if needed. This step defines the cluster's baseline capacity and security.
Ingest Data into OpenSearch
Data can be ingested via the OpenSearch API (bulk indexing), Amazon Kinesis Data Firehose, Logstash, or custom applications. For logs, a common pattern is to stream CloudWatch Logs to a Lambda function that formats and pushes to OpenSearch. Firehose can directly put records into OpenSearch with optional transformation. Each document is indexed into a specific index. The service automatically distributes documents across primary shards based on the document ID's hash. After indexing, data is searchable almost immediately (near real-time, with a default refresh interval of 1 second).
Monitor Cluster Health
Use Amazon CloudWatch metrics (CPUUtilization, JVMMemoryPressure, FreeStorageSpace, ClusterStatus.green/yellow/red) and OpenSearch's _cluster/health API. A green status means all primary and replica shards are active. Yellow means all primaries are active but some replicas are unassigned (common during node failures). Red means some primaries are missing (data loss). Set up CloudWatch alarms for high JVM memory pressure (>75%) or low free storage. Regularly review shard sizes and reindex if necessary.
Implement Index Lifecycle Management
Use Index State Management (ISM) policies to automate index rollovers, transitions, and deletions. For example, create a policy that rolls over an index every 30 days or when it reaches 50 GB, then moves it to UltraWarm after 90 days, and deletes it after 365 days. This reduces manual management and cost. ISM policies are JSON documents applied to an index pattern. They can also perform force merges or change replica counts.
Secure the Domain
Enable fine-grained access control (FGAC) to manage users, roles, and permissions within OpenSearch. Use the built-in internal user database or integrate with SAML or Amazon Cognito. For network security, place the domain in a VPC with private subnets and use security groups to restrict traffic. Enable audit logs to CloudWatch Logs to track all API calls. Use IAM policies to control who can perform administrative actions (e.g., create/delete domains). For Kibana access, use Cognito or SAML authentication.
Enterprise Scenario 1: Centralized Log Analytics
A large e-commerce company ingests application logs from thousands of microservices running on Amazon ECS. They use Amazon Kinesis Data Firehose to stream logs from CloudWatch Logs into an OpenSearch domain. The domain has 6 r5.large.search data nodes, 3 dedicated master nodes, and 500 GB gp3 EBS volumes per node. They use ISM policies to rollover indices every 24 hours and move old indices to UltraWarm after 30 days. They monitor cluster health with CloudWatch alarms and have a replica count of 2 for critical indices. Common misconfiguration: not enough shards per node leads to hot spots; they had to rebalance by increasing the number of primary shards from 5 to 10 per index.
Enterprise Scenario 2: Full-Text Search for a CMS
A media company uses OpenSearch to power search for millions of articles. They have a 3-node cluster (c5.large.search) with 1 replica. They use Cognito for Kibana authentication. They periodically reindex to update mappings. Performance issue: they initially used m5 instances, but search latency was high. Switching to c5 (compute-optimized) reduced latency by 40%. They also enabled slow logs to identify expensive queries.
Enterprise Scenario 3: SIEM for Security Logs
A financial services firm ingests VPC Flow Logs, CloudTrail logs, and GuardDuty findings into OpenSearch for security analysis. They use Lambda to transform and push data. The domain is deployed in a VPC with no public access. They use FGAC to restrict access to sensitive indices. They have a 30-day retention for hot data, then move to UltraWarm for 6 months, then snapshot to S3. A critical issue they faced: they forgot to enable replica shards, and a node failure caused data loss. They now enforce a minimum of 1 replica for all indices.
SAA-C03 Exam Focus on Amazon OpenSearch Service
Objective Codes: The topic falls under Domain 3: High-Performance Architecture (Objective 3.1 – Select high-performing, scalable, and cost-effective storage and compute solutions). Specifically, you need to know how to design a solution for log analytics and search workloads.
Common Wrong Answers: 1. Choosing Amazon ElastiCache for log analytics: Candidates often pick ElastiCache because it's a caching service, but it is not designed for full-text search or complex aggregations. OpenSearch is the correct choice for search and analytics. 2. Selecting Amazon RDS for search: RDS is a relational database, not optimized for full-text search at scale. OpenSearch provides inverted indices and near-real-time search. 3. Assuming OpenSearch is always public: Candidates forget that OpenSearch can be deployed in a VPC for security. The exam often presents a scenario requiring private access. 4. Ignoring UltraWarm for cost optimization: When asked to reduce costs for old data, many choose to delete indices or use S3 directly. UltraWarm is the managed solution for warm data.
Specific Values and Terms:
Default replica count: 1
Recommended shard size: 10-50 GB
UltraWarm uses S3 and local cache
Dedicated master count: minimum 3 for production
Automated snapshot interval: hourly, retained 14 days
ISM policy: for index lifecycle management
Edge Cases: - Multitenancy: Use multiple indices or separate domains. The exam may ask about isolation. - Cross-cluster search: Allows querying multiple domains from a single endpoint. Not commonly tested. - OpenSearch vs Elasticsearch: The exam uses the term OpenSearch Service; know it's the managed version of OpenSearch (fork of Elasticsearch).
Elimination Strategy:
If the question involves log analytics, search, or Kibana dashboards, OpenSearch is likely the answer.
If cost optimization for old data is needed, look for UltraWarm or S3 snapshots.
If security is a concern, look for VPC placement and FGAC.
If the question mentions 'near real-time search' and 'aggregations', it's OpenSearch.
Amazon OpenSearch Service is a managed search and analytics engine, used for log analytics, full-text search, and monitoring.
A domain is a cluster of nodes; for production, use at least 3 dedicated master nodes and 2+ data nodes.
Default replica count is 1; increase for higher read throughput and fault tolerance.
Recommended shard size is 10–50 GB; avoid oversharding (max 20 shards per GB of heap).
UltraWarm nodes provide a cost-effective warm tier using S3 and local cache.
Use Index State Management (ISM) policies to automate index rollover, transition to UltraWarm, and deletion.
Secure with VPC placement, fine-grained access control, encryption at rest/in transit, and audit logs.
Common ingestion patterns: Kinesis Data Firehose, Lambda, Logstash, CloudWatch Logs subscription.
Monitor cluster health (green/yellow/red) and JVM memory pressure via CloudWatch.
OpenSearch Service supports OpenSearch (engine versions 1.x, 2.x) and legacy Elasticsearch 7.x.
These come up on the exam all the time. Here's how to tell them apart.
Amazon OpenSearch Service
Designed for full-text search, log analytics, and aggregations.
Supports complex queries: term, match, range, aggregations.
Data is indexed and stored on disk (EBS or S3).
Near real-time indexing with 1-second refresh interval.
Supports Kibana/OpenSearch Dashboards for visualization.
Amazon ElastiCache
Designed for caching frequently accessed data (key-value store).
Supports simple lookups by key; no full-text search.
Data is stored in memory (RAM) with optional persistence.
Sub-millisecond latency for cache hits.
Provides Redis or Memcached; no built-in visualization.
Amazon OpenSearch Service
Schema-less; documents can have different fields.
Optimized for search and aggregation queries.
Data is stored in inverted indices.
Scales horizontally by adding shards and nodes.
Not ACID-compliant; eventual consistency.
Amazon RDS
Requires predefined schema (tables, columns).
Optimized for transactional queries (CRUD).
Data is stored in rows and columns.
Scales vertically (larger instances) or with read replicas.
ACID-compliant; strong consistency.
Mistake
OpenSearch Service is the same as Amazon Elasticsearch Service.
Correct
Amazon OpenSearch Service is the successor to Amazon Elasticsearch Service. It supports both OpenSearch (the open-source fork) and legacy Elasticsearch 7.x versions. The exam uses the name 'OpenSearch Service' for the managed service.
Mistake
You must manage OpenSearch clusters manually.
Correct
OpenSearch Service is fully managed. AWS handles patching, updates, failover, and backups. You only configure the domain settings and monitor via CloudWatch.
Mistake
OpenSearch can only be accessed over the internet.
Correct
You can deploy OpenSearch inside a VPC, making it private. This is recommended for production workloads. Public access requires an IP-based policy.
Mistake
UltraWarm nodes store data on EBS.
Correct
UltraWarm nodes use Amazon S3 for data storage and a local cache (instance store or EBS) for hot data. This reduces cost compared to all-hot data nodes.
Mistake
Index replicas are only for fault tolerance.
Correct
Replicas also improve read throughput because search queries can be executed on any replica shard. They provide both high availability and performance.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
You can stream CloudWatch Logs to OpenSearch using a Lambda function that is triggered by a CloudWatch Logs subscription filter. The Lambda function processes the log events and indexes them into OpenSearch via the bulk API. Alternatively, use Kinesis Data Firehose: configure CloudWatch Logs to stream to a Kinesis stream, which Firehose reads and delivers to OpenSearch. The Lambda approach is simpler for small volumes; Firehose is better for high throughput.
Hot nodes are standard data nodes that store data on EBS volumes, providing low-latency access. Warm nodes (UltraWarm) store data in Amazon S3 and use a local cache for recently accessed data. Warm nodes are cheaper but have higher latency for first access. Use hot nodes for frequently accessed data and warm nodes for older, less frequently accessed data. You can transition indices from hot to warm using ISM policies.
Yes, you can create a public domain that is accessible over the internet. However, you must attach an IP-based access policy to restrict access to specific IP addresses. For production, it is strongly recommended to deploy the domain inside a VPC for better security. VPC domains are not publicly accessible; you can use a proxy or VPN to access them.
First, register a snapshot repository in Amazon S3 using the OpenSearch API: PUT _snapshot/my-s3-repository { "type": "s3", "settings": { "bucket": "my-bucket", "region": "us-east-1", "role_arn": "arn:aws:iam::..." } }. Then, create a snapshot: PUT _snapshot/my-s3-repository/snapshot-name. The IAM role must have permissions to write to the S3 bucket. Manual snapshots are incremental and can be restored to a different domain.
The maximum EBS volume size per node depends on the instance type. For example, r5.large.search supports up to 3 TiB, while r5.2xlarge.search supports up to 3 TiB as well. Some instance types support up to 15 TiB. Check the AWS documentation for specific limits. UltraWarm nodes have a fixed local cache size (e.g., 20 GB for ultrawarm1.medium) but can access unlimited S3 storage.
OpenSearch automatically detects node failures via the master node (or dedicated masters). If a data node fails, the master reassigns its shards to other nodes. If a primary shard is lost, a replica is promoted to primary. The cluster may be in yellow or red state until shards are recovered. AWS automatically replaces failed nodes in a managed domain.
Yes, OpenSearch provides near real-time indexing with a default refresh interval of 1 second. This means data is searchable approximately 1 second after ingestion. For true real-time (sub-second), you can set the refresh interval to -1 (disabled) and manually refresh, but this is not recommended for most use cases. For streaming analytics, combine OpenSearch with Kinesis Analytics.
You've just covered Amazon OpenSearch Service — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?