DVA-C02Chapter 80 of 101Objective 1.3

Amazon OpenSearch Service

This chapter covers Amazon OpenSearch Service, a managed service for real-time search, monitoring, and log analytics. For the DVA-C02 exam, OpenSearch appears in approximately 5-8% of questions, primarily in the context of integrating search into applications, log analysis, and performance optimization. You will learn how OpenSearch works internally, its key components, configuration options, and common exam scenarios. Mastering this service is essential for building scalable, search-driven applications on AWS.

25 min read
Intermediate
Updated May 31, 2026

OpenSearch as a Library Card Catalog System

Imagine a massive library with millions of books, each containing thousands of words. The library does not store books in alphabetical order; instead, books are placed on shelves in no particular sequence. To find a book containing the phrase "cloud computing," a librarian would have to walk through every aisle, open every book, and read every page—this is like scanning a database without an index. This approach is slow and does not scale. To solve this, the library implements a card catalog system. When a new book arrives, a librarian reads every word, then creates index cards for each significant word, listing the book's ID and page numbers where that word appears. These cards are sorted alphabetically in a set of drawers. Now, to find "cloud computing," the librarian goes to the 'C' drawer, finds the 'cloud computing' card, and immediately knows which books and pages to retrieve. The card catalog itself is stored in a separate, highly organized room that allows fast lookups. This is exactly how Amazon OpenSearch Service works: it ingests documents, analyzes and tokenizes the text, and builds an inverted index mapping terms to document IDs and positions. When a search query arrives, OpenSearch does not scan raw documents; it consults the index, retrieves matching document IDs, scores them by relevance, and returns the results in milliseconds. The library card catalog is the inverted index; the drawers are Apache Lucene segments; and the librarian's ability to find cards rapidly is powered by OpenSearch's distributed architecture across multiple nodes.

How It Actually Works

What is Amazon OpenSearch Service?

Amazon OpenSearch Service is a fully managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS cloud. OpenSearch is an open-source, distributed search and analytics engine derived from Elasticsearch 7.10. It is used for log analytics, real-time application monitoring, full-text search, and security information and event management (SIEM). The DVA-C02 exam tests your ability to integrate OpenSearch into applications, configure it for performance and security, and troubleshoot common issues.

How OpenSearch Works Internally

At its core, OpenSearch is built on Apache Lucene, a high-performance, full-featured text search engine library. When a document is indexed, OpenSearch performs the following steps: 1. Analysis: The document's text fields are passed through an analyzer that tokenizes the text into terms (tokens) and may apply filters like lowercasing, stemming, or stop word removal. 2. Indexing: Each token is mapped to the document ID and position within the document. This mapping is stored in an inverted index, which allows fast lookups of terms to documents. 3. Sharding: The index is divided into shards, which are distributed across multiple nodes in the cluster. Each shard is a fully functional Lucene index that can be searched independently. 4. Replication: Each primary shard can have one or more replica shards, which are copies of the primary shard. Replicas provide high availability and increase search throughput by allowing searches to run on replicas in parallel.

Key Components and Defaults

- Cluster: A collection of one or more nodes that together hold your data and provide federated indexing and search capabilities. - Node: A single server that is part of a cluster. Nodes come in three types: - Data nodes: Store data and execute data-related operations (indexing, searching, aggregations). - Master nodes: Manage cluster-wide operations, such as creating/deleting indexes, tracking nodes, and deciding which shards to allocate to which nodes. By default, the first seven nodes in a cluster are eligible to be master nodes. - UltraWarm nodes: Provide a cost-effective way to store large amounts of read-only data using Amazon S3 and a caching layer. - Index: A collection of documents that share similar characteristics. An index is identified by a name and is analogous to a database in the relational world. - Document: A basic unit of information that can be indexed. It is expressed in JSON format. Each document has a unique ID within its index. - Shard: A partition of an index. Each shard is a Lucene index. By default, an index has 1 primary shard and 1 replica shard. The number of primary shards is defined at index creation and cannot be changed later (without reindexing). - Replica: A copy of a primary shard. Replicas never reside on the same node as their primary shard. The default number of replicas is 1. - Segment: A Lucene index is composed of segments, which are immutable. When a document is indexed, it is first written to an in-memory buffer and then flushed to disk as a segment. Segments are periodically merged to optimize search performance.

Configuration and Verification Commands

OpenSearch provides a RESTful API over HTTP. Common operations include: - Create an index: PUT /my-index with optional settings in the request body. - Index a document: POST /my-index/_doc with JSON body. - Search: GET /my-index/_search with query in the request body. - Get cluster health: GET /_cluster/health returns status (green, yellow, red). - Get node info: GET /_cat/nodes?v lists nodes with details.

Interaction with Related AWS Services

OpenSearch Service integrates with several AWS services: - Amazon CloudWatch Logs: You can stream log data from CloudWatch Logs to OpenSearch for real-time analysis using a Lambda subscription filter or a Kinesis Data Firehose delivery stream. - AWS Lambda: Lambda can be used to process and transform data before indexing into OpenSearch, or to trigger actions based on search results. - Amazon Kinesis Data Firehose: Firehose can deliver streaming data directly to OpenSearch, providing near-real-time ingestion. - AWS Identity and Access Management (IAM): IAM roles and policies control access to the OpenSearch API. Fine-grained access control can be implemented using OpenSearch's built-in security plugin. - Amazon S3: Index snapshots can be stored in S3 for backup and restore. UltraWarm nodes use S3 as the primary storage layer. - Amazon VPC: OpenSearch domains can be deployed within a VPC for network isolation, accessible only via private IP addresses.

Performance Considerations

Shard size: The recommended shard size is between 10 GB and 50 GB. Too many small shards waste resources; too few large shards slow down recovery.

Instance types: Memory-optimized instances (e.g., R5, R6g) are recommended for search-heavy workloads, while compute-optimized instances (e.g., C5, C6g) are suitable for indexing-heavy workloads.

UltraWarm: For long-term storage of older, less frequently accessed data, UltraWarm nodes provide a cost-effective option. Data is stored in S3 and cached on local nodes.

Dedicated master nodes: For production clusters with more than 10 data nodes, it is recommended to use three dedicated master nodes to avoid cluster instability.

Snapshot and restore: Automated snapshots are taken every hour by default and retained for 14 days. Manual snapshots can be stored in S3.

Common Exam Scenarios

1.

Log analysis: You need to analyze application logs in real-time. The solution involves streaming logs from CloudWatch Logs to OpenSearch via a Lambda function or Firehose.

2.

Full-text search: You are building an e-commerce application that requires fast product search. You index product data into OpenSearch and query using match queries.

3.

Performance tuning: A search application is slow. You need to optimize by adjusting shard count, using appropriate instance types, or enabling request caching.

4.

Security: You need to restrict access to the OpenSearch endpoint. Options include IAM policies, resource-based policies, IP-based access policies, or VPC placement.

Exam Traps

Confusing OpenSearch with Amazon Elasticsearch Service: OpenSearch is the successor to Elasticsearch. The exam may refer to both, but OpenSearch is the current service.

Shard count cannot be changed: Once an index is created, the number of primary shards is fixed. To change it, you must reindex the data.

Replicas can be changed dynamically: Unlike primary shards, the number of replicas can be updated at any time using the _settings API.

OpenSearch does not support SQL by default: While OpenSearch has SQL support via a plugin, the exam may test the use of the RESTful API for querying.

UltraWarm nodes are for read-only data: They cannot be used for indexing new data. They are ideal for time-series data that is no longer being updated.

Verification Commands

Check cluster health: curl -XGET '<domain-endpoint>/_cluster/health'

List indices: curl -XGET '<domain-endpoint>/_cat/indices?v'

Search: curl -XGET '<domain-endpoint>/my-index/_search' -H 'Content-Type: application/json' -d '{"query": {"match": {"field": "value"}}}'

Walk-Through

1

Create an OpenSearch Domain

Log into the AWS Management Console, navigate to Amazon OpenSearch Service, and click 'Create domain'. Choose a domain name (e.g., 'my-search-domain'), select the deployment type (development or production), and choose the version of OpenSearch. For production, enable dedicated master nodes and set the number of data nodes to at least 3 for high availability. Configure storage: choose instance type (e.g., r5.large.search), storage type (EBS or instance store), and size (e.g., 100 GB per node). Set up network configuration: choose VPC placement for internal access or public access with IP-based policies. Finally, configure access policies using IAM or resource-based policies. Click 'Create' and wait for the domain to initialize (typically 15-30 minutes).

2

Index Documents into OpenSearch

Once the domain is active, obtain the endpoint URL from the console (e.g., https://my-search-domain-1234567890.us-east-1.es.amazonaws.com). Use a tool like curl or the OpenSearch client library to index documents. For example, to index a product document: `curl -XPOST 'https://<endpoint>/products/_doc/1' -H 'Content-Type: application/json' -d '{"name": "laptop", "price": 999.99, "description": "High-performance laptop"}'`. The index 'products' is created automatically if it does not exist. OpenSearch returns a response with the document ID and index result. Repeat for multiple documents. Documents are immediately searchable after indexing.

3

Search Documents Using Query DSL

To search for documents, send a GET request to the search endpoint: `curl -XGET 'https://<endpoint>/products/_search' -H 'Content-Type: application/json' -d '{"query": {"match": {"description": "laptop"}}}'`. OpenSearch returns a JSON response with hits, including the total number of matches and the documents with relevance scores. The query DSL supports many types: match, term, range, bool, etc. For aggregations, use the `aggs` field to compute metrics like average price. The search is distributed: the coordinating node sends the query to all shards, merges the results, and returns the top hits.

4

Monitor Cluster Health and Performance

Regularly check cluster health using `curl -XGET 'https://<endpoint>/_cluster/health'`. The status is green (all shards allocated), yellow (replicas not allocated), or red (some primary shards not allocated). Use `_cat/nodes?v` to view node statistics. For performance, enable CloudWatch metrics for the domain: CPUUtilization, JVMMemoryPressure, FreeStorageSpace, SearchableDocuments, etc. Set up alarms for high memory pressure (above 80%) or low storage (below 20%). Use OpenSearch Dashboards (built-in Kibana) to visualize logs and metrics in real-time.

5

Optimize Indexing and Search Performance

To improve indexing throughput, use bulk requests: send multiple documents in one API call using the `_bulk` endpoint. For example: `curl -XPOST 'https://<endpoint>/_bulk' -H 'Content-Type: application/json' -d '{"index": {"_index": "products", "_id": "2"}}\n{"name": "mouse", "price": 29.99}\n{"index": {"_index": "products", "_id": "3"}}\n{"name": "keyboard", "price": 79.99}\n'`. Tune refresh interval: set `index.refresh_interval` to -1 during bulk indexing to disable refresh, then set back to 1s. For search performance, enable request caching for frequently used queries by setting `index.requests.cache.enable: true`. Use filter context instead of query context for non-scoring filters to leverage caching. Consider using index aliases to manage reindexing without downtime.

What This Looks Like on the Job

Enterprise Scenario 1: Centralized Log Analytics

A large e-commerce company uses Amazon OpenSearch Service to aggregate logs from thousands of microservices. Each service writes logs to CloudWatch Logs. A Lambda function is triggered by CloudWatch Logs subscription filters, processes the log entries, and indexes them into an OpenSearch cluster with three data nodes (r5.xlarge.search) and three dedicated master nodes. The cluster handles over 10 million log entries per day. Developers use OpenSearch Dashboards to create dashboards for error rates, latency percentiles, and user activity. Common issues include: high JVM memory pressure due to inefficient indexing (fixed by using bulk requests and adjusting refresh interval), and slow searches due to too many shards (mitigated by using time-based indices with rollover and index lifecycle management).

Enterprise Scenario 2: Full-Text Search for a Content Platform

A media streaming service uses OpenSearch to power search across its catalog of movies and TV shows. Product metadata (title, description, actors, genre) is indexed into an OpenSearch domain with 6 data nodes (c5.2xlarge.search) and 2 replica shards per index. The search endpoint is integrated into the mobile app via API Gateway and Lambda. To handle peak traffic of 5,000 queries per second, they use a dedicated cluster with appropriate shard sizing (20 primary shards per index, 50 GB each). They implement autoscaling for data nodes based on CPU utilization. Challenges include: relevance tuning using custom scoring scripts and handling partial matches with n-grams and edge-ngram tokenizers. Misconfiguration of shard count led to slow recovery after a node failure, resolved by using a snapshot and restore to a new cluster with correct shard count.

Enterprise Scenario 3: SIEM for Security Operations

A financial services company uses OpenSearch as a SIEM to analyze security logs from firewalls, IDS, and servers. Logs are streamed via Amazon Kinesis Data Firehose directly into OpenSearch. The cluster uses UltraWarm nodes for logs older than 30 days, reducing storage costs by 70%. Security analysts run complex aggregations to detect anomalies. The cluster is deployed in a VPC with no public access, and access is controlled via IAM and fine-grained access control. Common pitfalls: not enabling fine-grained access control leads to unauthorized access; not sizing UltraWarm nodes correctly causes cache evictions and slow queries. They also use Cross-Cluster Search to query across multiple domains for historical analysis.

How DVA-C02 Actually Tests This

What DVA-C02 Tests on Amazon OpenSearch Service

DVA-C02 objective 1.3 covers developing applications that use AWS managed services, including OpenSearch. Exam questions focus on: - Integration patterns: How to stream data into OpenSearch (CloudWatch Logs → Lambda → OpenSearch, Kinesis Firehose → OpenSearch). - Security: IAM policies, resource-based policies, VPC placement, fine-grained access control. - Performance optimization: Shard sizing, instance types, UltraWarm, request caching. - High availability: Dedicated master nodes, replica shards, multi-AZ. - Troubleshooting: Cluster health (green/yellow/red), memory pressure, slow queries.

Common Wrong Answers and Why

1.

"Use Amazon RDS for full-text search": RDS is a relational database, not optimized for full-text search at scale. OpenSearch provides near-real-time search with inverted indexes.

2.

"Change the number of primary shards dynamically": Primary shard count is fixed at index creation. Changing it requires reindexing. Replicas can be changed dynamically.

3.

"OpenSearch supports SQL natively": While there is a SQL plugin, the native API is RESTful JSON (Query DSL). The exam expects familiarity with the REST API.

4.

"UltraWarm nodes are for indexing new data": UltraWarm nodes are read-only and use S3 storage. They are for older, less frequently accessed data.

Specific Numbers and Terms

Default number of primary shards: 1 per index.

Default number of replica shards: 1 per primary shard.

Recommended shard size: 10-50 GB.

Automated snapshot interval: every hour, retained for 14 days.

Dedicated master nodes: recommended for clusters with >10 data nodes, minimum 3.

Cluster health statuses: green, yellow, red.

Edge Cases and Exceptions

Indexing into a non-existent index: OpenSearch auto-creates the index with default settings. This can lead to suboptimal shard count. Always create indices explicitly.

Network partition: If nodes lose connectivity, the cluster may elect a new master. Ensure dedicated master nodes to avoid split-brain.

High memory pressure: Caused by large aggregations or heavy indexing. Use circuit breakers (e.g., indices.breaker.total.use_real_memory).

Search timeout: Default search timeout is 30 seconds. Use timeout parameter to adjust.

How to Eliminate Wrong Answers

If the question involves real-time log analysis, eliminate services that are not designed for search (e.g., S3, DynamoDB).

If the question asks about changing shard count, look for answers involving reindexing or creating a new index.

If security is the focus, consider VPC placement and IAM over IP-based policies for fine-grained control.

If performance is the issue, think about shard size, instance type, and caching before scalability.

Key Takeaways

Amazon OpenSearch Service is a managed search and analytics engine based on Apache Lucene.

An OpenSearch index is composed of primary shards (fixed at creation) and replica shards (dynamic).

Default shard count: 1 primary, 1 replica. Recommended shard size: 10-50 GB.

Dedicated master nodes (minimum 3) are recommended for production clusters with >10 data nodes.

UltraWarm nodes provide cost-effective storage for read-only data using S3.

Automated snapshots occur every hour and are retained for 14 days.

Common integration patterns: CloudWatch Logs → Lambda → OpenSearch, Kinesis Firehose → OpenSearch.

Cluster health statuses: green (all shards allocated), yellow (replicas not allocated), red (primary shards missing).

Security options: IAM policies, resource-based policies, VPC placement, fine-grained access control.

Performance tuning: use bulk requests, adjust refresh interval, enable request caching, use filter context.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Amazon OpenSearch Service

Designed for full-text search and log analytics

Uses inverted indexes for fast search

Supports complex aggregations and scoring

Not ACID compliant; eventual consistency

Query language is RESTful JSON (Query DSL)

Amazon DynamoDB

Designed for key-value and document storage

Uses primary key and secondary indexes

Supports simple queries and transactions

ACID compliant with DynamoDB Transactions

Query language is a mix of JSON and expression syntax

Amazon OpenSearch Service

Provides real-time search and analytics

Supports custom indexing and mapping

Can store and search any JSON document

Integrates with many AWS services

Requires separate domain and cluster management

Amazon CloudWatch Logs Insights

Designed for querying CloudWatch log groups

Limited to log data stored in CloudWatch

Uses a query language similar to SQL

No separate cluster to manage

Automatic scaling and no capacity planning

Watch Out for These

Mistake

OpenSearch is a relational database and can replace RDS.

Correct

OpenSearch is a search and analytics engine, not a relational database. It does not support ACID transactions or joins. It is optimized for full-text search and aggregations.

Mistake

You can change the number of primary shards after index creation.

Correct

Primary shard count is fixed at index creation. To change it, you must reindex the data into a new index with the desired shard count.

Mistake

OpenSearch automatically scales indefinitely without configuration.

Correct

OpenSearch does not auto-scale by default. You must configure scaling policies or manually resize the domain. UltraWarm provides cost-effective scaling for read-only data.

Mistake

UltraWarm nodes can be used for indexing new data.

Correct

UltraWarm nodes are read-only and store data in S3. They are designed for infrequently accessed data. Indexing requires hot data nodes.

Mistake

OpenSearch only supports JSON over HTTP.

Correct

While the primary API is RESTful JSON, OpenSearch also supports a SQL plugin and Python/Java clients. However, the exam focuses on the REST API.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I stream CloudWatch Logs to Amazon OpenSearch Service?

You can stream CloudWatch Logs to OpenSearch using a Lambda function. Create a subscription filter in CloudWatch Logs that triggers a Lambda function. The Lambda function receives log events, processes them (e.g., parses JSON), and indexes them into OpenSearch using the REST API. Alternatively, you can use Amazon Kinesis Data Firehose with a CloudWatch Logs subscription filter as the source and OpenSearch as the destination.

Can I change the number of primary shards in an existing index?

No, the number of primary shards is set at index creation and cannot be changed. To change it, you must create a new index with the desired shard count and reindex the data from the old index using the Reindex API or by indexing documents again. You can use index aliases to switch traffic seamlessly.

What does cluster health status 'yellow' mean?

A yellow status means that all primary shards are allocated, but some replica shards are not allocated. This typically occurs when there are not enough nodes to place replicas (replicas cannot be on the same node as their primary shard). It can also indicate that the cluster is still recovering. The cluster is operational but not fully resilient to node failures.

How do I secure my OpenSearch domain?

You can secure your domain using: (1) IAM policies to control access to the OpenSearch API, (2) resource-based policies attached to the domain, (3) IP-based access policies in the domain access policy, (4) placing the domain inside a VPC for network isolation, and (5) enabling fine-grained access control with the built-in security plugin for user-level permissions.

What is UltraWarm and when should I use it?

UltraWarm is a storage tier for Amazon OpenSearch Service that uses Amazon S3 combined with a caching layer to provide cost-effective storage for read-only data. It is ideal for time-series data that is older and less frequently accessed, such as logs older than 30 days. UltraWarm nodes cannot be used for indexing new data; they are for search and analysis only.

How do I optimize search performance in OpenSearch?

Optimize search performance by: (1) using appropriate instance types (memory-optimized for search-heavy workloads), (2) sizing shards between 10-50 GB, (3) enabling request caching for frequently used queries, (4) using filter context instead of query context for non-scoring filters, (5) using index aliases to manage reindexing, and (6) tuning the refresh interval during bulk indexing.

What is the difference between OpenSearch and Elasticsearch?

OpenSearch is an open-source fork of Elasticsearch 7.10, created after Elastic changed its license. Amazon OpenSearch Service is the managed service that supports both OpenSearch and legacy Elasticsearch versions. For new domains, OpenSearch is the default. The APIs are largely compatible, but OpenSearch includes additional features like fine-grained access control and alerting.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Amazon OpenSearch Service — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?