CS0-003Chapter 41 of 100Objective 1.2

Elastic Stack (ELK) for Log Analysis

This chapter covers the Elastic Stack (formerly ELK) for log analysis, a core toolset for security operations. On the CS0-003 exam, approximately 10-15% of questions in Domain 1 (Security Operations) involve log analysis and SIEM-like tools, with the Elastic Stack being one of the most commonly tested platforms. You will be expected to understand its components, how they work together, and how to interpret log data to detect security incidents. This chapter provides the depth needed to answer scenario-based questions about log ingestion, indexing, searching, and visualization.

25 min read
Intermediate
Updated May 31, 2026

ELK Stack as a Forensic Lab

Imagine a forensic lab in a police station. Evidence (logs) arrives from various crime scenes (servers, applications, network devices). The lab has three main sections: First, the intake desk (Logstash) receives evidence in many formats—typed reports, handwritten notes, audio recordings. The intake officer normalizes everything: transcribes audio to text, converts handwritten notes to typed format, and stamps each piece with a timestamp and source ID. This standardized evidence is then stored in a massive evidence locker (Elasticsearch), where each item is indexed by case number, date, type, and keywords. Detectives (Kibana) can then search the locker using a simple interface—they type a suspect's name and instantly see all related evidence, even from different cases. They can also create dashboards showing patterns, like a map of crime locations over time. Without the intake desk, evidence would be unusable; without the locker, it would be lost; without the interface, detectives couldn't find connections. Similarly, the ELK stack ingests disparate logs, stores them in a searchable index, and provides visualization for security analysts to detect threats.

How It Actually Works

What is the Elastic Stack and Why Does It Exist?

The Elastic Stack (formerly known as ELK Stack) is a collection of open-source tools designed for ingesting, storing, searching, and visualizing log data. It was created to address the challenge of centralized log management in distributed systems. Before tools like ELK, administrators had to SSH into each server and grep through log files individually—a slow, error-prone process. The Elastic Stack provides a unified platform where logs from hundreds or thousands of sources can be collected, normalized, and made searchable in near real-time.

The core components are: - Elasticsearch: A distributed, RESTful search and analytics engine based on Apache Lucene. It stores data as JSON documents in indices and provides powerful full-text search capabilities. - Logstash: A server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and sends it to a stash like Elasticsearch. - Kibana: A visualization layer that works on top of Elasticsearch, providing dashboards, charts, and a query interface.

The stack originally was ELK (Elasticsearch, Logstash, Kibana). With the addition of Beats (lightweight data shippers), it was rebranded as the Elastic Stack. On the exam, you may still see the term "ELK" used interchangeably.

How It Works Internally – Step Through the Mechanism

The Elastic Stack operates in a pipeline fashion:

1.

Data Ingestion: Data enters the stack through Beats (e.g., Filebeat for log files, Metricbeat for metrics) or directly into Logstash. Beats are lightweight agents installed on source machines. They read log files, collect metrics, or capture network data, and send it to Logstash or Elasticsearch.

2. Data Processing (Logstash): Logstash uses a pipeline with three stages: inputs, filters, and outputs. - Inputs: Define where data comes from (e.g., beats, syslog, TCP/UDP). - Filters: Process and transform data. Common filters include: - grok: Parses unstructured log data into structured fields using predefined patterns. - mutate: Performs general field transformations like renaming, removing, or converting data types. - date: Parses timestamps to use for event ordering. - geoip: Adds geographic location data from IP addresses. - Outputs: Define where processed data goes (e.g., Elasticsearch, file, stdout).

3.

Indexing and Storage (Elasticsearch): Elasticsearch receives JSON documents and indexes them. An index is a collection of documents with similar characteristics. Each document contains fields with values. Elasticsearch uses an inverted index to enable fast full-text searches. The default number of shards per index is 1 (since version 7.0, previously 5) and replicas per index is 1. Shards are the building blocks of the index; they distribute data across nodes in a cluster.

4.

Search and Visualization (Kibana): Kibana connects to Elasticsearch via its REST API. Users can query indices using the Kibana Query Language (KQL) or Lucene syntax. Results can be displayed in tables, charts, maps, or dashboards. Kibana also provides a Discover tab for ad-hoc exploration and a Dashboard tab for saved visualizations.

Key Components, Values, Defaults, and Timers

Elasticsearch Cluster: A collection of nodes. Default port for HTTP API is 9200, for inter-node transport is 9300. The cluster health status is green (all shards assigned), yellow (primary shards assigned but replicas not), or red (some primary shards not assigned).

Logstash: Runs as a Java process. Default configuration file is /etc/logstash/conf.d/. The pipeline workers default to the number of CPU cores. The -f flag specifies a config file or directory.

Beats: Filebeat is the most common for log files. It uses a harvester to read each log file line by line. It keeps track of the current position using a registry file. Default close_inactive is 5 minutes (after which the harvester closes the file). scan_frequency defaults to 10 seconds.

Kibana: Default port 5601. It requires a saved object index called .kibana for storing dashboards, visualizations, etc.

Index Lifecycle Management (ILM): Automates management of indices over time. Policies define phases: hot (active writes), warm (read-only, optimized for search), cold (less frequent access, reduced hardware), delete (remove after a period). Default rollover conditions can be based on size (e.g., 50GB) or age (e.g., 30 days).

Configuration and Verification Commands

Filebeat configuration (/etc/filebeat/filebeat.yml):

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "filebeat-%{[agent.version]}-%{+yyyy.MM.dd}"

Logstash pipeline (/etc/logstash/conf.d/logstash.conf):

input {
  beats {
    port => 5044
  }
}
filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "apache-logs-%{+YYYY.MM.dd}"
  }
}

Verify cluster health:

curl -X GET "localhost:9200/_cluster/health?pretty"

List indices:

curl -X GET "localhost:9200/_cat/indices?v"

Search a specific index:

curl -X GET "localhost:9200/filebeat-*/_search?q=status:500"

Kibana Dev Tools: Use the Console tab to run Elasticsearch queries directly.

How It Interacts with Related Technologies

The Elastic Stack often integrates with other security tools: - SIEM: Elastic Security (formerly SIEM) app in Kibana provides pre-built dashboards and detection rules for security events. It can ingest alerts from Suricata, Zeek, and other NIDS. - Threat Intelligence: Elasticsearch can store threat intelligence feeds (e.g., known malicious IPs, domains) and enrich logs via Logstash filters or Enrichment policies. - Cloud Services: Filebeat can collect logs from AWS CloudTrail, Azure Audit Logs, and GCP Audit Logs via modules. Logstash can pull from S3 buckets using the S3 input plugin. - Containers: Filebeat and Metricbeat can run as sidecar containers or DaemonSets in Kubernetes to collect logs and metrics from pods and nodes.

Performance Considerations

Indexing rate: Elasticsearch can handle thousands of events per second per node depending on hardware. Use bulk indexing (default 500 events per batch in Logstash) to improve throughput.

Shard sizing: Aim for shards between 10-50GB. Too many small shards cause overhead; too large shards slow recovery. Use ILM to manage shard size.

Memory: Elasticsearch uses the Java heap. Default is 1GB, but production nodes often use 50% of available RAM (up to 32GB) to avoid GC overhead. The remaining RAM is used by the OS for filesystem cache.

Logstash memory: Also runs on JVM. Default heap is 1GB. For high throughput, increase to 4-8GB.

Common Pitfalls

Not using ILM: Indices grow unbounded, consuming disk and slowing queries.

Ignoring mapping conflicts: Elasticsearch dynamically maps field types. If the same field appears with different types in different documents, mapping conflicts occur. Use explicit mappings via index templates.

Overly complex grok patterns: Grok is CPU-intensive. Use dissect filter for simpler parsing when possible.

Not configuring time zone: Logstash date filter defaults to UTC. If logs use a different timezone, specify it to avoid off-by-hours errors.

Security Hardening

Enable X-Pack Security: Provides authentication, authorization, and encryption. Default trial includes basic security. In production, use the Elastic Stack security features (formerly Shield).

Use TLS for all communications: Between Beats and Logstash, Logstash and Elasticsearch, and Kibana and browser.

Role-based access control (RBAC): Define roles with specific index privileges (read, write, delete) and cluster privileges (monitor, manage).

Audit logging: Enable audit logging in Elasticsearch to track administrative actions.

Exam-Specific Details

The default port for Elasticsearch REST API is 9200.

The default port for Logstash Beats input is 5044.

The default port for Kibana is 5601.

The default number of shards per index in Elasticsearch 7.x is 1.

The default number of replicas per index is 1.

Filebeat uses a registry file to track the state of files it reads.

The grok filter is used to parse unstructured log data into structured fields.

The mutate filter is used to rename, remove, or convert fields.

ILM stands for Index Lifecycle Management.

Beats are lightweight data shippers.

The Elastic Stack is often used for centralized logging and security analytics.

Walk-Through

1

Install and Configure Beats

Install Filebeat on the source machine from which logs need to be collected. For example, on Ubuntu: `sudo apt-get install filebeat`. Edit `/etc/filebeat/filebeat.yml` to specify the log file paths (e.g., `/var/log/syslog`) and the output destination (e.g., Elasticsearch or Logstash). Enable the Filebeat system module if collecting system logs: `sudo filebeat modules enable system`. Load the index template: `sudo filebeat setup --index-management`. Finally, start the service: `sudo systemctl start filebeat`. Filebeat will now tail the specified log files and send each new line as a JSON event to the output.

2

Configure Logstash Pipeline

If using Logstash as an intermediary, create a pipeline configuration file, e.g., `/etc/logstash/conf.d/beats.conf`. Define an input section to listen on port 5044 for Beats connections. Add filter plugins to parse and enrich the data; for example, use `grok` to parse syslog messages or Apache logs. Use the `date` filter to parse timestamps. Define an output section that sends processed events to Elasticsearch, specifying the host and index pattern like `beats-%{+YYYY.MM.dd}`. Test the configuration with `sudo -u logstash /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/beats.conf`. Then start Logstash: `sudo systemctl start logstash`.

3

Set Up Elasticsearch Cluster

Install Elasticsearch on one or more nodes. Configure `/etc/elasticsearch/elasticsearch.yml` with cluster name, node name, network host (e.g., `0.0.0.0` for all interfaces), and discovery settings. For a single-node cluster, set `discovery.type: single-node`. Start Elasticsearch: `sudo systemctl start elasticsearch`. Verify cluster health with `curl -X GET "localhost:9200/_cluster/health"`. The response should show `"status" : "green"` or `"yellow"`. If using multiple nodes, ensure they can communicate via port 9300. Adjust heap size in `/etc/elasticsearch/jvm.options` (e.g., `-Xms4g -Xmx4g`).

4

Configure Kibana

Install Kibana on a server that can access Elasticsearch. Edit `/etc/kibana/kibana.yml` to set `elasticsearch.hosts: ["http://localhost:9200"]`. If Elasticsearch is remote, change the host accordingly. Set `server.host: "0.0.0.0"` to allow external connections. Start Kibana: `sudo systemctl start kibana`. Access the web interface at `http://<kibana-server>:5601`. On first login, you may need to configure an index pattern (e.g., `filebeat-*`) to define which indices Kibana will use for exploration. Kibana will read the mapping from Elasticsearch to display available fields.

5

Explore Logs in Kibana

In Kibana, go to the Discover tab. Select the index pattern you created (e.g., `filebeat-*`). You will see a histogram of events over time and a list of recent documents. Use the search bar to filter events using KQL (e.g., `system.syslog.program:"sshd"`). Click on a document to expand all fields. Create visualizations by going to the Visualize tab: choose a visualization type (e.g., pie chart, bar chart), select an index pattern, and configure metrics (e.g., count) and buckets (e.g., terms aggregation on `source.ip`). Save visualizations and add them to a Dashboard for continuous monitoring.

What This Looks Like on the Job

Enterprise Scenario 1: Centralized Log Management for a 500-Server Environment

A large e-commerce company runs 500 Linux servers across multiple data centers. Each server generates syslog, application logs, and security logs. The security team needs to search for anomalies (e.g., failed SSH attempts, SQL injection attempts) across all servers in real time. They deploy Filebeat on each server, sending logs to a cluster of three Logstash nodes (load-balanced via a TCP load balancer). Logstash parses the logs using grok patterns and enriches them with GeoIP data. The output goes to a six-node Elasticsearch cluster with 12TB of SSD storage. Kibana dashboards show real-time failed login attempts by source country, top targeted ports, and error rates. The team uses Index Lifecycle Management to rotate indices every 30 days or 50GB, keeping hot data on SSDs and moving older data to warm nodes with spinning disks. Common issues: Logstash becomes a bottleneck if pipelines are too complex; they mitigate by using multiple pipeline workers and tuning the batch size (e.g., 125 events per batch).

Enterprise Scenario 2: Security Incident Response with Elastic Security

A financial institution uses the Elastic Stack as its SIEM. They ingest logs from firewalls, IDS/IPS, endpoints (via Winlogbeat and Auditbeat), and cloud services (CloudTrail via Filebeat module). The SOC analysts use Kibana's Security app to investigate alerts. Pre-built detection rules (e.g., "Suspicious Process Execution") trigger alerts when certain patterns are matched. When an alert fires, analysts use the Timeline feature to pivot from the alert to related events—for example, from a suspicious IP to all connections from that IP in the last 24 hours. They can also use machine learning jobs to detect anomalies in user behavior. Performance scales by adding more Elasticsearch nodes and using index aliases for time-based data. Misconfiguration example: if ILM is not set up, indices can grow to hundreds of gigabytes, causing slow queries and disk pressure.

Enterprise Scenario 3: Cloud-Native Logging with Kubernetes

A SaaS provider runs microservices on Kubernetes. Each pod writes logs to stdout/stderr, which are collected by a DaemonSet running Filebeat. Filebeat sends logs to a Logstash service that enriches them with Kubernetes metadata (pod name, namespace, labels) using the Kubernetes filter plugin. The logs are stored in Elasticsearch and visualized in Kibana. Developers use Kibana to debug application errors, while the security team monitors for suspicious activity like unexpected network connections. Challenges: log volume can spike during incidents; they use Logstash's persistent queues to buffer data and prevent loss. They also configure Elasticsearch's disk-based shard allocation to prevent nodes from running out of disk space.

How CS0-003 Actually Tests This

CS0-003 Exam Focus on Elastic Stack

This topic appears in Domain 1: Security Operations, specifically under Objective 1.2: "Given a scenario, analyze indicators of compromise and formulate an appropriate response." The exam expects you to understand how to use the Elastic Stack to analyze logs for security incidents. You will not be asked to memorize configuration syntax in detail, but you must know the purpose of each component and how they interact.

Common Wrong Answers and Why Candidates Choose Them

1.

"Logstash is used for visualization." This is incorrect because Logstash is a data processing pipeline; Kibana is the visualization tool. Candidates confuse the roles because both are part of the stack.

2.

"Elasticsearch stores data in a relational database format." Elasticsearch stores data as JSON documents in indices, not relational tables. Candidates familiar with SQL databases may assume similar structure.

3.

"Filebeat can parse log data into structured fields." Filebeat is a lightweight shipper and does not parse logs; it sends raw data. Logstash or Elasticsearch ingest pipelines handle parsing. Candidates might think Filebeat has built-in parsing capabilities.

4.

"Kibana is used to collect logs from servers." Kibana is a visualization and management interface, not a data collector. Beats or Logstash collect data.

Specific Numbers, Values, and Terms Appearing on the Exam

Default ports: Elasticsearch API 9200, Logstash Beats input 5044, Kibana 5601.

Default shards per index: 1 (since 7.x).

Default replicas per index: 1.

Filebeat uses a "registry file" to track file state.

ILM = Index Lifecycle Management.

Grok filter is used for parsing unstructured logs.

Beats are "lightweight data shippers."

Edge Cases and Exceptions

If Elasticsearch cluster status is "yellow," all primary shards are assigned but some replicas are unassigned. This is often due to insufficient nodes to place replicas. The exam may ask what this status indicates.

If Logstash fails to parse a log line, the event is still sent to Elasticsearch but may have a _grokparsefailure tag. Candidates should know that parsing failures are not dropped by default.

Filebeat can handle multiline log messages (e.g., stack traces) using the multiline configuration option. This is a common trick question.

How to Eliminate Wrong Answers

If a question asks which component is used for "visualizing log data," eliminate Logstash and Elasticsearch—only Kibana provides visualization.

If the question mentions "parsing unstructured logs," the answer is Logstash (or Elasticsearch ingest pipelines), not Filebeat.

For "storing and searching log data," the answer is Elasticsearch.

For "collecting logs from a server," the answer is a Beat (e.g., Filebeat).

Key Takeaways

The Elastic Stack consists of Elasticsearch (storage/search), Logstash (processing), Kibana (visualization), and Beats (data shippers).

Elasticsearch stores data as JSON documents in indices; default shards per index is 1, replicas is 1 (7.x).

Logstash uses a pipeline with input, filter, and output stages; the grok filter parses unstructured logs.

Filebeat is a lightweight log shipper that uses a registry file to track file positions; it does not parse logs.

Kibana connects to Elasticsearch on port 5601 and provides dashboards, visualizations, and the Discover tab.

Index Lifecycle Management (ILM) automates index rollover, warm/cold phases, and deletion based on size or age.

Common Elasticsearch ports: 9200 (HTTP API), 9300 (inter-node transport). Logstash Beats input port: 5044.

The Elastic Stack is widely used for centralized logging, security analytics (SIEM), and operational monitoring.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Elastic Stack (ELK)

Open-source core (Elasticsearch, Logstash, Kibana) with paid X-Pack features.

Uses JSON documents stored in indices; schema on write.

Query language: KQL (Kibana Query Language) and Lucene syntax.

Scalable via sharding and clustering; ILM for index management.

Commonly self-hosted or used via Elastic Cloud.

Splunk

Proprietary software with a free tier (500 MB/day ingestion).

Uses a flat file structure with indexed fields; schema on search.

Search Processing Language (SPL) for queries.

Scales via indexers and search heads; data is stored in buckets.

Usually deployed as a managed service (Splunk Cloud) or on-premises.

Watch Out for These

Mistake

ELK stands for Elasticsearch, Logstash, and Kibana, and that is the complete stack.

Correct

The Elastic Stack now includes Beats (lightweight data shippers) as a core component. The acronym ELK is still used informally, but the official name is Elastic Stack, and Beats are essential for data collection.

Mistake

Logstash is required for the Elastic Stack to work.

Correct

Logstash is optional. Beats can send data directly to Elasticsearch, and Elasticsearch has ingest pipelines that can parse data. Logstash is only needed when complex transformations or multiple input/output plugins are required.

Mistake

Elasticsearch is a database that uses SQL for queries.

Correct

Elasticsearch uses a REST API with JSON queries. It supports a Query DSL (Domain Specific Language) and a simple query string syntax. It does not use SQL natively, though there is a SQL plugin available.

Mistake

Kibana can collect logs from servers on its own.

Correct

Kibana does not collect data; it only visualizes and manages data already stored in Elasticsearch. Data collection is done by Beats or Logstash.

Mistake

Filebeat parses log data into structured fields before sending.

Correct

Filebeat sends raw log lines as the `message` field. Parsing into structured fields (like timestamp, IP address) is done by Logstash filters or Elasticsearch ingest pipelines. Filebeat can do some basic structuring via modules, but it does not parse arbitrary log formats.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Elasticsearch and Logstash?

Elasticsearch is a search and analytics engine that stores data as JSON documents in indices and allows full-text search. Logstash is a data processing pipeline that ingests data from multiple sources, transforms it (e.g., parsing, enriching), and outputs it to a destination like Elasticsearch. In short, Logstash processes data before storage; Elasticsearch stores and indexes it for search.

Do I need Logstash if I use Filebeat?

No, Filebeat can send data directly to Elasticsearch. Logstash is only needed if you require complex transformations, multiple input sources, or additional output destinations. For simple log collection, Filebeat to Elasticsearch is sufficient. Elasticsearch also has ingest pipelines that can perform basic parsing.

What is the default port for Kibana?

The default port for Kibana is 5601. You access the web interface at http://<kibana-server>:5601. This can be changed in the kibana.yml configuration file with the server.port setting.

How does Filebeat handle log rotation?

Filebeat uses a registry file to keep track of the current position in each log file. When a log file is rotated (renamed or replaced), Filebeat detects the change via the file's inode and continues reading from the new file. It can also be configured to close inactive files after a period (close_inactive, default 5 minutes).

What is the purpose of the grok filter in Logstash?

The grok filter parses unstructured log data into structured fields using predefined patterns. For example, it can extract an IP address, timestamp, and HTTP status code from an Apache access log line. Grok patterns are combinations of regular expressions and named fields, making it easier to search and analyze logs.

What does Elasticsearch cluster status 'yellow' mean?

A yellow status means all primary shards are assigned to nodes, but one or more replica shards are unassigned. This often occurs when there are not enough nodes to place the replicas (e.g., a single-node cluster with replicas configured). The cluster is operational but not fully resilient to node failures.

Can I use the Elastic Stack for real-time log analysis?

Yes, the Elastic Stack supports near real-time log analysis. Filebeat sends logs as they are written, Logstash processes them with minimal delay, and Elasticsearch makes them searchable within seconds. Kibana dashboards can refresh automatically to show up-to-date data.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Elastic Stack (ELK) for Log Analysis — now see how well it sticks with free CS0-003 practice questions. Full explanations included, no account needed.

Done with this chapter?