CS0-003Chapter 23 of 100Objective 1.2

User and Entity Behaviour Analytics (UEBA)

This chapter covers User and Entity Behavior Analytics (UEBA), a critical security operations technology that uses machine learning to detect insider threats, compromised accounts, and advanced persistent threats. For the CS0-003 exam, understanding UEBA is essential as it appears in approximately 10-15% of Security Operations questions, particularly those involving threat detection, data analysis, and incident response. This chapter will explain how UEBA works, its components, configuration, and how to interpret its outputs for effective security monitoring.

25 min read
Intermediate
Updated May 31, 2026

The Office Building Security Guard

Imagine you are the security guard in a large office building with hundreds of employees. You know everyone who works there, their typical arrival times, their usual floors, and even their coffee break habits. One day, you see Bob from accounting, who usually arrives at 8:30 AM, show up at 3:00 AM and head straight to the server room. You also notice he is carrying a large duffel bag, which he never does. Your 'baseline' of Bob's normal behavior triggers an alert. You don't need to know exactly what he is doing—you just know it is anomalous. You investigate and find he is trying to steal backup tapes. This is UEBA. Just as you, the guard, learn patterns over time (baselines) and detect deviations, UEBA systems build profiles of users and entities (like servers, applications) by analyzing logs and network traffic. They use machine learning to spot unusual activity, such as a user logging in from a new country, accessing sensitive data at odd hours, or a server making outbound connections to a known malicious IP. The guard doesn't need a rule saying 'Bob is bad at 3 AM'; he just knows it's not normal. UEBA works similarly—it detects the unknown unknowns, the insider threats, and compromised accounts that rule-based systems miss.

How It Actually Works

What is UEBA and Why Does It Exist?

User and Entity Behavior Analytics (UEBA) is a security technology that leverages machine learning, statistical analysis, and big data to establish baselines of normal behavior for users, devices, applications, and other entities. It then detects anomalies that may indicate malicious activity. UEBA emerged as a response to the limitations of traditional signature-based detection (e.g., antivirus, IDS/IPS) and rule-based correlation (e.g., SIEM rules). These methods fail against zero-day exploits, insider threats, and attackers who use legitimate credentials—scenarios where no known signature or rule exists.

UEBA is distinct from User Behavior Analytics (UBA) because it also covers non-user entities such as servers, endpoints, network devices, and applications. For example, a server that suddenly starts communicating with a Command and Control (C2) server would be flagged even if no user is involved. The CS0-003 exam specifically tests the ability to differentiate UEBA from other detection technologies like SIEM, EDR, and NTA.

How UEBA Works Internally

UEBA systems operate through a multi-stage pipeline:

1.

Data Ingestion: UEBA collects data from diverse sources, including:

- Authentication logs (Windows Event ID 4624, 4625) - VPN logs - DHCP logs - DNS logs - Proxy logs - Email logs - Endpoint logs (from EDR) - Network flow data (NetFlow, sFlow) - Cloud API logs (e.g., AWS CloudTrail, Azure AD sign-in logs)

2.

Feature Extraction: The system extracts features such as:

- Login time, location, device - Accessed files and folders - Volume of data transferred - Number of failed logins - Peer group interactions

3. Baseline Modeling: Machine learning algorithms create profiles for each entity. Common algorithms include: - Statistical methods: Mean, median, standard deviation for numerical features (e.g., login frequency). - Time series analysis: ARIMA, exponential smoothing for temporal patterns. - Clustering: K-means or DBSCAN to group users with similar roles (e.g., HR vs. IT). - Deep learning: Autoencoders for complex anomaly detection.

4.

Anomaly Scoring: Each event is assigned a risk score based on deviation from the baseline. Scores are often normalized (0-100). Events exceeding a threshold (default often 70-80) generate alerts.

5.

Contextualization: The system enriches alerts with context: peer group comparison, asset criticality, threat intelligence feeds. For example, a login from a known malicious IP raises the score.

6.

Presentation: Alerts appear in a dashboard with severity, entity name, anomaly description, and supporting evidence.

Key Components, Values, Defaults, and Timers

Baseline Window: Typically 30 days of historical data to establish initial baselines. Some vendors recommend 7-14 days for quick deployment, but accuracy improves with longer windows.

Anomaly Threshold: Commonly set to 2-3 standard deviations from the mean. In practice, this corresponds to about a 95-99% confidence interval.

Risk Score Calculation: Weighted sum of multiple anomaly scores. Example formula: Score = (DeviationScore * 0.5) + (ThreatIntelScore * 0.3) + (AssetCriticality * 0.2).

Retention Period: Raw logs may be kept for 90-365 days; aggregated baselines may be retained longer.

Peer Group Size: Minimum 5-10 entities to form a meaningful peer group. Clusters with fewer members may not yield reliable baselines.

Learning Rate: How quickly the model adapts to changes. A low learning rate (0.01) makes the model slow to adapt; a high rate (0.1) may cause frequent false positives.

Configuration and Verification Commands

UEBA systems are often delivered as SaaS (e.g., Microsoft Sentinel UEBA, Splunk User Behavior Analytics) or as part of SIEM platforms. Configuration is typically GUI-based, but some platforms offer APIs and CLI-like interfaces.

For example, in Microsoft Sentinel UEBA, you enable it via the Azure portal:

Resource: Sentinel > Entity Behavior Analytics > Enable

To check status:

Sentinel > Entity Behavior Analytics > Manage > Entity Insights

In Splunk UBA, configuration involves:

# Set input sources
$SPLUNK_HOME/etc/apps/SA-UEBA/default/inputs.conf
[monitor:///var/log/auth.log]
index = ueba
sourcetype = linux_auth

# Verify data ingestion
| search index=ueba sourcetype=linux_auth | stats count

Interaction with Related Technologies

SIEM: UEBA often feeds alerts into SIEM for correlation with other events. SIEM can also enrich UEBA with threat intelligence.

EDR: UEBA uses endpoint logs for user and process behavior. EDR alerts (e.g., suspicious PowerShell) can be incorporated into UEBA scoring.

SOAR: UEBA alerts can trigger automated playbooks (e.g., disable account, isolate host).

Threat Intelligence: Matches anomalies against known IOCs to increase risk scores.

NTA (Network Traffic Analysis): Provides network behavior data for entity profiles (e.g., unusual data exfiltration).

Limitations

False Positives: Changes in job role, travel, or new applications can trigger alerts. Tuning is required.

Cold Start Problem: Insufficient historical data leads to inaccurate baselines. Synthetic baselines or transfer learning can help.

Adversarial Evasion: Attackers can slowly adapt their behavior to avoid detection (low and slow attacks). Some UEBA systems use adaptive learning to counter this.

Exam Relevance

CS0-003 Objective 1.2 (Security Operations) expects you to:

Explain how UEBA detects anomalies.

Differentiate UEBA from signature-based and rule-based detection.

Identify appropriate use cases (insider threat, credential misuse, data exfiltration).

Understand the role of baselines and peer groups.

Common exam question types: - "Which technology would detect a user accessing files at unusual times?" Answer: UEBA. - "What is the primary advantage of UEBA over SIEM?" Answer: Detection of unknown threats via behavioral baselines. - "What data source is essential for UEBA?" Answer: Authentication logs.

Walk-Through

1

Data Collection from Sources

UEBA ingests data from multiple sources: authentication logs (Windows Event ID 4624/4625, Linux auth.log), network flow data (NetFlow, sFlow), DNS logs, VPN logs, proxy logs, email logs, and EDR logs. The data is collected via syslog, API, or agent-based collectors. Each event is timestamped and associated with a user or entity identifier (e.g., username, IP address, hostname). The volume can reach millions of events per day for a mid-size enterprise. The system normalizes data into a common schema for analysis.

2

Feature Extraction and Aggregation

The system extracts features from raw events. For a user, features include: login time, login location (geo-IP), device used, number of failed logins, files accessed, data volume transferred, and applications used. For a server: CPU usage, network connections, processes running, and DNS queries. Features are aggregated over time windows (e.g., hourly, daily) to create behavioral attributes like 'average logins per day' or 'typical data transfer volume per session'.

3

Baseline Modeling via ML

Machine learning algorithms build baselines for each entity. Statistical methods calculate mean and standard deviation for numerical features. For categorical features (e.g., login location), the system computes frequency distributions. Time series models capture periodic patterns (e.g., logging in on weekdays vs. weekends). Clustering groups similar entities (peer groups). The baseline is updated continuously with a learning rate (default 0.05) to adapt to gradual changes. The model also considers seasonality, such as holiday effects.

4

Anomaly Detection and Scoring

When a new event occurs, the system compares it to the baseline. For numerical features, it calculates a z-score: (observed - mean) / standard deviation. For categorical, it uses probability. Each feature contributes a sub-score. The total anomaly score is a weighted sum. If the score exceeds a threshold (default 80/100), an alert is generated. The threshold is tunable to balance false positives vs. false negatives. Events are also scored for peer group deviation: if a user behaves differently from their peers, the score increases.

5

Alert Enrichment and Presentation

The alert is enriched with context: entity name, anomaly description, risk score, supporting evidence (e.g., 'User logged in from Russia at 3 AM, first time in 90 days'), peer group comparison, and asset criticality. Threat intelligence feeds are checked: if the source IP is known malicious, the score is boosted. The alert is sent to the SIEM or SOAR for triage. Security analysts review the alert in a dashboard, which shows a timeline of the entity's behavior. False positives can be dismissed to train the model (feedback loop).

What This Looks Like on the Job

Scenario 1: Insider Threat in a Financial Institution

A large bank deploys UEBA to monitor employees with access to sensitive customer data. The system ingests authentication logs, database access logs, and email logs. One day, a senior analyst, Alice, who normally works 9-5 from the New York office, logs in at 2 AM from a VPN in Brazil. She then accesses 10,000 customer records and sends an email with a large attachment to a personal Gmail account. UEBA detects multiple anomalies: unusual time, unusual location, unusual data access volume, and unusual email behavior. The risk score spikes to 95. The alert is sent to the SOC, which investigates and finds Alice's credentials were stolen. The account is disabled within minutes. Without UEBA, this attack might have gone unnoticed until data was leaked.

Scenario 2: Compromised Server in a Tech Company

A tech company uses UEBA to monitor server behavior. A web server normally handles HTTP requests and connects to the internal database. One day, the server starts making outbound connections to an unknown IP on port 4444 (a known C2 port). UEBA flags this because the server never initiates outbound connections. The baseline shows that the server only communicates with internal IPs. The anomaly score is 88. The SOC investigates and finds a web shell planted by an attacker. They isolate the server and remove the malware. UEBA detected the attack despite no signature for the web shell.

Performance Considerations

In production, UEBA systems must handle high data volumes. A typical deployment ingests 100-500 GB of logs per day. Scaling requires distributed processing (e.g., Spark, Elasticsearch). Baselines for millions of entities require significant storage (10-50 TB). To reduce noise, organizations often exclude low-risk entities (e.g., guest accounts) and use whitelists for known good behaviors (e.g., backup software). Misconfiguration often involves setting the anomaly threshold too low, causing alert fatigue, or too high, missing real threats. Another common mistake is not updating baselines after major changes (e.g., merger, new software rollout), leading to false positives.

How CS0-003 Actually Tests This

What CS0-003 Tests on UEBA

CS0-003 Objective 1.2 (Security Operations) focuses on the application of UEBA for threat detection. Specifically, you need to: - Identify UEBA as a detection method for insider threats, compromised accounts, and data exfiltration. - Differentiate UEBA from SIEM, EDR, and NTA. UEBA uses behavioral baselines and ML; SIEM uses rules; EDR focuses on endpoints; NTA focuses on network traffic. - Understand data sources: authentication logs, VPN logs, DNS logs, and proxy logs are key. - Know the concept of baselines and peer groups. - Recognize limitations: false positives, cold start, and adversarial evasion.

Common Wrong Answers and Why

1.

Choosing SIEM instead of UEBA for detecting unknown threats. Candidates think SIEM can detect anything if rules are written, but SIEM is rule-based and cannot detect novel attacks without a rule.

2.

Selecting EDR for user behavior monitoring. EDR focuses on endpoint processes, not user login patterns or data access.

3.

Believing UEBA replaces SIEM. UEBA complements SIEM; they are often used together.

4.

Confusing UEBA with User and Entity Behavior Analytics (UEBA) vs. User Behavior Analytics (UBA). The exam may ask about entity behavior; UEBA includes non-user entities.

Specific Numbers and Terms on the Exam

Baseline window: 30 days (typical).

Anomaly threshold: 2-3 standard deviations.

Risk score: often 0-100.

Peer group: minimum 5-10 members.

Data sources: authentication logs are the most critical.

Edge Cases

New employee: No baseline exists; UEBA may use peer group baseline or require a learning period.

Remote work: Changes in login location and time may be normal; UEBA must be tuned.

Attackers using slow, low-and-slow techniques: May not trigger thresholds; some UEBA uses cumulative scoring.

How to Eliminate Wrong Answers

If the question mentions 'unknown threat' or 'zero-day', the answer is likely UEBA (not SIEM or signature-based).

If the focus is on user login patterns, UEBA is correct; if on process behavior, EDR is correct.

If the question asks about 'real-time correlation of logs', it's SIEM; if 'behavioral baselines', it's UEBA.

Key Takeaways

UEBA detects anomalies by comparing events to baselines of normal behavior for users and entities.

Key data sources include authentication logs, VPN logs, DNS logs, and proxy logs.

Baselines are typically built over a 30-day window using statistical and machine learning methods.

Anomaly scores are calculated based on deviation from the baseline, often using z-scores.

Peer groups help contextualize behavior by comparing similar entities (e.g., same department).

UEBA is complementary to SIEM; SIEM provides rules, UEBA provides anomaly detection.

Common use cases: insider threat detection, compromised account detection, data exfiltration.

Limitations include false positives, cold start problems, and evasion by slow attacks.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

UEBA

Detects unknown threats via behavioral baselines

Uses machine learning and statistical analysis

Focuses on anomalies, not predefined rules

Requires baseline learning period (e.g., 30 days)

Outputs risk scores and anomaly descriptions

SIEM

Detects known threats via correlation rules

Uses rule-based matching and correlation

Focuses on matching events to signatures

Works immediately with configured rules

Outputs alerts based on rule matches

Watch Out for These

Mistake

UEBA is the same as SIEM.

Correct

UEBA uses machine learning to detect anomalies based on behavioral baselines, while SIEM uses rule-based correlation to match events against predefined rules. They are complementary but distinct.

Mistake

UEBA can detect all types of attacks.

Correct

UEBA is effective for anomalies but may miss attacks that blend into normal behavior (e.g., slow data exfiltration). It also requires tuning to reduce false positives.

Mistake

UEBA requires no historical data to start.

Correct

UEBA needs a baseline window (typically 30 days) to learn normal behavior. Without it, the system produces many false positives or fails to detect anomalies.

Mistake

UEBA only monitors user behavior.

Correct

UEBA monitors both users and entities (servers, applications, devices). The 'E' in UEBA stands for Entity, which includes non-user entities.

Mistake

UEBA is a standalone tool that replaces other security tools.

Correct

UEBA is often integrated with SIEM, EDR, and SOAR to provide enriched detection. It is not a replacement but an enhancement.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between UEBA and UBA?

UEBA (User and Entity Behavior Analytics) extends UBA by also monitoring non-user entities like servers, applications, and network devices. UBA only focuses on user behavior. The CS0-003 exam uses the term UEBA to emphasize entity coverage.

How long does it take for UEBA to establish a baseline?

Typically 30 days of historical data is recommended for accurate baselines. Some vendors offer a minimum of 7-14 days, but accuracy improves with more data. New entities may use peer group baselines initially.

Can UEBA detect zero-day attacks?

Yes, because UEBA does not rely on signatures. It detects deviations from normal behavior, so any unusual activity—even if never seen before—can be flagged. However, if the attack mimics normal behavior, it may be missed.

What is a peer group in UEBA?

A peer group is a set of entities with similar roles or characteristics, such as all employees in the HR department. UEBA compares an entity's behavior to its peer group to identify outliers. For example, an HR employee accessing source code repositories would be anomalous.

How does UEBA handle false positives?

Analysts can dismiss alerts and provide feedback. The model learns from this feedback to adjust baselines and thresholds. Tuning the anomaly threshold and excluding known-good behaviors (whitelisting) also reduces false positives.

What are the most important data sources for UEBA?

Authentication logs are the most critical because they capture login activity, which is a strong indicator of compromised accounts. Other important sources include VPN logs, DNS logs, proxy logs, and data access logs.

Is UEBA a replacement for SIEM?

No, UEBA complements SIEM. SIEM provides centralized log management and rule-based correlation, while UEBA adds behavioral anomaly detection. Many organizations integrate UEBA alerts into their SIEM for a unified view.

Terms Worth Knowing

Ready to put this to the test?

You've just covered User and Entity Behaviour Analytics (UEBA) — now see how well it sticks with free CS0-003 practice questions. Full explanations included, no account needed.

Done with this chapter?