This chapter covers Service Level Agreements (SLAs) for security services and the role of Managed Security Service Providers (MSSPs) in enterprise security. These topics map to CompTIA Security+ SY0-701 Objective 5.4 (Explain security compliance and other concepts) and are essential for understanding how organizations contract external security expertise. You will learn how SLAs define measurable performance targets, how MSSPs deliver monitoring and response, and how to evaluate these agreements to avoid common pitfalls. This knowledge is critical for the exam and for real-world security management.
Jump to a section
Imagine you own a fleet of delivery trucks and you purchase a comprehensive insurance policy that includes roadside assistance. The insurance company (MSSP) provides a 24/7 hotline, towing, flat tire repair, and jump-starts, but they do not own or drive your trucks. You (the client) still own the vehicles and are responsible for routine maintenance, driver training, and fuel. The service level agreement (SLA) specifies response times: for a flat tire, they must arrive within 30 minutes; for a major breakdown, within 2 hours. If they fail, you get a partial refund. However, the policy excludes wear-and-tear repairs and damage from improper loading. In this analogy, the trucks are your IT assets, the roadside assistance is the MSSP's security monitoring and incident response, the SLA defines guaranteed metrics (e.g., time to detect a breach, time to respond), and exclusions are services you must handle internally. The MSSP does not take ownership of your risk—they provide a defined set of services under contract. Just as you wouldn't expect the insurance company to replace your engine due to poor maintenance, you shouldn't expect an MSSP to patch your internal vulnerabilities unless specified. The SLA is the contract that makes the relationship clear and enforceable.
What are Security SLAs?
A Service Level Agreement (SLA) is a contractual commitment between a service provider and a client that defines the expected level of service, performance metrics, remedies for failures, and exclusions. In the context of security, an SLA typically covers aspects like uptime of security tools, time to detect incidents, time to respond, and reporting frequency. SLAs are not just technical documents; they are legal contracts that allocate risk and responsibility.
How Security SLAs Work Mechanically
An SLA specifies key performance indicators (KPIs) and service level objectives (SLOs). For example, an MSSP might guarantee 99.9% uptime for its Security Information and Event Management (SIEM) platform, meaning no more than 8.76 hours of downtime per year. If the SIEM is down for 10 hours, the SLA may trigger a service credit (e.g., 5% refund of monthly fee). The SLA also defines measurement methodology: uptime is calculated as total minutes in a month minus unscheduled downtime, divided by total minutes. It excludes planned maintenance windows (e.g., 2 AM to 4 AM on Sundays).
For incident response, an SLA might specify: "Time to Triage" (TTT) within 15 minutes of alert generation, "Time to Respond" (TTR) within 1 hour for critical incidents, and "Time to Resolve" (TTRes) within 8 hours. The MSSP uses a ticketing system with timestamps to measure these. The client can audit these metrics via monthly reports. If the MSSP fails to meet the TTR for three incidents in a quarter, the client may terminate the contract without penalty.
Key Components, Variants, and Standards
Metrics: Common security SLA metrics include:
- Mean Time to Detect (MTTD) - Mean Time to Respond (MTTR) — note: sometimes MTTR means "repair" or "resolve" - Mean Time to Contain (MTTC) - Uptime/Availability (e.g., 99.9% for SIEM) - False Positive Rate (e.g., <5%) - Report Delivery (e.g., within 5 business days of month end) - Remedies: Service credits (e.g., 10% of monthly fee per hour of downtime), termination rights, or escalation paths. - Exclusions: The SLA typically lists what is NOT covered, such as:
- Incidents caused by client's failure to patch systems - Attacks exceeding a certain volume (e.g., DDoS over 100 Gbps) - Zero-day exploits before a signature is available - Standards: While there is no single SLA standard, frameworks like ITIL and ISO 20000 provide guidance. The US National Institute of Standards and Technology (NIST) SP 800-86 offers best practices for incident handling that can inform SLAs.
How Organizations Deploy SLAs
When procuring an MSSP, the client drafts a Request for Proposal (RFP) that includes desired SLA metrics. The MSSP responds with proposed SLAs. Negotiation focuses on realistic targets: a 1-minute MTTD for all alerts is unrealistic; a 5-minute MTTD for critical alerts may be achievable. The client must also define how alerts are categorized (critical, high, medium, low) and what constitutes an incident. For example, a single failed login is an alert, not an incident; 10 failed logins in 5 minutes may be an incident.
Real Command/Tool Examples
While SLAs are contractual, tools help monitor them. For example, using a SIEM like Splunk or ELK, you can query:
source="ids.log" | stats count by signature | where count > 100 | table _time, signatureThis might be used to measure detection time. For MSSP reporting, the provider might use a platform like ServiceNow to track tickets. A sample SLA report might include:
Total alerts: 10,000
Critical alerts: 200
Average MTTD: 4.2 minutes
Average MTTR: 45 minutes
SLA breaches: 2 (both due to delayed response on medium incidents)
Variants: In-House vs. Co-Managed
Some organizations use a co-managed model where the MSSP handles Level 1 monitoring and the internal team handles Level 2/3 response. The SLA then defines handoff times: e.g., MSSP must escalate to client within 10 minutes of detecting a critical incident. Other variants include fully outsourced (MSSP does everything) and hybrid (MSSP provides tools but client staff operate them).
Common Pitfalls
Overly aggressive SLAs: A 99.999% uptime (5 minutes downtime per year) is expensive and may not be necessary.
Vague definitions: "Timely response" is not measurable; use specific minutes.
No measurement methodology: The SLA must state how metrics are calculated (e.g., using NTP-synchronized clocks).
Ignoring exclusions: Clients often miss that zero-day attacks are excluded, leading to surprise when MSSP doesn't cover them.
MSSP Integration
An MSSP typically deploys sensors (e.g., network taps, endpoint agents) on the client's network. The client must provide access (e.g., VPN, firewall rules). The SLA may include a "readiness" metric: e.g., MSSP must have full visibility within 30 days of contract signing. The MSSP's Security Operations Center (SOC) analysts monitor alerts 24/7 and follow playbooks defined in the SLA. For example, a playbook for ransomware might include: isolate the host, block the C2 IP, notify the client, and preserve forensic data.
Legal Considerations
SLAs often include limitation of liability clauses. For example, the MSSP's liability may be capped at the total fees paid in the last 12 months. This means if a breach causes $1M loss, the client may only recover $100k. The SLA should also address data ownership: who owns the logs and alerts? Typically the client owns the data, but the MSSP may retain copies for a period (e.g., 90 days).
Exam Relevance
For SY0-701, you need to understand:
The purpose of SLAs (to define expectations and remedies)
Common metrics (MTTD, MTTR, uptime)
The difference between SLA, SLO, and KPI
How SLAs relate to outsourcing security (MSSP)
That SLAs do not transfer legal liability for security breaches
Define Security Requirements
The client identifies what security services are needed: 24/7 monitoring, incident response, threat intelligence, etc. They also determine internal capabilities—what they can handle themselves. For example, a small e-commerce company may lack a SOC, so they need full monitoring. They document required metrics: detection time under 10 minutes for critical alerts, response under 1 hour. This step produces a list of SLAs to include in the RFP.
Draft and Negotiate SLA
The client writes an RFP with desired SLAs. MSSPs respond with proposals. Negotiation involves adjusting metrics to realistic levels—e.g., 5-minute detection may be too fast for all alert types; they agree on 10 minutes for critical, 30 for medium. They define measurement methods (e.g., using ticketing system timestamps) and exclusions (e.g., DDoS above 50 Gbps). The final SLA is signed as part of the contract.
Deploy MSSP Infrastructure
The MSSP installs sensors: network taps, endpoint agents, log collectors. The client configures firewall rules to allow outbound traffic from sensors to MSSP's SIEM. The MSSP validates that logs are flowing and alerts are generating. This step may have its own SLA (e.g., full deployment within 30 days). The client's IT team provides credentials and access as needed.
Monitor and Measure Performance
The MSSP's SOC analysts monitor alerts and follow playbooks. They log every action in a ticketing system. Monthly, they generate an SLA report showing metrics: total alerts, MTTD, MTTR, uptime, breaches. The client reviews the report. If metrics are missed, they request service credits per the SLA. For example, if SIEM uptime was 99.8% (below 99.9%), the client gets a 5% credit.
Review and Update SLA
Annually, the client and MSSP review the SLA. They may adjust metrics based on changing threats or business needs. For example, if ransomware attacks increased, they might lower MTTR for critical incidents. They also review exclusions—e.g., if the MSSP now offers a zero-day protection module, they may add it and adjust the SLA. The updated SLA is signed for the next term.
Scenario 1: Retail Company with PCI DSS Compliance
A mid-sized retailer outsources SIEM monitoring to an MSSP to meet PCI DSS Requirement 10 (log monitoring). The SLA specifies: MTTD < 15 minutes for critical alerts, SIEM uptime 99.9%, and monthly compliance reports. One month, the MSSP fails to detect a brute-force attack on a POS system for 45 minutes. The client's internal auditor catches it. The client reviews the SLA report and finds the MSSP's MTTD was 12 minutes on average, but the specific incident was missed due to a rule misconfiguration (the MSSP's rule for failed logins was set to 50 attempts per minute, but the attack used 40). The client invokes the SLA breach clause and receives a 10% service credit. The MSSP updates the rule. The client also learns to specify that all PCI-related alerts must be treated as critical, regardless of volume.
Scenario 2: Healthcare Provider with Ransomware Incident
A hospital network contracts an MSSP for 24/7 SOC services. The SLA includes MTTR < 1 hour for critical incidents. A ransomware attack encrypts files on a file server. The MSSP detects the initial lateral movement within 5 minutes (MTTD met) but takes 2 hours to respond because the playbook required client authorization before isolation. The SLA did not specify the authorization process. The client argues that MTTR should include the time to get approval. The MSSP counters that the SLA excludes delays caused by client. Ultimately, they revise the SLA to include a pre-approved isolation list and a 15-minute authorization window. The incident causes 4 hours of downtime, but the SLA breach leads to a 15% credit. The client also realizes they need a separate SLA for backup restoration (the MSSP did not handle backups).
Common Mistake: Overlooking Exclusions
A financial firm signs an MSSP contract with an SLA that promises "detection of all known malware." A new variant of Emotet (CVE-2023-1234) infects the network, and the MSSP does not detect it for 8 hours because no signature existed. The client demands a credit, but the SLA excludes zero-day threats. The client had not negotiated a separate zero-day detection SLA (e.g., using behavioral analysis). The lesson: always review exclusions and negotiate coverage for emerging threats if needed.
Exactly What SY0-701 Tests
Objective 5.4 covers "Explain security compliance and other concepts." The exam expects you to:
Define SLA, SLO, and KPI and differentiate them.
Identify common security SLA metrics (MTTD, MTTR, uptime).
Understand that SLAs are contractual, not technical controls.
Recognize that MSSPs provide outsourced security services under an SLA.
Know that SLAs do not eliminate the client's responsibility for security—they define service levels, not liability.
Common Wrong Answers and Why
"SLAs guarantee 100% security." Wrong: SLAs guarantee service levels, not prevention of all breaches. Candidates choose this because they confuse service guarantees with security guarantees.
"MSSPs take full responsibility for all security incidents." Wrong: The client retains ultimate responsibility. Candidates think outsourcing transfers liability, but it does not.
"MTTR always means 'time to resolve'." Wrong: MTTR can mean respond, repair, or resolve. The exam may specify which one. Candidates assume one definition.
"SLAs are optional for MSSP contracts." Wrong: SLAs are standard and often legally required. Candidates may think they are just nice-to-have.
Specific Terms and Values
SLA (Service Level Agreement) : Contract defining service expectations.
SLO (Service Level Objective) : Target metric within SLA (e.g., 99.9% uptime).
KPI (Key Performance Indicator) : Measurable value (e.g., number of incidents).
MTTD (Mean Time to Detect) : Average time from incident occurrence to detection.
MTTR (Mean Time to Respond) : Average time from detection to response action.
Uptime: Percentage of time a service is operational (often 99.9% or 99.99%).
Service Credit: Refund for SLA breach (e.g., 5% of monthly fee).
Common Trick Questions
A question might ask: "Which document defines the expected response time for a security incident?" Answer: SLA (not policy, not BCP).
A scenario might describe an MSSP missing a detection time—what is the consequence? Answer: Service credit, not termination (unless specified).
They might ask: "Who is ultimately responsible for security when using an MSSP?" Answer: The client organization.
Decision Rule for Eliminating Wrong Answers
If a question asks about SLAs in a scenario, eliminate any answer that:
Suggests the SLA guarantees no breaches.
Implies the MSSP assumes all liability.
Uses vague terms like "timely" without metrics.
Confuses SLA with a security policy or procedure. The correct answer will reference specific metrics, contractual remedies, or the shared responsibility model.
SLA (Service Level Agreement) is a contract defining expected service levels, metrics, and remedies.
Common SLA metrics include MTTD (Mean Time to Detect), MTTR (Mean Time to Respond), and uptime percentage.
MSSPs provide outsourced security monitoring and response under an SLA.
The client retains ultimate responsibility for security; MSSPs augment, not replace, internal efforts.
Service credits are partial refunds for SLA breaches, not full compensation for losses.
SLAs must be specific, measurable, and include exclusions and measurement methodology.
For SY0-701, know the difference between SLA, SLO, and KPI, and that SLAs are contractual, not technical controls.
These come up on the exam all the time. Here's how to tell them apart.
SLA (Service Level Agreement)
A contract between provider and client
Includes multiple metrics and remedies
Legally binding
Example: 'SIEM uptime must be 99.9%'
Defines consequences for non-compliance
SLO (Service Level Objective)
A specific target within an SLA
A single measurable goal
Not a contract itself
Example: '99.9% uptime' is an SLO
Part of the SLA, not standalone
MSSP (Managed Security Service Provider)
Outsourced security monitoring
Cost-effective for small teams
24/7 coverage without hiring
Limited customization of tools
Client retains ultimate responsibility
In-House SOC
Internal team of security analysts
Higher cost (salaries, tools)
Full control over processes
Can tailor tools to environment
Full responsibility for operations
Mistake
An SLA guarantees that the MSSP will prevent all security incidents.
Correct
An SLA defines service levels like detection time and uptime; it does not guarantee prevention. Incidents can still occur, and the SLA provides remedies (e.g., credits) if the MSSP fails to meet metrics.
Mistake
Once you sign an MSSP contract, you no longer need internal security staff.
Correct
The client retains responsibility for security governance, patching, user training, and often incident response coordination. The MSSP augments, not replaces, internal capabilities.
Mistake
MTTR always means 'mean time to resolve'.
Correct
MTTR can stand for 'respond', 'repair', or 'resolve' depending on context. The SLA must define which meaning is used. The exam may test this ambiguity.
Mistake
Service credits fully compensate for losses from a security incident.
Correct
Service credits are typically a small percentage of the monthly fee (e.g., 5-10%). They do not cover business losses like downtime, data loss, or reputation damage.
Mistake
All MSSP SLAs are standardized and non-negotiable.
Correct
SLAs are negotiated between client and provider. Clients can request custom metrics, exclusions, and remedies. Standard SLAs are starting points.
An SLA is a contract that includes multiple SLOs (Service Level Objectives). The SLO is a specific target, like 99.9% uptime, while the SLA defines the overall agreement, including remedies for missed SLOs. For the exam, remember that SLOs are the individual metrics within an SLA.
No. The client retains ultimate responsibility for their security posture. The MSSP provides defined services under an SLA, but the client must still manage internal policies, patching, and user training. The SLA does not transfer legal liability.
Typically, the client receives service credits (e.g., 5-10% of monthly fee). Repeated failures may allow contract termination. The SLA specifies the remedy. Credits do not cover business losses.
No. SLAs guarantee service levels, not security outcomes. No provider can prevent all incidents. The SLA defines how quickly they will detect and respond, not that incidents won't happen.
Common exclusions include zero-day attacks (before signatures are available), attacks exceeding a certain volume (e.g., DDoS > 100 Gbps), incidents caused by client's failure to patch, and planned maintenance windows. Always review exclusions carefully.
MTTR can mean Mean Time to Respond, Mean Time to Repair, or Mean Time to Resolve. The SLA must specify which definition is used. For incident response, it often means time from detection to first response action.
Include specific metrics (MTTD, MTTR, uptime), measurement methodology, reporting frequency, exclusions, remedies (service credits, termination rights), and escalation paths. Also define alert severity levels and incident classification.
You've just covered Security SLAs and MSSPs — now see how well it sticks with free SY0-701 practice questions. Full explanations included, no account needed.
Done with this chapter?