Knowledge + Practice

CCNA Soa Monitoring Logging Questions

75 of 302 questions · Page 1/5 · Soa Monitoring Logging topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQhard

A SysOps administrator is troubleshooting an application that runs on EC2 instances behind an ALB. Users report intermittent 503 errors. The administrator checks the ALB access logs and finds entries with 'elb_status_code' 503 and 'target_status_code' '-'. What is the most likely cause?

A.The target instances are unhealthy, causing the ALB to return 503.

B.The SSL certificate on the ALB has expired.

C.The target instances have high CPU utilization.

D.The security group on the ALB is blocking traffic.

AnswerA

If all targets are unhealthy, ALB returns 503.

Why this answer

The ALB access log entry with `elb_status_code` 503 and `target_status_code` '-' indicates that the load balancer itself generated the 503 error because it could not establish a connection to any healthy target. The dash for the target status code means the request never reached a target instance, which occurs when all targets in the target group are marked unhealthy by the health checks. This is the most common cause of intermittent 503 errors with an ALB.

Exam trap

The trap here is that candidates often confuse a 503 error with target-side issues (like high CPU or application errors), but the dash in the target_status_code is the key indicator that the ALB itself is rejecting the request due to no healthy targets, not that the request reached a target and failed.

How to eliminate wrong answers

Option B is wrong because an expired SSL certificate on the ALB would cause TLS handshake failures (e.g., 502 or 525 errors), not a 503 with a dash for the target status code. Option C is wrong because high CPU utilization on target instances would still allow the ALB to forward requests to them (resulting in a target_status_code like 200 or 500), but the dash indicates no connection was attempted. Option D is wrong because the ALB's security group controls inbound traffic to the load balancer; if it were blocking traffic, clients would receive a 504 or connection timeout, not a 503, and the access log would show a different elb_status_code.

Practice this question →

2

MCQhard

A SysOps administrator receives an alarm that an EC2 instance's status check has failed. The instance is part of an Auto Scaling group behind an Application Load Balancer. The administrator needs to ensure that the instance is automatically replaced and that the root cause is investigated. What is the MOST efficient combination of actions to achieve this?

A.Configure an Auto Scaling lifecycle hook to terminate the unhealthy instance and send the instance system log to an S3 bucket for analysis.

B.Create a CloudWatch alarm that triggers an SNS notification to the administrator to manually replace the instance.

C.Reboot the instance from the AWS Management Console and then review CloudTrail logs.

D.Manually stop and start the instance to recover it, then check the system logs.

AnswerA

Lifecycle hooks allow custom actions before termination, and system logs help with root cause analysis.

Why this answer

Option A is correct because it combines automatic instance replacement via the Auto Scaling group's health check (which marks the instance unhealthy and terminates it) with a lifecycle hook that captures the instance's system log before termination and sends it to S3 for root cause analysis. This is the most efficient approach as it requires no manual intervention and preserves diagnostic data.

Exam trap

The trap here is that candidates may think manual actions (reboot, stop/start) are sufficient for recovery, but the question explicitly requires automatic replacement and root cause investigation, which only a lifecycle hook with data capture provides.

How to eliminate wrong answers

Option B is wrong because it relies on manual replacement via SNS notification, which is inefficient and violates the requirement for automatic replacement. Option C is wrong because rebooting an instance with a failed status check does not address the underlying issue and does not automatically replace the instance; CloudTrail logs record API calls, not system-level diagnostics. Option D is wrong because manually stopping and starting the instance is not automatic and does not guarantee recovery; it also fails to capture diagnostic data for root cause analysis.

Practice this question →

3

MCQeasy

A SysOps administrator receives a notification that an EC2 instance's status check has failed. The instance is part of an Auto Scaling group. What is the immediate impact on the application?

A.The instance is still accessible and serving traffic.

B.The instance is immediately terminated.

C.The instance is automatically stopped and started.

D.The Auto Scaling group will launch a new instance to replace the failed one, potentially causing temporary downtime.

AnswerD

Auto Scaling replaces the instance, but there may be a brief interruption.

Why this answer

When an EC2 instance fails a status check, the Auto Scaling group detects the failure and initiates a replacement by launching a new instance. However, the failed instance is not immediately terminated; it may remain in a stopped or impaired state until the replacement is fully in service, which can cause temporary downtime for the application if the instance was actively handling traffic.

Exam trap

The trap here is that candidates assume the Auto Scaling group immediately terminates the failed instance (Option B), but in reality, the group waits for a health check grace period and the replacement process is not instantaneous, causing temporary downtime.

How to eliminate wrong answers

Option A is wrong because a failed status check indicates the instance is impaired (e.g., unreachable due to OS-level issues or hardware problems), so it is not accessible or serving traffic. Option B is wrong because the Auto Scaling group does not immediately terminate the instance; it first waits for the health check grace period and then performs a gradual replacement, and the instance may be terminated only after the new one is ready. Option C is wrong because EC2 status check failures do not automatically stop and start the instance; that action would require a manual or automated recovery via CloudWatch alarms or EC2 auto-recovery, not the Auto Scaling group's default behavior.

Practice this question →

4

Matchingmedium

Match each AWS monitoring tool to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Centralized log storage and analysis

Time-series data points

Trigger actions based on metrics

Event-driven automation (now EventBridge)

Customizable monitoring views

Why these pairings

These are components of Amazon CloudWatch.

Practice this question →

5

MCQmedium

A company uses an Amazon S3 bucket to store sensitive data. The SysOps administrator needs to be notified within 15 minutes if any object in the bucket becomes publicly accessible. Which solution will meet this requirement with the least operational overhead?

A.Configure an S3 event notification for all object creation events and publish to an Amazon SNS topic that sends an email alert.

B.Use an AWS Config managed rule to detect 's3-bucket-public-read-prohibited' and trigger an SNS notification via Amazon EventBridge.

C.Enable Amazon CloudTrail data events for the S3 bucket and create a CloudWatch Logs metric filter for PutObjectAcl (or PutObject with public ACL) and set an alarm.

D.Configure S3 event notifications for 's3:ObjectCreated:Put' and 's3:ObjectCreated:PutObjectAcl' with a suffix/prefix filter for public grants, sending to an SNS topic.

AnswerD

Correct. This allows real-time notification specifically when objects are created with public ACLs, meeting the requirement with minimal overhead.

Why this answer

Option D is correct because S3 event notifications can be configured specifically for `s3:ObjectCreated:Put` and `s3:ObjectCreated:PutObjectAcl` events, and you can filter by prefix/suffix to detect public grants (e.g., `public-read` or `public-read-write`). This directly triggers an SNS topic for near-real-time notification within seconds, meeting the 15-minute requirement with minimal overhead, as no additional services or complex configurations are needed.

Exam trap

The trap here is that candidates often choose CloudTrail or Config because they associate them with security monitoring, but they overlook the latency and overhead of those services compared to the direct, low-latency S3 event notification mechanism designed for real-time object-level alerts.

How to eliminate wrong answers

Option A is wrong because S3 event notifications for all object creation events do not filter for public ACLs; they would trigger on every object upload, causing noise and failing to specifically detect public accessibility. Option B is wrong because the AWS Config managed rule `s3-bucket-public-read-prohibited` checks the bucket-level policy, not individual object ACLs, and Config evaluations run every 10 minutes or on configuration changes, which may not guarantee notification within 15 minutes for object-level changes. Option C is wrong because CloudTrail data events for S3 have a latency of up to 15 minutes (often longer) for delivery to CloudWatch Logs, and metric filters plus alarms add complexity and potential delay, making it unreliable for the 15-minute requirement and higher operational overhead.

Practice this question →

6

Multi-Selectmedium

A company is using an Auto Scaling group with a dynamic scaling policy based on average CPU utilization. The SysOps administrator notices that the scaling is not triggering as expected. Which THREE steps should the administrator take to troubleshoot the issue?

Select 3 answers

A.Check the scaling activity history in the Auto Scaling group for any errors or cooldown periods.

B.Ensure that the EC2 instances are passing the ELB health checks.

C.Review the scaling policy's cooldown period and threshold settings.

D.Verify that the CloudWatch alarm associated with the scaling policy is in ALARM state when CPU is high.

E.Manually increase the desired capacity to see if the scaling policy takes effect.

AnswersA, C, D

Activity history shows why scaling did not occur.

Why this answer

Option A is correct because the scaling activity history provides a log of all scaling actions, including errors, cooldown periods, and why a scaling event was or was not triggered. By reviewing this history, the administrator can identify if the scaling policy was blocked by a cooldown period, if the alarm state was not reached, or if there were any configuration errors that prevented the scaling action from executing.

Exam trap

The trap here is that candidates may confuse ELB health checks with the metric-based alarm that drives scaling, or think that manually adjusting capacity is a valid diagnostic step, when in fact it bypasses the automated policy logic and does not reveal why the policy failed to trigger.

Practice this question →

7

MCQhard

A SysOps administrator is troubleshooting an issue where an EC2 instance running a web server is not reachable from the internet. The instance has a public IP and is in a public subnet. The security group allows HTTP and HTTPS from 0.0.0.0/0. The network ACL allows all inbound and outbound traffic. What should the administrator check NEXT?

A.Check that the instance is associated with an Elastic IP address.

B.Verify that the subnet's route table has a route to an internet gateway.

C.Confirm that the instance's operating system firewall is disabled.

D.Review the VPC Flow Logs for the instance's network interface.

AnswerB

Without a route to the internet gateway, traffic cannot reach the internet.

Why this answer

The instance is in a public subnet with a public IP and security group allowing HTTP/HTTPS, and the network ACL allows all traffic. The most likely remaining issue is that the subnet's route table lacks a route to an internet gateway (IGW), which is required for traffic to and from the internet. Without this route, the instance cannot send responses back to internet clients, making it unreachable despite having a public IP.

Exam trap

The trap here is that candidates often assume a public IP and permissive security groups are sufficient for internet access, overlooking the critical requirement of a route table entry pointing to an internet gateway for the subnet.

How to eliminate wrong answers

Option A is wrong because an Elastic IP is not required for internet connectivity; an instance with a public IP (auto-assigned) can already be reached from the internet if routing is correct. Option C is wrong because the question states the instance is not reachable from the internet, and while an OS firewall could block traffic, the more fundamental network-level routing issue should be checked first, and the OS firewall is not the most likely cause given the security group and NACL are permissive. Option D is wrong because VPC Flow Logs are useful for analyzing traffic that has already reached the network interface, but if the route table lacks an IGW route, traffic never reaches the instance, so flow logs would not show the missing route and are not the next logical check.

Practice this question →

8

MCQhard

A SysOps administrator notices that an Amazon RDS for MySQL instance's CPU utilization is consistently above 80% during business hours. The administrator wants to identify the queries causing the high load without impacting performance. Which action should be taken?

A.Enable the MySQL slow query log and store it in CloudWatch Logs.

B.Enable Performance Insights on the RDS instance.

C.Enable Enhanced Monitoring to get OS-level metrics.

D.Increase the retention period for CloudWatch metrics to 15 months.

AnswerB

Performance Insights provides a real-time dashboard of database performance and top SQL queries with minimal overhead.

Why this answer

Performance Insights provides a database-specific performance schema that visualizes database load and identifies the SQL queries responsible for high CPU utilization. It operates with minimal overhead by sampling the database engine's internal performance data, making it ideal for diagnosing query performance issues without impacting the production workload.

Exam trap

The trap here is that candidates often confuse Enhanced Monitoring (OS-level metrics) with Performance Insights (database-level query analysis), or assume the slow query log is the best tool for identifying all high-CPU queries despite its threshold-based limitation.

How to eliminate wrong answers

Option A is wrong because the MySQL slow query log captures only queries that exceed a defined execution time threshold, not all queries causing high CPU utilization, and enabling it can add I/O overhead that may impact performance. Option C is wrong because Enhanced Monitoring provides OS-level metrics (CPU, memory, disk I/O) but does not identify which specific SQL queries are consuming CPU resources. Option D is wrong because increasing CloudWatch metric retention to 15 months only preserves historical data for long-term analysis, it does not help identify current queries causing high CPU load.

Practice this question →

9

Multi-Selectmedium

A SysOps administrator is investigating a performance issue with an Amazon RDS for PostgreSQL instance. The administrator has enabled Performance Insights. Which TWO metrics from Performance Insights can help identify the root cause of a sudden increase in database load? (Choose TWO.)

Select 2 answers

A.Read IOPS and Write IOPS.

B.Average Active Sessions.

C.DB Load by Wait Events.

D.CPUUtilization percentage.

E.Top SQL queries by DB Load.

AnswersC, E

This shows the distribution of load across different wait events, helping pinpoint the type of contention.

Why this answer

Performance Insights measures database load in units of Average Active Sessions (AAS). The 'DB Load by Wait Events' metric breaks down this load by the specific wait events (e.g., I/O, locks, CPU) that are causing sessions to wait, directly pinpointing the bottleneck. This is the primary diagnostic view for identifying the root cause of a sudden load increase.

Exam trap

The trap here is that candidates confuse 'Average Active Sessions' (the overall load metric) with 'DB Load by Wait Events' (the breakdown), or they mistakenly think raw I/O metrics like IOPS are sufficient to diagnose database-level contention, when in fact wait event analysis is required to isolate the specific resource bottleneck.

Practice this question →

10

MCQhard

A SysOps administrator needs to ensure that all S3 buckets in the account are logged to CloudTrail for data events. The administrator enables CloudTrail with data events for S3 and selects 'All buckets' in the current account. However, after a week, they notice that some buckets are not being logged. What is the most likely reason?

A.The IAM user who created the trail does not have s3:PutObject permissions on the buckets.

B.The S3 buckets do not have a bucket policy that allows CloudTrail to write the log files.

C.The S3 buckets are in a different AWS Region from the CloudTrail trail.

D.The S3 buckets have server access logging enabled, which conflicts with CloudTrail logging.

AnswerB

Without the appropriate bucket policy, CloudTrail cannot deliver logs to the target bucket.

Why this answer

When CloudTrail delivers S3 data event logs to a destination bucket, it writes log files on behalf of the trail. Even if the trail is configured to log data events for 'All buckets,' CloudTrail must have explicit permissions to write to the destination bucket. The destination bucket requires a bucket policy that grants CloudTrail the s3:PutObject action; without this policy, CloudTrail cannot deliver logs for any bucket, including those being monitored.

Option B correctly identifies this missing bucket policy as the most likely reason some buckets are not logged.

Exam trap

The trap here is that candidates assume enabling CloudTrail with 'All buckets' automatically grants write permissions, but they overlook the critical requirement of a bucket policy on the destination bucket that explicitly allows CloudTrail to deliver logs.

How to eliminate wrong answers

Option A is wrong because the IAM user who created the trail does not need s3:PutObject permissions on the buckets being logged; CloudTrail itself writes the logs to the destination bucket, and the trail creation only requires permissions to create the trail and configure logging, not to write to each source bucket. Option C is wrong because CloudTrail can log data events for S3 buckets in any region as long as the trail is configured with 'All buckets' or a bucket ARN that includes the region; regional mismatch does not prevent logging. Option D is wrong because server access logging and CloudTrail data event logging are independent features that can coexist on the same bucket without conflict; enabling one does not disable the other.

Practice this question →

11

MCQmedium

An environment has 12 individual CloudWatch metric alarms covering CPU, memory, disk, and network. When one instance degrades, all 12 alarms fire simultaneously and send 12 separate notifications to the on-call engineer. The team wants a single notification per incident regardless of how many individual alarms trigger. What CloudWatch feature addresses this?

A.Create a composite alarm that enters ALARM state when any of the 12 child alarms is in ALARM state, and configure a single SNS action on the composite alarm only

B.Increase the alarm evaluation period on all 12 alarms to 30 minutes so they fire less frequently

C.Use an SNS topic with a delivery policy that batches notifications sent within a 60-second window

D.Configure all 12 alarms to write to the same CloudWatch Events rule and suppress duplicate events with EventBridge deduplication

AnswerA

The composite alarm's rule expression 'ALARM(alarm1) OR ALARM(alarm2) OR ...' triggers when any child fires. By routing all notifications through the composite alarm's action and removing actions from the child alarms, exactly one notification is sent per incident. Child alarm states remain visible in the console for root cause analysis.

Why this answer

Option A is correct because a composite alarm in CloudWatch can aggregate multiple child alarms into a single parent alarm. When any of the 12 child alarms enters the ALARM state, the composite alarm transitions to ALARM and triggers a single SNS notification, thereby reducing alert noise to one notification per incident.

Exam trap

The trap here is that candidates may think SNS batching or EventBridge deduplication can consolidate separate alarm notifications, but those services do not aggregate distinct alarm state changes into a single event; only composite alarms provide that logical grouping.

How to eliminate wrong answers

Option B is wrong because increasing the evaluation period to 30 minutes does not consolidate multiple notifications into one; it merely delays the alarms, and all 12 would still fire individually after the longer period. Option C is wrong because SNS delivery policies control retries and message batching for HTTP/HTTPS endpoints, not deduplication or aggregation of separate alarm notifications; each alarm still sends its own message to the topic. Option D is wrong because CloudWatch Events (now EventBridge) can route alarm state changes to targets, but EventBridge deduplication applies to events based on a deduplication ID and is designed for idempotent event processing, not for collapsing multiple distinct alarm events into a single notification.

Practice this question →

12

Multi-Selecteasy

A company needs to monitor the CPU and memory utilization of its EC2 instances. Which TWO services can be used to collect and visualize these metrics?

Select 2 answers

A.Amazon CloudWatch

B.AWS CloudTrail

C.Amazon CloudWatch Agent

D.AWS Config

E.AWS Systems Manager

AnswersA, C

CloudWatch collects CPU utilization by default and can display metrics.

Why this answer

Amazon CloudWatch is the native AWS monitoring service that collects and stores metrics such as CPU utilization and memory utilization from EC2 instances. However, by default, CloudWatch only captures hypervisor-level metrics (like CPU) and not in-guest metrics (like memory utilization). To collect memory utilization, you must install the Amazon CloudWatch Agent on the instance, which sends custom metrics to CloudWatch.

Together, CloudWatch and the CloudWatch Agent provide both collection and visualization of CPU and memory metrics.

Exam trap

The trap here is that candidates often assume CloudWatch alone collects all EC2 metrics, but they miss that memory utilization requires the CloudWatch Agent because it is an in-guest metric not provided by the hypervisor.

Practice this question →

13

MCQmedium

A company uses AWS CloudFormation to deploy infrastructure. The operations team wants to be notified when a stack update fails. What is the simplest way to achieve this?

A.Enable CloudTrail and create a metric filter for 'UpdateStack' events, then set an alarm.

B.Write a script that periodically checks the CloudFormation console for stack status and sends an email.

C.Create an Amazon EventBridge rule that matches CloudFormation events and triggers a Lambda function to send an SNS notification.

D.Configure an SNS topic in the CloudFormation stack's notification options.

AnswerD

CloudFormation can directly send stack events to SNS topics, including failure notifications.

Why this answer

Option D is correct because CloudFormation natively supports specifying an SNS topic in the stack's notification options, which automatically sends notifications on stack events such as failures, without requiring any additional services or custom code. This is the simplest and most direct method to notify the operations team when a stack update fails.

Exam trap

The trap here is that candidates often over-engineer the solution by choosing EventBridge or CloudTrail-based approaches, overlooking CloudFormation's built-in SNS notification feature as the simplest and most direct option.

How to eliminate wrong answers

Option A is wrong because CloudTrail logs API calls but does not directly trigger notifications; creating a metric filter and alarm adds unnecessary complexity when a built-in notification mechanism exists. Option B is wrong because writing a script to poll the CloudFormation console is inefficient, introduces latency, and violates the principle of using event-driven notifications over polling. Option C is wrong because while EventBridge with Lambda and SNS can work, it is more complex than the native SNS integration and requires custom code, making it not the simplest solution.

Practice this question →

14

MCQhard

A SysOps administrator creates this IAM policy for a monitoring application. The application needs to publish custom metrics to CloudWatch and retrieve information about EC2 instances and Auto Scaling groups. The application reports that it cannot list EC2 instances. What is the most likely reason?

A.The policy uses a wildcard for the ec2:Describe action; it should specify ec2:DescribeInstances.

B.The policy does not grant access to the us-east-1 region.

C.The policy is missing the cloudwatch:PutMetricData action for the monitoring application.

D.The policy should include ec2:DescribeTags to list instances.

AnswerA

The action 'ec2:Describe' without a wildcard does not match any action.

Why this answer

The policy uses a wildcard for the ec2:Describe action, but IAM does not support wildcards for action names in the format 'ec2:Describe*'. The correct action to list EC2 instances is 'ec2:DescribeInstances'. Because the wildcard is invalid, the policy grants no EC2 Describe permissions, causing the application to fail when listing instances.

Exam trap

The trap here is that candidates assume a wildcard like 'ec2:Describe*' is valid for IAM actions, but AWS IAM does not support wildcards in action names—only in resource ARNs—so the policy effectively grants no EC2 Describe permissions.

How to eliminate wrong answers

Option B is wrong because the policy does not specify a region restriction, so it implicitly applies to all regions; the error is not due to region access. Option C is wrong because the application reports it cannot list EC2 instances, not that it cannot publish metrics; missing cloudwatch:PutMetricData would cause a different failure. Option D is wrong because ec2:DescribeTags is not required to list instances; the core permission needed is ec2:DescribeInstances, and the wildcard issue is the root cause.

Practice this question →

15

Multi-Selecteasy

A SysOps administrator wants to monitor the CPU utilization of an Amazon RDS instance and receive an alert if it exceeds 90% for 5 consecutive minutes. Which TWO AWS services are required to set up this monitoring? (Choose TWO.)

Select 2 answers

A.Amazon Simple Notification Service (SNS)

B.AWS Config

C.Amazon RDS Enhanced Monitoring

D.Amazon CloudWatch Alarms

E.Amazon CloudWatch

AnswersD, E

CloudWatch Alarms can monitor the metric and trigger an alert.

Why this answer

Amazon CloudWatch is the service that collects and stores metrics such as CPU utilization from RDS instances. Amazon CloudWatch Alarms allow you to set a threshold (e.g., CPU > 90%) and evaluate it over a specified period (e.g., 5 consecutive minutes) to trigger an action, such as sending a notification via SNS.

Exam trap

The trap here is that candidates often confuse Enhanced Monitoring (which provides OS-level metrics) with the standard CloudWatch metrics (which already include CPU utilization), leading them to incorrectly select Enhanced Monitoring as a required service.

Practice this question →

16

MCQmedium

A SysOps administrator wants to be alerted when an EC2 instance's status check fails. The instance is part of an Auto Scaling group. What is the BEST approach?

A.Use Amazon EventBridge to detect status check failures.

B.Create a CloudWatch alarm on the 'StatusCheckFailed' metric.

C.Enable CloudTrail to monitor EC2 instance status changes.

D.Configure an Auto Scaling lifecycle hook to send a notification.

AnswerB

CloudWatch has built-in metrics for status checks.

Why this answer

Option B is correct because the 'StatusCheckFailed' metric is automatically published by EC2 to CloudWatch, and a CloudWatch alarm on this metric can directly trigger an SNS notification or other action when the status check fails. This is the simplest and most reliable method for alerting on instance health, regardless of whether the instance is in an Auto Scaling group.

Exam trap

The trap here is that candidates often confuse CloudTrail (API logging) with CloudWatch (metrics and alarms), or assume EventBridge is the best choice for all event-driven monitoring, when in fact CloudWatch alarms on the native 'StatusCheckFailed' metric are the simplest and most direct solution for status check alerts.

How to eliminate wrong answers

Option A is wrong because Amazon EventBridge can detect status check failures via EC2 instance state change events, but it does not natively capture the 'StatusCheckFailed' metric; it would require custom event patterns and is less direct than using CloudWatch alarms. Option C is wrong because CloudTrail records API calls (e.g., StartInstances, StopInstances), not status check results, so it cannot detect status check failures. Option D is wrong because Auto Scaling lifecycle hooks are designed for custom actions during instance launch or termination, not for monitoring ongoing instance health or status check failures.

Practice this question →

17

MCQhard

A company uses AWS Organizations to manage multiple accounts. The security team needs a centralized view of all API calls made across all accounts. Which solution should the SysOps administrator implement?

A.Use AWS Config aggregator to view configuration changes across accounts.

B.Create a CloudTrail trail in the management account that logs events for all accounts in the organization.

C.Use CloudWatch cross-account dashboards to view metrics from all accounts.

D.Enable CloudTrail in each account and have each account send logs to its own S3 bucket.

AnswerB

Organization trails centralize logging across accounts.

Why this answer

Option B is correct because AWS CloudTrail supports an organization trail that, when created in the management account, automatically logs API calls for all member accounts in the AWS Organization. This provides a centralized, single point of access to all API activity across the organization without needing to configure individual trails per account.

Exam trap

The trap here is that candidates may confuse AWS Config (which tracks configuration changes) with CloudTrail (which tracks API calls), or assume that individual account trails are sufficient for a centralized view, overlooking the simplicity and automatic coverage of an organization trail.

How to eliminate wrong answers

Option A is wrong because AWS Config aggregator provides a centralized view of resource configuration changes and compliance status, not API calls (which are logged by CloudTrail). Option C is wrong because CloudWatch cross-account dashboards aggregate metrics (e.g., CPU utilization, latency), not API call logs. Option D is wrong because sending logs to separate S3 buckets in each account does not provide a centralized view; it requires aggregating logs manually or using additional services like S3 replication or Athena, which is less efficient than an organization trail.

Practice this question →

18

MCQmedium

A SysOps administrator is troubleshooting an EC2 instance that is unresponsive. The administrator can SSH into the instance but finds that the CloudWatch agent is not sending custom metrics. The CloudWatch agent configuration file is at '/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json'. What should the administrator check first?

A.Verify that the IAM role attached to the EC2 instance has the CloudWatchAgentServerPolicy.

B.Ensure that the IAM user has permissions to access CloudWatch.

C.Check if the security group allows outbound traffic on port 443.

D.Run 'sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status' to check the agent status.

AnswerA

The IAM role must have permissions to put metrics.

Why this answer

The correct first check is to verify the IAM role attached to the EC2 instance has the CloudWatchAgentServerPolicy. The CloudWatch agent uses the instance's IAM role to obtain credentials for publishing metrics to CloudWatch. Without this policy, the agent will fail to send custom metrics even if it is running correctly and the instance has network connectivity.

Exam trap

The trap here is that candidates often jump to checking network connectivity (security group rules) or agent status first, overlooking that the IAM role permission is the most common root cause for a CloudWatch agent that is installed and running but not sending metrics.

How to eliminate wrong answers

Option B is wrong because the IAM user's permissions are irrelevant; the EC2 instance uses an IAM role, not a user, to access CloudWatch. Option C is wrong because while outbound HTTPS (port 443) is required for CloudWatch endpoints, the agent typically uses port 443 for TLS connections, but the most common cause of failure is missing IAM permissions, not network connectivity, especially when SSH works. Option D is wrong because checking the agent status is a valid troubleshooting step, but the question asks what to check first; verifying IAM permissions is the more fundamental prerequisite before investigating agent runtime issues.

Practice this question →

19

MCQmedium

A SysOps administrator needs to monitor the CPU utilization of an Amazon EC2 instance and receive an email notification when the metric exceeds 90% for 5 consecutive minutes. The solution should use the least operational overhead. Which combination of AWS services should be used?

A.Create a CloudWatch alarm on the CPUUtilization metric and configure the alarm to send a notification to an Amazon SNS topic with email subscriptions.

B.Create an Amazon EventBridge rule that triggers an AWS Lambda function to check the CPUUtilization metric and send an email via Amazon SES.

C.Configure the EC2 instance to publish CPU logs to Amazon CloudWatch Logs, then create a metric filter to detect high CPU and trigger an SNS notification.

D.Use AWS CloudTrail to monitor EC2 CPU metrics and send notifications to an Amazon SQS queue.

AnswerA

This is the simplest approach. CloudWatch natively monitors EC2 metrics and can trigger SNS notifications without any custom code.

Why this answer

Option A is correct because a CloudWatch alarm directly monitors the CPUUtilization metric for an EC2 instance and can be configured to evaluate whether the metric exceeds 90% for 5 consecutive minutes (e.g., 5 evaluation periods of 1 minute each). The alarm then publishes to an Amazon SNS topic, which sends email notifications to subscribed endpoints, requiring no additional infrastructure or code, thus minimizing operational overhead.

Exam trap

The trap here is that candidates may overcomplicate the solution by introducing Lambda or log-based filters, when the simplest and most direct path—a CloudWatch alarm on the existing CPUUtilization metric with an SNS action—is the correct answer for minimal operational overhead.

How to eliminate wrong answers

Option B is wrong because it introduces unnecessary complexity by using an EventBridge rule and a Lambda function to poll or process metrics, which increases operational overhead and latency compared to a native CloudWatch alarm. Option C is wrong because publishing CPU logs to CloudWatch Logs and creating a metric filter is designed for log-based metrics (e.g., parsing log entries), not for the native CPUUtilization metric, which is already available as a CloudWatch metric without logs. Option D is wrong because AWS CloudTrail records API calls and management events, not EC2 CPU utilization metrics, and cannot monitor or trigger notifications based on performance metrics.

Practice this question →

20

MCQhard

A company runs a web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The SysOps administrator notices that the application's response time is increasing during peak hours. The administrator wants to set up a CloudWatch dashboard that displays the average latency of requests across all instances and the number of healthy hosts. Which metrics should be used?

A.Use the ALB's 'TargetResponseTime' metric and the ALB's 'UnhealthyHostCount' metric.

B.Use the ALB's 'TargetResponseTime' metric and the ALB's 'HealthyHostCount' metric.

C.Use the ALB's 'RequestCount' metric and the EC2 Auto Scaling group's 'GroupInServiceInstances' metric.

D.Use the ALB's 'Latency' metric and the EC2 instance's 'CPUUtilization' metric.

AnswerB

These are the correct ALB metrics for latency and healthy host count.

Why this answer

Option B is correct because the ALB's 'TargetResponseTime' metric measures the average time (in seconds) that requests are routed to targets, which directly reflects application latency. The ALB's 'HealthyHostCount' metric shows the number of healthy registered targets, which is the exact metric needed to monitor host health. Together, these two metrics provide the required visibility into average latency and healthy host count across all instances.

Exam trap

The trap here is that candidates confuse 'UnhealthyHostCount' with 'HealthyHostCount' or mistakenly use instance-level metrics (like CPUUtilization) instead of ALB-level metrics, failing to recognize that the ALB's own metrics are the authoritative source for request latency and target health.

How to eliminate wrong answers

Option A is wrong because 'UnhealthyHostCount' tracks unhealthy hosts, not healthy hosts; the question specifically asks for the number of healthy hosts. Option C is wrong because 'RequestCount' measures total requests, not latency, and 'GroupInServiceInstances' is an Auto Scaling group metric, not an ALB metric; the ALB's 'HealthyHostCount' is the correct source for healthy host count. Option D is wrong because 'Latency' is not a valid ALB metric (the correct metric is 'TargetResponseTime'), and 'CPUUtilization' measures instance CPU usage, not host health or latency.

Practice this question →

21

MCQmedium

An application running on Amazon ECS with Fargate is experiencing intermittent failures. The application logs show connection timeouts to an RDS MySQL database. The database is in the same VPC but a different subnet. Which CloudWatch metric should be examined first to diagnose the issue?

A.DatabaseConnections for the RDS instance.

B.MemoryUtilization for the RDS instance.

C.CPUUtilization for the ECS tasks.

D.VPC Flow Logs for the ECS task's elastic network interface.

AnswerD

Flow Logs can show if packets are being rejected or dropped, indicating network ACL or security group issues.

Why this answer

Option D is correct because VPC Flow Logs capture metadata about network traffic at the elastic network interface level, including whether packets were accepted or rejected. Since the application logs show connection timeouts (not authentication or resource exhaustion), the most likely cause is a network path issue, such as a missing route or security group rule blocking traffic between the ECS task's subnet and the RDS subnet. VPC Flow Logs will reveal if packets from the ECS task to the RDS database are being dropped, allowing you to pinpoint the exact network failure.

Exam trap

The trap here is that candidates often jump to RDS metrics (DatabaseConnections) or ECS metrics (CPUUtilization) because they seem directly related to the database or application, but the symptom of 'connection timeouts' specifically points to a network-layer issue, which VPC Flow Logs are designed to diagnose.

How to eliminate wrong answers

Option A is wrong because DatabaseConnections measures the number of client connections to the RDS instance, which would not indicate connection timeouts caused by network-layer blocking; a high connection count might cause new connection rejections, but timeouts suggest packets never reached the database. Option B is wrong because MemoryUtilization for RDS reflects the database's memory pressure, which could cause slow queries or crashes but not TCP-level connection timeouts from a different subnet. Option C is wrong because CPUUtilization for ECS tasks measures compute resource usage of the application containers, not network connectivity; high CPU could cause application slowness but not specific connection timeouts to a database in a different subnet.

Practice this question →

22

MCQeasy

A SysOps administrator needs to monitor the CPU utilization of an EC2 instance and receive an alert when it exceeds 80% for 10 consecutive minutes. Which AWS service should be used to configure this monitoring and alerting?

A.Amazon EventBridge

B.Amazon CloudWatch Alarms

C.AWS Trusted Advisor

D.AWS Config

AnswerB

CloudWatch Alarms monitor metrics and trigger actions based on threshold breaches.

Why this answer

Amazon CloudWatch Alarms is the correct service because it allows you to monitor a specific metric, such as EC2 CPUUtilization, and trigger an action (e.g., an SNS notification) when the metric crosses a defined threshold (80%) for a specified number of consecutive evaluation periods (10 minutes, which with the default 1-minute period equals 10 datapoints). This directly fulfills the requirement for threshold-based alerting on a single metric over a sustained duration.

Exam trap

The trap here is that candidates confuse Amazon EventBridge (which can trigger actions based on events but cannot natively evaluate sustained metric thresholds) with CloudWatch Alarms, or mistakenly think AWS Config or Trusted Advisor can monitor real-time performance metrics, when they are designed for configuration compliance and best-practice recommendations respectively.

How to eliminate wrong answers

Option A is wrong because Amazon EventBridge is a serverless event bus used to route events from sources (e.g., AWS services, custom apps) to targets (e.g., Lambda, Step Functions), but it does not natively evaluate metric thresholds over time or generate alarms based on sustained CPU utilization. Option C is wrong because AWS Trusted Advisor provides best-practice checks and recommendations (e.g., underutilized instances, security gaps) but does not perform real-time metric monitoring or alerting on CPU utilization thresholds. Option D is wrong because AWS Config is a service for recording and evaluating resource configuration changes against rules (e.g., ensuring EBS volumes are encrypted), not for monitoring performance metrics like CPU utilization or generating threshold-based alerts.

Practice this question →

23

MCQmedium

An application running on EC2 instances sends custom metrics to CloudWatch using the PutMetricData API. The SysOps admin notices that some metrics are missing from the CloudWatch console. What is the most likely cause?

A.The metric data does not include a unit

B.The metric data does not include a dimension

C.The metric data is being sent with a timestamp older than 14 days

D.The namespace in the PutMetricData call does not match the namespace in the CloudWatch console

AnswerD

Metrics are organized by namespace; a mismatch hides the data.

Why this answer

Option D is correct because custom metrics in CloudWatch are uniquely identified by the combination of namespace, metric name, and dimensions. If the namespace used in the PutMetricData API call does not match the namespace being viewed in the CloudWatch console, the metrics will not appear under that namespace. CloudWatch does not automatically merge or alias namespaces, so mismatched namespaces cause the data to be stored under a different namespace, making it invisible in the console view.

Exam trap

The trap here is that candidates often assume missing metrics are due to timestamp or dimension issues, but the most common real-world cause is a namespace mismatch between the PutMetricData call and the console filter, which CloudWatch does not automatically reconcile.

How to eliminate wrong answers

Option A is wrong because the unit field in PutMetricData is optional; CloudWatch accepts metric data without a unit and displays it without a unit label. Option B is wrong because dimensions are optional for custom metrics; while dimensions help organize metrics, a metric without dimensions is still valid and will appear under the specified namespace. Option C is wrong because CloudWatch accepts metric data with timestamps up to 15 days in the past (not 14), and the question states some metrics are missing, not that all data older than 14 days is missing.

Practice this question →

24

MCQeasy

A company wants to monitor the number of messages in an Amazon SQS queue and scale the number of EC2 instance consumers based on queue depth. Which combination of AWS services should be used?

A.Amazon CloudWatch and Amazon EC2 Auto Scaling

B.Amazon Elastic Load Balancing and Amazon EC2 Auto Scaling

C.Amazon CloudWatch and AWS Lambda

D.AWS CloudTrail and Amazon EventBridge

AnswerA

CloudWatch monitors queue depth; Auto Scaling adjusts instance count based on alarms.

Why this answer

Amazon CloudWatch monitors the SQS queue depth (ApproximateNumberOfMessagesVisible metric) and triggers an Amazon EC2 Auto Scaling scaling policy based on a CloudWatch alarm. This allows the number of EC2 consumer instances to dynamically scale in or out in response to the queue depth, ensuring efficient processing without over-provisioning.

Exam trap

The trap here is that candidates often confuse Elastic Load Balancing with queue-based scaling, assuming ELB can scale EC2 instances based on SQS depth, but ELB only handles HTTP/HTTPS traffic distribution and cannot read SQS metrics.

How to eliminate wrong answers

Option B is wrong because Elastic Load Balancing distributes incoming traffic to EC2 instances but does not monitor SQS queue depth or trigger scaling actions; it is not designed for queue-based scaling. Option C is wrong because while AWS Lambda can process SQS messages, it is a serverless compute service and does not manage EC2 instance scaling; using Lambda alone would not scale EC2 instances. Option D is wrong because AWS CloudTrail records API activity for auditing, and Amazon EventBridge routes events between services, but neither directly monitors SQS queue depth nor triggers EC2 Auto Scaling adjustments.

Practice this question →

25

MCQhard

A company has a production environment with multiple EC2 instances that send logs to CloudWatch Logs. The operations team wants to search across all log groups for a specific error pattern. What is the most efficient way to achieve this?

A.Use CloudWatch Logs Insights to query across all log groups.

B.Set up a subscription filter to stream logs to an Amazon ES domain.

C.Use CloudWatch Logs filter patterns on each log group.

D.Download all logs to an S3 bucket and use Amazon Athena to query.

AnswerA

CloudWatch Logs Insights can query multiple log groups with a single query.

Why this answer

CloudWatch Logs Insights allows you to run SQL-like queries across multiple log groups in a single query, making it the most efficient way to search for a specific error pattern across all log groups without needing to set up additional infrastructure or manually query each group individually.

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing a more complex architecture (like streaming to Elasticsearch or using Athena) when CloudWatch Logs Insights provides a native, serverless, and efficient way to query across multiple log groups directly.

How to eliminate wrong answers

Option B is wrong because setting up a subscription filter to stream logs to an Amazon ES domain adds unnecessary complexity, latency, and cost; it requires provisioning and managing an Elasticsearch cluster, which is overkill for a simple cross-log-group search. Option C is wrong because CloudWatch Logs filter patterns operate on a single log group at a time, so you would need to configure and run separate queries for each log group, which is inefficient and not scalable for searching across all log groups. Option D is wrong because downloading all logs to an S3 bucket and using Amazon Athena introduces significant overhead, including export delays, storage costs, and the need to define a schema; it is not the most efficient approach for real-time or ad-hoc searching across log groups.

Practice this question →

26

Multi-Selecthard

A company uses CloudWatch Logs to store application logs. The logs must be retained for 3 years for compliance. Which TWO steps should be taken to achieve this? (Choose TWO.)

Select 2 answers

A.Configure an S3 lifecycle policy to transition objects to Glacier after 3 years.

B.Enable CloudWatch Logs Insights.

C.Set the log group retention period to 3 years.

D.Use CloudWatch Logs subscription filter to stream logs to Amazon Kinesis Firehose.

E.Export logs to an Amazon S3 bucket using CloudWatch Logs export tasks.

AnswersA, E

Lifecycle policies manage long-term retention.

Why this answer

Option A is correct because S3 lifecycle policies can transition objects to Glacier after a specified number of days, which allows long-term archival storage at low cost. This is a common approach for retaining logs beyond the CloudWatch Logs maximum retention period of 10 years, but here the requirement is 3 years, which is within CloudWatch Logs' capabilities. However, exporting to S3 and then applying a lifecycle policy is a valid method to ensure compliance with the 3-year retention requirement.

Option E is correct because CloudWatch Logs export tasks can export log data to an S3 bucket, where it can be stored indefinitely and managed with lifecycle policies for long-term retention.

Exam trap

The trap here is that candidates might think setting the log group retention period to 3 years is sufficient, but that actually causes logs to be deleted after 3 years, whereas the requirement is to retain them for 3 years, so exporting to S3 and using lifecycle policies is the correct approach to ensure logs are available for the full compliance period.

Practice this question →

27

MCQeasy

A SysOps administrator manages an Application Load Balancer (ALB) that distributes traffic to an Auto Scaling group of EC2 instances. The administrator needs to receive a notification whenever the number of unhealthy targets in the ALB target group exceeds a threshold of 2 for at least 5 consecutive minutes. Which solution meets this requirement with the least operational overhead?

A.Create a CloudWatch alarm on the 'UnHealthyHostCount' metric for the ALB target group, with a threshold of 2 and an evaluation period of 5 minutes. Configure the alarm to send an Amazon SNS notification.

B.Enable AWS CloudTrail logging for the ALB and create a CloudWatch metric filter for 'UnHealthyHostCount' events. Then create an alarm on that metric to notify via SNS.

C.Use an AWS Config rule to evaluate the health of the ALB target group and trigger an SNS notification when non-compliant.

D.Create an Amazon EventBridge rule that triggers every minute to call the AWS CLI command describe-target-health and send a notification via Lambda if unhealthy count exceeds 2.

AnswerA

CloudWatch automatically collects the 'UnHealthyHostCount' metric from the ALB. The alarm triggers when the metric exceeds the threshold, and SNS delivers notifications directly. This requires no custom code and is the simplest approach.

Why this answer

Option A is correct because CloudWatch natively publishes the 'UnHealthyHostCount' metric for ALB target groups, allowing a direct alarm with a threshold of 2 and an evaluation period of 5 minutes. This alarm can trigger an SNS notification with minimal configuration, requiring no custom scripts or additional services. The requirement for 'at least 5 consecutive minutes' is satisfied by setting the evaluation period to 5 minutes (or using 5 datapoints with a 1-minute period).

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing custom polling (Option D) or misapplying services like CloudTrail or Config, failing to recognize that CloudWatch metrics already provide the exact functionality needed with zero additional code.

How to eliminate wrong answers

Option B is wrong because AWS CloudTrail logs API calls, not real-time metric data like 'UnHealthyHostCount'; creating a metric filter for 'UnHealthyHostCount' events is invalid as CloudTrail does not emit such events. Option C is wrong because AWS Config rules evaluate resource compliance against desired configurations (e.g., security groups, tags), not real-time health metrics like unhealthy host counts; Config cannot trigger based on dynamic metric thresholds. Option D is wrong because it introduces unnecessary operational overhead by requiring a custom Lambda function and EventBridge rule to poll the describe-target-health CLI command every minute, whereas CloudWatch provides a built-in, simpler solution.

Practice this question →

28

MCQhard

A company uses AWS CloudFormation to deploy infrastructure. A SysOps admin wants to receive a notification when a stack update fails. Which approach is the most efficient?

A.Write a script that polls the CloudFormation API and sends notifications

B.Use AWS Config to monitor stack resources

C.Create an EventBridge rule that matches CloudFormation stack events

D.Enable CloudTrail and create a metric filter for stack update failures

AnswerC

EventBridge can filter on stack events like ROLLBACK_IN_PROGRESS.

Why this answer

Option C is correct because Amazon EventBridge can directly capture CloudFormation stack events (e.g., CREATE_FAILED, UPDATE_FAILED) in real time and trigger a notification via SNS or Lambda. This approach is serverless, requires no polling, and is the most efficient method for reacting to stack update failures as they occur.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing CloudTrail or polling, missing the fact that EventBridge provides native, real-time event capture for CloudFormation stack status changes without additional overhead.

How to eliminate wrong answers

Option A is wrong because polling the CloudFormation API introduces latency, consumes unnecessary compute resources, and is less efficient than an event-driven approach. Option B is wrong because AWS Config is designed to evaluate resource compliance against rules, not to monitor CloudFormation stack lifecycle events or send failure notifications. Option D is wrong because CloudTrail logs API calls, but creating a metric filter for stack update failures requires additional steps (e.g., setting up a CloudWatch alarm) and introduces delay compared to native EventBridge event matching.

Practice this question →

29

MCQmedium

Refer to the exhibit. A SysOps administrator runs the command to find 'CreateKeyPair' events in January 2023 but gets an empty list. The administrator knows that key pairs were created during that time. What is the most likely reason?

A.The events occurred in a different AWS region.

B.The start and end times are outside the 90-day retention period.

C.CloudTrail is not enabled in the account.

D.The IAM user does not have 'cloudtrail:LookupEvents' permission.

AnswerA

The lookup is for us-east-1, but the trail might be single-region.

Why this answer

The `aws cloudtrail lookup-events` command returns events only from the region specified in the AWS CLI configuration (or the `--region` parameter). If the administrator did not specify a region, the command defaults to the region set in the CLI profile. Since `CreateKeyPair` events are regional (each key pair is created in a specific region), the empty result indicates the events occurred in a different AWS region than the one queried.

Exam trap

The trap here is that candidates assume CloudTrail events are globally visible by default, but in reality, `lookup-events` is region-scoped unless the `--region` parameter is explicitly set to the correct region.

How to eliminate wrong answers

Option B is wrong because the 90-day retention period applies to CloudTrail event history, and January 2023 is well within 90 days from the current date (assuming the exam is set in 2023 or later), so the start and end times are not outside the retention period. Option C is wrong because CloudTrail is enabled by default in all AWS accounts, and the `lookup-events` command works with the default event history even without a specific trail. Option D is wrong because if the IAM user lacked `cloudtrail:LookupEvents` permission, the command would return an access denied error, not an empty list.

Practice this question →

30

MCQeasy

A company wants to visualize the geographic distribution of failed login attempts to their web application. The application runs on EC2 instances behind an ALB. They have access logs enabled for the ALB. Which service should be used to create the visualization?

A.Amazon CloudWatch Dashboard with a custom widget.

B.Amazon Kinesis Data Analytics with a Lambda function.

C.Amazon S3 Select with Athena.

D.Amazon QuickSight with S3 as a data source.

AnswerD

QuickSight can directly connect to ALB access logs in S3 and create geospatial charts.

Why this answer

Amazon QuickSight is a fully managed business intelligence service that can directly query ALB access logs stored in Amazon S3, enabling the creation of geospatial visualizations (e.g., heat maps) of failed login attempts. ALB access logs are delivered to S3 in a structured format, and QuickSight can parse and visualize this data without additional processing. This makes QuickSight with S3 as a data source the correct choice for building the required geographic visualization.

Exam trap

The trap here is that candidates confuse data querying (Athena) with data visualization (QuickSight), or assume CloudWatch can handle geospatial log analysis, when in fact QuickSight is the only option that natively provides interactive geospatial dashboards from S3 data.

How to eliminate wrong answers

Option A is wrong because CloudWatch Dashboards with custom widgets are designed for real-time metrics and logs, not for ad-hoc geospatial analysis of historical ALB access logs stored in S3; they lack native geospatial visualization capabilities. Option B is wrong because Kinesis Data Analytics is a real-time stream processing service, not suited for batch visualization of historical log data, and adding a Lambda function introduces unnecessary complexity and cost for a simple query-and-visualize task. Option C is wrong because S3 Select is a server-side filtering tool that returns only a subset of data from an object, not a visualization service; Athena can query the logs but does not create visualizations—it would require an additional BI tool to render the geographic map.

Practice this question →

31

MCQmedium

A company runs a web application on Amazon EC2 instances. The application logs are sent to Amazon CloudWatch Logs. The SysOps administrator needs to monitor the logs for an increasing number of HTTP 500 errors. The administrator wants to create a metric filter that will count the number of lines containing 'HTTP 500' in the log group. Which syntax should the administrator use for the metric filter pattern?

A.[error, HTTP, 500]

B."HTTP 500"

C."HTTP" && "500"

D.[HTTP, 500, ...]

AnswerB

This is the correct syntax for matching the exact string 'HTTP 500' anywhere in the log line.

Why this answer

Option B is correct because CloudWatch Logs metric filter patterns use literal string matching by enclosing the exact text in double quotes. The pattern "HTTP 500" will match any log line that contains the exact substring 'HTTP 500', which is the simplest and most reliable way to count occurrences of HTTP 500 errors.

Exam trap

The trap here is that candidates confuse the space-delimited token pattern syntax (square brackets) with literal string matching, leading them to choose options like A or D that only match specific token positions rather than any occurrence of 'HTTP 500' in the log line.

How to eliminate wrong answers

Option A is wrong because the syntax [error, HTTP, 500] is a space-delimited token pattern that would match a log line starting with three space-separated tokens like 'error HTTP 500', not a substring anywhere in the line. Option C is wrong because "HTTP" && "500" is not valid CloudWatch Logs metric filter syntax; the && operator is not supported in metric filter patterns. Option D is wrong because [HTTP, 500, ...] is a token-based pattern that expects 'HTTP' and '500' as the first two tokens in the log line, and it would not match lines where 'HTTP 500' appears later in the line or with additional text before it.

Practice this question →

32

MCQmedium

A SysOps administrator is setting up monitoring for an application that runs on Amazon ECS with Fargate launch type. The application's performance degrades when memory utilization exceeds 80%. The administrator wants to receive a notification when memory usage approaches this threshold. What should the administrator do?

A.Install the CloudWatch agent on each Fargate task

B.Use Amazon CloudWatch Container Insights to view ReservedMemory metric

C.Enable Service Auto Scaling with a target tracking policy based on MemoryUtilization

D.Create a CloudWatch alarm on the ECS service's MemoryUtilization metric

AnswerD

ECS provides MemoryUtilization metric for services, and an alarm can trigger notifications.

Why this answer

Option D is correct because Amazon ECS services automatically publish a `MemoryUtilization` metric to CloudWatch for Fargate tasks. By creating a CloudWatch alarm on this metric with a threshold of 80%, the administrator can trigger an SNS notification when memory usage approaches the threshold, enabling proactive remediation before performance degrades.

Exam trap

The trap here is that candidates often assume they need to install an agent or use Container Insights to get memory metrics, but ECS Fargate automatically publishes `MemoryUtilization` and `CPUUtilization` metrics to CloudWatch without any extra setup.

How to eliminate wrong answers

Option A is wrong because the CloudWatch agent cannot be installed on Fargate tasks; Fargate is a serverless compute engine that does not allow direct access to the underlying host or installation of agents. Option B is wrong because Container Insights provides aggregated metrics and logs for cluster-level visibility, but it does not expose a `ReservedMemory` metric; the relevant metric for memory usage is `MemoryUtilization`, which is already available without Container Insights. Option C is wrong because Service Auto Scaling with a target tracking policy based on `MemoryUtilization` would automatically adjust the number of tasks to maintain a target utilization, but the question asks for a notification when memory approaches 80%, not for automatic scaling.

Practice this question →

33

MCQhard

A company stores sensitive data in an S3 bucket. The security team requires that any access to the bucket from outside the corporate network be logged and immediately alerted. Which solution meets these requirements?

A.Enable S3 server access logging and store logs in a separate bucket for later analysis.

B.Enable S3 server access logging, stream logs to CloudWatch Logs, create a metric filter for external IP addresses, and set a CloudWatch alarm.

C.Enable AWS CloudTrail and create a metric filter for 'GetObject' events, then set a CloudWatch alarm.

D.Use AWS Config rules to detect when the bucket policy is changed to allow public access.

AnswerB

This provides both logging and real-time alerting for external access.

Why this answer

Option B is correct because it combines S3 server access logging with CloudWatch Logs streaming, metric filtering for external IP addresses, and a CloudWatch alarm. This provides both logging and immediate alerting for any access from outside the corporate network, meeting the security team's requirements.

Exam trap

The trap here is that candidates often confuse AWS CloudTrail (which logs API calls) with S3 server access logs (which log object-level access), leading them to choose Option C without realizing CloudTrail requires explicit data event logging and does not provide the same granularity for immediate alerting on external IP access.

How to eliminate wrong answers

Option A is wrong because it only enables logging to a separate bucket for later analysis, but does not provide immediate alerting as required. Option C is wrong because AWS CloudTrail logs management events (like bucket configuration changes) by default, not data events like 'GetObject' for S3; enabling data events would incur additional costs and still requires a metric filter and alarm setup, but the core issue is that CloudTrail is not the primary service for logging object-level access in real-time. Option D is wrong because AWS Config rules detect configuration changes (like bucket policy modifications) but do not log or alert on actual access events from external IPs.

Practice this question →

34

MCQmedium

A SysOps administrator notices that an Amazon EC2 instance's CPU utilization is consistently above 90% during business hours. The instance is part of an Auto Scaling group with a simple scaling policy based on average CPU utilization. However, the Auto Scaling group is not launching new instances. What is the most likely cause?

A.The scaling policy is in a cooldown period after a previous scaling activity.

B.The Auto Scaling group has a minimum size equal to the current number of instances.

C.The Auto Scaling group has a scheduled scaling action that is overriding the dynamic policy.

D.The instance is not healthy and is being terminated by the Auto Scaling group.

AnswerA

Cooldown prevents additional scaling actions until it expires.

Why this answer

The simple scaling policy in Auto Scaling has a cooldown period (default 300 seconds) that prevents the group from launching or terminating instances immediately after a previous scaling activity. If the policy triggered a scale-out event recently, the cooldown period is still active, so even though CPU utilization remains above 90%, no new instances are launched until the cooldown expires. This is the most likely cause because the cooldown is designed to stabilize metrics and avoid thrashing.

Exam trap

The trap here is that candidates often assume high CPU utilization always triggers a scale-out immediately, forgetting that simple scaling policies enforce a cooldown period that can delay subsequent scaling actions, even when the metric remains elevated.

How to eliminate wrong answers

Option B is wrong because if the minimum size equals the current number of instances, the Auto Scaling group would still launch new instances to meet the desired capacity set by the scaling policy; the minimum size only prevents scaling below that number, not above it. Option C is wrong because a scheduled scaling action overrides dynamic policies only at the scheduled time, but it does not block the dynamic policy from acting during business hours unless the scheduled action explicitly sets the desired capacity to a value that prevents scaling. Option D is wrong because an unhealthy instance is terminated and replaced by the Auto Scaling group, which would launch a new instance, not block scaling; the group would still respond to high CPU utilization with a new launch.

Practice this question →

35

MCQmedium

A SysOps administrator notices that an EC2 instance's CPU utilization is consistently above 90% during business hours. The instance is part of an Auto Scaling group with a scaling policy based on average CPU utilization. Despite high utilization, no scaling events are triggered. What is the most likely cause?

A.The scaling policy has a cooldown period that is too long, preventing new scaling activities.

B.The instance type is not supported by the Auto Scaling group's launch configuration.

C.The CloudWatch alarm is in the ALARM state but the Auto Scaling group has a suspended process for Add instances.

D.The Auto Scaling group's health check type is set to ELB, causing the instance to be marked unhealthy.

AnswerA

A long cooldown period can delay or prevent scaling actions, even if the metric is high.

Why this answer

The most likely cause is that the scaling policy has a cooldown period that is too long. After a scaling activity completes, the Auto Scaling group enters a cooldown period that prevents additional scaling activities from being triggered until the cooldown expires. If the cooldown period is set too long (e.g., 600 seconds or more), the group will not launch new instances even if the CloudWatch alarm remains in ALARM state with high CPU utilization, because the scaling policy is blocked from executing.

Exam trap

The trap here is that candidates often assume a scaling policy will always trigger when the CloudWatch alarm is in ALARM state, overlooking the cooldown period as a deliberate throttling mechanism that can prevent scaling activities from being initiated.

How to eliminate wrong answers

Option B is wrong because if the instance type were not supported by the launch configuration, the instance would fail to launch or would be in an impaired state, but the existing instance would still be running and scaling events would still be triggered (though they might fail). Option C is wrong because if the Add instances process were suspended, the Auto Scaling group would not launch new instances at all, but the question states that no scaling events are triggered, which implies the scaling policy itself is not executing; a suspended process would still allow the CloudWatch alarm to trigger a scaling event (which would then be blocked), but the event would still appear in the scaling activity history. Option D is wrong because the health check type being set to ELB would cause the instance to be marked unhealthy only if the ELB health checks fail, but high CPU utilization alone does not cause an ELB health check failure; the instance would still be considered healthy and scaling events would still be triggered based on the CloudWatch alarm.

Practice this question →

36

Drag & Dropmedium

Drag and drop the steps to troubleshoot an unhealthy target in an Application Load Balancer target group into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Troubleshooting starts with security group rules, then health check configuration, then instance and application status, then logs, and finally replacement if needed.

Practice this question →

37

MCQhard

A company has a multi-account AWS environment using AWS Organizations. The security team requires that all API calls made in any account be logged to a centralized S3 bucket in the management account. Additionally, the team wants to be alerted when an IAM user in any account creates a new access key. Currently, CloudTrail is enabled in each account but logs are stored locally. Which solution meets these requirements with the least operational overhead?

A.Use CloudWatch Logs cross-account subscription to send logs from each account's CloudTrail log group to a central account. Create a metric filter and alarm on CreateAccessKey events.

B.Create a CloudTrail trail in each account that sends logs to a centralized S3 bucket in the management account. Create an SNS topic in each account to alert on CreateAccessKey.

C.Enable CloudTrail in all accounts and use an AWS Lambda function to copy logs from each account's S3 bucket to the centralized bucket. Create a CloudWatch Events rule in each account to send alerts on CreateAccessKey.

D.Create a new CloudTrail organization trail in the management account that logs events to a centralized S3 bucket. Then create an Amazon EventBridge rule in the management account that matches the CreateAccessKey event and sends a notification via SNS.

AnswerD

Organization trails simplify logging across accounts; EventBridge can capture events from all accounts.

Why this answer

Option A is correct because an organization trail logs API calls from all accounts to a single S3 bucket, and EventBridge can be set up to alert on CreateAccessKey events across accounts. Option B is wrong because it requires a Lambda function to aggregate logs, adding complexity. Option C is wrong because it replicates the organization trail but uses separate SNS topics per account.

Option D is wrong because it uses CloudWatch Logs cross-account subscription, which is more complex.

Practice this question →

38

Multi-Selectmedium

A company wants to automatically remediate an Amazon EC2 instance that becomes unresponsive by rebooting it. The solution should use AWS managed services to minimize custom code. Which combination should a SysOps administrator use? (Choose TWO.)

Select 2 answers

A.Amazon EC2 Auto Scaling and lifecycle hooks

B.Amazon CloudWatch alarm on EC2 status check failures

C.Amazon CloudWatch alarm and AWS Lambda function

D.Amazon EventBridge rule to trigger an SNS notification

E.AWS Systems Manager Automation document to reboot the instance

AnswersB, E

Status checks detect unresponsive instances.

Why this answer

Option B is correct because Amazon CloudWatch can monitor EC2 status check failures (both system and instance checks) and trigger an alarm. When the alarm enters the ALARM state, it can directly invoke an AWS Systems Manager Automation document to reboot the instance, which is a managed, code-free remediation approach. This combination minimizes custom code by using built-in AWS services.

Exam trap

The trap here is that candidates often choose Option C (CloudWatch alarm + Lambda) because it is a common pattern, but the question explicitly requires minimizing custom code, making the managed Systems Manager Automation document the correct choice over a custom Lambda function.

Practice this question →

39

Multi-Selecteasy

A company uses CloudWatch Logs to monitor application logs. The SysOps administrator wants to search for specific error patterns across multiple log groups. Which THREE AWS services can be used to achieve this?

Select 3 answers

A.CloudWatch Logs Insights

B.Amazon OpenSearch Service

C.Amazon Kinesis Data Analytics

D.Amazon Athena

E.AWS Glue

AnswersA, B, D

Supports querying across multiple log groups.

Why this answer

CloudWatch Logs Insights is correct because it is a native AWS service designed specifically for querying and analyzing log data stored in CloudWatch Logs. It allows you to run SQL-like queries (using a query language) across multiple log groups to search for specific error patterns, making it a direct and efficient solution for this use case without requiring data export or additional infrastructure.

Exam trap

The trap here is that candidates may overlook Amazon OpenSearch Service and Amazon Athena as valid options because they require additional configuration (streaming or exporting logs), but the question asks which services 'can be used' to achieve the goal, not which are the most direct or native, so all three (A, B, D) are technically feasible.

Practice this question →

40

Multi-Selecteasy

A SysOps administrator needs to monitor the disk space usage on an EC2 instance running Windows Server. Which actions are required to collect this metric? (Select TWO.)

Select 2 answers

A.Install the CloudWatch Logs agent to monitor disk usage logs.

B.Enable EC2 status checks to monitor disk health.

C.Use Windows Performance Monitor to track disk space and send to CloudWatch.

D.Create an IAM role with permissions to publish custom metrics and attach it to the instance.

E.Install the CloudWatch agent on the instance and configure it to collect disk metrics.

AnswersD, E

The instance needs IAM permissions to publish metrics.

Why this answer

Option D is correct because the CloudWatch agent requires permissions to publish custom metrics to CloudWatch. An IAM role with the appropriate policy (e.g., CloudWatchAgentServerPolicy) must be attached to the EC2 instance to allow the agent to send disk space metrics. Option E is correct because the CloudWatch agent must be installed and configured with a JSON configuration file that includes the "disk" section to collect disk space utilization metrics, which are not available by default from EC2.

Exam trap

The trap here is that candidates often confuse the CloudWatch Logs agent with the CloudWatch agent, or assume that EC2 status checks or Performance Monitor can directly send disk metrics to CloudWatch without additional configuration.

Practice this question →

41

Multi-Selectmedium

A SysOps administrator is setting up monitoring for an RDS MySQL database. The administrator needs to be notified when the database connection count exceeds 100. Which steps should be taken to achieve this? (Choose TWO.)

Select 2 answers

A.Configure the CloudWatch alarm to send a notification to an SNS topic.

B.Create a CloudWatch alarm on the 'DatabaseConnections' metric.

C.Enable Enhanced Monitoring for RDS.

D.Enable CloudTrail to log RDS API calls.

E.Create an Amazon EventBridge rule that triggers on RDS events.

AnswersA, B

SNS can send email or SMS.

Why this answer

Option A is correct because Amazon CloudWatch alarms can send notifications to an Amazon SNS topic when the alarm state changes. This allows the SysOps administrator to receive alerts (e.g., via email, SMS, or HTTP) when the database connection count exceeds the threshold. Option B is correct because the 'DatabaseConnections' metric is a standard CloudWatch metric for RDS MySQL that tracks the number of current connections to the database instance.

Creating an alarm on this metric with a threshold of 100 will trigger when the connection count exceeds that value.

Exam trap

The trap here is that candidates often confuse Enhanced Monitoring (which provides OS-level metrics) with CloudWatch metrics (which provide database-level metrics like connection counts), leading them to incorrectly select Enhanced Monitoring as a solution for connection-based alarms.

Practice this question →

42

MCQhard

A SysOps administrator needs to monitor Amazon EC2 instances for disk space usage. Disk space metrics are not available by default in Amazon CloudWatch. The administrator wants to collect disk space metrics from all EC2 instances across multiple AWS accounts and aggregate them in a single CloudWatch dashboard. Which combination of steps should the administrator take?

A.Install the CloudWatch agent on each instance using SSM Run Command, configure the agent to collect disk metrics, and use CloudWatch cross-account observability to aggregate metrics from multiple accounts.

B.Enable detailed monitoring on the EC2 instances, create a custom metric in CloudWatch for disk space, and use CloudWatch Logs to forward logs to a central account.

C.Use AWS Config to track disk space and send metrics to CloudWatch.

D.Use AWS Trusted Advisor to monitor disk space and send alerts via SNS.

AnswerA

The CloudWatch agent collects custom metrics (e.g., disk space) and sends them to CloudWatch. SSM Run Command automates the installation. Cross-account observability allows a central monitoring account to view metrics from all member accounts.

Why this answer

Option A is correct because the CloudWatch agent is required to collect custom metrics like disk space from EC2 instances, as these are not available by default. SSM Run Command enables scalable, agentless installation across instances. CloudWatch cross-account observability allows you to aggregate metrics from multiple AWS accounts into a single monitoring account, meeting the requirement for a unified dashboard.

Exam trap

The trap here is that candidates assume detailed monitoring or AWS Config can capture OS-level metrics like disk space, when in fact only an in-guest agent (CloudWatch agent) can collect such data, and cross-account aggregation requires a specific feature (cross-account observability) rather than simple log forwarding.

How to eliminate wrong answers

Option B is wrong because enabling detailed monitoring only provides hypervisor-level metrics (CPU, network, etc.) at 1-minute frequency, not disk space metrics; disk space requires an in-guest agent. Option C is wrong because AWS Config tracks resource configuration changes (e.g., instance type, security groups) and can trigger rules, but it does not collect or emit disk space metrics to CloudWatch. Option D is wrong because AWS Trusted Advisor provides best-practice checks and recommendations, but it does not collect real-time disk space metrics or send them to CloudWatch for dashboard aggregation.

Practice this question →

43

MCQeasy

A SysOps administrator needs to monitor the CPU utilization of an Amazon EC2 instance and send an alert when it exceeds 90% for 5 consecutive minutes. Which combination of AWS services should the administrator use to meet this requirement?

A.Amazon CloudWatch metric (CPUUtilization), a CloudWatch alarm, and an Amazon SNS topic.

B.Amazon CloudWatch Logs, a metric filter to extract CPU utilization from logs, and an alarm on that metric.

C.A CloudWatch dashboard and an AWS Lambda function that checks the dashboard periodically.

D.Amazon EventBridge (CloudWatch Events) and a Lambda function that calls the EC2 DescribeInstances API.

AnswerA

This combination allows monitoring of the metric, evaluation against a threshold, and notification via SNS. It is the simplest and most direct method.

Why this answer

The correct approach is to use a CloudWatch metric for CPUUtilization, which is automatically published by EC2 instances. A CloudWatch alarm can be configured to evaluate this metric over a period of 5 consecutive minutes with a threshold of 90%, and when the alarm state is triggered, it publishes to an SNS topic to send notifications. This is the native, efficient, and recommended method for monitoring and alerting on EC2 CPU utilization.

Exam trap

The trap here is that candidates may confuse CloudWatch Logs metric filters (used for custom log-based metrics) with the built-in EC2 metrics, or think that EventBridge can directly access CPU utilization data, when in fact CPUUtilization is a CloudWatch metric and must be monitored via CloudWatch alarms.

How to eliminate wrong answers

Option B is wrong because CloudWatch Logs and metric filters are used to extract custom metrics from log data (e.g., application logs), not to monitor the built-in CPUUtilization metric which is already available as a CloudWatch metric without needing log extraction. Option C is wrong because a CloudWatch dashboard is a visualization tool and does not trigger alerts; a Lambda function polling a dashboard periodically is inefficient, introduces latency, and is not a supported pattern for real-time alerting. Option D is wrong because EventBridge and a Lambda function calling DescribeInstances API only retrieves instance metadata and state, not CPU utilization metrics; CPU utilization is a CloudWatch metric, not available via the EC2 DescribeInstances API.

Practice this question →

44

MCQmedium

A SysOps administrator needs to monitor memory utilization on an Amazon EC2 instance. Memory metrics are not available by default in Amazon CloudWatch for EC2 instances. Which action should the administrator take to collect memory utilization metrics?

A.Install the CloudWatch agent on the EC2 instance

B.Enable detailed monitoring on the EC2 instance

C.Use an AWS Lambda function to query the EC2 instance for memory metrics

D.Use Amazon Inspector to collect memory metrics

AnswerA

The CloudWatch agent can collect memory and other system-level metrics from the EC2 instance and publish them to CloudWatch custom metrics.

Why this answer

The CloudWatch agent is the correct solution because it can collect custom metrics, including memory utilization, from EC2 instances. Unlike the default hypervisor-level metrics (CPU, network, disk), memory metrics require an in-guest agent to read the operating system's memory counters and publish them to CloudWatch.

Exam trap

The trap here is that candidates confuse 'detailed monitoring' (which increases metric frequency) with the ability to collect new metric types, assuming it will magically include memory metrics when it only affects existing hypervisor-level metrics.

How to eliminate wrong answers

Option B is wrong because enabling detailed monitoring only increases the frequency of default EC2 metrics (e.g., CPU, disk I/O) from 5 minutes to 1 minute; it does not add memory metrics. Option C is wrong because AWS Lambda cannot directly query an EC2 instance's OS-level memory metrics without an agent or API endpoint installed inside the instance. Option D is wrong because Amazon Inspector is a vulnerability assessment service that scans for software vulnerabilities and network exposures, not a tool for collecting OS-level performance metrics like memory utilization.

Practice this question →

45

MCQhard

A SysOps administrator is managing a fleet of EC2 instances in an Auto Scaling group. The instances are behind an Application Load Balancer. The administrator notices that the 'SurgeQueueLength' metric for the ALB is frequently high. What does this indicate, and what is the BEST remediation action?

A.The targets are unhealthy; decrease the desired capacity to reduce load.

B.The targets are not able to handle the request rate; increase the desired capacity or add scaling policies.

C.The load balancer is accepting too many connections; increase the idle timeout.

D.The load balancer is overloaded; replace it with a Network Load Balancer.

AnswerB

High SurgeQueueLength indicates requests are waiting for targets to become available.

Why this answer

The SurgeQueueLength metric measures the number of requests that are queued by the Application Load Balancer (ALB) because no healthy target is available to process them. A frequently high value indicates that the targets (EC2 instances) are overwhelmed and cannot keep up with the incoming request rate. The best remediation is to increase the desired capacity of the Auto Scaling group or add scaling policies (e.g., based on SurgeQueueLength or RequestCountPerTarget) to automatically add more instances to handle the load.

Exam trap

The trap here is that candidates confuse SurgeQueueLength with connection-level metrics (like idle timeout) or assume the load balancer itself is the bottleneck, when in fact the metric directly indicates insufficient target capacity.

How to eliminate wrong answers

Option A is wrong because decreasing desired capacity would reduce the number of targets, worsening the queue length, and unhealthy targets are indicated by the 'UnHealthyHostCount' metric, not SurgeQueueLength. Option C is wrong because increasing the idle timeout only affects how long the ALB keeps idle connections open, not the rate of incoming requests or the queue depth; SurgeQueueLength is about request backlog, not connection persistence. Option D is wrong because replacing the ALB with a Network Load Balancer (NLB) does not address the root cause—the targets are under-provisioned; an NLB operates at Layer 4 and does not provide the same request queuing behavior, but the underlying capacity issue remains.

Practice this question →

46

MCQeasy

A company is using AWS CloudTrail to log API activity. They need to ensure that log files are protected from unauthorized modification and can be used to verify the integrity of log files. Which AWS feature should be enabled?

A.Enable CloudTrail log file integrity validation.

B.Enable S3 server-side encryption on the CloudTrail S3 bucket.

C.Stream CloudTrail logs to Amazon CloudWatch Logs.

D.Enable S3 Multi-Factor Authentication (MFA) Delete on the CloudTrail S3 bucket.

AnswerA

Integrity validation uses SHA-256 hashing and digital signatures to verify that log files have not been modified.

Why this answer

CloudTrail log file integrity validation uses a SHA-256 hash chain to create a digest file that can be used to verify that log files have not been modified, deleted, or tampered with after delivery. This feature is specifically designed to provide cryptographic assurance of log file integrity, meeting the requirement to protect against unauthorized modification and enable verification.

Exam trap

The trap here is that candidates often confuse data protection features like encryption or deletion prevention with integrity verification, not realizing that integrity validation specifically requires a cryptographic hash chain to detect modification, not just access control or encryption.

How to eliminate wrong answers

Option B is wrong because enabling S3 server-side encryption protects log files at rest from unauthorized access but does not provide a mechanism to verify that the log files have not been modified or tampered with after they were written. Option C is wrong because streaming CloudTrail logs to CloudWatch Logs enables real-time monitoring and alerting but does not provide cryptographic integrity verification of the original log files stored in S3. Option D is wrong because S3 MFA Delete protects against accidental or unauthorized deletion of objects by requiring multi-factor authentication for delete operations, but it does not provide a hash-based integrity check to detect modification of log file contents.

Practice this question →

47

MCQmedium

An administrator needs to be notified when the root user signs in to the AWS Management Console. Which method should be used?

A.Create a CloudWatch Events rule for 'AWS Console Sign-In' events and set the target to an SNS topic.

B.Enable CloudTrail Insights to detect root login anomalies.

C.Create a CloudWatch alarm on the RootAccountUsage metric.

D.Use AWS Config to track IAM password policy changes.

AnswerA

CloudTrail records console sign-ins, and CloudWatch Events can react to them.

Why this answer

Option A is correct because you can create an Amazon CloudWatch Events rule (now called Amazon EventBridge rule) that matches the 'AWS Console Sign-In' event from AWS CloudTrail. When the root user signs in, this event is generated, and the rule can trigger an SNS topic to send a notification to the administrator. This is the recommended approach for real-time alerting on root user activity.

Exam trap

The trap here is that candidates may think CloudWatch alarms can monitor root account usage directly via a metric, but AWS does not expose a 'RootAccountUsage' metric; instead, you must use CloudTrail events as the source for event-driven alerts.

How to eliminate wrong answers

Option B is wrong because CloudTrail Insights analyzes write management events to detect unusual activity patterns, but it does not provide real-time notifications for specific events like root user sign-ins; it focuses on anomaly detection, not event-driven alerts. Option C is wrong because the 'RootAccountUsage' metric is not a standard CloudWatch metric; CloudWatch does not have a built-in metric for root account usage, and you cannot create an alarm on a non-existent metric. Option D is wrong because AWS Config tracks resource configuration changes, such as IAM password policy changes, but it does not monitor or alert on root user sign-in events.

Practice this question →

48

Multi-Selecthard

A SysOps administrator is monitoring an Amazon ECS cluster running Fargate tasks. The administrator wants to receive a notification when any task fails to start due to insufficient memory. Which combination of actions should be taken? (Choose TWO.)

Select 2 answers

A.Enable AWS CloudTrail and create a metric filter for RunTask API calls.

B.Configure the CloudWatch Events rule to send notifications to an SNS topic.

C.Create a CloudWatch Events rule that matches ECS task state changes with a reason of 'RESOURCE:MEMORY'.

D.Create a CloudWatch alarm on the ECS cluster's CPUUtilization metric.

E.Enable CloudWatch Logs for the ECS cluster and filter for error messages.

AnswersB, C

SNS can send email or SMS notifications for alarms.

Why this answer

Option B is correct because Amazon CloudWatch Events (now Events) can trigger an SNS notification when a specific ECS task state change occurs. Option C is correct because you can create a CloudWatch Events rule that matches ECS task state changes with a reason of 'RESOURCE:MEMORY', which indicates the task failed to start due to insufficient memory. Together, these actions ensure you receive a notification when a Fargate task fails to start due to memory constraints.

Exam trap

The trap here is that candidates often confuse CloudTrail (audit logging) with CloudWatch Events (event-driven notifications), or they mistakenly think CPU metrics can indicate memory-related failures, leading them to select options that do not directly capture the specific 'RESOURCE:MEMORY' reason.

Practice this question →

49

MCQmedium

A company uses Amazon CloudFront to serve content from a custom origin. A SysOps administrator needs to detect IP addresses that generate a high rate of HTTP 403 (Forbidden) errors, which may indicate malicious bots attempting to access restricted content. The administrator wants to automatically add these IP addresses to a AWS WAF IP set to block them. Which solution meets this requirement with the least operational overhead?

A.Configure CloudFront access logs to be delivered to an Amazon S3 bucket, and use Amazon Athena to query logs for IPs with many 403 errors. Then manually add those IPs to a WAF IP set.

B.Enable AWS CloudTrail for CloudFront and create a CloudWatch metric filter for 'Forbidden' events. Use a CloudWatch alarm to notify the administrator via email, who then manually updates the WAF IP set.

C.Use AWS Config to monitor CloudFront distributions and trigger an AWS Lambda function when a high number of 403 errors is detected by evaluating access logs stored in S3.

D.Enable CloudFront standard logs and stream them to Amazon CloudWatch Logs. Create a metric filter for 403 status codes, grouped by source IP. Set a CloudWatch alarm on the metric that triggers an AWS Lambda function to update the WAF IP set.

AnswerD

This solution automates detection and remediation. CloudWatch Logs processes the logs in near real-time, the metric filter counts 403 error responses per IP, and the alarm invokes Lambda to block the IP via WAF. This is fully automated and requires minimal operational overhead.

Why this answer

Option D is correct because it provides a fully automated, serverless solution with minimal operational overhead. By streaming CloudFront standard logs to CloudWatch Logs, you can create a metric filter that counts 403 errors grouped by source IP, then use a CloudWatch alarm to trigger a Lambda function that programmatically updates the WAF IP set. This eliminates manual intervention and leverages native AWS integrations.

Exam trap

The trap here is that candidates may confuse CloudTrail (which logs API calls) with CloudFront access logs (which log HTTP requests), leading them to choose Option B, which is technically incorrect for detecting HTTP 403 errors.

How to eliminate wrong answers

Option A is wrong because it requires manual querying with Athena and manual updates to the WAF IP set, which introduces significant operational overhead and delays in blocking malicious IPs. Option B is wrong because CloudTrail does not capture HTTP 403 error events from CloudFront; CloudTrail records API calls, not HTTP request-level responses. Option C is wrong because AWS Config is designed for resource compliance and configuration tracking, not for real-time log analysis or triggering actions based on error rates from access logs.

Practice this question →

50

MCQeasy

A company wants to monitor the health of its web application running on EC2 instances behind an Application Load Balancer (ALB). Which CloudWatch metric from the ALB can indicate that requests are failing due to server errors?

A.HTTPCode_Target_5XX_Count

B.HTTPCode_Target_4XX_Count

C.HTTPCode_Target_2XX_Count

D.HTTPCode_Target_3XX_Count

AnswerA

This counts server-side errors from targets.

Why this answer

The correct answer is A because the HTTPCode_Target_5XX_Count metric from the Application Load Balancer (ALB) specifically counts the number of HTTP response codes in the 5xx range returned by the target (EC2 instances). A 5xx status code indicates a server-side error, such as an internal server error (500), gateway timeout (504), or service unavailable (503), which directly reflects that requests are failing due to issues on the EC2 instances themselves.

Exam trap

The trap here is that candidates often confuse HTTPCode_Target_5XX_Count with HTTPCode_ELB_5XX_Count, mistakenly thinking any 5xx error is from the target, when in fact the ELB can also generate 5xx errors (e.g., 502 from a malformed response) that are tracked separately.

How to eliminate wrong answers

Option B is wrong because HTTPCode_Target_4XX_Count tracks client-side errors (e.g., 400 Bad Request, 403 Forbidden, 404 Not Found), which indicate issues with the request from the client, not server failures. Option C is wrong because HTTPCode_Target_2XX_Count counts successful responses (e.g., 200 OK), which indicate healthy application behavior, not failures. Option D is wrong because HTTPCode_Target_3XX_Count counts redirection responses (e.g., 301 Moved Permanently, 302 Found), which are not errors and do not indicate server-side problems.

Practice this question →

51

Multi-Selecthard

A SysOps administrator is troubleshooting an issue where an EC2 instance running a web server is becoming unresponsive under high load. The administrator has enabled detailed monitoring and set up CPUUtilization alarms. Which THREE additional steps could help diagnose the root cause? (Choose THREE.)

Select 3 answers

A.Install the CloudWatch agent and collect disk space metrics.

B.Place the instance behind an Auto Scaling group.

C.Install the CloudWatch agent and collect memory metrics.

D.Increase the instance size to handle more load.

E.Enable access logs on the load balancer to analyze request patterns.

AnswersA, C, E

Disk full can cause failures.

Why this answer

Option A is correct because under high load, the web server could become unresponsive due to disk space exhaustion (e.g., from log files filling the root partition). The CloudWatch agent can collect disk space metrics, which are not available by default, allowing the administrator to correlate disk usage with performance degradation.

Exam trap

The trap here is that candidates confuse reactive scaling actions (like resizing the instance or adding Auto Scaling) with diagnostic steps, failing to recognize that the question asks for steps to diagnose the root cause, not to mitigate the symptom.

Practice this question →

52

MCQmedium

A company uses AWS CloudFormation to deploy its infrastructure. The SysOps administrator needs to be notified if a stack creation fails. Which method is the most efficient way to achieve this?

A.Use Amazon Simple Email Service (SES) to send emails on stack failure.

B.Specify an SNS topic ARN in the 'NotificationARNs' parameter of the stack.

C.Create a Lambda function that polls the CloudFormation API for stack status changes.

D.Enable CloudTrail and create a metric filter for 'CreateStack' events.

AnswerB

CloudFormation sends events to the SNS topic automatically.

Why this answer

Option B is correct because CloudFormation natively supports specifying an Amazon SNS topic ARN in the 'NotificationARNs' parameter of the stack. When a stack creation fails, CloudFormation automatically publishes a notification to the SNS topic, which can then deliver the message via email, SMS, or other protocols without any custom polling or additional services.

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing CloudTrail or Lambda polling, missing the fact that CloudFormation has a built-in, efficient notification mechanism via SNS that requires no additional services or custom code.

How to eliminate wrong answers

Option A is wrong because Amazon SES is an email sending service, not a notification delivery mechanism integrated with CloudFormation; it would require custom logic to trigger on stack failure. Option C is wrong because polling the CloudFormation API for stack status changes is inefficient, introduces latency, and incurs additional API call costs compared to the native push-based notification via SNS. Option D is wrong because CloudTrail and metric filters are used for auditing and monitoring API calls, not for real-time notification of stack failures; they would require additional setup with CloudWatch Alarms and SNS to achieve similar functionality, making it less efficient than the direct SNS integration.

Practice this question →

53

Multi-Selecteasy

A SysOps admin needs to set up centralized logging for multiple AWS accounts. Which TWO services should be used together to aggregate logs into a single S3 bucket? (Choose 2.)

Select 2 answers

A.Amazon VPC Flow Logs

B.AWS Config

C.AWS CloudTrail

D.Amazon S3 cross-region replication

E.Amazon CloudWatch Logs subscription filter

AnswersB, C

Config can deliver configuration history and snapshots to a central S3 bucket.

Why this answer

AWS Config and AWS CloudTrail are the correct pair because CloudTrail records API activity across accounts, and AWS Config records resource configuration changes. Both can be configured to deliver log files to a centralized S3 bucket by setting up a trail (CloudTrail) or a delivery channel (Config) that points to the same bucket, often using a bucket policy that grants cross-account write permissions. This enables aggregated logging for multiple AWS accounts in a single S3 bucket.

Exam trap

The trap here is that candidates often confuse log delivery services (CloudTrail, Config) with log generation services (VPC Flow Logs) or data movement services (S3 replication, CloudWatch subscription filters), failing to recognize that only CloudTrail and Config natively support direct cross-account S3 delivery without additional infrastructure.

Practice this question →

54

MCQhard

A company is using AWS Lambda functions to process incoming messages from Amazon SQS. The Lambda function sometimes fails due to a transient error, and the message is not processed. The team wants to automatically retry failed messages and send them to a dead-letter queue (DLQ) after three failed attempts. Which configuration meets these requirements?

A.Set the Lambda function's reserved concurrency to 1 and enable 'maximumRetryAttempts' to 2.

B.Create an SQS queue with a visibility timeout that allows three retries before sending to a DLQ.

C.Configure the SQS queue as an event source for Lambda with a DLQ specified in the Lambda function's dead-letter configuration.

D.Configure the SQS queue with a redrive policy that allows three maximum receives before sending to a DLQ.

AnswerC

Lambda's asynchronous invocation automatically retries twice and then sends the event to the DLQ after three total attempts.

Why this answer

Option C is correct because when an SQS queue is configured as an event source for Lambda, the Lambda service manages the polling and deletion of messages. By specifying a dead-letter queue (DLQ) in the Lambda function's dead-letter configuration (not the SQS redrive policy), failed invocations are retried based on the Lambda function's 'maximumRetryAttempts' setting (default 2, plus the initial attempt equals 3 total). After exhausting retries, the message is sent to the DLQ specified in the Lambda configuration, ensuring automatic retry and DLQ routing without relying on SQS's visibility timeout or redrive policy.

Exam trap

The trap here is that candidates confuse the SQS redrive policy (which works at the queue level based on receive count) with the Lambda dead-letter configuration (which works at the function level based on invocation failures), leading them to choose Option D, which would send messages to a DLQ after three receives regardless of whether Lambda actually processed them or not.

How to eliminate wrong answers

Option A is wrong because setting reserved concurrency to 1 does not control retry behavior; 'maximumRetryAttempts' is a property of the Lambda event source mapping, not a direct function configuration, and setting it to 2 would only allow 2 retries (total 3 attempts), but the reserved concurrency limit is irrelevant for retry logic. Option B is wrong because the SQS visibility timeout controls how long a message is hidden after being polled, but it does not inherently trigger retries or send messages to a DLQ after three failures; the redrive policy on the SQS queue is needed for that. Option D is wrong because configuring the SQS queue with a redrive policy that allows three maximum receives sends messages to the DLQ after three receives, but this does not integrate with Lambda's automatic retry mechanism; Lambda would need to delete the message after successful processing, and the redrive policy would only trigger if the message is not deleted, which may not align with the requirement for Lambda to retry on transient errors.

Practice this question →

55

MCQeasy

A SysOps administrator wants to receive an email when the average CPU utilization of an EC2 instance exceeds 90% for 5 minutes. What should the administrator create?

A.A CloudWatch Logs metric filter on the instance logs.

B.An AWS Config rule to detect high CPU usage.

C.A CloudWatch Events rule on the EC2 instance state change.

D.A CloudWatch alarm on the CPUUtilization metric with an SNS notification.

AnswerD

Standard approach for metric-based alerts.

Why this answer

Option D is correct because a CloudWatch alarm on the CPUUtilization metric can be configured to evaluate the average CPU usage over a 5-minute period and trigger an action when it exceeds 90%. The alarm can send a notification via Amazon SNS, which can deliver an email to subscribed endpoints. This directly meets the requirement for email notification based on a sustained metric threshold.

Exam trap

The trap here is that candidates may confuse CloudWatch Logs metric filters (which require log data) with CloudWatch metrics (which are numeric time-series data), or think AWS Config can monitor performance metrics instead of configuration compliance.

How to eliminate wrong answers

Option A is wrong because CloudWatch Logs metric filters analyze log data (e.g., application logs) for specific patterns, not numeric metrics like CPU utilization; CPU utilization is a standard EC2 metric emitted by the hypervisor, not a log entry. Option B is wrong because AWS Config rules evaluate resource configurations (e.g., instance type, tags) for compliance, not real-time performance metrics like CPU usage; Config does not monitor metric thresholds or trigger SNS notifications for metric breaches. Option C is wrong because a CloudWatch Events rule on EC2 instance state changes (e.g., running, stopped) does not monitor CPU utilization; it only reacts to lifecycle events, not metric-based conditions.

Practice this question →

56

MCQmedium

A SysOps administrator notices that an EC2 instance's CPU utilization has been at 100% for the past hour. The administrator checks CloudWatch metrics and sees no anomalies in network or disk I/O. Which step should the administrator take to investigate further?

A.Install the CloudWatch Logs agent on the instance to capture system logs.

B.Check the EC2 instance's CPU credit balance in CloudWatch.

C.Stop the EC2 instance and start it again to reset the CPU.

D.Enable detailed monitoring on the EC2 instance to get 1-minute CloudWatch metrics.

AnswerD

Detailed monitoring provides higher-resolution data to diagnose the issue.

Why this answer

Detailed monitoring (1-minute metrics) provides higher-resolution data than the default 5-minute metrics, allowing the administrator to identify short-lived CPU spikes or patterns that might be averaged out in the standard 5-minute interval. Since network and disk I/O appear normal, the issue is likely a process or application consuming CPU, and finer-grained metrics help pinpoint the timing and correlate with specific events or logs.

Exam trap

The trap here is that candidates assume CPU credit balance (Option B) is always the answer for high CPU utilization, but credits only apply to T-series instances, and the question does not specify the instance type, making detailed monitoring the more universally correct first step for investigation.

How to eliminate wrong answers

Option A is wrong because the CloudWatch Logs agent captures system logs (e.g., /var/log/messages) but does not provide CPU utilization metrics; the administrator already has CPU metrics and needs higher resolution, not logs. Option B is wrong because CPU credit balance is only relevant for burstable performance instance types (e.g., T2/T3); the question does not specify the instance type, and 100% CPU utilization for an hour on a non-burstable instance would not involve credits. Option C is wrong because stopping and starting the instance does not reset CPU utilization; it only changes the underlying host, and the root cause (e.g., a runaway process) would persist unless the instance is configured to terminate and relaunch.

Practice this question →

57

MCQmedium

A SysOps administrator suspects that an Amazon Linux 2 EC2 instance has been compromised. The instance is part of an Auto Scaling group and is currently running. The administrator needs to preserve the root volume for forensic analysis while minimizing the impact to the application. Which action should the administrator take FIRST?

A.Create an Amazon Machine Image (AMI) of the instance while it is running.

B.Stop the instance and create an Amazon EBS snapshot of the root volume.

C.Detach the root volume from the instance and create a snapshot of the volume for analysis.

D.Terminate the instance immediately to stop any malicious activity.

AnswerC

Detaching preserves the volume and allows forensic snapshot; Auto Scaling will launch a replacement instance.

Why this answer

Detaching the root volume and creating a snapshot preserves the evidence while the instance can be terminated and replaced by the Auto Scaling group. Option A (terminate) loses the volume. Option C (stop) leaves the volume attached to a stopped instance.

Option D (AMI) creates a new image but does not preserve the exact state if the instance continues running.

Practice this question →

58

Multi-Selecthard

A SysOps administrator needs to receive alerts when an S3 bucket is publicly accessible. Which TWO AWS services can be used to monitor and detect this configuration?

Select 2 answers

A.AWS CloudTrail

B.AWS Security Hub

C.AWS Trusted Advisor

D.AWS Config

E.Amazon CloudWatch

AnswersB, D

Security Hub aggregates findings from Config and other services.

Why this answer

AWS Security Hub (B) is correct because it aggregates security findings from multiple AWS services, including Amazon GuardDuty and AWS Config, and can detect publicly accessible S3 buckets via its built-in security standards (e.g., CIS AWS Foundations Benchmark). AWS Config (D) is correct because it can evaluate S3 bucket configurations against rules, such as the managed rule 's3-bucket-public-read-prohibited' or 's3-bucket-public-write-prohibited', and trigger alerts when a bucket becomes publicly accessible.

Exam trap

The trap here is that candidates often choose AWS Trusted Advisor (C) because it has a 'S3 Bucket Permissions' check, but they overlook that it does not provide real-time alerts or continuous monitoring, unlike AWS Config which can trigger immediate notifications via Amazon SNS.

Practice this question →

59

MCQmedium

A company has an application that writes logs to CloudWatch Logs. The SysOps administrator needs to search for a specific error pattern across multiple log groups. Which solution is the most efficient?

A.Create a CloudWatch dashboard to visualize log data.

B.Use CloudWatch Logs Insights to query the log groups.

C.Create a metric filter to count the error pattern.

D.Create a subscription filter to stream logs to Amazon ES.

AnswerB

Logs Insights is designed for interactive log analysis.

Why this answer

CloudWatch Logs Insights is purpose-built for interactive ad-hoc querying of log data across multiple log groups, enabling efficient pattern matching and filtering without requiring pre-configured infrastructure. It uses a dedicated query language optimized for searching, aggregating, and analyzing log events, making it the most efficient solution for searching a specific error pattern across multiple log groups.

Exam trap

The trap here is that candidates often confuse metric filters (which only count occurrences) with the ability to search and retrieve actual log events, leading them to choose Option C instead of the correct query-based solution.

How to eliminate wrong answers

Option A is wrong because CloudWatch dashboards are designed for visualizing metrics and log data in pre-defined widgets, not for performing ad-hoc searches or queries across multiple log groups. Option C is wrong because metric filters only count occurrences of a pattern and emit a metric, but they do not allow you to search or retrieve the actual log events containing the error pattern. Option D is wrong because subscription filters stream logs to Amazon ES (now OpenSearch Service) for long-term analysis and visualization, which adds latency, cost, and operational overhead compared to directly querying the log groups with Logs Insights.

Practice this question →

60

MCQhard

An IAM policy is attached to an EC2 instance role to allow sending logs to CloudWatch Logs. The application running on the instance fails to send logs to the log group 'MyAppLogGroup'. Which change is required to fix the issue?

A.Install the CloudWatch agent on the instance.

B.Attach the policy to the EC2 instance instead of the instance role.

C.Add a new statement allowing logs:PutLogEvents on 'arn:aws:logs:us-east-1:123456789012:log-group:MyAppLogGroup:log-stream:*'.

D.Change the log group ARN in the policy to include the log stream name.

AnswerC

PutLogEvents requires permissions on the log stream resource, not just the log group.

Why this answer

The IAM policy attached to the EC2 instance role is missing the `logs:PutLogEvents` permission for the specific log stream within the log group. Even if the policy allows `logs:CreateLogStream` and `logs:DescribeLogGroups`, the application cannot send log events without `logs:PutLogEvents` on the log stream resource. Option C adds the required statement with the correct ARN pattern to resolve the failure.

Exam trap

The trap here is that candidates assume the CloudWatch agent is required for any log delivery, or that attaching a policy directly to the instance is possible, when the real issue is a missing `PutLogEvents` permission on the log stream resource.

How to eliminate wrong answers

Option A is wrong because the CloudWatch agent is not required for sending logs via the AWS SDK or CLI; the application can use the `PutLogEvents` API directly, and the issue is a permissions problem, not a missing agent. Option B is wrong because IAM policies cannot be attached directly to an EC2 instance; they must be attached to an IAM role that is then associated with the instance profile. Option D is wrong because the log group ARN in the policy does not need to include the log stream name; the policy can use a wildcard for the log stream (e.g., `log-stream:*`) to allow `PutLogEvents` on any stream within the group.

Practice this question →

61

Multi-Selecthard

A SysOps administrator is troubleshooting a slow-running Amazon RDS for PostgreSQL instance. The administrator suspects that a specific query is causing high I/O. Which tools should be used together to identify the query and its I/O impact? (Choose THREE.)

Select 3 answers

A.Amazon RDS Performance Insights

B.Amazon CloudWatch Logs

C.AWS CloudTrail

D.Amazon RDS Enhanced Monitoring

E.RDS Performance Insights with database load and wait events

AnswersA, D, E

Shows top SQL queries and their load.

Why this answer

Amazon RDS Performance Insights (A) provides a database performance dashboard that visualizes database load and identifies the specific queries causing high load, including I/O wait events. By correlating this with Amazon RDS Enhanced Monitoring (D), which offers OS-level metrics like read/write IOPS and queue depth, you can pinpoint the exact query and its I/O impact. The combination of RDS Performance Insights with database load and wait events (E) directly surfaces the query text and the wait types (e.g., IO:DataFileRead) to confirm high I/O.

Exam trap

The trap here is that candidates might select CloudWatch Logs or CloudTrail thinking they can capture query logs or API calls to diagnose performance, but neither provides the real-time database load breakdown or OS-level I/O metrics needed to identify the specific query and its I/O impact.

Practice this question →

62

MCQhard

A SysOps administrator is responsible for a multi-tier web application running on AWS. The application consists of an Application Load Balancer (ALB), an Auto Scaling group of EC2 instances, and an Amazon RDS for MySQL database. Recently, the operations team has been receiving alerts from CloudWatch that the ALB's 'HTTPCode_Target_5XX_Count' metric is spiking periodically. The team has also noticed that the database CPU utilization is high during these spikes. The application logs show that some requests are timing out. The administrator needs to identify the root cause and implement a remediation. After reviewing the architecture, the administrator rules out the database as the bottleneck because the database connections are pooled and the query response times are normal. The administrator suspects that the issue is related to the application server's health. Which course of action should the administrator take to diagnose and resolve the issue?

A.Increase the health check timeout and threshold to allow for transient high CPU usage.

B.Add an Amazon ElastiCache cluster to cache database queries.

C.Decrease the health check interval to detect unhealthy instances faster.

D.Increase the idle timeout on the ALB to keep connections open longer.

AnswerA

This reduces false positive health check failures during short-lived CPU spikes, preventing unnecessary 5xx errors.

Why this answer

Option A is correct because the periodic spikes in ALB 5xx errors, high database CPU, and application timeouts, combined with normal query response times and pooled connections, strongly suggest that the application servers are becoming overwhelmed and failing health checks. By increasing the health check timeout and threshold, the ALB will allow the EC2 instances more time to recover from transient CPU or memory pressure before being marked unhealthy and removed from service, which prevents unnecessary instance replacement and reduces the cascading load on remaining instances and the database.

Exam trap

The trap here is that candidates often assume high database CPU means the database is the bottleneck, but the question explicitly states query response times are normal and connections are pooled, so the real issue is application server health causing retries and cascading load, which is remediated by tuning health check sensitivity rather than adding caching or changing timeouts.

How to eliminate wrong answers

Option B is wrong because adding an ElastiCache cluster addresses database read performance, but the database is not the bottleneck (query response times are normal and connections are pooled), so caching would not resolve the application server health issue causing the 5xx errors. Option C is wrong because decreasing the health check interval would cause the ALB to check instances more frequently, potentially marking them unhealthy even faster during transient spikes, worsening the problem by cycling instances out of service more aggressively. Option D is wrong because increasing the idle timeout on the ALB keeps connections open longer, which does not address the root cause of application server health failures; it would only delay connection closure, potentially masking the issue and increasing resource consumption on already stressed instances.

Practice this question →

63

MCQeasy

A company has an Auto Scaling group of EC2 instances that are critical for production. The SysOps administrator needs to be notified immediately when any instance in the group enters the 'InService' state after a scale-out event. Which approach should be used?

A.Create a CloudWatch alarm on the Auto Scaling group metric 'GroupTotalInstances' with a threshold of 1.

B.Use CloudTrail to monitor the 'CompleteLifecycleAction' API call and send notifications via CloudWatch Events.

C.Configure an Auto Scaling lifecycle hook that sends a notification to an SNS topic when the instance transitions to the 'InService' state.

D.Configure an Amazon SQS queue and have the EC2 instances send a message when they become 'InService'.

AnswerC

Lifecycle hooks provide a direct mechanism to notify when an instance reaches a specific state.

Why this answer

Option C is correct because Auto Scaling lifecycle hooks can be configured to send a notification to an SNS topic when an instance transitions to the 'InService' state. This allows the SysOps administrator to be notified immediately after a scale-out event, as the lifecycle hook pauses the instance during the transition and sends the notification before completing the action.

Exam trap

The trap here is that candidates may confuse lifecycle hooks with CloudWatch alarms or CloudTrail, thinking that monitoring API calls or aggregate metrics can provide per-instance state notifications, but only lifecycle hooks offer a native, event-driven mechanism tied directly to the instance state transition.

How to eliminate wrong answers

Option A is wrong because the 'GroupTotalInstances' metric tracks the total number of instances in the Auto Scaling group, not the state of individual instances; a threshold of 1 would trigger on any change in instance count, not specifically when an instance enters 'InService'. Option B is wrong because 'CompleteLifecycleAction' is an API call made by the administrator or automation to signal the completion of a lifecycle action, not a notification that an instance has entered 'InService'; CloudTrail logs API calls but does not directly send notifications for state transitions. Option D is wrong because having EC2 instances send a message to an SQS queue when they become 'InService' requires custom scripting and does not provide a native, reliable mechanism tied to the Auto Scaling lifecycle; it also introduces latency and potential failure points.

Practice this question →

64

MCQeasy

A SysOps administrator needs to monitor the memory utilization of an EC2 instance running Windows Server. Which steps are required to collect memory metrics?

A.Install and configure the Amazon CloudWatch agent on the instance.

B.Install the AWS Systems Manager Agent (SSM Agent) and use Run Command.

C.Enable detailed monitoring on the EC2 instance.

D.Use the AWS Management Console to enable memory monitoring.

AnswerA

The CloudWatch agent can collect memory and disk metrics.

Why this answer

Amazon CloudWatch does not collect memory metrics from EC2 instances by default; it only captures hypervisor-level metrics such as CPU, network, and disk I/O. To monitor in-guest memory utilization on a Windows Server instance, you must install and configure the Amazon CloudWatch agent, which sends custom metrics (e.g., Memory % Committed Bytes In Use) to CloudWatch. The agent uses the Windows Performance Monitor (PerfMon) counters to gather this data.

Exam trap

The trap here is that candidates assume memory metrics are automatically available or can be enabled via a simple console toggle, when in fact they require the CloudWatch agent to be installed and configured on the instance.

How to eliminate wrong answers

Option B is wrong because the AWS Systems Manager Agent (SSM Agent) and Run Command are used for management tasks like patching or executing scripts, not for collecting and publishing memory metrics to CloudWatch. Option C is wrong because enabling detailed monitoring only increases the frequency of existing hypervisor-level metrics (e.g., CPU, network) from 5 minutes to 1 minute; it does not add in-guest memory metrics. Option D is wrong because the AWS Management Console does not have a built-in toggle to enable memory monitoring; memory metrics require the CloudWatch agent to be installed and configured on the instance.

Practice this question →

65

MCQmedium

A SysOps administrator needs to monitor memory utilization of an Amazon EC2 instance. The default Amazon CloudWatch metrics for EC2 do not include memory utilization. Which solution should the administrator implement to collect memory metrics and set alarms?

A.Install the CloudWatch agent on the instance and configure it to collect memory metrics

B.Enable detailed monitoring on the EC2 instance

C.Use AWS Systems Manager Patch Manager to report memory usage

D.Use AWS CloudTrail to log memory events

AnswerA

The CloudWatch agent can capture OS-level metrics (memory, disk, etc.) and send them to CloudWatch, allowing alarms to be created.

Why this answer

The CloudWatch agent is specifically designed to collect custom metrics, such as memory utilization, from EC2 instances and on-premises servers. Unlike the default EC2 metrics, which only capture hypervisor-level metrics (e.g., CPU, disk I/O, network), memory utilization requires OS-level access. The CloudWatch agent uses the `mem` plugin to gather memory data and can publish it to CloudWatch as custom metrics, enabling alarm configuration.

Exam trap

The trap here is that candidates often confuse 'detailed monitoring' with the ability to collect additional metrics, but detailed monitoring only increases the resolution of existing metrics, not the scope of what is collected.

How to eliminate wrong answers

Option B is wrong because enabling detailed monitoring increases the frequency of existing EC2 metrics (e.g., CPU, disk I/O) from 5 minutes to 1 minute, but it does not add new metrics like memory utilization. Option C is wrong because AWS Systems Manager Patch Manager is used for patching and compliance, not for collecting or reporting memory usage metrics. Option D is wrong because AWS CloudTrail logs API calls and management events, not OS-level performance data like memory utilization.

Practice this question →

66

MCQhard

A company is running a critical application on Amazon EC2 instances. The application performance has degraded over the past week. The SysOps administrator suspects a memory leak. The administrator needs to collect detailed memory usage metrics every minute and store them for 30 days. Which solution is the MOST cost-effective and operationally efficient?

A.Use AWS CloudTrail to log memory usage and store the logs in Amazon S3.

B.Enable default EC2 monitoring to collect memory metrics every 5 minutes.

C.Enable detailed monitoring on the EC2 instances to collect memory metrics every 1 minute.

D.Install the CloudWatch agent on the EC2 instances to collect memory metrics and publish them to CloudWatch.

AnswerD

The CloudWatch agent can collect memory metrics and publish them as custom metrics with 1-minute resolution.

Why this answer

Option D is correct because the CloudWatch agent is specifically designed to collect custom metrics like memory utilization from EC2 instances, which are not available through default or detailed EC2 monitoring. By installing the agent and configuring it to publish memory metrics to CloudWatch every minute, the administrator can meet the requirement for 1-minute granularity and 30-day retention cost-effectively, as CloudWatch retains metric data at 1-minute resolution for 15 days by default, but can be extended to 30 days via a custom metric retention setting.

Exam trap

The trap here is that candidates often confuse detailed EC2 monitoring (which only covers hypervisor-level metrics) with in-guest monitoring, assuming that enabling detailed monitoring will capture memory usage, but memory is a guest OS metric that requires the CloudWatch agent.

How to eliminate wrong answers

Option A is wrong because AWS CloudTrail logs API activity, not system-level metrics like memory usage; it cannot capture memory utilization data from EC2 instances. Option B is wrong because default EC2 monitoring collects metrics (e.g., CPU, network) every 5 minutes, not memory metrics, and the 5-minute interval does not meet the 1-minute requirement. Option C is wrong because detailed EC2 monitoring collects metrics every 1 minute but only includes hypervisor-level metrics (e.g., CPU, disk I/O, network); memory metrics are not provided by EC2 monitoring and require an in-guest agent like the CloudWatch agent.

Practice this question →

67

MCQeasy

A SysOps administrator needs to receive an email notification when an IAM user's console login fails. Which AWS service should be used to set up this notification?

A.Amazon CloudWatch

B.AWS CloudTrail

C.Amazon Simple Notification Service (SNS)

D.AWS Config

AnswerA

CloudWatch can create a metric filter on a CloudTrail log group to count failed login events, set an alarm on that metric, and publish to SNS for email notification. CloudWatch is the core monitoring and alerting service.

Why this answer

Amazon CloudWatch can monitor AWS CloudTrail log events for IAM console login failures by creating a metric filter on the CloudTrail log group for the `ConsoleLogin` event with a `failure` status. When the filter matches, CloudWatch can trigger an alarm that sends an email notification via Amazon SNS. This is the correct service because CloudWatch is designed for monitoring and alerting on operational metrics and log patterns.

Exam trap

The trap here is that candidates often pick Amazon SNS directly, forgetting that SNS is a notification channel, not a monitoring service, and requires CloudWatch to detect the failure event first.

How to eliminate wrong answers

Option B (AWS CloudTrail) is wrong because CloudTrail only records API activity and audit logs; it cannot directly send email notifications or trigger alerts without CloudWatch. Option C (Amazon SNS) is wrong because SNS is a pub/sub messaging service that delivers notifications, but it cannot monitor or detect login failures on its own; it requires a trigger from another service like CloudWatch. Option D (AWS Config) is wrong because Config evaluates resource configurations and compliance rules, not real-time login events or authentication failures.

Practice this question →

68

MCQeasy

A SysOps administrator needs to audit all API calls made in an AWS account for compliance and security analysis. The logs must be stored securely for at least one year. Which AWS service should the administrator enable?

A.AWS CloudTrail

B.Amazon CloudWatch Logs

C.AWS Config

D.Amazon GuardDuty

AnswerA

CloudTrail records API activity and delivers log files to S3, enabling auditing and compliance needs.

Why this answer

AWS CloudTrail is the correct service because it records all API calls made in an AWS account, including the identity of the caller, the time of the call, the source IP address, and the request parameters. This audit log is essential for compliance and security analysis, and CloudTrail can be configured to store logs in an S3 bucket with lifecycle policies to retain them for at least one year.

Exam trap

The trap here is that candidates often confuse CloudTrail with CloudWatch Logs, thinking CloudWatch Logs can capture API calls, but CloudWatch Logs only stores logs from services that explicitly send them, not the full audit trail of API activity.

How to eliminate wrong answers

Option B (Amazon CloudWatch Logs) is wrong because CloudWatch Logs is designed for monitoring, storing, and accessing log files from AWS resources (e.g., EC2, Lambda), not for auditing API calls; it does not capture AWS API activity by default. Option C (AWS Config) is wrong because AWS Config evaluates resource configurations against desired policies and records configuration changes, but it does not log API calls or provide a record of who made changes. Option D (Amazon GuardDuty) is wrong because GuardDuty is a threat detection service that analyzes CloudTrail logs, VPC flow logs, and DNS logs for malicious activity, but it does not itself generate or store API call logs for auditing.

Practice this question →

69

MCQhard

A company has an S3 bucket that stores sensitive data. A SysOps administrator needs to detect when objects in the bucket are publicly accessible. Which AWS service should the administrator use to continuously monitor and report on public access?

A.AWS Config

B.S3 server access logs

C.Amazon GuardDuty

D.AWS Trusted Advisor

AnswerA

AWS Config can continuously evaluate S3 bucket policies against rules for public access.

Why this answer

AWS Config provides a managed rule called 's3-bucket-public-read-prohibited' and 's3-bucket-public-write-prohibited' that continuously evaluates S3 bucket policies and ACLs against the desired configuration. When a bucket becomes publicly accessible, AWS Config flags it as noncompliant and can trigger automated remediation or notifications. This makes it the correct service for ongoing monitoring and reporting of public access to sensitive data.

Exam trap

The trap here is that candidates often confuse 'detecting public access' with 'auditing access logs' (S3 server access logs) or 'threat detection' (GuardDuty), but the question specifically asks for continuous monitoring and reporting of the bucket's configuration state, which is exactly what AWS Config's managed rules provide.

How to eliminate wrong answers

Option B is wrong because S3 server access logs record detailed requests made to the bucket (e.g., requester, action, response status), but they do not evaluate or report on the bucket's public access configuration; they are used for auditing who accessed objects, not for detecting whether the bucket itself is publicly accessible. Option C is wrong because Amazon GuardDuty is a threat detection service that analyzes VPC Flow Logs, DNS logs, and CloudTrail events for malicious activity, not S3 bucket policies or ACLs; it does not monitor public access settings on S3 buckets. Option D is wrong because AWS Trusted Advisor provides one-time or periodic checks (e.g., S3 Bucket Permissions check) but does not offer continuous, real-time monitoring or compliance reporting; it is a best-practice advisory tool, not a configuration monitoring service.

Practice this question →

70

MCQmedium

A company uses AWS CloudTrail to log API calls across all regions. The SysOps administrator notices that logs for a specific region are missing from the centralized S3 bucket. What is the most likely cause?

A.The CloudTrail trail is not enabled for that region.

B.The S3 bucket policy denies write access from CloudTrail for that region.

C.CloudTrail log file validation is disabled.

D.The IAM role for CloudTrail does not have permissions to write logs from that region.

AnswerA

Correct. CloudTrail must be explicitly enabled for each region or a multi-region trail must be used. Missing logs for a specific region strongly suggests the trail is not applied there.

Why this answer

CloudTrail trails can be configured to log API calls from specific regions or all regions. If logs for a particular region are missing from the centralized S3 bucket, the most likely cause is that the trail was not enabled for that region during trail creation or update. By default, a trail applied to all regions will automatically log activity from every region, but if the trail is configured for a single region or a subset, other regions will not have their logs delivered.

Exam trap

The trap here is that candidates often assume missing logs are due to a permissions or policy issue (options B or D), when in fact the most common root cause is a simple configuration oversight where the trail is not set to log from all regions or the specific region was not included.

How to eliminate wrong answers

Option B is wrong because if the S3 bucket policy denied write access from CloudTrail for that specific region, logs from all regions would likely be affected or the error would appear in CloudTrail’s delivery status, not just missing logs for one region. Option C is wrong because log file validation is a security feature that adds a digest file for integrity checks; disabling it does not prevent logs from being delivered to the S3 bucket. Option D is wrong because CloudTrail uses a service-linked role or a customer-managed IAM role that is not region-specific; if the role lacked permissions, logs from all regions would fail to be delivered, not just one region.

Practice this question →

71

Multi-Selecthard

A company is using Amazon CloudWatch Logs to collect logs from multiple AWS services. The SysOps administrator needs to query logs across multiple log groups in real-time. Which THREE of the following are capabilities of CloudWatch Logs Insights?

Select 3 answers

A.Export query results directly to an S3 bucket.

B.Schedule queries to run at a specific time.

C.Run queries in real-time against incoming log data.

D.Visualize query results with bar charts and line graphs.

E.Query multiple log groups in a single query.

AnswersC, D, E

Insights queries are near real-time on log data ingested.

Why this answer

CloudWatch Logs Insights supports real-time queries against incoming log data, allowing you to analyze logs as they are ingested. This is enabled by its ability to query live streams without requiring data to be indexed first, making it suitable for real-time troubleshooting and monitoring.

Exam trap

The trap here is that candidates may confuse CloudWatch Logs Insights' real-time querying with scheduled or export capabilities, which are actually handled by separate AWS services like EventBridge or S3 Export tasks, not by Insights itself.

Practice this question →

72

Multi-Selecteasy

A SysOps administrator wants to be alerted when the root user of the AWS account signs in. Which TWO services can be used together to achieve this?

Select 2 answers

A.AWS Lambda

B.AWS CloudTrail

C.AWS Config

D.Amazon CloudWatch Events (Amazon EventBridge)

E.AWS Trusted Advisor

AnswersB, D

CloudTrail logs ConsoleLogin events for the root user.

Why this answer

Option B is correct because AWS CloudTrail logs all API calls, including root user sign-ins, as `RootLogin` events in the management events trail. Option D is correct because Amazon CloudWatch Events (now part of Amazon EventBridge) can be configured with a rule that matches these CloudTrail log events and triggers a notification action, such as sending an SNS alert, when the root user signs in.

Exam trap

The trap here is that candidates often pick AWS Config or Trusted Advisor because they associate them with security monitoring, but neither service captures API call events like root sign-ins, which require CloudTrail and EventBridge for event-driven alerting.

Practice this question →

73

Multi-Selectmedium

A SysOps administrator needs to set up monitoring for an application that runs on an EC2 instance. The application generates custom metrics that should be available for analysis in CloudWatch. Which steps are required to achieve this? (Select TWO.)

Select 2 answers

A.Attach an IAM role to the EC2 instance with permissions to call PutMetricData.

B.Create an SNS topic and subscribe the application to send metrics.

C.Install the CloudWatch Logs agent to send custom metrics.

D.Use the CloudWatch agent or AWS CLI to publish custom metrics using the put-metric-data command.

E.Enable detailed monitoring on the EC2 instance to collect custom metrics.

AnswersA, D

The instance needs IAM permissions to publish custom metrics.

Why this answer

Option A is correct because the EC2 instance must have an IAM role attached with permissions to call PutMetricData, which authorizes the instance to publish custom metrics to CloudWatch. Without this IAM role, any attempt to send metrics from the instance will fail due to missing credentials.

Exam trap

The trap here is confusing the CloudWatch Logs agent with the CloudWatch agent, as the Logs agent cannot send custom metrics, and assuming detailed monitoring automatically captures application-level metrics rather than just increasing the frequency of default EC2 metrics.

Practice this question →

74

MCQmedium

A SysOps administrator is troubleshooting an issue where an Application Load Balancer (ALB) is returning HTTP 503 errors to clients. The target group is healthy, and the instances are passing health checks. What is the most likely cause of the 503 errors?

A.The target instances have reached the maximum number of concurrent connections allowed by the application or operating system.

B.The target instances do not have enough capacity to handle the request volume.

C.The health check settings are configured incorrectly, causing healthy instances to be marked as unhealthy.

D.The security group for the targets does not allow traffic from the ALB.

AnswerA

When the application's connection backlog is full, the ALB receives a 'connection refused' or timeout, resulting in a 503.

Why this answer

Option C is correct because HTTP 503 errors from an ALB typically indicate the load balancer cannot establish a connection to the target due to the target's connection queue being full. Option A is wrong because insufficient capacity usually causes 502 or timeout errors, not 503. Option B is wrong because misconfigured health checks would show unhealthy targets.

Option D is wrong because a missing security group rule would cause 502 or timeout.

Practice this question →

75

MCQeasy

A company uses Amazon CloudWatch Logs to store application logs. The SysOps administrator needs to count the occurrences of the string 'ERROR' in the logs and trigger an Amazon SNS notification when more than 10 errors occur within a 5-minute window. Which steps should the administrator take?

A.Create a metric filter on the log group and then create a CloudWatch alarm on the resulting metric

B.Create a CloudWatch alarm directly on the log group

C.Create an AWS Lambda function to parse the logs and send a notification to Amazon SNS

D.Create an Amazon EventBridge rule to filter log events and send to SNS

AnswerA

Creating a metric filter on the log group produces a metric that can be used in a CloudWatch alarm. This is the standard, low-operational-overhead approach.

Why this answer

A metric filter on a CloudWatch Logs log group extracts a numeric metric (e.g., count of 'ERROR' occurrences) and publishes it to a CloudWatch custom metric. A CloudWatch alarm can then be configured on that metric to evaluate a threshold (e.g., >10) over a specified period (e.g., 5 minutes) and trigger an SNS notification when breached. This is the native, serverless, and cost-effective approach for counting log patterns and alerting.

Exam trap

The trap here is that candidates may think they can directly alarm on a log group (Option B) or assume a Lambda function is required for custom log parsing (Option C), but the exam expects knowledge of CloudWatch Logs metric filters as the native solution for counting patterns and triggering alarms.

How to eliminate wrong answers

Option B is wrong because CloudWatch alarms cannot be created directly on a log group; alarms require a numeric metric as input, not raw log data. Option C is wrong because while a Lambda function could parse logs and send to SNS, it introduces unnecessary complexity, cost, and potential latency compared to the built-in metric filter and alarm mechanism. Option D is wrong because Amazon EventBridge rules can filter log events from CloudWatch Logs but cannot perform aggregation (e.g., count occurrences over a time window) to trigger an alarm based on a threshold; EventBridge is designed for event-driven patterns, not metric-based alerting.

Practice this question →

Page 1 of 5 · 302 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Soa Monitoring Logging questions.

Start 20-question session