220-1101Chapter 20 of 123Objective 5.2

Troubleshoot: Storage and RAID

This chapter covers troubleshooting storage devices and RAID arrays, a core competency for the CompTIA A+ 220-1101 exam. Storage failures are among the most common hardware issues, and RAID adds complexity with multiple drives and configurations. Approximately 15-20% of the Hardware Troubleshooting domain (Objective 5.2) involves storage and RAID, making this a high-yield topic. You will learn to identify symptoms, isolate causes, and apply systematic troubleshooting steps for both single-drive and RAID failures, including interpreting error messages, testing components, and recovering data.

25 min read
Intermediate
Updated May 31, 2026

RAID: A Team of Secretaries with a Filing System

Imagine a busy office with several secretaries (hard drives) who share the task of managing a large filing cabinet (the data storage). The boss (the RAID controller) assigns different strategies for how they work together. In RAID 0, the boss splits each document into pages and gives each page to a different secretary to file simultaneously. This speeds up filing (performance) but if one secretary goes on leave (drive fails), you lose all the pages of every document—no redundancy. In RAID 1, the boss gives a complete copy of every document to two secretaries. One can go on leave and you still have the other copy. But you use twice the filing space (capacity cost). In RAID 5, the boss splits documents across three or more secretaries, but also has them calculate a simple math check (parity) stored on a different secretary each time. If one secretary is absent, the remaining secretaries can reconstruct the missing pages using the math checks. This balances speed, space, and fault tolerance. RAID 10 combines mirroring (RAID 1) and striping (RAID 0): the boss creates two sets of secretaries, each set mirrors the other, but within each set, documents are striped. This provides both speed and redundancy, but costs more filing space. The boss (controller) can be a dedicated manager (hardware RAID) or the secretaries can coordinate among themselves (software RAID). The key is that the boss decides the strategy, and the secretaries follow it exactly.

How It Actually Works

Understanding Storage Devices and Their Failure Modes

Storage devices—hard disk drives (HDDs) and solid-state drives (SSDs)—are electromechanical or solid-state components that store data persistently. The CompTIA A+ 220-1101 exam expects you to recognize common failure symptoms and differentiate between HDD and SSD issues.

Hard Disk Drives (HDDs): - Use spinning platters and a moving actuator arm with read/write heads. - Common failures: bad sectors, head crashes, spindle motor failure, logical corruption. - Symptoms: clicking or grinding noises (physical failure), slow read/write speeds, frequent errors, inability to boot, file system corruption. - SMART (Self-Monitoring, Analysis, and Reporting Technology) attributes: Reallocated Sectors Count, Current Pending Sector Count, Spin Retry Count, Read Error Rate. Thresholds vary by manufacturer; many tools flag drives when values exceed safe limits.

Solid-State Drives (SSDs): - Use NAND flash memory chips; no moving parts. - Common failures: NAND wear-out (limited program/erase cycles), controller failure, power loss corruption, firmware bugs. - Symptoms: sudden failure without warning (especially for controller failure), drive not detected, corrupted data, TRIM issues (performance degradation over time). - SMART attributes: Media Wearout Indicator, Percentage Used, Erase Fail Count, Power-On Hours.

Common Troubleshooting Steps for Storage Issues: 1. Identify the symptom: Drive not detected, boot failure, data corruption, slow performance, strange noises. 2. Check connections: For internal SATA drives, ensure power and data cables are secure. For external USB/eSATA, check cable and port. 3. Test with known good hardware: Swap cables, try different port, connect drive to another system. 4. Check BIOS/UEFI: Ensure drive is detected in system setup. If not, check SATA mode (AHCI vs. IDE) and drive enable/disable settings. 5. Use operating system tools:

- Windows: Disk Management (diskmgmt.msc), CHKDSK, SFC, Device Manager. - Linux: fdisk -l, lsblk, smartctl, badblocks. 6. Run manufacturer diagnostics: Many vendors (Seagate, WD, Samsung, Crucial) provide free tools to test drive health. 7. Check SMART data: Use tools like CrystalDiskInfo (Windows) or smartctl (Linux). Look for reallocated sectors, pending sectors, uncorrectable errors, and temperature. 8. Backup and replace: If drive is failing, back up data immediately and replace the drive.

Understanding RAID and Its Failure Modes

RAID (Redundant Array of Independent Disks) combines multiple physical drives into one logical unit for performance, redundancy, or both. The exam covers RAID 0, 1, 5, 10 (1+0), and sometimes JBOD (Just a Bunch Of Disks).

RAID Levels: - RAID 0 (Striping): Data is split across drives (stripes). No redundancy. If one drive fails, all data is lost. Performance is improved (simultaneous read/write). Minimum 2 drives. - RAID 1 (Mirroring): Data is duplicated exactly on two or more drives. If one drive fails, the other(s) have the data. Read performance may improve; write performance is similar to single drive. Minimum 2 drives. - RAID 5 (Striping with Parity): Data and parity are striped across three or more drives. Parity is computed (XOR) and distributed across drives. Can survive one drive failure. Read performance good; write performance suffers due to parity calculation. Minimum 3 drives. - RAID 10 (RAID 1+0): Combines mirroring and striping: first create mirrored pairs, then stripe across pairs. Can survive multiple drive failures as long as no mirror pair loses both drives. Good performance and redundancy. Minimum 4 drives. - JBOD: Drives are concatenated into one logical volume; no redundancy, no performance gain. If one drive fails, only data on that drive is lost.

RAID Implementation Types: - Hardware RAID: Uses a dedicated RAID controller card (e.g., LSI, Adaptec) or integrated on motherboard (e.g., Intel Rapid Storage Technology). Controller handles all RAID calculations; OS sees a single disk. Battery-backed cache (BBWC) or NVRAM protects writes during power loss. - Software RAID: Managed by the operating system (e.g., Windows Storage Spaces, Linux mdadm). Uses CPU for calculations; no dedicated hardware. Can be less performant but more flexible. - Firmware/Driver-based RAID (Fake RAID): Common on consumer motherboards. Uses a combination of firmware and driver; not true hardware RAID. Can cause issues when migrating drives.

Common RAID Failure Symptoms and Troubleshooting: 1. Degraded array: One or more drives have failed, but the array is still operational (if redundancy level allows). Performance may degrade. The RAID controller or OS will alert (e.g., beeps, status LED, event log). 2. Failed drive: A drive is completely unresponsive or shows errors. The array may become offline or nonfunctional depending on RAID level. 3. Offline array: Multiple drives failed or configuration lost. No access to data. 4. Logical corruption: File system errors, missing data, invalid partition table. 5. Performance issues: Slow read/write, high latency, resync/rebuild in progress.

Troubleshooting Steps for RAID: 1. Identify the symptom: Check RAID controller status (via BIOS/UEFI, management software, or OS tools). Look for degraded, failed, or offline status. 2. Check physical connections: Ensure all drives are properly connected (power and data cables). Reseat cables and drives. 3. Check drive health individually: If possible, test each drive with diagnostics. A drive may be failing but not yet failed. 4. Check RAID configuration: Verify that the RAID level, stripe size, and member drives are correct. If a drive was replaced, ensure it is the correct type and size. 5. Rebuild the array: If a drive failed and was replaced, initiate a rebuild. Monitor progress. Rebuild time depends on array size, drive speed, and controller load. 6. Check for consistency errors: Some controllers support consistency checks or patrol reads to detect and correct latent errors. 7. Backup data immediately: If array is degraded, back up critical data before attempting repairs. 8. Replace failed drives: Use identical or compatible drives. Drives should match in capacity, speed, and interface (SATA, SAS, etc.). 9. Update firmware/drivers: For RAID controller and drives, firmware updates may fix bugs. 10. If array is offline and critical: Professional data recovery may be required. Do not attempt rebuild without proper knowledge.

Key Troubleshooting Tools and Commands

Windows: - diskpart – list disk, select disk, detail disk. - chkdsk /f /r – fix file system errors and locate bad sectors. - sfc /scannow – system file checker. - wmic diskdrive get status – check drive status. - Get-PhysicalDisk (PowerShell) – for Storage Spaces.

Linux: - lsblk – list block devices. - fdisk -l – list partitions. - smartctl -a /dev/sda – view SMART data. - mdadm --detail /dev/md0 – check RAID status (software RAID). - dmesg | grep sd – kernel messages about drive errors.

RAID Controller Utilities: - MegaRAID Storage Manager (LSI/Avago) – GUI. - storcli (command line for LSI). - HP Smart Storage Administrator (HP servers). - Dell OpenManage Server Administrator.

Important RAID Concepts for the Exam

Stripe size: The amount of data written to each drive before moving to the next. Common values: 64 KB, 128 KB. Larger stripe size benefits sequential transfers; smaller benefits random I/O.

Parity: XOR calculation that allows data reconstruction if one drive fails. RAID 5 uses distributed parity; RAID 6 uses double parity (can survive two failures).

Hot spare: A spare drive that automatically replaces a failed drive in the array. The controller starts rebuilding immediately.

Global vs. dedicated hot spare: Global can be used by any array; dedicated is assigned to a specific array.

RAID rebuild: Process of reconstructing data on a replacement drive. During rebuild, performance is degraded and the array is vulnerable if another drive fails.

Write hole: In RAID 5/6, a power loss during parity update can cause data inconsistency. Some controllers use NVRAM or BBWC to mitigate.

TRIM/UNMAP: For SSDs in RAID, ensure the controller or OS supports pass-through of TRIM commands to maintain performance. Not all RAID controllers support TRIM.

Walk-Through

1

1. Gather Symptoms and Error Messages

Start by collecting all symptoms: is the drive/array detected? Are there error beeps or LED indicators? For RAID, note the controller's status (e.g., 'Degraded', 'Failed'). Check system logs (Event Viewer in Windows, syslog in Linux) for disk errors (e.g., event ID 7, 11, 15 for disk errors). Also note any application errors or crashes. For example, a 'Disk not found' error at boot may indicate a failed drive or loose cable. A 'SMART event' warning indicates impending failure. Documenting the exact error message is crucial for narrowing down the cause.

2

2. Verify Physical Connections

For internal drives, power off and open the case. Reseat the SATA/power cables on both the drive and motherboard/controller. Check for bent pins or debris. For external drives, try different USB ports or cables. For RAID arrays, ensure all drives are properly seated in the backplane or caddy. A loose connection can cause intermittent failures that mimic drive failure. Use known-good cables for testing. This step eliminates the most common cause of 'failed drive'—a bad cable.

3

3. Check BIOS/UEFI and Controller Configuration

Boot into BIOS/UEFI and verify that the drive or RAID controller is detected. For a single drive, check the SATA port is enabled and set to the correct mode (AHCI is recommended for modern OS; IDE for legacy compatibility). For RAID, enter the RAID controller configuration utility (usually by pressing Ctrl+I, Ctrl+R, or Ctrl+M during POST). Verify the array status: 'Optimal', 'Degraded', or 'Offline'. Check that all expected member drives are present and have the correct status. Also check the boot order—if the drive is not first, the system may not boot.

4

4. Use Operating System Disk Utilities

Boot from a live CD/USB or the OS if possible. Use Disk Management (Windows) or `lsblk` (Linux) to see if the drive appears. If it appears but is uninitialized or shows an invalid partition table, the issue may be logical. Run `chkdsk` (Windows) or `fsck` (Linux) to check file system integrity. For RAID, use the OS tools: Windows Storage Spaces or Linux `mdadm` to check status. For example, `mdadm --detail /dev/md0` shows if the array is active and which drives are in sync. If the drive is not listed, the issue is likely physical or driver-related.

5

5. Run Drive Diagnostics and Check SMART Data

Use manufacturer diagnostic tools (e.g., SeaTools for Seagate, Data Lifeguard for WD) or third-party tools like CrystalDiskInfo. Check SMART attributes: Reallocated Sectors (if increasing, drive is failing), Current Pending Sectors (sectors waiting to be reallocated), and Raw Read Error Rate. For SSDs, check Percentage Used (wear level). If SMART shows critical values, the drive should be replaced immediately. For RAID, check the controller's event log for past errors. A drive with many reallocated sectors may be causing the array to degrade.

6

6. Isolate the Faulty Component

If a specific drive is suspected, test it individually in a known-good system. If the drive fails there, it's defective. If it works, the issue may be the controller, cables, or backplane. Swap cables and ports to isolate. For RAID, if one drive consistently drops out, replace it with a known-good drive. If the array goes to 'Optimal' after replacement, the original drive was faulty. If problems persist, the controller or backplane may be defective. Document serial numbers and firmware versions for RMA.

7

7. Replace Faulty Hardware and Rebuild

If a drive is confirmed faulty, replace it with an identical or compatible drive (same or larger capacity, same interface, same speed if possible). For RAID, the controller will automatically start rebuilding if a hot spare is configured, or you must manually initiate the rebuild. During rebuild, the array is in a degraded state and performance is lower. Do not power off during rebuild. After rebuild, verify array status and run consistency checks. For non-RAID drives, replace the drive and restore data from backup.

8

8. Verify Resolution and Document

After replacement or repair, verify that the issue is resolved: check that the drive/array is detected, file system is accessible, performance is normal, and no errors appear in logs. Run a full backup to ensure data integrity. Document the failure symptoms, root cause (e.g., 'failed HDD with bad sectors'), replacement part details, and steps taken. This documentation helps with future troubleshooting and warranty claims. Also update the inventory and asset tags if applicable.

What This Looks Like on the Job

In a small business server with a hardware RAID 5 array of four 4TB SATA HDDs, the administrator notices that the server is beeping and the RAID controller status shows 'Degraded'. The event log indicates that drive 3 has failed. The company uses a Dell PowerEdge server with PERC H730 controller. The admin follows the troubleshooting steps: first, he checks the physical connections—reseats the drive caddy and cables, but the status doesn't change. He then uses the Dell OpenManage tool to view the drive's SMART data, which shows a high number of reallocated sectors and a pending failure. He orders a replacement drive of the same model and capacity. After installing the new drive, the controller automatically starts rebuilding (hot spare not configured; manual rebuild required). The rebuild takes about 8 hours for 4TB drives. During this time, the server is vulnerable to another drive failure—if another drive fails, data loss is inevitable. The admin monitors the rebuild progress via the controller utility. After completion, the array status shows 'Optimal'. He then runs a consistency check to verify data integrity. In another scenario, a cloud data center uses a large RAID 10 array of 24 SSDs for a database server. A drive fails, but because RAID 10 can tolerate multiple failures (as long as no mirror pair loses both drives), the array remains online. The hot spare automatically activates and rebuilds. The administrator receives an alert and replaces the failed drive later during a maintenance window. Performance is unaffected because the rebuild uses dedicated spare resources. However, if the controller's firmware has a bug (e.g., certain older LSI controllers), the rebuild may cause performance degradation or even array failure. Therefore, keeping firmware updated is critical. In a home user scenario, a gaming PC with a software RAID 0 (striping) of two SSDs suddenly becomes unbootable. The user hears no strange noises, but the BIOS sees both drives individually. Using a live Linux USB, he runs mdadm --examine and finds that the superblock is corrupted. He attempts to reassemble the array with mdadm --assemble --force, but one drive shows errors. He then uses ddrescue to clone the failing drive to a new SSD, then reassembles the array with the clone. This recovers most data, but some files are corrupted. This highlights the risk of RAID 0—no redundancy—and the importance of backups.

How 220-1101 Actually Tests This

The CompTIA A+ 220-1101 exam tests storage and RAID troubleshooting under Objective 5.2 (Given a scenario, troubleshoot problems related to motherboards, RAM, storage, etc.). Specifically, you must be able to identify symptoms and apply troubleshooting steps for storage devices and RAID arrays. The exam will present scenarios where you must determine the most likely cause or the next step. Common wrong answers include: (1) Choosing to rebuild a degraded RAID 0 array—but RAID 0 cannot survive any drive failure; the array is lost and data must be restored from backup. Many candidates mistakenly think RAID 0 can be rebuilt. (2) Assuming a clicking sound from an HDD indicates a logical error—clicking is almost always a physical head crash, requiring replacement, not CHKDSK. (3) Thinking that replacing a failed drive in a RAID 5 array with a smaller capacity drive is acceptable—the replacement must be at least as large as the smallest member drive. (4) Confusing RAID 10 with RAID 0+1—the exam may ask about the order; RAID 10 (striped mirrors) is more common and allows more failure tolerance. Key numbers: RAID 0 requires 2 drives, RAID 1 requires 2, RAID 5 requires 3, RAID 10 requires 4. Hot spare can be global or dedicated. Rebuild time depends on drive size and controller. The exam loves to test the difference between hardware and software RAID (e.g., software RAID uses CPU resources). Edge cases: A drive that is detected in BIOS but not in OS may have a driver issue or be in a RAID configuration that the OS doesn't recognize. Also, if a RAID array is offline due to multiple failures, do not attempt to rebuild—seek data recovery. Finally, remember that TRIM is not always supported in RAID; for SSDs in RAID, performance may degrade over time if TRIM is not passed through.

Key Takeaways

RAID 0 requires minimum 2 drives, provides no redundancy, and one drive failure causes total data loss.

RAID 1 mirrors data; minimum 2 drives, can survive one drive failure per mirror set.

RAID 5 requires minimum 3 drives; uses distributed parity to survive one drive failure.

RAID 10 (1+0) requires minimum 4 drives; combines mirroring and striping, can survive multiple failures if no mirror pair is fully lost.

A clicking sound from an HDD indicates physical failure; do not run CHKDSK—replace the drive.

SMART attributes like Reallocated Sector Count and Current Pending Sector Count indicate drive health.

When replacing a drive in a RAID array, use an identical or larger capacity drive of the same interface.

Hardware RAID uses a dedicated controller; software RAID uses CPU and OS resources.

A degraded array is still accessible but vulnerable; rebuild immediately with a new drive.

SSDs can fail suddenly; monitor SMART attributes like Media Wearout Indicator.

TRIM is not always supported in RAID configurations; verify controller support for SSDs.

Always back up critical data before attempting RAID rebuild or drive replacement.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Hardware RAID

Dedicated RAID controller handles all RAID calculations, offloading CPU.

Often includes battery-backed cache (BBWC) for data protection during power loss.

OS sees a single logical drive; no OS configuration needed.

Generally faster and more reliable, especially for RAID 5/6.

More expensive due to controller hardware.

Software RAID

Uses host CPU for RAID calculations, consuming system resources.

No dedicated cache; may be vulnerable to data loss on power failure.

Configuration is done within the OS (e.g., Windows Storage Spaces, Linux mdadm).

Can be more flexible (e.g., RAID levels not supported by hardware).

Lower cost; no extra hardware required.

Watch Out for These

Mistake

RAID 0 provides redundancy because it uses multiple drives.

Correct

RAID 0 (striping) provides no redundancy. It splits data across drives to improve performance, but if any one drive fails, all data in the array is lost.

Mistake

A clicking hard drive can be fixed by running CHKDSK.

Correct

Clicking is a physical symptom of a head crash or mechanical failure. Running CHKDSK may cause further damage. The drive should be replaced and data recovered by professionals if needed.

Mistake

You can replace a failed drive in a RAID 5 array with any drive of the same interface.

Correct

The replacement drive must be at least as large as the smallest drive in the array. Ideally, it should be identical in model, capacity, and speed to avoid performance mismatches and ensure compatibility.

Mistake

SSDs do not fail; they are more reliable than HDDs.

Correct

SSDs can fail suddenly due to controller failure, NAND wear, or power loss. They have a limited number of program/erase cycles. SMART monitoring is still important.

Mistake

A degraded RAID array means data is lost.

Correct

A degraded array means one or more drives have failed but the array is still operational (if redundancy allows). Data is intact, but the array is vulnerable to another failure. The failed drive should be replaced and the array rebuilt immediately.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What should I do if my RAID 5 array is degraded?

First, identify the failed drive by checking the RAID controller status (e.g., via BIOS or management software). Replace the failed drive with a compatible one (same or larger capacity). Then initiate a rebuild from the controller utility. During rebuild, the array is vulnerable—back up critical data if possible. After rebuild, verify the array status is 'Optimal' and run a consistency check.

Can I replace a failed drive in a RAID array with a larger capacity drive?

Yes, you can use a larger capacity drive, but the extra space will not be usable until all drives in the array are replaced and the array is expanded. The replacement must be at least as large as the smallest drive in the array. For optimal performance, use identical drives.

What does a clicking noise from a hard drive mean?

A clicking noise (often called the 'click of death') indicates a physical failure, typically a head crash or spindle motor issue. The drive should be powered off immediately to prevent further damage. Data recovery may be possible by professional services, but the drive is not repairable and must be replaced.

How do I check if my SSD is failing?

Use a tool like CrystalDiskInfo (Windows) or `smartctl` (Linux) to view SMART data. Key attributes for SSDs: Media Wearout Indicator (percentage of life used), Percentage Used, Erase Fail Count, and Reallocated Block Count. If the wearout indicator is near 100% or reallocated blocks are increasing, the SSD is failing and should be replaced.

What is the difference between hardware RAID and software RAID?

Hardware RAID uses a dedicated controller card that handles all RAID calculations, offloading the CPU and often providing a cache for better performance and data protection. Software RAID is managed by the operating system (e.g., Windows Storage Spaces, Linux mdadm) and uses the host CPU, which can impact performance. Hardware RAID is generally more reliable and faster but costs more.

Can I recover data from a failed RAID 0 array?

RAID 0 has no redundancy, so if one drive fails, the entire array is lost. However, if the failure is logical (e.g., corrupted superblock) rather than physical, data recovery software may be able to reconstruct the stripe set from the remaining drives. If one drive is physically failed, professional data recovery may be possible but is expensive and not guaranteed.

What does 'SMART' stand for and why is it important?

SMART stands for Self-Monitoring, Analysis, and Reporting Technology. It is a monitoring system built into HDDs and SSDs that tracks various attributes (e.g., reallocated sectors, temperature, power-on hours) to predict drive failure. Monitoring SMART data helps identify failing drives before they cause data loss.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Troubleshoot: Storage and RAID — now see how well it sticks with free 220-1101 practice questions. Full explanations included, no account needed.

Done with this chapter?