This chapter covers the signs of hard drive failure, a critical topic for the CompTIA A+ 220-1101 exam under Objective 5.2 (Given a scenario, troubleshoot common hardware problems). Hard drive failures are among the most common hardware issues, and the exam expects you to identify symptoms, interpret diagnostic tools, and recommend appropriate actions. Approximately 10-15% of hardware troubleshooting questions may touch on hard drive failure signs, so mastering this topic is essential for passing the exam.
Jump to a section
A hard drive is like a light bulb that has a limited lifespan. When a bulb is new, it shines brightly and consistently. Over time, the filament weakens, and you may notice flickering — the bulb briefly dims or brightens erratically. This flickering is analogous to a hard drive developing bad sectors or experiencing read/write errors. Just as a flickering bulb is a sign that failure is imminent, a hard drive with increasing bad sectors or S.M.A.R.T. errors is warning you. Eventually, the bulb will burn out completely, just as a hard drive will fail catastrophically. The key is to recognize the flickering — the early warning signs — and replace the bulb (or drive) before total failure. In both cases, the failure is often gradual, with intermittent symptoms that worsen over time. Ignoring the flickering leads to sudden darkness; ignoring S.M.A.R.T. warnings leads to data loss.
What is Hard Drive Failure and Why It Matters
Hard drive failure refers to the inability of a hard disk drive (HDD) or solid-state drive (SSD) to reliably store or retrieve data. For HDDs, this often involves mechanical breakdown of moving parts; for SSDs, it involves wear on NAND flash memory cells. The exam focuses on identifying the signs of impending failure so that data can be backed up and the drive replaced before total data loss occurs. Understanding these signs is crucial for any IT professional.
How Hard Drives Work Internally
HDDs: An HDD consists of spinning platters coated with magnetic material, read/write heads on an actuator arm, and a spindle motor. Data is stored magnetically in concentric tracks divided into sectors. When the drive is powered on, the platters spin at a constant speed (e.g., 5400 or 7200 RPM). The heads float nanometers above the platters on a cushion of air. To read or write data, the actuator arm moves the heads to the correct track, and the heads read or write magnetic flux changes.
SSDs: An SSD uses NAND flash memory chips arranged in pages and blocks. Data is written in pages, but erased in blocks. The SSD controller manages wear leveling, garbage collection, and error correction. Unlike HDDs, SSDs have no moving parts, making them more resistant to physical shock but susceptible to write endurance limits (e.g., 100 TBW for a typical consumer SSD).
Key Components and Their Failure Modes
Spindle Motor (HDD): Can seize or fail to spin up, causing a clicking noise or no spin. The drive may be detected but not accessible.
Read/Write Heads (HDD): Can crash into the platter, causing a screeching sound and permanent data loss. Head crashes often result from physical shock.
Platters (HDD): Can develop scratches or defects (bad sectors). Bad sectors are areas that cannot reliably store data. The drive firmware marks them as unusable.
Actuator Arm (HDD): Can become stuck or fail to move, resulting in a ticking sound (the arm trying to move but failing).
NAND Flash (SSD): Cells can wear out after a limited number of program/erase cycles (e.g., 3000-100000 cycles depending on type: SLC, MLC, TLC, QLC). When cells wear out, the SSD may become read-only or fail completely.
Controller (Both): The drive controller can fail due to electrical issues, firmware bugs, or physical damage. This may cause the drive not to be recognized or to behave erratically.
Capacitors (SSD): Used to flush write cache during power loss. Failure can lead to data corruption.
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology)
S.M.A.R.T. is a monitoring system built into most modern drives (both HDD and SSD). It tracks various attributes and can predict imminent failure. Key S.M.A.R.T. attributes for HDDs: - Raw Read Error Rate: Increases when the drive has difficulty reading data from the platter. - Spin-Up Time: Time taken for the platters to reach operating speed. Increasing values indicate bearing wear. - Reallocated Sectors Count: Number of sectors that have been remapped to spare sectors. A high or rapidly increasing count indicates a failing platter. - Current Pending Sector Count: Number of sectors that are unstable and may be reallocated. These are sectors that the drive has had trouble reading and is waiting to reallocate if the read succeeds. - Uncorrectable Sector Count: Number of sectors that could not be read and are considered permanently damaged.
For SSDs, S.M.A.R.T. attributes include: - Media Wearout Indicator: Percentage of NAND life used (0% = new, 100% = worn out). - Total LBAs Written: Total data written to the drive, used to estimate remaining life. - Reallocated Sectors Count: Same as HDD, but refers to NAND blocks that have been remapped. - Power-On Hours: Total time the drive has been powered on.
S.M.A.R.T. can be checked using tools like smartctl (Linux), CrystalDiskInfo (Windows), or Disk Utility (macOS). The drive firmware compares attribute values against thresholds; if a value falls below the threshold, the drive reports a S.M.A.R.T. failure. However, S.M.A.R.T. is not foolproof — some drives fail without warning.
Common Symptoms of Hard Drive Failure
Unusual Noises (HDD): Clicking, grinding, screeching, or whining sounds. Clicking often indicates the actuator arm repeatedly trying to move. Grinding suggests head contact with platters. These are mechanical failure signs.
Slow Performance: Drives with bad sectors or failing controllers may take longer to read/write data. You may notice file access delays, boot slowdowns, or system freezes.
Frequent Crashes or Blue Screen of Death (BSOD): Corrupted data from bad sectors can cause operating system crashes. Common BSOD codes include UNEXPECTED_KERNEL_MODE_TRAP, CRITICAL_PROCESS_DIED, and KERNEL_DATA_INPAGE_ERROR.
File System Errors: Errors like "The file or directory is corrupted and unreadable" or "Cyclic redundancy check (CRC) error" indicate data integrity problems.
Missing Files or Folders: Data may disappear or become inaccessible. This can be due to bad sectors corrupting the file system index.
Drive Not Detected: The BIOS or OS fails to see the drive. This could be due to a dead controller, failed motor, or complete NAND failure.
Boot Failure: The system may hang at POST or display "No boot device found." The drive may spin up but not be recognized.
S.M.A.R.T. Warnings: The system or diagnostic tool reports an imminent failure. This is a proactive warning.
Diagnostic Tools and Commands
Windows:
- CHKDSK: Checks for file system errors and bad sectors. Run chkdsk C: /f /r to fix errors and locate bad sectors. /f fixes file system errors, /r finds bad sectors and recovers readable data.
- S.M.A.R.T. Status: Check in Disk Management or use wmic diskdrive get status in Command Prompt. The status should be "OK".
- Performance Monitor: Can track disk queue length, latency, and throughput. High disk queue length (>2 per disk) indicates congestion.
- Event Viewer: Look for disk-related errors with Event ID 7 (bad block), 50 (paging error), or 153 (disk timeout).
Linux:
- smartctl: smartctl -a /dev/sda shows all S.M.A.R.T. attributes. smartctl -H /dev/sda shows overall health status.
- badblocks: badblocks -v /dev/sda scans for bad sectors. Use with caution; it can stress a failing drive.
- fsck: Checks file system integrity. fsck /dev/sda1.
- dmesg: Kernel messages may show I/O errors, such as ata1.00: failed command: READ DMA.
macOS:
- Disk Utility: First Aid can verify and repair disk permissions and file system. It also shows S.M.A.R.T. status (if supported).
- smartctl: Install via Homebrew (brew install smartmontools) and run smartctl -a /dev/disk0.
How Failure Signs Interact with Other Components
RAID Arrays: In a RAID 1 or RAID 5, a single drive failure does not cause data loss, but the array may degrade. Symptoms include degraded array status, slower performance, and alerts from the RAID controller. The failed drive must be replaced promptly.
Operating System: A failing drive can cause OS crashes, file corruption, and boot failures. Backup and replacement are critical.
Backup Software: Backup may fail or produce incomplete backups due to read errors. This is often the first sign of a failing drive.
Virtualization: In a hypervisor, a failing drive in a datastore can cause VM performance issues, snapshot failures, or VM crashes.
Best Practices for Handling Hard Drive Failure
Backup Immediately: If you suspect failure, back up critical data to another medium. Use imaging tools like Clonezilla or ddrescue for failing drives.
Run Diagnostics: Use the manufacturer's diagnostic tool (e.g., SeaTools for Seagate, Data Lifeguard for WD) for thorough testing.
Check S.M.A.R.T. Data: Use smartctl or a GUI tool to review attributes. Pay attention to reallocated sectors, pending sectors, and uncorrectable errors.
Replace the Drive: If the drive is under warranty, initiate an RMA. Otherwise, replace with a new drive.
Secure Erase: If disposing of the drive, use hdparm --security-erase or manufacturer utility to sanitize data.
Exam Tips
The exam often asks about "clicking noise" as a sign of mechanical failure in HDDs.
"S.M.A.R.T. failure" is a key term — know that it indicates the drive predicts failure.
"Bad sectors" are not always fatal; the drive can reallocate them, but a growing count is bad.
"Slow performance" can be caused by many things, but when combined with other symptoms, point to drive failure.
SSDs can fail without warning; they may become read-only or disappear entirely.
CHKDSK with /r is the Windows command to find bad sectors and recover data.
The exam may present scenarios where you must choose between drive failure and other issues (e.g., loose cables, incorrect boot order). Always check cables first, then S.M.A.R.T. status, then run diagnostics.
Identify Unusual Noises
Listen for sounds like clicking, grinding, or screeching from the drive. Clicking often indicates the actuator arm repeatedly trying to move but failing, possibly due to a stuck pivot or damaged voice coil. Grinding suggests the read/write head is contacting the platter, which can cause scratches and data loss. Screeching may indicate bearing failure in the spindle motor. These mechanical symptoms are almost exclusively HDD issues; SSDs are silent. If you hear these sounds, immediately power down to prevent further damage and attempt data recovery from a professional service if data is critical. The exam expects you to recognize clicking as a sign of imminent HDD failure.
Check S.M.A.R.T. Status
Use a tool like smartctl or CrystalDiskInfo to read S.M.A.R.T. attributes. Look for the overall health status (PASS/FAIL) and specific attributes: Reallocated Sectors Count (ID 5), Current Pending Sector Count (ID 197), Uncorrectable Sector Count (ID 198). A non-zero or rapidly increasing value indicates physical damage. For SSDs, check Media Wearout Indicator (ID 231) for remaining life. S.M.A.R.T. is not perfect — some drives fail without warning — but a failure status is a strong indicator. The exam may present a scenario where S.M.A.R.T. shows a warning; the correct action is to back up data and replace the drive.
Run CHKDSK or fsck
On Windows, run `chkdsk C: /f /r` from an elevated command prompt. The `/r` parameter locates bad sectors and recovers readable data. The scan may take hours and can stress a failing drive. On Linux, use `fsck -y /dev/sda1` to repair file system errors. These tools can fix logical errors but cannot repair physical damage. If CHKDSK reports many bad sectors or hangs, the drive is likely failing. Note that CHKDSK should be run from a recovery environment if the drive is the system drive. The exam may ask what command to use to check for bad sectors — the answer is `chkdsk /r`.
Check Event Logs
In Windows, open Event Viewer and navigate to Windows Logs > System. Look for disk-related errors with Event ID 7 (bad block), 11 (driver error), or 153 (disk timeout). These events indicate I/O errors that may be caused by failing hardware. Also check the Application logs for data corruption errors. In Linux, run `dmesg | grep -i error` to see kernel-level drive errors. Event logs can confirm that symptoms like crashes or slow performance are due to disk issues. The exam may test your ability to interpret Event IDs — know that Event ID 7 indicates a bad sector.
Test with Manufacturer Tool
Download the hard drive manufacturer's diagnostic tool (e.g., SeaTools for Seagate, Data Lifeguard Diagnostic for WD, SSD Toolbox for Intel). These tools perform proprietary tests like short DST (Drive Self Test) and long DST. A short test checks the drive's basic functionality; a long test scans every sector. If the tool reports a failure, the drive should be replaced under warranty if applicable. Manufacturer tools are more thorough than generic S.M.A.R.T. readers. The exam may not test specific manufacturer tools, but you should know that they exist and are more reliable than OS-level tools for confirming failure.
Backup and Replace
If any diagnostic indicates failure, immediately back up critical data to another medium. Use a tool like ddrescue on Linux to clone a failing drive sector by sector, skipping bad areas. Then replace the drive with a new one. For SSDs, consider that they may become read-only when near end-of-life, so backup promptly. After replacement, verify the backup is intact. The exam emphasizes that the first step after identifying failure signs is to back up data, then replace the drive. Do not attempt to repair a physically failing drive — it is unreliable.
Enterprise Scenario 1: RAID Array with Failing Drive
In a data center, a server uses RAID 5 with four 1TB HDDs. The monitoring system alerts that one drive has a high Reallocated Sectors Count (over 1000). The array is still operational, but performance has degraded because the RAID controller must reconstruct data from parity for every read to the failing drive. The administrator checks S.M.A.R.T. data, confirms the drive is failing, and hot-swaps it with a new drive. The RAID controller automatically rebuilds the array (takes about 4 hours for 1TB). The failing drive is then securely erased and RMA'd. Misconfiguration: If the administrator had ignored the warning, the drive might fail completely, causing the array to enter a degraded state with risk of data loss if another drive fails during rebuild.
Enterprise Scenario 2: SSD Wear in a SQL Database Server
A high-transaction SQL server uses a 1TB NVMe SSD with a Media Wearout Indicator at 95% (only 5% life remaining). The drive has been in service for 3 years. The database is write-intensive, causing high write amplification. The administrator uses smartctl to check Total LBAs Written (e.g., 500 TB written). The SSD's TBW rating is 600 TB, so it is near end-of-life. The administrator schedules a maintenance window to migrate the database to a new SSD with higher endurance (e.g., 1.5 PBW). Failure to replace could result in the SSD becoming read-only mid-transaction, causing database corruption.
Enterprise Scenario 3: Desktop with Clicking Drive
An employee reports that their desktop PC makes a clicking noise and occasionally freezes. The IT technician backs up the user's documents to a network share, then runs CHKDSK /r, which finds 50 bad sectors. The technician then replaces the HDD with an SSD (a common upgrade). The old drive is destroyed for data security. The technician notes that the clicking was the actuator arm failing. Common mistake: The user thought the noise was from a fan, leading to delayed action and potential data loss.
Exam Objective: 5.2 Given a scenario, troubleshoot common hardware problems.
This objective includes hard drive failure symptoms. The exam tests your ability to identify symptoms and select the correct troubleshooting step. Key areas:
1. Symptoms: - Grinding noise → HDD mechanical failure. - Clicking noise → Actuator arm failure. - S.M.A.R.T. failure → Drive predicts imminent failure. - Slow performance → Could be many things, but with other symptoms points to drive. - BSOD with KERNEL_DATA_INPAGE_ERROR → Drive read failure. - Drive not detected → Possible controller failure or power issue.
2. Common Wrong Answers: - "Replace the drive immediately without backing up." (Wrong: always back up first if possible.) - "Run CHKDSK without parameters." (Wrong: need /r to scan for bad sectors.) - "Defragment the drive." (Wrong: defragmentation does not fix bad sectors and can stress a failing drive.) - "Check the cables first." (Partially correct: always check cables, but if symptoms point to failure, diagnostics are needed.) - "The drive is fine because S.M.A.R.T. status is PASS." (Wrong: S.M.A.R.T. can miss failures.)
3. Specific Numbers and Terms: - S.M.A.R.T. attribute thresholds: Reallocated Sectors raw value > 0 is concerning. - CHKDSK /r: the exact command. - Event ID 7: bad block. - TBW (Total Bytes Written) for SSDs. - 5400/7200 RPM for HDDs.
4. Edge Cases: - SSD failure without warning: S.M.A.R.T. may show OK, then drive dies. Always have backups. - Intermittent failure: Drive works sometimes, fails at other times. This can be due to thermal issues or bad sectors that are only accessed occasionally. - RAID: A single drive failure in RAID 1 or 5 does not cause data loss, but the array is degraded. The exam may ask what to do when a drive in a RAID array fails.
5. Eliminate Wrong Answers: - If the question mentions "clicking noise," eliminate any answer that suggests software fix or cable check as the primary solution. The correct answer is to back up and replace. - If the question says "S.M.A.R.T. failure," eliminate answers that say "ignore it" or "run CHKDSK first." The correct answer is to back up and replace. - If the question describes "slow performance with no other symptoms," drive failure is less likely; check other causes like fragmentation, insufficient RAM, or malware first.
Exam Tip: Always remember the order: back up data first, then diagnose, then replace. The exam loves to test this sequence.
Clicking or grinding noise from an HDD indicates mechanical failure; back up and replace immediately.
S.M.A.R.T. failure status means the drive predicts imminent failure; do not ignore.
CHKDSK /r is the Windows command to scan for bad sectors and recover data.
Event ID 7 in Windows indicates a bad block on the disk.
SSDs can fail without warning; always maintain backups.
The first step when drive failure is suspected is to back up critical data.
Defragmentation does not fix bad sectors and can harm a failing drive.
Manufacturer diagnostic tools (e.g., SeaTools, Data Lifeguard) are more reliable than OS tools.
In a RAID array, a single drive failure does not cause data loss but degrades the array.
SSD wear is measured in TBW (Total Bytes Written); exceeding TBW may cause failure.
These come up on the exam all the time. Here's how to tell them apart.
HDD (Hard Disk Drive)
Uses spinning platters and read/write heads.
Failure often preceded by mechanical noises (clicking, grinding).
S.M.A.R.T. attributes focus on reallocated sectors, spin-up time.
Susceptible to physical shock and vibration.
Slower random access due to seek time and rotational latency.
SSD (Solid-State Drive)
Uses NAND flash memory, no moving parts.
Failure often silent; drive may become read-only or disappear.
S.M.A.R.T. attributes focus on media wearout, total bytes written.
More resistant to physical shock but limited write endurance.
Fast random access; no seek time.
Mistake
Defragmenting a failing hard drive can fix bad sectors.
Correct
Defragmentation rearranges files to contiguous sectors to improve performance, but it does not repair physical defects. Running defrag on a failing drive can actually stress the drive further, potentially causing more damage. The correct action is to back up data and replace the drive.
Mistake
A clicking noise from a hard drive is normal and can be ignored.
Correct
Clicking is never normal for a healthy HDD. It indicates a mechanical problem, typically the actuator arm repeatedly trying to move but failing (often called the "click of death"). This is a sign of imminent failure and requires immediate backup and replacement.
Mistake
SSDs never fail, so they don't need backups.
Correct
SSDs can fail due to NAND wear, controller failure, or electrical issues. Unlike HDDs, they often fail without warning (e.g., sudden inability to write or complete failure). Regular backups are essential for both HDDs and SSDs.
Mistake
If S.M.A.R.T. status is PASS, the drive is perfectly healthy.
Correct
S.M.A.R.T. is not infallible. Some drives fail without any S.M.A.R.T. warning. Additionally, S.M.A.R.T. thresholds vary by manufacturer. A PASS status does not guarantee the drive is healthy; other symptoms should still be investigated.
Mistake
Running CHKDSK /r will repair physical bad sectors.
Correct
CHKDSK /r identifies bad sectors and marks them as unusable, but it cannot repair physical damage. It also attempts to recover readable data from those sectors. The underlying physical defect remains; the drive will continue to develop more bad sectors over time.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
A clicking noise from an HDD typically indicates a mechanical failure, often called the "click of death." It usually means the actuator arm is repeatedly trying to move but failing, possibly due to a stuck pivot or damaged voice coil. This is a sign of imminent drive failure. You should immediately back up any accessible data and replace the drive. Do not ignore clicking noises.
Yes, S.M.A.R.T. is not 100% reliable. Some drives fail suddenly without any prior S.M.A.R.T. alerts. This is more common with SSDs, which can have sudden controller failures. For HDDs, mechanical failures like head crashes can happen without warning. Therefore, regular backups are essential even if S.M.A.R.T. status is PASS.
CHKDSK /f fixes logical file system errors only (e.g., incorrect file sizes, orphaned clusters). CHKDSK /r does everything /f does, plus it scans the entire disk for bad sectors and attempts to recover readable data from them. /r is more thorough but takes longer. For diagnosing hard drive failure, use /r.
You can check S.M.A.R.T. status using the command `wmic diskdrive get status` in Command Prompt. This returns "OK", "Bad", or "Unknown". For detailed attributes, use third-party tools like CrystalDiskInfo or HDDScan. In Windows 10/11, you can also use the built-in PowerShell command `Get-PhysicalDisk | Select-Object *` to see operational status.
The Media Wearout Indicator (typically S.M.A.R.T. attribute 231) shows the percentage of NAND life used. If it is high (e.g., 95% or more), the SSD is nearing end-of-life. Back up all data immediately and plan to replace the SSD. The drive may become read-only or fail completely soon.
Yes, a faulty SATA cable can cause intermittent disconnections, slow performance, and even CRC errors, mimicking a failing drive. Always check and reseat cables before concluding the drive is failing. However, if symptoms persist with a known good cable, the drive is likely failing. The exam may include cable issues as a distractor.
A bad sector is a sector that has been verified as defective and cannot be read or written. The drive firmware remaps it to a spare sector. A pending sector is a sector that the drive suspects might be bad but has not yet confirmed; it may be reallocated if a future read fails. A high count of pending sectors often precedes reallocated sectors.
You've just covered Troubleshoot: Hard Drive Failure Signs — now see how well it sticks with free 220-1101 practice questions. Full explanations included, no account needed.
Done with this chapter?