220-1102Chapter 90 of 131Objective 3.1

Troubleshoot: Linux Common Issues

This chapter covers common Linux issues you may encounter as a CompTIA A+ technician, focusing on troubleshooting techniques for boot problems, application failures, user access issues, and system performance. Linux is widely used in servers and embedded systems, and the 220-1102 exam expects you to diagnose and resolve basic Linux problems. Approximately 5-10% of the Software Troubleshooting domain questions involve Linux, so mastering these concepts is essential for passing the exam.

25 min read
Intermediate
Updated May 31, 2026

Linux Troubleshooting: A Detective's Toolkit

Imagine you are a detective investigating a crime scene in a large office building. The building's security system (Linux kernel) keeps logs of every door opening, alarm trigger, and badge swipe. Your first step is to check the central security console (system logs via journalctl or /var/log/). If you see a door that failed to lock, that's like a service failing to start. You might use a master key (root access) to open any door, but you need to be careful not to disturb evidence. To trace a specific person's movements, you'd look at the badge reader logs (application logs). If the network is down, it's like all the phones are dead—you check the main switchboard (network configuration with ip addr or ifconfig). Sometimes a door is stuck because a piece of paper is wedged in it (a misconfigured file in /etc/). You'd compare the current door's behavior to the building's blueprints (configuration files) to see what changed. If a service is running but not responding, it's like a security guard who is awake but ignoring his radio—you might need to restart him (restart the service). The detective's toolkit includes a magnifying glass (grep), a notepad (less), and a phone to call for backup (man pages or online forums). Each clue leads you closer to the root cause, and you document every step to avoid repeating work.

How It Actually Works

Understanding Linux Troubleshooting

Linux troubleshooting involves identifying and resolving issues that prevent the system from operating correctly. Common problem categories include boot failures, service crashes, file system errors, permission problems, and network connectivity issues. The exam focuses on command-line tools and system logs to diagnose problems efficiently.

Boot Process and Common Failures

The Linux boot process follows a sequence: BIOS/UEFI -> Bootloader (GRUB) -> Kernel -> Init system (systemd) -> Services. A failure at any stage can prevent the system from starting. Common boot issues include: - Corrupted bootloader: GRUB may fail to load due to a missing configuration file (/boot/grub/grub.cfg) or a damaged Master Boot Record (MBR). You can reinstall GRUB using a live CD and commands like grub-install /dev/sda. - Kernel panic: Occurs when the kernel encounters a fatal error, often due to missing or corrupt kernel modules or hardware incompatibility. Check the kernel boot parameters or try booting with an older kernel from GRUB. - File system errors: A dirty file system (e.g., ext4) can cause boot hangs. Use fsck to repair, but note that it should be run from a live environment or single-user mode. - Init system failure: systemd may fail to start critical services. Boot into recovery mode or use systemctl to check service status.

Service and Application Issues

Services managed by systemd can fail due to configuration errors, missing dependencies, or resource limits. Key commands: - systemctl status <service>: Check if a service is running, stopped, or failed. - journalctl -u <service>: View logs for a specific service. - systemctl restart <service>: Restart a service. - systemctl enable <service>: Enable a service to start at boot. - systemctl list-units --type=service --state=failed: List all failed services.

Common causes of service failure: - Port conflicts: A service may fail to bind to a port if another service is already using it. Use netstat -tulpn or ss -tulpn to check listening ports. - Permission errors: The service user may not have read/write access to necessary files or directories. Check file ownership and permissions using ls -l. - Configuration syntax errors: Many services (e.g., Apache, Nginx) validate configuration files. Run apachectl configtest or nginx -t to check. - Resource exhaustion: Out of memory or file descriptors. Use free -h for memory, ulimit -n for file descriptor limits.

File System Issues

File system problems can cause data loss or system instability. Common issues: - Disk full: Use df -h to check disk usage. If a partition is full, find large files with du -sh /* | sort -rh. - Inode exhaustion: Even if disk space is available, lack of inodes can prevent new files. Check with df -i. - Corrupted file system: Use fsck to repair. For ext4, fsck.ext4 -f /dev/sda1 forces a check. Always unmount the partition first. - Mount failures: A mount point may fail due to incorrect /etc/fstab entries. Check with mount -a and review syslog.

User and Permission Issues

Linux enforces strict file permissions. Common problems: - Permission denied: The user lacks read/write/execute access. Use chmod to change permissions or chown to change ownership. - User cannot execute a script: The script must have execute permission (chmod +x script.sh) and the user must be able to read it. - Sudo issues: The user may not be in the sudoers file. Add them via visudo or usermod -aG wheel username (on some distributions). - Password expiration: Users may be locked out if their password has expired. Use chage -l username to check and passwd username to reset.

Network Connectivity Issues

Network problems can prevent access to services. Key diagnostic commands: - ip addr show: Display IP addresses and interface status. - ping <host>: Test basic connectivity. - traceroute <host>: Trace the path to a host. - ss -tulpn: Show listening ports and associated processes. - nslookup <domain> or dig <domain>: DNS resolution checks. - curl -I <URL>: Check HTTP service availability.

Common issues: - Interface down: Bring it up with ip link set <interface> up. - Wrong IP configuration: Check DHCP or static settings in /etc/network/interfaces or /etc/sysconfig/network-scripts/. - DNS resolution failure: Check /etc/resolv.conf for correct nameservers. - Firewall blocking: Use iptables -L or firewall-cmd --list-all (if using firewalld) to check rules.

Performance Issues

Performance problems can manifest as slow response or high load. Tools: - top or htop: View processes consuming CPU/memory. - free -h: Memory usage. - iostat: Disk I/O statistics. - vmstat: System performance. - dmesg: Kernel ring buffer messages, often revealing hardware errors.

Common causes: - CPU bottleneck: High CPU usage from a runaway process. Use top to find and kill it. - Memory leak: A process consumes increasing memory over time. Use ps aux --sort=-%mem to identify. - Disk I/O saturation: High disk wait time. Use iostat -x 1 to monitor. - Swap usage: Excessive swapping indicates low memory. Check with swapon --show.

Log Files and System Monitoring

Logs are crucial for troubleshooting. Primary logs: - /var/log/messages or /var/log/syslog: General system messages. - /var/log/auth.log: Authentication logs. - /var/log/kern.log: Kernel logs. - /var/log/boot.log: Boot messages. - /var/log/dmesg: Kernel ring buffer.

Use journalctl on systemd-based systems for unified logging. Examples: - journalctl -xe: Show recent logs with explanation. - journalctl -p err: Show errors only. - journalctl --since "1 hour ago": Show logs from last hour.

Recovery and Repair Tools

When the system is unbootable, use a live CD/USB to:

Mount the root filesystem: mount /dev/sda1 /mnt

Chroot into the system: chroot /mnt

Repair the bootloader: grub-install /dev/sda

Reset passwords: passwd root

Edit configuration files.

Common Commands Summary

lsblk: List block devices.

fdisk -l: Partition table.

blkid: UUID of filesystems.

lspci: PCI devices.

lsusb: USB devices.

uname -a: Kernel version.

cat /etc/os-release: Distribution info.

Exam Focus: Key Values and Defaults

GRUB configuration: /boot/grub/grub.cfg (or /etc/default/grub)

systemd unit files: /etc/systemd/system/ and /usr/lib/systemd/system/

Default runlevels: systemd targets (multi-user.target, graphical.target)

File system types: ext4, xfs, btrfs

Swap space: Typically 1-2 times RAM for systems with less than 2GB RAM.

Sudoers file: /etc/sudoers (edit with visudo)

Network configuration: /etc/sysconfig/network-scripts/ifcfg-* (RHEL/CentOS) or /etc/network/interfaces (Debian/Ubuntu)

Troubleshooting Methodology

1.

Identify the symptom.

2.

Gather information from logs and commands.

3.

Form a hypothesis.

4.

Test the hypothesis.

5.

Implement a fix.

6.

Verify resolution.

7.

Document the solution.

Always start with the simplest possible cause (e.g., check if the service is running) before diving into complex kernel debugging.

Walk-Through

1

Identify the Symptom

Begin by clearly defining the problem. Is the system not booting? Is a service failing? Are users unable to log in? Use precise language like 'The web server returns a 503 error' or 'The system hangs at GRUB prompt.' This step narrows down the scope and guides your initial investigation. For example, if a user cannot SSH in, the issue could be network, firewall, SSH configuration, or authentication. Document the exact error message if available.

2

Check System Logs

Logs are your best friend. Use `journalctl -xe` for recent errors or `journalctl -p err` to see only errors. For boot issues, check `dmesg` or `/var/log/boot.log`. For service issues, use `journalctl -u <service>`. For authentication, check `/var/log/auth.log`. Logs often contain the exact error message and timestamp, which helps correlate with other events. For example, a 'Permission denied' error in logs points to file permission issues.

3

Verify System Resources

Check if the system has adequate resources. Run `df -h` for disk space, `free -h` for memory, and `top` for CPU usage. A full disk can cause services to fail to write logs or create temporary files. Low memory may cause OOM (Out of Memory) killer to terminate processes. Use `dmesg | grep -i oom` to see if the OOM killer was invoked. Also check inode usage with `df -i` as inode exhaustion can prevent new files even with free space.

4

Inspect Configuration Files

Many issues stem from misconfigured files. For services, check their configuration files in `/etc/`. Use `grep -v '^#' <config>` to see non-commented lines. For network issues, check `/etc/network/interfaces` or `/etc/sysconfig/network-scripts/ifcfg-*`. For sudo issues, verify `/etc/sudoers` with `visudo`. Common errors include typos, wrong paths, or incorrect permissions on the config file itself (should be 644 usually).

5

Test Connectivity and Dependencies

If the issue involves networking, test connectivity with `ping`, `traceroute`, and `nslookup`. For services, ensure dependencies are running: e.g., a web server may depend on a database. Use `systemctl list-dependencies <service>` to see required units. Also check if the necessary ports are open with `ss -tulpn`. If a service fails to start due to a missing dependency, install the required package or start the dependency first.

6

Implement and Verify Fix

Based on your findings, apply the fix. This could be restarting a service (`systemctl restart <service>`), editing a config file, fixing permissions (`chmod`/`chown`), or freeing disk space. After the fix, verify that the symptom is resolved. For example, if you fixed a web server config, test with `curl -I http://localhost`. If the fix doesn't work, revert the change and try another hypothesis. Document the solution for future reference.

What This Looks Like on the Job

In enterprise environments, Linux servers are often deployed as web servers, database servers, or application servers. A common scenario is a web server (Apache/Nginx) that suddenly returns 502 Bad Gateway errors. The root cause could be that the backend PHP-FPM service has crashed due to a memory limit exhaustion. As a sysadmin, you would check the web server error logs (/var/log/apache2/error.log) and the PHP-FPM logs (/var/log/php-fpm.log). You might see 'WARNING: [pool www] server reached pm.max_children setting' indicating that the process manager has hit its limit. Increasing pm.max_children in /etc/php-fpm.d/www.conf and restarting PHP-FPM resolves the issue. Performance considerations include monitoring the number of concurrent connections and adjusting the pool settings accordingly. Misconfiguration, such as setting pm.max_children too high, can cause memory exhaustion on the server.

Another scenario is a database server (MySQL/MariaDB) that becomes unresponsive. The issue could be a corrupted table or a full disk. Using df -h reveals 100% disk usage on the partition containing the database files. You find large log files in /var/log/mysql/ that need to be rotated or deleted. After freeing space, you use mysqlcheck -u root -p --auto-repair --all-databases to repair corrupted tables. In production, you would set up log rotation with logrotate to prevent this in the future.

A third scenario involves user authentication issues on a central LDAP server. Users report being unable to log in to their workstations. Checking /var/log/auth.log shows 'pam_ldap: error trying to bind' indicating the LDAP server is unreachable. You use ldapsearch -x -H ldap://ldap.example.com -b dc=example,dc=com to test connectivity. If it fails, you check network connectivity and firewall rules. The issue might be that the LDAP server's IP changed without updating the client configuration in /etc/ldap.conf. Updating the configuration and restarting nscd or sssd resolves the issue.

How 220-1102 Actually Tests This

The 220-1102 exam tests Linux troubleshooting under Objective 3.1: Given a scenario, troubleshoot common software problems. Specific areas include: - Boot issues: Know how to repair GRUB, use recovery mode, and interpret kernel panics. Common wrong answer: choosing 'reinstall the kernel' when the issue is a corrupt bootloader. Candidates often confuse GRUB with LILO (legacy). The exam expects you to know that grub-install repairs the bootloader. - Service management: Understand systemd commands (systemctl, journalctl). A common trap: using service command instead of systemctl on modern distributions. The exam may present a scenario where a service fails to start due to a missing configuration file; candidates might incorrectly check rc.local instead of the unit file. - File system issues: Know fsck and its options. A wrong answer is to run fsck on a mounted partition without unmounting first, which can cause data loss. The exam emphasizes that fsck should be run from a live environment or single-user mode. - Permission problems: Understand chmod, chown, and sudo configuration. A common mistake is to use chmod 777 as a quick fix, but the exam expects proper permission settings (e.g., 755 for directories, 644 for files). Another trap: confusing the sudoers file location (/etc/sudoers) with /etc/sudo.conf. - Log analysis: Know the default log locations and how to use journalctl. Candidates often look in /var/log/messages when the system uses systemd and should use journalctl instead. - Network troubleshooting: Use ip, ss, ping, traceroute. A wrong answer is to use ifconfig (deprecated) when ip addr is the modern equivalent.

Edge cases the exam loves:

A system that boots to a GRUB prompt: You need to manually load the kernel (linux /boot/vmlinuz-... root=/dev/sda1) and boot.

A service that fails to start due to a typo in the unit file: Use systemctl daemon-reload after editing.

A user who cannot sudo because they are not in the sudoers file: Add them with visudo.

To eliminate wrong answers, focus on the underlying mechanism. For example, if a service fails to start, check logs first (not immediately reinstall). If a file cannot be read, check permissions and ownership before assuming corruption.

Key Takeaways

Use `journalctl -xe` for detailed error messages on systemd systems.

Repair a corrupted bootloader with `grub-install /dev/sda` from a live environment.

Check disk space with `df -h` and inodes with `df -i`.

Restart services with `systemctl restart <service>`.

Edit the sudoers file using `visudo` to avoid syntax errors.

Use `ip addr` instead of deprecated `ifconfig`.

Never run `fsck` on a mounted filesystem; unmount first.

Check service logs with `journalctl -u <service>`.

A kernel panic may be resolved by booting an older kernel.

Configuration files for services are typically in `/etc/`.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

systemd

Uses `systemctl` for service management.

Uses `journalctl` for logging.

Supports parallel service startup for faster boot.

Uses unit files with .service extension.

Common on RHEL 7+, Ubuntu 15.04+, Debian 8+.

SysV init

Uses `service` and `chkconfig` commands.

Logs to `/var/log/` files like syslog.

Starts services sequentially.

Uses init scripts in `/etc/init.d/`.

Common on older distributions like RHEL 5/6, Debian 7.

Watch Out for These

Mistake

Linux always uses `ifconfig` to configure network interfaces.

Correct

`ifconfig` is deprecated on many modern distributions. The `ip` command from the iproute2 package is the modern replacement. For the exam, know both but prefer `ip addr` and `ip link`.

Mistake

Running `fsck` on a mounted filesystem is safe.

Correct

Running `fsck` on a mounted filesystem can cause severe corruption. Always unmount the partition first or boot from a live CD. The exam expects you to know this safety rule.

Mistake

GRUB configuration is stored in `/boot/grub/grub.conf`.

Correct

The main configuration file is `/boot/grub/grub.cfg` (auto-generated) or `/etc/default/grub` for user settings. The legacy `grub.conf` may exist on older systems, but modern distributions use `grub.cfg`.

Mistake

The `service` command works on all Linux distributions.

Correct

The `service` command is used on SysV init systems. Modern distributions using systemd require `systemctl`. The exam tests both, but systemd is more common.

Mistake

A kernel panic always requires a full reinstall.

Correct

A kernel panic can often be resolved by booting into an older kernel from GRUB, checking hardware, or reinstalling the kernel package. A full reinstall is rarely necessary.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I fix GRUB bootloader failure in Linux?

If GRUB fails to load, boot from a live CD/USB, mount your root partition (e.g., `mount /dev/sda1 /mnt`), chroot into it (`chroot /mnt`), then reinstall GRUB with `grub-install /dev/sda`. If you need to regenerate the configuration, run `update-grub` (on Debian/Ubuntu) or `grub2-mkconfig -o /boot/grub2/grub.cfg` (on RHEL/CentOS). This restores the bootloader without reinstalling the OS.

What commands can I use to check why a service failed to start?

First, check the service status with `systemctl status <service>`. This shows if it failed and often gives a reason. For more details, use `journalctl -u <service>` to see the service's log output. You can also check `/var/log/messages` or `/var/log/syslog` for related entries. Common causes include missing dependencies, permission errors, or configuration syntax errors.

How do I troubleshoot permission denied errors in Linux?

Use `ls -l` to check file permissions and ownership. If the user doesn't have read/write/execute permission, use `chmod` to adjust (e.g., `chmod 644 file`). If ownership is wrong, use `chown user:group file`. For directories, ensure the execute bit is set for traversal. If the error occurs when running a script, ensure the script has execute permission (`chmod +x script.sh`). For sudo issues, verify the user is in the sudoers file via `visudo`.

What is the difference between `systemctl` and `service` commands?

`systemctl` is used on systems with systemd (modern distributions), while `service` is used on SysV init systems (older distributions). Both manage services, but `systemctl` offers more features like `enable`/`disable` and `daemon-reload`. On systemd systems, `service` may be aliased to `systemctl` for compatibility, but the exam expects you to know the native command. For example, `systemctl start httpd` vs `service httpd start`.

How can I recover a lost root password in Linux?

Boot into recovery mode or single-user mode. For GRUB, edit the boot parameters by pressing 'e' at the GRUB menu, find the line starting with `linux`, add `init=/bin/bash` or `rd.break` (for systemd), then boot. You will get a root shell. Remount the root filesystem as read-write (`mount -o remount,rw /`), then use `passwd` to reset the root password. After rebooting, you can log in with the new password.

What should I do if a Linux system is running out of disk space?

Use `df -h` to identify which partition is full. Then use `du -sh /* | sort -rh` to find large directories. Common culprits include log files in `/var/log`, temporary files in `/tmp`, and user home directories. You can delete unnecessary files, rotate logs with `logrotate`, or move data to another partition. If the root partition is full, you may need to boot from a live CD to free space.

How do I check if a specific port is open on a Linux server?

Use `ss -tulpn | grep :<port>` to see if a service is listening on that port. Alternatively, use `netstat -tulpn` (if installed). For external checks, use `nmap` from another machine: `nmap -p <port> <server_ip>`. If the port is not open, check if the service is running (`systemctl status <service>`) and if firewall rules are blocking it (`iptables -L` or `firewall-cmd --list-all`).

Terms Worth Knowing

Ready to put this to the test?

You've just covered Troubleshoot: Linux Common Issues — now see how well it sticks with free 220-1102 practice questions. Full explanations included, no account needed.

Done with this chapter?