SY0-701Chapter 4 of 212Objective 1.4

Hashing Algorithms

This chapter covers cryptographic hashing algorithms, a core concept in the General Security Concepts domain of the SY0-701 exam (Objective 1.4). Hashing is essential for ensuring data integrity, secure password storage, and digital signatures. Understanding how hashes work, their properties, and common attacks is critical for exam success and real-world security practice.

25 min read
Intermediate
Updated May 31, 2026

The Tamper-Evident Shipping Container

Imagine you're shipping a valuable painting across the ocean in a steel container. Before departure, you seal the container with a unique, tamper-evident lock. This lock isn't a key; it's a digital fingerprint of the entire container's contents. At the destination, the recipient recalculates that fingerprint using the same method. If even a single screw inside the container is turned, the fingerprint changes completely. The recipient knows instantly the container was tampered with. This is exactly how cryptographic hashing works: a hash function takes any input data (the painting and packing) and produces a fixed-size string (the lock's seal). If the data changes even by one bit, the hash output changes dramatically (the avalanche effect). The hash is not encryption—it's a one-way function. You cannot reverse the hash to get the original painting. Attackers exploit this by trying to find two different inputs that produce the same hash (a collision) to substitute malicious data undetected. Defenders use hashing to verify file integrity, store passwords securely (store the hash, not the password), and digitally sign documents. The mechanism is deterministic: same input always yields same hash; different input yields completely different hash. This is why hashing is fundamental to data integrity and authentication.

How It Actually Works

What is a Cryptographic Hash?

A cryptographic hash function is a mathematical algorithm that takes an input (or 'message') of arbitrary length and produces a fixed-size string of bytes, typically a digest. The output is unique to the input; even a tiny change in the input yields a completely different hash (the avalanche effect). Hashing is a one-way function—it is computationally infeasible to reverse the hash to recover the original input. This property makes hashes ideal for verifying data integrity without revealing the data itself.

Properties of a Secure Hash

For a hash function to be cryptographically secure, it must satisfy several properties:

Deterministic: The same input always produces the same hash.

Preimage Resistance: Given a hash, it should be infeasible to find any input that produces that hash.

Second Preimage Resistance: Given an input, it should be infeasible to find a different input that produces the same hash.

Collision Resistance: It should be infeasible to find any two different inputs that produce the same hash.

Avalanche Effect: A small change in input (e.g., flipping a single bit) should change about half of the output bits.

Common Hashing Algorithms

MD5 (Message Digest 5): Produces a 128-bit hash. Designed by Ron Rivest in 1991. Collisions can be generated in seconds using a standard laptop. MD5 is considered broken and should not be used for security purposes. SY0-701 expects you to know MD5 is weak.

SHA-1 (Secure Hash Algorithm 1): Produces a 160-bit hash. Developed by NSA. In 2017, Google demonstrated a practical collision attack (SHAttered). SHA-1 is deprecated and should not be used.

SHA-2 Family: Includes SHA-224, SHA-256, SHA-384, and SHA-512. SHA-256 produces a 256-bit hash, SHA-512 produces 512 bits. SHA-2 is currently secure and widely used. For SY0-701, SHA-256 is the recommended minimum.

SHA-3 Family: The latest SHA standard, released in 2015. Based on the Keccak algorithm. Offers different output sizes (SHA3-224, SHA3-256, SHA3-384, SHA3-512). SHA-3 is structurally different from SHA-2 and provides a backup in case SHA-2 is broken.

RIPEMD: A family of hash functions developed in Europe. RIPEMD-160 is the most common (160-bit output). Less widely used than SHA but still considered secure.

How Hashing Works Mechanically

Take SHA-256 as an example. The algorithm processes data in 512-bit blocks:

1.

Padding: The input message is padded so its length is congruent to 448 mod 512. The original length is appended as a 64-bit integer.

2.

Initialization: Eight 32-bit initial hash values (H0-H7) are set to specific constants.

3.

Processing: Each 512-bit block goes through 64 rounds of compression. The block is expanded into 64 32-bit words. The compression function uses bitwise operations (AND, OR, XOR, NOT), modular addition, and rotation functions.

4.

Output: After all blocks are processed, the final hash is the concatenation of H0 through H7 (256 bits total).

This process ensures that any change in input propagates through many rounds, causing the avalanche effect.

Hashing for Password Storage

Instead of storing plaintext passwords, systems store the hash. When a user logs in, the system hashes the entered password and compares it to the stored hash. To prevent rainbow table attacks (precomputed hash dictionaries), a salt is added: a random value unique to each user that is concatenated with the password before hashing. Even if two users have the same password, their salted hashes will differ.

Key stretching (e.g., bcrypt, PBKDF2, Argon2) iterates the hash function many times (e.g., 10,000 iterations) to slow down brute-force attacks. SY0-701 expects you to know that salting and key stretching are essential for secure password storage.

Hashing for Data Integrity

File integrity is verified by comparing the hash of the current file to a known good hash (e.g., from the vendor's website). Any difference indicates tampering. Tools like sha256sum (Linux) or Get-FileHash (PowerShell) compute hashes.

Digital Signatures

A digital signature uses hashing plus asymmetric encryption. The sender hashes the message and encrypts the hash with their private key (signing). The recipient decrypts the hash with the sender's public key and compares it to their own hash of the message. If they match, the message is authentic and untampered.

Common Attacks on Hashing

Collision Attack: Attacker finds two different inputs that produce the same hash. Exploited to create fraudulent certificates (e.g., Flame malware used an MD5 collision).

Preimage Attack: Attacker finds an input that produces a given hash. Infeasible for secure hashes.

Rainbow Table Attack: Precomputed hash chains for cracking passwords. Mitigated by salting.

Length Extension Attack: Given H(M) and the length of M, an attacker can compute H(M || padding || extra) without knowing M. SHA-3 and HMAC are not vulnerable; SHA-256 is vulnerable in naive use.

Real Command Examples

Computing SHA-256 hash on Linux:

sha256sum file.txt

On Windows PowerShell:

Get-FileHash file.txt -Algorithm SHA256

Verifying a downloaded file:

echo "expected_hash file.txt" | sha256sum --check

Standards and RFCs

MD5: RFC 1321

SHA-1: FIPS PUB 180-1

SHA-2: FIPS PUB 180-4

SHA-3: FIPS PUB 202

PBKDF2: RFC 2898

HMAC: RFC 2104

SY0-701 may test your knowledge of which algorithms are deprecated (MD5, SHA-1) and which are current (SHA-2, SHA-3).

Walk-Through

1

1. Generate Original Hash

The defender (e.g., software vendor) computes a hash of the original file using a secure algorithm like SHA-256. This hash serves as the integrity baseline. The defender publishes the hash on a trusted website or alongside the download. Example command: `sha256sum original_software.iso > checksum.txt`. The output is a 64-character hexadecimal string for SHA-256. This step establishes a reference point. In a SOC scenario, the hash might be stored in a secure database or signed with a digital certificate to prevent tampering. The defender must ensure the hash itself is not altered; otherwise, integrity verification is meaningless.

2

2. Attacker Modifies Data

An attacker intercepts the file during transmission (e.g., man-in-the-middle) or gains unauthorized access to the distribution server. They modify the file to insert malware, backdoors, or other malicious payloads. The modified file now has different content. The attacker may also attempt to replace the published hash with a hash of the modified file to avoid detection. However, if the hash is signed or published on a separate secure channel, the attacker cannot easily alter it. The attacker's goal is to make the modified file appear legitimate by either creating a collision (finding two inputs with same hash) or by replacing the hash. Since collision attacks are feasible for MD5 and SHA-1, the attacker might use a collision to create a malicious file that has the same hash as the original.

3

3. Recipient Verifies Integrity

The end user downloads the file and computes its hash using the same algorithm. On Linux: `sha256sum downloaded_software.iso`. On Windows: `Get-FileHash downloaded_software.iso -Algorithm SHA256`. The user compares the computed hash to the published hash (e.g., from the vendor's website). If they match, the file is intact. If they differ, the file has been tampered with. The user should reject the file and report the discrepancy. In a SOC environment, automated tools like Tripwire or OSSEC continuously monitor file integrity by comparing current hashes to stored baselines. Any mismatch triggers an alert. The analyst must investigate the cause: was it a legitimate update, a disk error, or a security breach?

4

4. Attack Exploits Weak Hash

If the defender used a weak hash algorithm like MD5 or SHA-1, the attacker may generate a collision. For example, in 2017, Google demonstrated a collision for SHA-1 (SHAttered) using 9 quintillion computations. The attacker creates two PDF files with different content (one benign, one malicious) that produce the same SHA-1 hash. The benign file is submitted for signing (e.g., code signing certificate). Once signed, the attacker swaps it with the malicious file, which has the same hash and thus the same signature. The signature remains valid, and the malicious file appears trusted. This attack undermines digital signatures and integrity verification. To mitigate, always use collision-resistant hashes like SHA-256 or SHA-3.

5

5. Defender Implements Strong Hashing

To defend against such attacks, the defender must use a secure hash algorithm (SHA-256 or higher) and protect the hash itself. Best practices include: (1) Using HMAC (Hash-based Message Authentication Code) with a secret key to prevent length extension attacks; (2) Digitally signing the hash to ensure authenticity; (3) Publishing hashes over HTTPS and on multiple trusted sources; (4) Using key stretching algorithms (bcrypt, PBKDF2, Argon2) for password hashing with a unique salt per user; (5) Regularly updating hashing algorithms as older ones become deprecated. In a SOC, analysts should verify that all critical files and configurations are hashed using SHA-256 or SHA-3 and that baseline hashes are stored in a tamper-proof manner (e.g., read-only media or signed database).

What This Looks Like on the Job

Scenario 1: Software Supply Chain Attack

A SOC analyst at a large enterprise receives an alert from the file integrity monitoring (FIM) system: the hash of a critical application binary has changed. The analyst pulls the current hash using sha256sum and compares it to the baseline stored in a signed database. The hashes do not match. The analyst isolates the affected system and investigates the change. The vendor's website shows the original hash, but the downloaded file from the company's internal mirror has a different hash. Further investigation reveals the mirror was compromised, and the binary was replaced with a trojanized version. The analyst blocks the malicious hash across the network using endpoint detection and response (EDR) tools and initiates incident response. Common mistake: ignoring the alert, assuming it was a legitimate software update. The correct response is to always verify the change against the vendor's official hash and investigate any discrepancy.

Scenario 2: Password Hash Dump

A penetration tester recovers a password hash file from a compromised server. The hashes are unsalted MD5. Using a rainbow table, the tester cracks 80% of the passwords within hours. The report recommends migrating to salted SHA-256 or bcrypt. The IT team implements salting with a 16-byte random salt per user and uses PBKDF2 with 100,000 iterations. After migration, even if the hash file is stolen again, cracking becomes computationally infeasible. The SOC should monitor for any use of weak hashing algorithms in the environment and enforce strong password policies.

Scenario 3: Digital Signature Verification Failure

An email security gateway receives a digitally signed email. The gateway computes the hash of the email body and compares it to the decrypted hash from the signature. They match, so the email is considered authentic. However, the gateway uses SHA-1 for the hash. An attacker could have used a collision attack to create a malicious email with the same SHA-1 hash as a benign one. The gateway would accept the malicious email. The fix is to configure the gateway to reject SHA-1 signatures and require SHA-256 or higher. The SOC should audit all cryptographic configurations to ensure compliance with current standards.

How SY0-701 Actually Tests This

What SY0-701 Tests

Objective 1.4 (Compare and contrast basic concepts of cryptography) includes hashing under integrity concepts. You must know:

The difference between hashing and encryption (one-way vs two-way).

Properties: collision resistance, preimage resistance, avalanche effect.

Specific algorithms: MD5 (weak, 128-bit), SHA-1 (weak, 160-bit), SHA-2 (secure, 256/384/512-bit), SHA-3 (secure).

Use cases: data integrity, password storage, digital signatures.

Salting and key stretching (bcrypt, PBKDF2, Argon2).

Common attacks: collision, rainbow table, length extension.

Common Wrong Answers

1.

"Hashing is a form of encryption." Many candidates confuse hashing with encryption. Encryption is reversible with a key; hashing is one-way. The exam expects you to know they are different.

2.

"MD5 is still secure for most purposes." MD5 is broken; collisions can be generated in seconds. Never choose MD5 as a secure option.

3.

"SHA-1 is acceptable for digital signatures." SHA-1 is deprecated; the SHAttered attack proved collision feasibility. The exam will consider SHA-1 insecure.

4.

"Salting is unnecessary if you use a strong hash." Salting prevents rainbow table attacks regardless of hash strength. Always salt passwords.

Specific Terms and Values

MD5: 128-bit output

SHA-1: 160-bit output

SHA-256: 256-bit output

SHA-512: 512-bit output

RIPEMD-160: 160-bit output

HMAC: Hash-based Message Authentication Code (uses a secret key)

PBKDF2, bcrypt, Argon2: key stretching algorithms

Trick Questions

A question may describe a scenario where a file's hash matches, but the file is malicious. This is a collision attack. The correct answer is to use a stronger hash algorithm (e.g., SHA-256 instead of MD5).

A question may ask about "reversing a hash." The correct answer is that it's infeasible for secure hashes; attackers use brute force or rainbow tables, not reversal.

"Which algorithm provides the best integrity?" Choose SHA-256 or SHA-3, not MD5 or SHA-1.

Decision Rule

On scenario questions involving hashing, first identify the use case (integrity, password storage, digital signature). Then evaluate the algorithm: if it's MD5 or SHA-1, it's likely the weak link. Look for missing salt or lack of key stretching for passwords. The correct answer will involve replacing weak algorithms with SHA-256 or SHA-3 and adding salt/key stretching for passwords.

Key Takeaways

Hashing is a one-way function used for integrity and password storage, not confidentiality.

MD5: 128-bit, broken, never use. SHA-1: 160-bit, deprecated, avoid. SHA-256/512: secure, use them.

Salting adds a random value to each password before hashing to prevent rainbow table attacks.

Key stretching (bcrypt, PBKDF2, Argon2) iterates hashing to slow brute-force attacks.

Digital signatures combine hashing with asymmetric encryption to provide integrity and non-repudiation.

Collision attacks find two inputs with the same hash; preimage attacks find an input for a given hash.

HMAC (Hash-based Message Authentication Code) uses a secret key to prevent length extension attacks.

Always verify file integrity by comparing hashes from a trusted source (e.g., vendor website over HTTPS).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Hashing (SHA-256)

One-way function: cannot reverse output to input

No key required; output is deterministic

Used for integrity verification and password storage

Fixed output size (256 bits for SHA-256)

Vulnerable to collision and preimage attacks

Encryption (AES-256)

Two-way function: reversible with correct key

Requires a secret key for encryption and decryption

Used for confidentiality of data

Output size equals input size (with padding)

Vulnerable to brute-force and side-channel attacks

MD5

128-bit hash output

Collision attacks feasible in seconds

Deprecated and insecure

Used historically for file integrity

Not suitable for digital signatures

SHA-256

256-bit hash output

Collision resistant (no practical attacks)

Currently secure and widely used

Recommended for file integrity and signatures

FIPS 180-4 compliant

Plain Hashing (SHA-256)

No secret key used

Vulnerable to length extension attacks

Same hash for same input every time

Used for public integrity verification

Simple to compute

Keyed Hashing (HMAC-SHA256)

Uses a secret key concatenated with the message

Resistant to length extension attacks

Output depends on both message and key

Used for message authentication (MAC)

Provides authenticity in addition to integrity

Watch Out for These

Mistake

Hashing and encryption are the same thing.

Correct

Hashing is a one-way function that produces a fixed-size output from arbitrary input. Encryption is a two-way function that requires a key to encrypt and decrypt. Hashing is not reversible; encryption is.

Mistake

MD5 is still secure enough for non-critical applications.

Correct

MD5 is cryptographically broken. Collisions can be generated in under a second on modern hardware. MD5 should never be used for security purposes, regardless of perceived criticality.

Mistake

SHA-1 is still acceptable if you trust the source.

Correct

SHA-1 is deprecated due to practical collision attacks (e.g., SHAttered). Using SHA-1 for integrity verification or digital signatures is insecure. Use SHA-256 or higher.

Mistake

Salting passwords is optional if you use a strong hash like SHA-256.

Correct

Salting is essential regardless of hash strength. Without salt, identical passwords produce identical hashes, enabling rainbow table attacks. Salt ensures each hash is unique.

Mistake

A hash collision means the hash algorithm is completely broken.

Correct

All hash functions have collisions due to the pigeonhole principle (infinite inputs, finite outputs). A secure hash makes collisions computationally infeasible to find. When collisions can be found efficiently (as with MD5 and SHA-1), the algorithm is considered broken.

Frequently Asked Questions

What is the difference between hashing and encryption?

Hashing is a one-way function that converts data into a fixed-size digest that cannot be reversed. Encryption is a two-way function that uses a key to transform data into ciphertext that can be decrypted back to plaintext. Hashing ensures integrity; encryption ensures confidentiality. On the exam, if the scenario involves making data unreadable without a key, it's encryption. If it's about verifying that data hasn't changed, it's hashing.

Why is MD5 considered insecure?

MD5 is insecure because collision attacks can be performed in seconds on modern hardware. A collision means two different inputs produce the same hash. This allows an attacker to substitute a malicious file that has the same MD5 hash as a legitimate one. For SY0-701, remember that MD5 is broken and should never be used for security purposes. Always choose SHA-256 or higher.

What is a salt and why is it important?

A salt is a random value added to a password before hashing. It ensures that even if two users have the same password, their hashes will be different. Salting prevents rainbow table attacks, where an attacker uses precomputed hash-to-password mappings. On the exam, if a password storage scenario lacks a salt, that is a vulnerability. The correct answer will include using a unique salt per user.

What is the avalanche effect in hashing?

The avalanche effect means that a small change in the input (e.g., flipping a single bit) causes approximately half of the output bits to change. This property is crucial because it makes the hash unpredictable and ensures that similar inputs produce completely different hashes. The exam may test this concept in the context of strong hash function properties.

What is a length extension attack?

A length extension attack exploits the Merkle-Damgård construction used in MD5, SHA-1, and SHA-2. Given `H(M)` and the length of `M`, an attacker can compute `H(M || padding || extra)` without knowing `M`. This can break naive message authentication schemes. HMAC and SHA-3 are not vulnerable. For SY0-701, know that HMAC mitigates this attack.

Which hashing algorithm should I use for password storage?

For password storage, use a key stretching algorithm like bcrypt, PBKDF2, or Argon2, not a plain hash like SHA-256. These algorithms are designed to be slow, making brute-force attacks infeasible. Always include a unique salt per user. On the exam, if the answer choices include bcrypt or PBKDF2, they are likely correct for password storage scenarios.

What is the difference between SHA-2 and SHA-3?

SHA-2 (SHA-256/512) is based on the Merkle-Damgård structure, while SHA-3 is based on the Keccak sponge construction. SHA-3 is the latest standard (FIPS 202) and is structurally different, providing a backup in case SHA-2 is broken. Both are currently secure. For the exam, know that SHA-3 is the newer standard but SHA-2 is still widely used and acceptable.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Hashing Algorithms — now see how well it sticks with free SY0-701 practice questions. Full explanations included, no account needed.

Done with this chapter?