This chapter covers cryptographic hashing algorithms, a core concept in the General Security Concepts domain of the SY0-701 exam (Objective 1.4). Hashing is essential for ensuring data integrity, secure password storage, and digital signatures. Understanding how hashes work, their properties, and common attacks is critical for exam success and real-world security practice.
Jump to a section
Imagine you're shipping a valuable painting across the ocean in a steel container. Before departure, you seal the container with a unique, tamper-evident lock. This lock isn't a key; it's a digital fingerprint of the entire container's contents. At the destination, the recipient recalculates that fingerprint using the same method. If even a single screw inside the container is turned, the fingerprint changes completely. The recipient knows instantly the container was tampered with. This is exactly how cryptographic hashing works: a hash function takes any input data (the painting and packing) and produces a fixed-size string (the lock's seal). If the data changes even by one bit, the hash output changes dramatically (the avalanche effect). The hash is not encryption—it's a one-way function. You cannot reverse the hash to get the original painting. Attackers exploit this by trying to find two different inputs that produce the same hash (a collision) to substitute malicious data undetected. Defenders use hashing to verify file integrity, store passwords securely (store the hash, not the password), and digitally sign documents. The mechanism is deterministic: same input always yields same hash; different input yields completely different hash. This is why hashing is fundamental to data integrity and authentication.
What is a Cryptographic Hash?
A cryptographic hash function is a mathematical algorithm that takes an input (or 'message') of arbitrary length and produces a fixed-size string of bytes, typically a digest. The output is unique to the input; even a tiny change in the input yields a completely different hash (the avalanche effect). Hashing is a one-way function—it is computationally infeasible to reverse the hash to recover the original input. This property makes hashes ideal for verifying data integrity without revealing the data itself.
Properties of a Secure Hash
For a hash function to be cryptographically secure, it must satisfy several properties:
Deterministic: The same input always produces the same hash.
Preimage Resistance: Given a hash, it should be infeasible to find any input that produces that hash.
Second Preimage Resistance: Given an input, it should be infeasible to find a different input that produces the same hash.
Collision Resistance: It should be infeasible to find any two different inputs that produce the same hash.
Avalanche Effect: A small change in input (e.g., flipping a single bit) should change about half of the output bits.
Common Hashing Algorithms
MD5 (Message Digest 5): Produces a 128-bit hash. Designed by Ron Rivest in 1991. Collisions can be generated in seconds using a standard laptop. MD5 is considered broken and should not be used for security purposes. SY0-701 expects you to know MD5 is weak.
SHA-1 (Secure Hash Algorithm 1): Produces a 160-bit hash. Developed by NSA. In 2017, Google demonstrated a practical collision attack (SHAttered). SHA-1 is deprecated and should not be used.
SHA-2 Family: Includes SHA-224, SHA-256, SHA-384, and SHA-512. SHA-256 produces a 256-bit hash, SHA-512 produces 512 bits. SHA-2 is currently secure and widely used. For SY0-701, SHA-256 is the recommended minimum.
SHA-3 Family: The latest SHA standard, released in 2015. Based on the Keccak algorithm. Offers different output sizes (SHA3-224, SHA3-256, SHA3-384, SHA3-512). SHA-3 is structurally different from SHA-2 and provides a backup in case SHA-2 is broken.
RIPEMD: A family of hash functions developed in Europe. RIPEMD-160 is the most common (160-bit output). Less widely used than SHA but still considered secure.
How Hashing Works Mechanically
Take SHA-256 as an example. The algorithm processes data in 512-bit blocks:
Padding: The input message is padded so its length is congruent to 448 mod 512. The original length is appended as a 64-bit integer.
Initialization: Eight 32-bit initial hash values (H0-H7) are set to specific constants.
Processing: Each 512-bit block goes through 64 rounds of compression. The block is expanded into 64 32-bit words. The compression function uses bitwise operations (AND, OR, XOR, NOT), modular addition, and rotation functions.
Output: After all blocks are processed, the final hash is the concatenation of H0 through H7 (256 bits total).
This process ensures that any change in input propagates through many rounds, causing the avalanche effect.
Hashing for Password Storage
Instead of storing plaintext passwords, systems store the hash. When a user logs in, the system hashes the entered password and compares it to the stored hash. To prevent rainbow table attacks (precomputed hash dictionaries), a salt is added: a random value unique to each user that is concatenated with the password before hashing. Even if two users have the same password, their salted hashes will differ.
Key stretching (e.g., bcrypt, PBKDF2, Argon2) iterates the hash function many times (e.g., 10,000 iterations) to slow down brute-force attacks. SY0-701 expects you to know that salting and key stretching are essential for secure password storage.
Hashing for Data Integrity
File integrity is verified by comparing the hash of the current file to a known good hash (e.g., from the vendor's website). Any difference indicates tampering. Tools like sha256sum (Linux) or Get-FileHash (PowerShell) compute hashes.
Digital Signatures
A digital signature uses hashing plus asymmetric encryption. The sender hashes the message and encrypts the hash with their private key (signing). The recipient decrypts the hash with the sender's public key and compares it to their own hash of the message. If they match, the message is authentic and untampered.
Common Attacks on Hashing
Collision Attack: Attacker finds two different inputs that produce the same hash. Exploited to create fraudulent certificates (e.g., Flame malware used an MD5 collision).
Preimage Attack: Attacker finds an input that produces a given hash. Infeasible for secure hashes.
Rainbow Table Attack: Precomputed hash chains for cracking passwords. Mitigated by salting.
Length Extension Attack: Given H(M) and the length of M, an attacker can compute H(M || padding || extra) without knowing M. SHA-3 and HMAC are not vulnerable; SHA-256 is vulnerable in naive use.
Real Command Examples
Computing SHA-256 hash on Linux:
sha256sum file.txtOn Windows PowerShell:
Get-FileHash file.txt -Algorithm SHA256Verifying a downloaded file:
echo "expected_hash file.txt" | sha256sum --checkStandards and RFCs
MD5: RFC 1321
SHA-1: FIPS PUB 180-1
SHA-2: FIPS PUB 180-4
SHA-3: FIPS PUB 202
PBKDF2: RFC 2898
HMAC: RFC 2104
SY0-701 may test your knowledge of which algorithms are deprecated (MD5, SHA-1) and which are current (SHA-2, SHA-3).
1. Generate Original Hash
The defender (e.g., software vendor) computes a hash of the original file using a secure algorithm like SHA-256. This hash serves as the integrity baseline. The defender publishes the hash on a trusted website or alongside the download. Example command: `sha256sum original_software.iso > checksum.txt`. The output is a 64-character hexadecimal string for SHA-256. This step establishes a reference point. In a SOC scenario, the hash might be stored in a secure database or signed with a digital certificate to prevent tampering. The defender must ensure the hash itself is not altered; otherwise, integrity verification is meaningless.
2. Attacker Modifies Data
An attacker intercepts the file during transmission (e.g., man-in-the-middle) or gains unauthorized access to the distribution server. They modify the file to insert malware, backdoors, or other malicious payloads. The modified file now has different content. The attacker may also attempt to replace the published hash with a hash of the modified file to avoid detection. However, if the hash is signed or published on a separate secure channel, the attacker cannot easily alter it. The attacker's goal is to make the modified file appear legitimate by either creating a collision (finding two inputs with same hash) or by replacing the hash. Since collision attacks are feasible for MD5 and SHA-1, the attacker might use a collision to create a malicious file that has the same hash as the original.
3. Recipient Verifies Integrity
The end user downloads the file and computes its hash using the same algorithm. On Linux: `sha256sum downloaded_software.iso`. On Windows: `Get-FileHash downloaded_software.iso -Algorithm SHA256`. The user compares the computed hash to the published hash (e.g., from the vendor's website). If they match, the file is intact. If they differ, the file has been tampered with. The user should reject the file and report the discrepancy. In a SOC environment, automated tools like Tripwire or OSSEC continuously monitor file integrity by comparing current hashes to stored baselines. Any mismatch triggers an alert. The analyst must investigate the cause: was it a legitimate update, a disk error, or a security breach?
4. Attack Exploits Weak Hash
If the defender used a weak hash algorithm like MD5 or SHA-1, the attacker may generate a collision. For example, in 2017, Google demonstrated a collision for SHA-1 (SHAttered) using 9 quintillion computations. The attacker creates two PDF files with different content (one benign, one malicious) that produce the same SHA-1 hash. The benign file is submitted for signing (e.g., code signing certificate). Once signed, the attacker swaps it with the malicious file, which has the same hash and thus the same signature. The signature remains valid, and the malicious file appears trusted. This attack undermines digital signatures and integrity verification. To mitigate, always use collision-resistant hashes like SHA-256 or SHA-3.
5. Defender Implements Strong Hashing
To defend against such attacks, the defender must use a secure hash algorithm (SHA-256 or higher) and protect the hash itself. Best practices include: (1) Using HMAC (Hash-based Message Authentication Code) with a secret key to prevent length extension attacks; (2) Digitally signing the hash to ensure authenticity; (3) Publishing hashes over HTTPS and on multiple trusted sources; (4) Using key stretching algorithms (bcrypt, PBKDF2, Argon2) for password hashing with a unique salt per user; (5) Regularly updating hashing algorithms as older ones become deprecated. In a SOC, analysts should verify that all critical files and configurations are hashed using SHA-256 or SHA-3 and that baseline hashes are stored in a tamper-proof manner (e.g., read-only media or signed database).
Scenario 1: Software Supply Chain Attack
A SOC analyst at a large enterprise receives an alert from the file integrity monitoring (FIM) system: the hash of a critical application binary has changed. The analyst pulls the current hash using sha256sum and compares it to the baseline stored in a signed database. The hashes do not match. The analyst isolates the affected system and investigates the change. The vendor's website shows the original hash, but the downloaded file from the company's internal mirror has a different hash. Further investigation reveals the mirror was compromised, and the binary was replaced with a trojanized version. The analyst blocks the malicious hash across the network using endpoint detection and response (EDR) tools and initiates incident response. Common mistake: ignoring the alert, assuming it was a legitimate software update. The correct response is to always verify the change against the vendor's official hash and investigate any discrepancy.
Scenario 2: Password Hash Dump
A penetration tester recovers a password hash file from a compromised server. The hashes are unsalted MD5. Using a rainbow table, the tester cracks 80% of the passwords within hours. The report recommends migrating to salted SHA-256 or bcrypt. The IT team implements salting with a 16-byte random salt per user and uses PBKDF2 with 100,000 iterations. After migration, even if the hash file is stolen again, cracking becomes computationally infeasible. The SOC should monitor for any use of weak hashing algorithms in the environment and enforce strong password policies.
Scenario 3: Digital Signature Verification Failure
An email security gateway receives a digitally signed email. The gateway computes the hash of the email body and compares it to the decrypted hash from the signature. They match, so the email is considered authentic. However, the gateway uses SHA-1 for the hash. An attacker could have used a collision attack to create a malicious email with the same SHA-1 hash as a benign one. The gateway would accept the malicious email. The fix is to configure the gateway to reject SHA-1 signatures and require SHA-256 or higher. The SOC should audit all cryptographic configurations to ensure compliance with current standards.
What SY0-701 Tests
Objective 1.4 (Compare and contrast basic concepts of cryptography) includes hashing under integrity concepts. You must know:
The difference between hashing and encryption (one-way vs two-way).
Properties: collision resistance, preimage resistance, avalanche effect.
Specific algorithms: MD5 (weak, 128-bit), SHA-1 (weak, 160-bit), SHA-2 (secure, 256/384/512-bit), SHA-3 (secure).
Use cases: data integrity, password storage, digital signatures.
Salting and key stretching (bcrypt, PBKDF2, Argon2).
Common attacks: collision, rainbow table, length extension.
Common Wrong Answers
"Hashing is a form of encryption." Many candidates confuse hashing with encryption. Encryption is reversible with a key; hashing is one-way. The exam expects you to know they are different.
"MD5 is still secure for most purposes." MD5 is broken; collisions can be generated in seconds. Never choose MD5 as a secure option.
"SHA-1 is acceptable for digital signatures." SHA-1 is deprecated; the SHAttered attack proved collision feasibility. The exam will consider SHA-1 insecure.
"Salting is unnecessary if you use a strong hash." Salting prevents rainbow table attacks regardless of hash strength. Always salt passwords.
Specific Terms and Values
MD5: 128-bit output
SHA-1: 160-bit output
SHA-256: 256-bit output
SHA-512: 512-bit output
RIPEMD-160: 160-bit output
HMAC: Hash-based Message Authentication Code (uses a secret key)
PBKDF2, bcrypt, Argon2: key stretching algorithms
Trick Questions
A question may describe a scenario where a file's hash matches, but the file is malicious. This is a collision attack. The correct answer is to use a stronger hash algorithm (e.g., SHA-256 instead of MD5).
A question may ask about "reversing a hash." The correct answer is that it's infeasible for secure hashes; attackers use brute force or rainbow tables, not reversal.
"Which algorithm provides the best integrity?" Choose SHA-256 or SHA-3, not MD5 or SHA-1.
Decision Rule
On scenario questions involving hashing, first identify the use case (integrity, password storage, digital signature). Then evaluate the algorithm: if it's MD5 or SHA-1, it's likely the weak link. Look for missing salt or lack of key stretching for passwords. The correct answer will involve replacing weak algorithms with SHA-256 or SHA-3 and adding salt/key stretching for passwords.
Hashing is a one-way function used for integrity and password storage, not confidentiality.
MD5: 128-bit, broken, never use. SHA-1: 160-bit, deprecated, avoid. SHA-256/512: secure, use them.
Salting adds a random value to each password before hashing to prevent rainbow table attacks.
Key stretching (bcrypt, PBKDF2, Argon2) iterates hashing to slow brute-force attacks.
Digital signatures combine hashing with asymmetric encryption to provide integrity and non-repudiation.
Collision attacks find two inputs with the same hash; preimage attacks find an input for a given hash.
HMAC (Hash-based Message Authentication Code) uses a secret key to prevent length extension attacks.
Always verify file integrity by comparing hashes from a trusted source (e.g., vendor website over HTTPS).
These come up on the exam all the time. Here's how to tell them apart.
Hashing (SHA-256)
One-way function: cannot reverse output to input
No key required; output is deterministic
Used for integrity verification and password storage
Fixed output size (256 bits for SHA-256)
Vulnerable to collision and preimage attacks
Encryption (AES-256)
Two-way function: reversible with correct key
Requires a secret key for encryption and decryption
Used for confidentiality of data
Output size equals input size (with padding)
Vulnerable to brute-force and side-channel attacks
MD5
128-bit hash output
Collision attacks feasible in seconds
Deprecated and insecure
Used historically for file integrity
Not suitable for digital signatures
SHA-256
256-bit hash output
Collision resistant (no practical attacks)
Currently secure and widely used
Recommended for file integrity and signatures
FIPS 180-4 compliant
Plain Hashing (SHA-256)
No secret key used
Vulnerable to length extension attacks
Same hash for same input every time
Used for public integrity verification
Simple to compute
Keyed Hashing (HMAC-SHA256)
Uses a secret key concatenated with the message
Resistant to length extension attacks
Output depends on both message and key
Used for message authentication (MAC)
Provides authenticity in addition to integrity
Mistake
Hashing and encryption are the same thing.
Correct
Hashing is a one-way function that produces a fixed-size output from arbitrary input. Encryption is a two-way function that requires a key to encrypt and decrypt. Hashing is not reversible; encryption is.
Mistake
MD5 is still secure enough for non-critical applications.
Correct
MD5 is cryptographically broken. Collisions can be generated in under a second on modern hardware. MD5 should never be used for security purposes, regardless of perceived criticality.
Mistake
SHA-1 is still acceptable if you trust the source.
Correct
SHA-1 is deprecated due to practical collision attacks (e.g., SHAttered). Using SHA-1 for integrity verification or digital signatures is insecure. Use SHA-256 or higher.
Mistake
Salting passwords is optional if you use a strong hash like SHA-256.
Correct
Salting is essential regardless of hash strength. Without salt, identical passwords produce identical hashes, enabling rainbow table attacks. Salt ensures each hash is unique.
Mistake
A hash collision means the hash algorithm is completely broken.
Correct
All hash functions have collisions due to the pigeonhole principle (infinite inputs, finite outputs). A secure hash makes collisions computationally infeasible to find. When collisions can be found efficiently (as with MD5 and SHA-1), the algorithm is considered broken.
Hashing is a one-way function that converts data into a fixed-size digest that cannot be reversed. Encryption is a two-way function that uses a key to transform data into ciphertext that can be decrypted back to plaintext. Hashing ensures integrity; encryption ensures confidentiality. On the exam, if the scenario involves making data unreadable without a key, it's encryption. If it's about verifying that data hasn't changed, it's hashing.
MD5 is insecure because collision attacks can be performed in seconds on modern hardware. A collision means two different inputs produce the same hash. This allows an attacker to substitute a malicious file that has the same MD5 hash as a legitimate one. For SY0-701, remember that MD5 is broken and should never be used for security purposes. Always choose SHA-256 or higher.
A salt is a random value added to a password before hashing. It ensures that even if two users have the same password, their hashes will be different. Salting prevents rainbow table attacks, where an attacker uses precomputed hash-to-password mappings. On the exam, if a password storage scenario lacks a salt, that is a vulnerability. The correct answer will include using a unique salt per user.
The avalanche effect means that a small change in the input (e.g., flipping a single bit) causes approximately half of the output bits to change. This property is crucial because it makes the hash unpredictable and ensures that similar inputs produce completely different hashes. The exam may test this concept in the context of strong hash function properties.
A length extension attack exploits the Merkle-Damgård construction used in MD5, SHA-1, and SHA-2. Given `H(M)` and the length of `M`, an attacker can compute `H(M || padding || extra)` without knowing `M`. This can break naive message authentication schemes. HMAC and SHA-3 are not vulnerable. For SY0-701, know that HMAC mitigates this attack.
For password storage, use a key stretching algorithm like bcrypt, PBKDF2, or Argon2, not a plain hash like SHA-256. These algorithms are designed to be slow, making brute-force attacks infeasible. Always include a unique salt per user. On the exam, if the answer choices include bcrypt or PBKDF2, they are likely correct for password storage scenarios.
SHA-2 (SHA-256/512) is based on the Merkle-Damgård structure, while SHA-3 is based on the Keccak sponge construction. SHA-3 is the latest standard (FIPS 202) and is structurally different, providing a backup in case SHA-2 is broken. Both are currently secure. For the exam, know that SHA-3 is the newer standard but SHA-2 is still widely used and acceptable.
You've just covered Hashing Algorithms — now see how well it sticks with free SY0-701 practice questions. Full explanations included, no account needed.
Done with this chapter?