EC-CouncilForensicsSecurityIntermediate21 min read

What Is Data Carving? Security Definition

Also known as: data carving, file carving, forensic data recovery, magic bytes, foremost tool

Reviewed byJohnson Ajibi· Senior Network & Security Engineer · MSc IT Security
On This Page

Quick Definition

Data carving is a technique used to recover files from a hard drive or other storage device even if the file system is damaged or the files were deleted. Instead of using the directory or folder structure, it looks for file signatures and patterns inside the raw data. This allows investigators to find documents, images, and other files that the user tried to erase or that were lost due to corruption.

Must Know for Exams

In the EC-Council Certified Hacking Forensic Investigator (CHFI) exam, data carving appears prominently in the disk and file system forensics domain. The exam objectives explicitly require candidates to understand file system structures, data recovery techniques, and tools like Foremost, Scalpel, and PhotoRec. You may be asked to explain the difference between carving single files and fragmented files, or to identify the correct tool for a given recovery scenario.

The CHFI exam often tests the concept through scenario-based questions. For instance, you might be told that a suspect formatted a hard drive and reinstalled the operating system. The question will ask how an investigator can recover specific user files that were present before the format. The correct answer will involve data carving, because the file system has been rebuilt but the old data sectors may remain unallocated. The exam also tests your knowledge of file signatures: you may need to recall the magic bytes for common file types like JPEG, PDF, and ZIP.

Other certification exams, such as CompTIA Security+ and SANS GIAC, also cover carving in the context of forensic methodology. In Security+, it appears as part of the forensic investigation procedure, testing the sequence of steps including acquisition, analysis, and recovery. In GIAC Certified Forensic Examiner (GCFE) exams, carving questions are more advanced, covering techniques like carving from unallocated space and dealing with file fragmentation. Across all these exams, the core principle is the same: carving recovers data when metadata is unavailable.

Simple Meaning

Imagine you have a giant library where every book has a special label on its cover telling you the title, author, and where it belongs on the shelf. The librarian uses these labels to find any book quickly. Now imagine someone removes all those labels and scrambles the books into a huge pile on the floor. Without the labels, the librarian cannot use the catalog to find a specific book. However, if the librarian knows that every mystery novel starts with a specific phrase like "It was a dark and stormy night," they can still search through the pile, find pages that begin with that phrase, and piece together the mystery novel.

Data carving works exactly like that. The file system, like the library catalog, keeps track of where every file starts and ends. When a file is deleted or the catalog is damaged, that information is lost. But many file types have unique patterns called magic bytes or file signatures at their beginning, and sometimes at their end. For example, a JPEG image file almost always starts with the bytes FF D8 FF. A data carving tool scans every single byte on the storage device, looking for these unique patterns. When it finds one, it starts copying data from that point onward. It then looks for the file's ending signature or uses other rules to decide when the file is complete. This is like searching through the pile of books for every page that says "It was a dark and stormy night" and then reading until you reach "The End." The result is that you can recover many files even when the file system is completely broken.

Full Technical Definition

Data carving, also known as file carving, is a forensic data recovery technique that extracts files from a storage medium using file content signatures rather than relying on file system metadata such as the Master File Table (MFT) on NTFS, the inode table on Ext4, or the File Allocation Table (FAT). This technique is essential in digital forensics when the file system is corrupted, formatted, or partially overwritten, or when files have been deleted and their metadata entries are no longer valid.

The core mechanism involves scanning the raw binary data of a storage device, typically represented as a forensic image (a bit-for-bit copy), for known file headers or magic bytes. Each file format has a standard signature that identifies the file type. For example, a Portable Network Graphics (PNG) file begins with the eight-byte signature 89 50 4E 47 0D 0A 1A 0A. A PDF file starts with 25 50 44 46 (which is ASCII for "%PDF"). The carving tool reads the contiguous data starting from this header and attempts to determine the file length by detecting a footer signature, using file size information embedded in the header, or applying statistical analysis to identify the end of the file.

More advanced carving methods include bifragment gap carving, which can recover files that have been split into two fragments, and smart carving, which uses knowledge of file system structures to improve accuracy. For instance, when carving a JPEG image, the tool may look for the FF D8 FF header and then search for the FF D9 footer. However, due to fragmentation, the data between these markers might not all belong to the same image. Modern tools like Scalpel, Foremost, and PhotoRec use complex algorithms to handle fragmentation, and they often employ carving profiles that define the signatures and structural rules for hundreds of file types.

In real IT environments, data carving is implemented using dedicated forensic software on write-blocked devices to ensure evidence integrity. The process is typically performed on a forensic workstation where the investigator runs a carving tool against a disk image. The output is a set of recovered files, which are then manually reviewed or further analyzed for metadata like EXIF data in photos or document properties. Carving is not limited to hard drives; it works on any digital storage, including SSDs, USB flash drives, SD cards, and memory dumps from mobile devices.

Real-Life Example

Think of a lost-and-found office in a large sports stadium after a big game. Thousands of people have left, and hundreds of items were found: phones, wallets, keys, jackets, and water bottles. The lost-and-found staff does not have any tags or labels on these items that tell them whom each item belongs to or where the owner was sitting. They cannot use a list of seat numbers to return the items. Instead, they must examine each item individually to figure out who it might belong to. They look at the phone's background wallpaper for a face, check the wallet for a driver's license, or look inside the jacket for a dry cleaning label with a name. They are essentially "carving" identity information out of the physical items themselves, without using any lost item database.

In data carving, the storage device is the stadium full of lost items. The file system metadata, like the lost item database, is missing or broken. The investigator cannot ask the database, "Where is the photo file that was saved on Tuesday?" because that database is gone. Instead, the investigator scans the raw data, which is like physically opening every bag and looking inside. When they see the unique pattern FF D8 FF for a JPEG image, that is like finding a phone with a specific brand logo. They then pull out that data segment, which is like picking up the phone and placing it in the recovery pile. The process is painstaking and not perfect, just like the stadium staff might mistake one black jacket for another, but it recovers items that would otherwise be permanently lost.

Why This Term Matters

In real IT work, especially in digital forensics and incident response, data carving is a critical last resort for recovering evidence. When a cybercriminal deletes files, reformats a drive, or even partially overwrites data, the file system metadata is often destroyed. Without data carving, those files would be considered gone forever, making it impossible to prove what happened in a data breach, intellectual property theft, or legal case. Forensic examiners use carving to recover deleted emails, incriminating photos, documents, and logs that can be presented in court.

Beyond law enforcement, businesses use data carving in incident response to understand the scope of an attack. If an attacker wipes logs after gaining access, carving the drive can recover log fragments that show the attacker's commands and movements. IT teams also use carving to recover accidentally deleted files when backup systems fail. For example, if an employee deletes an important spreadsheet and the recycle bin is emptied, a quick carving operation can often retrieve the file before it is overwritten.

Data carving also plays a role in data sanitization verification. When decommissioning drives, organizations must ensure sensitive data is truly gone. Carving can test whether a wiping tool was effective by attempting to recover files from the erased drive. If carving returns nothing, the sanitization was successful. Without carving, there would be no practical way to verify that a drive is completely clean.

How It Appears in Exam Questions

In certification exams, data carving questions usually appear in three main patterns. The first is the definition or tool identification question. For example, a multiple-choice question might ask: "Which of the following tools is BEST suited for recovering files from a damaged file system without relying on file system metadata?" The options might include tools like FTK Imager, WinHex, Foremost, and EnCase. The correct answer is Foremost, because it is a dedicated carving tool. Another variation asks candidates to identify the file signature that marks the beginning of a PNG image file.

The second pattern is the scenario question. These questions describe a forensic situation and ask what the investigator should do next. For instance: "A forensic analyst receives a hard drive from a suspect. The drive was formatted and the operating system was reinstalled. The analyst needs to recover the suspect's personal photos and documents from before the format. Which technique should the analyst use?" The answer is data carving, because the new file system overwrites the metadata, but the underlying data sectors may still contain the original file content until overwritten by new data.

The third pattern involves troubleshooting or analysis questions. A question might present a partially recovered file that appears corrupted and ask why this happened. The correct answer often involves file fragmentation: the file was stored in non-contiguous clusters, and the carving tool only recovered the first fragment. The candidate must understand that simple carving tools cannot handle fragmentation, and more advanced techniques like bifragment gap carving are required. These questions test deeper understanding of how carving algorithms work and their limitations.

Study ec-chfi

Test your understanding with exam-style practice questions.

Practise

Example Scenario

Situation: An IT security analyst at a small company discovers that an employee, who has just been terminated, may have been stealing customer data. The employee used a company-issued laptop. Before leaving, the employee deleted all files from the Desktop and My Documents folders, emptied the Recycle Bin, and ran a factory reset on the laptop, which reinstalls Windows and formats the drive. The manager asks the analyst to recover any evidence of data theft from the laptop.

How Data Carving Applies: The factory reset has completely rebuilt the file system. The old file names, folder structures, and metadata are gone. Standard file recovery tools that look in the Recycle Bin or try to read deleted file entries in the MFT will find nothing because those structures have been overwritten by the new Windows installation. However, the underlying storage sectors that contained the old files might not all be overwritten. The analyst creates a forensic image of the drive using a write blocker. Then, they run a data carving tool like PhotoRec on the image. The tool scans every sector for file signatures. It will find multiple JPEG images, PDF files, and Excel spreadsheets that start with their unique magic bytes. Even though the new Windows installation has overwritten some sectors, many sectors remain unchanged because the new OS does not use every byte of the drive. The analyst recovers these files, which include spreadsheets with customer names and credit card numbers. The evidence proves the data theft, and the company can take legal action.

Common Mistakes

Believing that data carving works perfectly on every storage device and always recovers complete, usable files.

Data carving is not a magic bullet. It relies on file signatures that may be corrupted, and files are often fragmented across the disk. If a file is fragmented, a simple carving tool will only recover the first fragment, producing a corrupted output. Additionally, if the file signature is partially overwritten, the tool will miss the file entirely.

Always expect incomplete results. Use advanced carving tools that can handle fragmentation, and verify every recovered file by opening it or checking its checksum against known values. In forensic practice, multiple tools should be used and results cross-checked.

Thinking that data carving can recover all types of data, including encrypted or compressed files, without any special handling.

File signatures for encrypted data may not be present or may be hidden within the encryption wrapper. For example, a TrueCrypt container has no standard file header that a carving tool recognizes. The tool will see random data and may not carve it at all, or may carve a large chunk that is still encrypted and unusable.

Understand that data carving is effective for files with known, unencrypted signatures like JPEG, PDF, and MP4. For encrypted containers, use brute-force or key recovery methods, not carving. Always check the context of the investigation to determine if encryption is involved before relying solely on carving.

Assuming that data carving is only useful after file deletion and never for active file systems.

Data carving can also recover files from unallocated clusters on an active file system. These are clusters that the file system no longer considers part of any file, often because the file was deleted or the cluster was marked as free space. Carving from unallocated space is a standard forensic procedure even when the file system is healthy.

Remember that carving is used not only for damaged or formatted drives but also for extracting evidence from unallocated space on a working system. In incident response, carving the unallocated space of a running server can recover deleted command-and-control scripts or temporary files created by malware.

Confusing data carving with file hashing or file signature verification.

File hashing (like MD5 or SHA1) creates a unique digital fingerprint of a file to verify its integrity. Data carving recovers the file itself from raw data. They serve different purposes: hashing checks that a file has not changed, while carving recovers the file content when the file system is not available.

Use both techniques together. Carve the file first, then compute its hash to verify that the recovered file matches the original (if a known hash exists from a backup or previous acquisition). The hash is a verification step after carving, not an alternative to it.

Exam Trap — Don't Get Fooled

When a hard drive is formatted (not wiped), all files on the drive are permanently destroyed and cannot be recovered. Understand the difference between formatting and wiping. A quick format or standard format deletes the metadata that tells the OS where files start and end, but the data remains on the disk until new data writes over it.

Data carving can recover files from a formatted drive because it does not rely on that metadata. A wipe (like using a DoD-standard erasure tool) overwrites all sectors with zeros or random data, making carving impossible. In exams, if a question says "formatted," remember that carving is still an option.

If it says "wiped" or "overwritten with zeros," carving will not work.

Commonly Confused With

Data CarvingvsFile Recovery

File recovery typically refers to recovering files by reading file system structures like the Master File Table or directory entries. Data carving bypasses these structures entirely and works directly with raw data. File recovery requires a recognizable file system, while data carving does not.

If you accidentally delete a file and go to the Recycle Bin, you are doing file recovery. If you drop your hard drive in water and the file system is unreadable, you use data carving to pull out any JPEGs you can find in the leftover data.

Data CarvingvsFile Signature Analysis

File signature analysis involves checking the signature bytes of files that are already accessible to verify they match their claimed extension (e.g., a .jpg file should start with FF D8 FF). Data carving uses the same signatures but to locate and extract files that are not visible in the file system. One is verification, the other is extraction.

If you have a folder full of files and you check that every .jpg starts with FF D8 FF, that is signature analysis. If you scan the raw drive for every FF D8 FF sequence and pull out the data that follows, that is data carving.

Data CarvingvsFile System Metadata Recovery

Metadata recovery aims to restore the file system structures themselves, such as rebuilding the Master File Table or the boot sector. Data carving does not care about these structures; it treats the entire device as a bag of bytes. Metadata recovery tries to fix the catalog, while data carving goes straight to the books.

If your computer cannot boot because the boot sector is corrupt, you try to rebuild the boot sector (metadata recovery). If you just want to get your photos off the drive without fixing it, you use data carving.

Step-by-Step Breakdown

1

Acquisition of a Forensic Image

Before any carving begins, the target storage device is imaged using a write blocker to prevent any changes. A bit-for-bit copy (raw image or E01 format) is created. This ensures the original evidence remains intact and all analysis is done on the copy.

2

Selection of Carving Profiles

The forensic investigator chooses which file types to carve. Most carving tools come with predefined profiles listing known file signatures for hundreds of formats. The investigator can enable only the types relevant to the investigation, such as JPEG, PDF, and Office documents, to save time and reduce noise.

3

Scanning for File Headers

The carving tool begins reading the forensic image from the first byte to the last. It compares every sequence of bytes against the signatures in the selected carving profiles. When a match is found, the tool notes the starting offset and the file type.

4

Extraction of File Content

From the header offset, the tool starts copying data forward. It looks for the corresponding footer signature to determine the file's end. If no footer is found, it may use embedded size information (like the width and height in a JPEG header) or continue until the next header of a different file type is encountered.

5

Handling Fragmentation and Validation

Simple carving assumes files are contiguous. If a file is fragmented, the tool will likely produce a corrupted output. Advanced tools attempt to detect fragmentation patterns and may reconstruct files from multiple fragments. After extraction, each recovered file is validated by checking its internal structure (e.g., valid JPEG markers) and computing its hash for verification.

Practical Mini-Lesson

Data carving is a foundational skill for any digital forensics professional. When you approach a case where a suspect has tried to destroy evidence by deleting files or reformatting a drive, carving is often the only path forward. The process begins with proper acquisition: you must use a hardware write blocker or software write blocker to ensure the drive is not modified during imaging. A forensic image is then created using tools like FTK Imager or dd (on Linux). The image is the playground for carving.

Next, you select your tools and profiles. Foremost and Scalpel are classic carving tools that are highly configurable. They use a configuration file called foremost.conf or scalpel.conf, where you specify the file types to carve and their signatures. For example, to carve JPEG files, you add a line like: jpg y 200000000 \xFF\xD8\xFF \xFF\xD9. This tells the tool to look for the header FF D8 FF and the footer FF D9, with a maximum file size of around 200 MB. PhotoRec is another powerful tool that does not require as much configuration because it automatically detects over 400 file types.

In practice, you will rarely carve every file type at once because it produces thousands of files, many of which are system files or irrelevant. Instead, you tailor your carving to the case. If you are looking for specific documents, you might carve only PDF, DOCX, and XLSX. Each recovered file must be manually reviewed. They will have no original file names or dates, so you must rely on content analysis to classify them. Carving is time-consuming and computationally intensive. A 1 TB drive can take hours to carve completely.

What can go wrong? The biggest issue is file fragmentation. On modern drives with SSDs and active file systems, files are frequently fragmented. A simple carving tool will produce a clipping of the first fragment, which appears as a partial image or corrupt document. To mitigate this, use tools that support fragmentation recovery, such as R-Studio or the smart carving features in EnCase. Another issue is that carving can recover false positives: sequences of bytes that happen to match a header but are not actual files. Always validate recovered files by opening them or checking their integrity with tools like file (on Linux) or by loading them in a hex editor to verify the structure.

Carving connects to broader forensic concepts like unallocated space and file system analysis. Understanding how the file system organizes data helps you interpret why carving is needed and what its limitations are. For example, on a healthy NTFS system, a 100 KB file might be stored in a single cluster run if the drive is not fragmented. After deletion, that run remains until overwritten. Carving can recover it easily. But if the same file was in 20 fragments across the disk, carving will fail unless you use advanced techniques. As a professional, you must know when carving is the right tool and when other methods, like metadata recovery or journal analysis, are more appropriate.

Memory Tip

Remember the three Ps of carving: Pattern (find the header), Pull (copy the data from that point), and Piece together (handle fragmentation).

Covered in These Exams

Related Glossary Terms

Frequently Asked Questions

Can data carving recover files from a solid-state drive (SSD) that has been trimmed?

It can be difficult because SSD TRIM commands physically erase data blocks immediately after deletion. If TRIM has been run, the data sectors are zeroed out, and carving will find nothing. However, if TRIM has not yet executed, carving may still work.

Does data carving work on drives that have been overwritten once?

No. If a drive sector has been overwritten with new data, the old data is gone. Data carving cannot recover overwritten data. It only works for data in sectors that have not been overwritten since the file was deleted or the system was reformatted.

What is the difference between data carving and file carving?

There is no difference. The terms are used interchangeably in digital forensics to describe the process of extracting files from raw data using file signatures.

Can data carving recover files from a RAID array?

Yes, but the process is more complex. You first need to reconstruct the RAID volume from its member disks (either logically or by imaging each disk), and then carve the reconstructed volume. The stripes and parity complicate the carving process.

What is the most common tool for data carving in CHFI labs?

Foremost is frequently used in CHFI lab exercises. It is open source, runs on Linux and Windows, and is easy to configure. PhotoRec is another common tool, especially for recovering media files.

Can data carving recover deleted text messages from a mobile phone?

It can recover text messages that are stored as files in the phone's app data partition, but modern smartphones use complex file systems and encryption. Carving the raw memory dump of a phone can recover fragments of messages if they are not encrypted, but full recovery is challenging.

Summary

Data carving is a powerful forensic technique that recovers files by scanning raw storage data for known file signatures, bypassing the file system entirely. This method is essential when file system metadata is destroyed due to formatting, deletion, or corruption. Understanding data carving means grasping the concept of magic bytes, the difference between simple and advanced carving, and the real-world limitations caused by file fragmentation and overwritten data.

For certification exams like CHFI and Security+, you must know the tools (Foremost, Scalpel, PhotoRec), the signatures for common file types, and when to apply carving versus other recovery methods. Remember that carving does not work on wiped drives or on sectors that have been overwritten, and it often produces incomplete results for fragmented files. Despite these limitations, data carving remains a cornerstone of digital forensics, enabling investigators to recover evidence that would otherwise be lost.

Master this concept, and you will be well prepared for both your exams and real-world forensic work.