A financial institution is implementing a data loss prevention (DLP) solution to protect customer financial information. The DLP system must detect and block the transmission of credit card numbers via email. Which of the following is the BEST approach to ensure accurate detection while minimizing false positives?
Combining regex with Luhn check reduces false positives by verifying the mathematical validity of the number.
Why this answer
Option A is correct because combining a regular expression for credit card number patterns with Luhn algorithm validation significantly reduces false positives. The Luhn algorithm checks the mathematical validity of the number (e.g., checksum), ensuring that random digit sequences matching the pattern are not flagged as credit card numbers. This dual-layer approach is a standard DLP best practice for accurately detecting sensitive data like credit card numbers.
Exam trap
The trap here is that candidates may choose option C, thinking simple pattern matching is sufficient, but they overlook the need for Luhn validation to avoid false positives from non-credit-card digit sequences.
How to eliminate wrong answers
Option B is wrong because hashing outbound emails and comparing against a database of known credit card hashes is impractical; credit card numbers are unique per transaction and not pre-known, and hashing prevents detection of previously unseen numbers. Option C is wrong because using a simple regular expression like '\d{4}-\d{4}-\d{4}-\d{4}' will generate many false positives by matching any 16-digit sequence (e.g., phone numbers, order IDs) without validating the number's structure or checksum. Option D is wrong because machine learning classifiers trained on past credit card data require extensive labeled datasets and may still produce high false positive rates or miss novel patterns, and they lack the deterministic validation that Luhn provides.