A company uses an AI model to generate personalized marketing emails. They want to prevent the model from leaking the system prompt used to configure its behavior. Which attack should they guard against?
Prompt leaking extracts the hidden system prompt via crafted user inputs.
Why this answer
Prompt leaking is an attack where an adversary crafts inputs to trick the model into revealing its system prompt or hidden instructions. Since the system prompt defines the model's behavior and often contains proprietary or sensitive configuration details, preventing its disclosure is critical. Guarding against prompt leaking directly addresses the goal of keeping the system prompt confidential.
Exam trap
Cisco often tests the distinction between attacks on training data (model inversion, membership inference, data poisoning) versus attacks on the inference-time configuration (prompt leaking), so candidates mistakenly choose a training-data attack when the question explicitly targets the system prompt.
How to eliminate wrong answers
Option B is wrong because model inversion attacks aim to reconstruct training data from the model's outputs, not to extract the system prompt which is part of the model's runtime configuration, not its training data. Option C is wrong because membership inference attacks determine whether a specific data point was used in the model's training set, which is unrelated to leaking the system prompt. Option D is wrong because data poisoning involves corrupting the training data to alter the model's behavior, not extracting the system prompt that is provided at inference time.