CompTIA AI+ AI0-001 (AI0-001) — Questions 526600

1000 questions total · 14pages · All types, answers revealed

Page 7

Page 8 of 14

Page 9
526
MCQeasy

A model serving endpoint is tested using curl commands. Based on the exhibit, what is the most likely issue?

A.The server is returning HTTP 500 errors
B.The input features are malformed
C.The model is experiencing intermittent high latency leading to timeouts
D.The model is not deployed on the server
AnswerC

The third request timed out, suggesting occasional performance degradation.

Why this answer

The exhibit shows that the first curl request succeeds (HTTP 200), but subsequent requests fail with 'curl: (28) Operation timed out' after the default timeout of 30 seconds. This pattern of intermittent success followed by timeouts is characteristic of a model experiencing high latency spikes, not a persistent server error or configuration issue. The server is reachable and the model responds correctly some of the time, ruling out deployment or malformed input issues.

Exam trap

CompTIA often tests the distinction between persistent errors (like 500 or 404) and intermittent timeout failures, where candidates mistakenly attribute timeouts to server errors or input issues rather than recognizing the pattern of variable latency.

How to eliminate wrong answers

Option A is wrong because the exhibit shows HTTP 200 responses for successful requests, not HTTP 500 errors; a server returning 500 errors would consistently fail with a 5xx status code, not timeouts. Option B is wrong because the first request succeeds, proving the input features are correctly formatted and accepted by the model; malformed features would cause persistent failures across all requests. Option D is wrong because the successful first request confirms the model is deployed and serving predictions; an undeployed model would return a 404 or 503 error, not a timeout after a successful response.

527
MCQhard

An ML team monitors a production model using a dashboard that shows daily performance metrics. Over the past month, the model's accuracy has dropped from 92% to 87%, while the data distribution of input features has remained stable according to statistical tests. Which type of model drift is most likely occurring?

A.Data drift (covariate shift)
B.Model decay
C.Overfitting
D.Concept drift
AnswerD

Concept drift changes the mapping from inputs to outputs, reducing accuracy.

Why this answer

Concept drift occurs when the relationship between input features and the target variable changes, even if the input data distribution remains stable. In this scenario, the model's accuracy declines from 92% to 87% while input feature distributions are unchanged, indicating that the underlying mapping from features to labels has shifted—a classic sign of concept drift.

Exam trap

CompTIA often tests the distinction between data drift and concept drift by presenting a scenario where input distributions are stable but model performance degrades, leading candidates to mistakenly choose data drift (covariate shift) because they focus on the input features rather than the label relationship.

How to eliminate wrong answers

Option A is wrong because data drift (covariate shift) refers to changes in the distribution of input features, which the question explicitly states has remained stable according to statistical tests. Option B is wrong because model decay is a general term for performance degradation over time, but it is not a specific type of drift; the question asks for the type of drift, and concept drift is the precise classification. Option C is wrong because overfitting is a training-time issue where a model fits noise in the training data, leading to poor generalization on new data; it does not explain a gradual performance drop in production while input distributions remain stable.

528
MCQmedium

A machine learning team notices that their model's performance degrades when deployed to a new geographic region. The data distribution in the new region differs from the training data. Which concept best describes this issue?

A.Covariate shift
B.Data leakage
C.Underfitting
D.Overfitting
AnswerA

Covariate shift happens when the distribution of input features changes between training and deployment.

Why this answer

Covariate shift occurs when the distribution of the input features (covariates) changes between training and deployment, while the conditional relationship P(Y|X) remains the same. In this scenario, the model's performance degrades because the new geographic region has a different data distribution than the training data, which is the classic definition of covariate shift. This is a common issue in machine learning when models are deployed in environments not represented in the training set.

Exam trap

CompTIA often tests the distinction between covariate shift and overfitting, where candidates mistakenly think performance degradation on new data is always due to overfitting, but the key is that overfitting implies poor performance on the same distribution, not a different one.

How to eliminate wrong answers

Option B is wrong because data leakage refers to information from outside the training set (e.g., future data or target information) being used to train the model, which artificially inflates performance, not a distribution shift between training and deployment. Option C is wrong because underfitting occurs when a model is too simple to capture patterns in the training data, resulting in poor performance on both training and test sets, not specifically a degradation due to a change in data distribution. Option D is wrong because overfitting happens when a model learns noise or specific patterns in the training data too well, leading to poor generalization on unseen data from the same distribution, not a shift to a different distribution.

529
MCQhard

A company is evaluating a vendor's AI system for hiring. The vendor claims the system is fair because it achieves demographic parity. However, the company discovers that the system has significantly different false positive rates across groups. Which fairness issue does this indicate?

A.The system violates individual fairness
B.The system suffers from selection bias
C.The system violates equalised odds
D.The system is not calibrated
AnswerC

Equalised odds demands equal false positive rates across groups; significant differences indicate unfairness.

Why this answer

Equalised odds requires that a model's false positive rates and true positive rates are equal across all demographic groups. Since the vendor's system has significantly different false positive rates across groups, it violates the equalised odds fairness criterion, even if demographic parity (equal selection rates) is satisfied. This is a core fairness metric in AI governance, as it ensures that errors are distributed equitably.

Exam trap

Cisco often tests the distinction between demographic parity and equalised odds, trapping candidates who assume that equal selection rates (demographic parity) automatically guarantee fairness across all error types.

How to eliminate wrong answers

Option A is wrong because individual fairness focuses on treating similar individuals similarly, not on group-level error rates like false positives. Option B is wrong because selection bias refers to systematic errors in data collection or sampling that lead to unrepresentative training data, not to post-deployment disparities in model errors across groups. Option D is wrong because calibration measures whether predicted probabilities match actual outcomes within each group, which is a separate property from equalised odds; a model can be calibrated yet still have unequal false positive rates.

530
Multi-Selecthard

Which TWO are valid techniques to reduce overfitting in a deep neural network? (Choose TWO.)

Select 2 answers
A.Increase batch size
B.Increase learning rate
C.L2 regularization
D.Gradient clipping
E.Dropout
AnswersC, E

L2 regularization adds a penalty for large weights, discouraging complex models.

Why this answer

L2 regularization (option C) is a valid technique to reduce overfitting by adding a penalty term proportional to the square of the weight magnitudes to the loss function. This discourages the network from learning overly complex patterns, effectively shrinking weights and improving generalization. Dropout (option E) randomly drops a fraction of neurons during training, which prevents co-adaptation of features and forces the network to learn more robust representations, also reducing overfitting.

Exam trap

CompTIA often tests the distinction between techniques that improve training stability (like gradient clipping or adjusting batch size/learning rate) versus those that directly regularize the model to reduce overfitting (like L2 regularization and dropout), leading candidates to confuse optimization tricks with regularization methods.

531
MCQmedium

A company wants to build a conversational agent that can handle complex multi-step tasks such as booking a flight, reserving a hotel, and scheduling a car rental in a single session. The agent must be able to break down the user's request into sub-tasks, call external APIs, and reason about the results. Which design pattern is BEST suited for this requirement?

A.Retrieval-Augmented Generation (RAG) with a vector store
B.An agentic workflow implementing the ReAct pattern with tool use
C.A single large language model prompt with all instructions
D.Fine-tuning a model on a dataset of flight, hotel, and rental conversations
AnswerB

ReAct (Reasoning+Acting) agents iteratively decompose tasks, call APIs, and reason about results, perfectly suiting complex multi-step workflows.

Why this answer

Agentic workflows, particularly the ReAct pattern, combine reasoning and acting (tool calls) allowing the agent to iteratively decompose tasks, use APIs, and adapt based on results.

532
Multi-Selecthard

A team is using k-fold cross-validation to evaluate a model. They observe high variance in performance scores across folds. Which TWO actions are most likely to reduce this variance? (Choose TWO.)

Select 2 answers
A.Increase the number of folds
B.Use stratified cross-validation
C.Decrease the number of folds
D.Shuffle data before splitting
E.Use a more complex model
AnswersA, B

More folds mean each training set is larger and more similar to the full dataset, reducing variance.

Why this answer

Increasing the number of folds (e.g., from 5 to 10) means each fold contains more training data, which reduces the variance of the performance estimate because the model is trained on a larger portion of the dataset each time. Stratified cross-validation ensures that each fold maintains the same class distribution as the original dataset, which stabilizes performance scores when the dataset is imbalanced, thereby reducing variance across folds.

Exam trap

Cisco often tests the misconception that decreasing the number of folds reduces variance, when in fact the opposite is true—fewer folds increase variance because each training set is smaller and more dissimilar.

533
MCQeasy

A team is building a recommendation system using collaborative filtering. They have a sparse user-item matrix. Which technique should they use to handle the sparsity and improve recommendations?

A.Association rule mining
B.Matrix factorization
C.k-nearest neighbors
D.Content-based filtering
AnswerB

Matrix factorization reduces dimensionality and captures latent features, effectively handling sparsity.

Why this answer

Matrix factorization (B) is the correct technique because it decomposes the sparse user-item matrix into lower-dimensional latent factor matrices, effectively capturing underlying patterns and filling in missing entries. This directly addresses sparsity by learning dense representations that generalize beyond observed interactions, which is a core strength in collaborative filtering for recommendation systems.

Exam trap

CompTIA often tests the misconception that k-nearest neighbors (k-NN) is the go-to for collaborative filtering, but candidates fail to recognize that k-NN's performance collapses under high sparsity, whereas matrix factorization explicitly models latent factors to overcome this.

How to eliminate wrong answers

Option A is wrong because association rule mining (e.g., Apriori algorithm) is designed for market basket analysis to find frequent itemsets and rules, not for handling sparse user-item matrices in collaborative filtering; it fails to generalize from sparse data and does not model latent factors. Option C is wrong because k-nearest neighbors (k-NN) is a memory-based collaborative filtering method that relies on direct similarity computations between users or items, which degrades severely with high sparsity due to lack of overlapping ratings, leading to poor recommendations. Option D is wrong because content-based filtering uses item features (e.g., genre, keywords) to recommend similar items, not the user-item interaction matrix; it does not address sparsity in collaborative filtering and ignores collaborative signals from other users.

534
Multi-Selectmedium

Which TWO are best practices for deploying AI models in a containerized production environment? (Select TWO.)

Select 2 answers
A.Always pull the latest image tag for automatic updates
B.Store model artifacts inside the container image for portability
C.Use an orchestration platform like Kubernetes for scaling and health management
D.Package the model and its dependencies into a single container image
E.Configure JVM heap arguments inside the container if using Java
AnswersC, D

Kubernetes provides automated scaling and self-healing.

Why this answer

Option C is correct because orchestration platforms like Kubernetes provide automated scaling, self-healing, and rolling updates for containerized AI models. Kubernetes uses liveness and readiness probes to monitor model health and restart failed containers, ensuring high availability in production.

Exam trap

CompTIA often tests the distinction between containerization best practices (e.g., immutable images, external model storage) and generic software deployment habits (e.g., using latest tags, embedding data), so candidates mistakenly select options that seem convenient but violate production reliability principles.

535
Multi-Selecteasy

Which TWO of the following are common activation functions used in deep neural networks?

Select 2 answers
A.Linear Regression
B.Support Vector Machine
C.K-means
D.ReLU
E.Sigmoid
AnswersD, E

ReLU is the most common activation for hidden layers.

Why this answer

ReLU (Rectified Linear Unit) is a common activation function in deep neural networks because it introduces non-linearity while being computationally efficient, outputting the input directly if positive and zero otherwise. It helps mitigate the vanishing gradient problem, making it a default choice for hidden layers in many architectures.

Exam trap

Cisco often tests the distinction between machine learning algorithms (like Linear Regression, SVM, K-means) and neural network components (like activation functions), so candidates mistakenly select algorithms as activation functions because they recognize them as common ML terms.

536
MCQeasy

A machine learning engineer has a dataset of 100,000 records. She splits it into 70% training, 15% validation, and 15% test sets. After training, the model achieves 95% accuracy on training and 85% on validation. What does the accuracy difference most likely indicate?

A.The validation set is too small
B.The model generalizes well
C.The model is overfitting
D.The test set should be larger
AnswerC

Overfitting explains high training accuracy and lower validation accuracy.

Why this answer

The 10% gap between training accuracy (95%) and validation accuracy (85%) is a classic sign of overfitting. The model has memorized patterns specific to the training set rather than learning generalizable features, causing it to perform worse on unseen validation data. In machine learning, a significant drop in performance from training to validation indicates poor generalization, which is the hallmark of overfitting.

Exam trap

Cisco often tests the distinction between overfitting and data split issues, trapping candidates who mistake a performance gap for an insufficient validation set rather than recognizing it as a model generalization problem.

How to eliminate wrong answers

Option A is wrong because a 15% validation set (15,000 records) is generally sufficient for reliable evaluation; the issue is not size but the performance gap. Option B is wrong because good generalization would show similar accuracy on training and validation sets, not a 10% drop. Option D is wrong because the test set size (15%) is standard and does not affect the training-to-validation accuracy discrepancy; the problem lies in model behavior, not data partitioning.

537
MCQeasy

A data engineer needs to process streaming clickstream data for real-time feature engineering in an ML pipeline. Which data pipeline technology is BEST suited for this task?

A.Apache Spark in batch mode
B.Snowflake
C.Apache Kafka
D.Apache Airflow
AnswerC

Kafka is purpose-built for real-time data streaming and can feed into ML pipelines.

Why this answer

Apache Kafka is the best choice because it is a distributed streaming platform designed for high-throughput, fault-tolerant, real-time data ingestion and processing. It can capture clickstream events as they occur and make them immediately available for feature engineering in an ML pipeline, supporting exactly-once semantics and low-latency delivery.

Exam trap

Cisco often tests the distinction between data ingestion/messaging systems (Kafka) and batch processing or storage systems, leading candidates to confuse Airflow's orchestration role with actual stream processing capabilities.

How to eliminate wrong answers

Option A is wrong because Apache Spark in batch mode processes data in static, finite batches with high latency, making it unsuitable for real-time streaming clickstream data. Option B is wrong because Snowflake is a cloud-based data warehouse optimized for analytical queries on structured, stored data, not for real-time stream ingestion or processing. Option D is wrong because Apache Airflow is a workflow orchestration tool for scheduling and monitoring batch jobs, not a stream processing or messaging system capable of handling real-time data streams.

538
Multi-Selecteasy

A company wants to use machine learning to recommend products to customers based on their purchase history. Which TWO techniques are appropriate for this task? (Select TWO)

Select 2 answers
A.Collaborative filtering
B.Principal Component Analysis (PCA)
C.K-Nearest Neighbors (KNN)
D.Naive Bayes
E.Linear regression
AnswersA, C

Collaborative filtering uses behavior patterns to recommend items.

Why this answer

Collaborative filtering recommends based on user similarities. K-Nearest Neighbors can find similar users or items. Both are suitable for recommendation.

539
MCQeasy

A startup is building a chatbot for customer service. They have 500 recorded conversations and want to use a pre-trained language model to generate responses. However, they have limited computational resources and need the chatbot to respond in real-time. They are considering fine-tuning a large model like GPT-3 or using a smaller model like DistilBERT. The conversation data contains industry-specific jargon. Which approach should they take?

A.Use GPT-3 via API without fine-tuning
B.Fine-tune DistilBERT on the conversation data
C.Train a custom RNN from scratch on the conversations
D.Implement a rule-based system with keywords
AnswerB

DistilBERT is smaller, faster, and fine-tuning on domain-specific data will adapt it to jargon while meeting real-time requirements.

Why this answer

Option B is correct because fine-tuning DistilBERT on the 500 recorded conversations allows the model to adapt to industry-specific jargon while maintaining real-time responsiveness due to its smaller size. DistilBERT is a distilled version of BERT that retains 97% of BERT’s language understanding with 40% fewer parameters, making it suitable for limited computational resources. Fine-tuning on domain-specific data is essential here, as pre-trained models like GPT-3 lack exposure to the startup’s specialized terminology, and using a smaller model ensures low-latency inference for real-time chatbot responses.

Exam trap

CompTIA often tests the misconception that larger pre-trained models like GPT-3 are always superior for domain adaptation, ignoring the critical trade-offs of computational cost, latency, and the need for fine-tuning on small, specialized datasets.

How to eliminate wrong answers

Option A is wrong because using GPT-3 via API without fine-tuning would not adapt to the industry-specific jargon in the 500 conversations, leading to generic or incorrect responses, and the API call latency and cost are unsuitable for real-time constraints with limited resources. Option C is wrong because training a custom RNN from scratch on only 500 conversations is insufficient for learning complex language patterns, resulting in poor generalization and high risk of overfitting, while also requiring significant computational resources for training. Option D is wrong because a rule-based system with keywords cannot handle the variability and nuance of natural language in customer service conversations, especially with industry-specific jargon, and would fail to generate coherent, context-aware responses beyond predefined patterns.

540
MCQeasy

A security analyst is reviewing logs from an AI-powered recommendation system and notices an unusually high number of requests for products from a specific vendor. The analyst suspects data poisoning. Which mitigation strategy should be implemented first?

A.Encrypt all training data at rest
B.Deploy an anomaly detection system on model outputs
C.Retrain the model with a smaller, curated dataset
D.Implement input validation and sanitization for training data
AnswerD

Input validation prevents poisoned data from entering the training pipeline.

Why this answer

Option D is correct because input validation and sanitization directly prevent malicious or anomalous data from entering the training pipeline, which is the root cause of data poisoning. In an AI-powered recommendation system, poisoned training data can cause the model to learn biased associations, such as favoring a specific vendor. By validating and sanitizing inputs before they are used for training, the attack vector is blocked at the earliest stage, making it the most effective first mitigation step.

Exam trap

Cisco often tests the principle of defense in depth by making candidates choose a reactive or recovery measure (like retraining or monitoring outputs) instead of the proactive control that stops the attack at the input stage.

How to eliminate wrong answers

Option A is wrong because encrypting training data at rest protects confidentiality and integrity during storage, but it does not prevent malicious data from being ingested into the training set; data poisoning occurs before encryption is applied. Option B is wrong because deploying an anomaly detection system on model outputs is a reactive measure that detects poisoning after the model has already been compromised, rather than preventing the attack. Option C is wrong because retraining with a smaller, curated dataset may reduce the impact of poisoning but does not address the underlying vulnerability that allowed poisoned data to enter the system; it is a recovery step, not a first-line mitigation.

541
MCQeasy

A data analyst wants to predict housing prices based on square footage, number of bedrooms, and location. Which machine learning approach is most suitable?

A.K-means clustering
B.Decision tree regression
C.Association rule mining
D.Linear regression
AnswerD

Linear regression models the linear relationship between input features and a continuous output.

Why this answer

Linear regression is the most suitable approach because the problem involves predicting a continuous numeric target (housing prices) from multiple independent variables (square footage, bedrooms, location). Linear regression models the linear relationship between the features and the target, providing interpretable coefficients and efficient training for this type of regression task.

Exam trap

The trap here is that candidates may confuse regression (predicting a continuous value) with classification or unsupervised learning, and incorrectly select decision tree regression or clustering because they see 'prediction' and assume any tree-based or grouping method works.

How to eliminate wrong answers

Option A is wrong because K-means clustering is an unsupervised learning algorithm used for grouping unlabeled data into clusters, not for predicting a continuous target variable. Option B is wrong because decision tree regression can be used for regression, but it is not the most suitable here; it tends to overfit and lacks the interpretability and simplicity of linear regression for a straightforward linear relationship. Option C is wrong because association rule mining is an unsupervised technique for discovering frequent itemsets and rules in transactional data, not for predicting numeric values.

542
Multi-Selecteasy

Which TWO are common types of adversarial attacks on AI models?

Select 2 answers
A.Hyperparameter tuning
B.Transfer learning
C.Evasion attack
D.Backdoor attack
E.Data poisoning
AnswersC, E

Evasion attacks craft input perturbations to cause misclassification at test time.

Why this answer

Evasion attacks (Option C) are a common type of adversarial attack where an attacker crafts malicious input data that is intentionally designed to cause a trained AI model to make incorrect predictions or classifications, often by adding imperceptible perturbations to legitimate inputs. This exploits the model's sensitivity to small changes in feature space, leading to misclassification without altering the model itself.

Exam trap

Cisco often tests the distinction between attack types that occur during training (data poisoning) versus inference (evasion), and candidates may mistakenly classify hyperparameter tuning or transfer learning as attacks because they sound like active manipulations, but they are standard ML practices.

543
MCQeasy

Which metric is most appropriate for evaluating a binary classification model where the positive class is rare and false positives are costly?

A.Accuracy
B.F1-score
C.Precision
D.Recall
AnswerC

Correct; precision measures how many predicted positives are actually positive, reducing false positives.

Why this answer

Precision is the most appropriate metric when the positive class is rare and false positives are costly because it measures the proportion of true positive predictions among all positive predictions. In this scenario, minimizing false positives is critical, and precision directly penalizes them by requiring high confidence before labeling an instance as positive. This aligns with the business need to avoid costly false alarms, such as in fraud detection or medical diagnosis for rare diseases.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, but the trap here is that candidates overlook how class imbalance and asymmetric costs make precision or recall more relevant, and they fail to distinguish between F1-score and precision when the cost of false positives is explicitly stated.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading for imbalanced datasets; a model that predicts the majority class for all instances can achieve high accuracy while failing to identify any positive cases, which is useless when the positive class is rare. Option B is wrong because F1-score balances precision and recall, but when false positives are costly, precision alone is more appropriate; F1-score would still allow some false positives in favor of recall, which is undesirable here. Option D is wrong because recall focuses on capturing all positive instances, but it does not penalize false positives; in a rare positive class scenario with high cost of false positives, maximizing recall would likely increase false positives, which is counterproductive.

544
MCQhard

A deep learning model for image classification is overfitting the training data. The team has already tried data augmentation and dropout. Which additional technique should they implement to reduce overfitting?

A.Batch normalization
B.Increase number of epochs
C.Gradient clipping
D.Early stopping
AnswerD

Early stopping monitors validation loss and stops training when it starts to increase, reducing overfitting.

Why this answer

Early stopping (Option D) is the correct additional technique because it halts training when validation performance stops improving, directly preventing the model from memorizing noise in the training data. Since data augmentation and dropout are already in use, early stopping provides a complementary regularization effect by limiting the number of training iterations before overfitting occurs.

Exam trap

CompTIA often tests the distinction between techniques that address overfitting versus those that solve optimization issues, leading candidates to confuse batch normalization or gradient clipping as overfitting solutions when they are not.

How to eliminate wrong answers

Option A is wrong because batch normalization primarily accelerates training and stabilizes learning by normalizing layer inputs, but it does not directly reduce overfitting—it can even have a slight regularizing effect, but it is not a primary overfitting countermeasure. Option B is wrong because increasing the number of epochs would exacerbate overfitting by giving the model more opportunities to memorize training data, making the problem worse. Option C is wrong because gradient clipping is used to prevent exploding gradients in deep networks, especially in RNNs, and does not address overfitting from excessive model capacity or insufficient regularization.

545
MCQmedium

An AI engineer is tuning a deep learning model and observes that the training loss decreases very slowly. The learning rate is set to 0.001. Which adjustment is most likely to speed up convergence?

A.Increase the learning rate to 0.01
B.Add more hidden layers
C.Decrease the learning rate to 0.0001
D.Increase the batch size
AnswerA

A higher learning rate allows larger weight updates, potentially speeding up convergence.

Why this answer

A learning rate of 0.001 is causing the model to take very small steps toward the minimum of the loss function, resulting in slow convergence. Increasing the learning rate to 0.01 allows larger weight updates per iteration, which typically speeds up training. However, care must be taken not to overshoot the optimum, as an excessively high learning rate can cause divergence.

Exam trap

CompTIA often tests the misconception that decreasing the learning rate always improves training, when in fact a learning rate that is too low is a primary cause of slow convergence, and the correct adjustment is to increase it within a safe range.

How to eliminate wrong answers

Option B is wrong because adding more hidden layers increases model complexity and the number of parameters, which generally slows training and can exacerbate the slow convergence problem rather than solving it. Option C is wrong because decreasing the learning rate to 0.0001 would make the updates even smaller, further slowing convergence. Option D is wrong because increasing the batch size provides a more accurate gradient estimate but reduces the frequency of updates per epoch, which can actually slow convergence in terms of steps needed to reach a given loss.

546
MCQeasy

A data scientist is building a binary classification model to predict customer churn. The dataset has 90% non-churn and 10% churn. After training, the model achieves 90% accuracy, but the recall for the churn class is only 20%. Which metric should the team primarily focus on to evaluate the model's effectiveness?

A.Recall for the churn class
B.Accuracy
C.Area Under the ROC Curve (AUC-ROC)
D.Precision for the non-churn class
AnswerA

Recall measures how many actual churners are correctly identified, which is the key concern.

Why this answer

When classes are imbalanced, accuracy is misleading. Recall (or F1) for the minority class is more informative.

547
Multi-Selectmedium

A cybersecurity team is red-teaming their internal LLM-powered code assistant. They want to test the model's resistance to jailbreaking techniques that bypass safety guardrails. Which TWO of the following should they include in their red teaming exercise to effectively evaluate jailbreak resilience?

Select 2 answers
A.Model inversion to reconstruct training data
B.Role-playing scenarios where the model is asked to act as a character with no restrictions (e.g., DAN)
C.Encoding obfuscation, such as base64 encoding malicious instructions
D.Payload splitting across multiple user messages
E.Few-shot prompting with benign examples
AnswersB, C

Role-playing scenarios are a classic jailbreak technique that attempts to override system instructions by assigning the model an unrestricted persona.

Why this answer

Role-playing scenarios (e.g., DAN) and encoding obfuscation (e.g., base64) are common jailbreak techniques. Payload splitting is a type of prompt injection, not specifically jailbreaking. Few-shot prompting is a legitimate technique.

Model inversion is a privacy attack.

548
MCQhard

A research team is training a deep learning model for image classification using a small dataset of 1,000 labeled images. They are concerned about overfitting. Which combination of regularisation techniques would be MOST effective?

A.Use early stopping without any other regularisation
B.Dropout with a rate of 0.5 and L2 regularisation
C.L1 regularisation and batch normalisation
D.Increase learning rate and use momentum
AnswerB

Dropout and L2 regularisation together effectively reduce overfitting by preventing reliance on specific neurons and penalising large weights.

Why this answer

Dropout randomly disables neurons during training to prevent co-adaptation, and L2 regularisation penalises large weights. Both are standard regularisation techniques. L1 promotes sparsity but is less common for dense layers.

Batch normalisation helps convergence but is not primarily a regularisation method.

549
MCQmedium

An AI system experiences degraded accuracy over time due to changes in user behavior. Which monitoring metric should be prioritized to detect this issue earliest?

A.API response latency
B.Data drift detection on input features
C.Area under the ROC curve (AUC)
D.Model accuracy on a holdout validation set
AnswerB

Data drift detects changes before performance degrades.

Why this answer

Option B is correct: Data drift detection monitors changes in input distribution, which often precedes accuracy drop. Option A is wrong because accuracy is a lagging indicator. Option C is wrong because latency doesn't reflect data shift.

Option D is wrong because AUC is also lagging.

550
Multi-Selecteasy

A data scientist is monitoring a deployed image classification model. Which TWO actions are best practices for detecting model drift? (Choose 2.)

Select 2 answers
A.Schedule automatic weekly retraining of the model.
B.Increase the model's complexity to improve generalization.
C.Use a holdout test set to periodically evaluate model accuracy.
D.Monitor the average prediction confidence of the model.
E.Track the distribution of input data over time.
AnswersC, E

Comparing performance on a static test set reveals concept drift.

Why this answer

Option C is correct because periodically evaluating the model on a holdout test set that reflects the current production data distribution is a direct method to detect accuracy degradation caused by model drift. This approach measures whether the model's performance on unseen data has declined over time, which is a key indicator of drift.

Exam trap

CompTIA often tests the distinction between detection and remediation actions, so candidates mistakenly choose retraining (Option A) as a detection method when it is actually a corrective action.

551
MCQhard

Refer to the exhibit. An AI governance review finds that a model was deployed without required ethics approval. Based on the audit log, who is most responsible for the compliance failure?

A.Bob
B.Alice
C.Carol
D.System
AnswerA

Bob deployed the model without ethics approval.

Why this answer

Bob is the data scientist who deployed the model to production. The audit log shows that Bob executed the deployment command without first obtaining the required ethics approval. As the individual who performed the action that violated the governance policy, Bob bears primary responsibility for the compliance failure.

Exam trap

Cisco often tests the distinction between who performed the action versus who requested or approved it, leading candidates to incorrectly blame the project manager or ethics officer instead of the deployer.

How to eliminate wrong answers

Option B (Alice) is wrong because Alice is the project manager who requested the model deployment, but she did not perform the actual deployment action; the audit log shows she only submitted the request. Option C (Carol) is wrong because Carol is the ethics officer who approved the model earlier, but the audit log indicates she did not approve this specific deployment; the failure is that Bob bypassed the required approval step. Option D (System) is wrong because the system is an automated deployment pipeline that executed Bob's command; it has no agency or responsibility for compliance decisions, and the governance policy assigns accountability to human actors.

552
Multi-Selecthard

A security engineer is hardening an LLM application against indirect prompt injection attacks. Which TWO controls are MOST effective? (Select two.)

Select 2 answers
A.Output filtering
B.Input validation and sanitization
C.Rate limiting
D.Differential privacy
E.Federated learning
AnswersA, B

Filtering outputs can block actions that arise from injected instructions.

Why this answer

Input validation and sanitization can filter malicious content in retrieved data, and output filtering can prevent the model from executing injected instructions. Both are key defenses.

553
MCQhard

A large hospital system deploys an AI triage system for emergency rooms. The system uses patient vitals and symptoms to recommend treatment priority. Six months after deployment, complaints arise that the system frequently underestimates the severity of symptoms for patients from certain ethnic backgrounds. A data scientist runs a bias audit and finds that the model's false negative rate is 20% higher for the minority group. The hospital's AI governance board requires immediate corrective action. The data science team has limited resources and cannot retrain the entire model from scratch. They have access to the training data, which is imbalanced. The model is a gradient boosted tree. Which course of action best addresses the bias while minimizing operational impact?

A.Rebalance the training data using SMOTE and retrain the model
B.Use adversarial debiasing during training to remove protected attribute correlations
C.Post-process the model's predictions by adjusting thresholds for the minority group
D.Replace the model with a simpler logistic regression model to improve interpretability
AnswerC

Threshold adjustment is fast, cheap, and directly minimizes false negative disparity.

Why this answer

Option C is correct because post-processing by adjusting decision thresholds for the minority group directly compensates for the higher false negative rate without requiring retraining. Since the team has limited resources and cannot retrain the entire gradient boosted tree model, this approach minimizes operational impact while addressing the bias. The threshold adjustment effectively lowers the probability cutoff for the minority group, making the model more sensitive to their symptoms and reducing underestimation of severity.

Exam trap

Cisco often tests the misconception that bias mitigation always requires retraining or complex algorithmic changes, when in fact post-processing threshold adjustments can be a quick, effective fix for deployed models with limited resources.

How to eliminate wrong answers

Option A is wrong because SMOTE rebalances the training data by oversampling the minority class, but retraining the entire gradient boosted tree model from scratch is resource-intensive and contradicts the constraint of limited resources; moreover, SMOTE may introduce synthetic noise that degrades model performance. Option B is wrong because adversarial debiasing is a training-time technique that requires modifying the model architecture and retraining, which is not feasible given the limited resources and the fact that the model is already deployed; it also does not directly address the false negative rate disparity without full retraining. Option D is wrong because replacing the model with a simpler logistic regression model would require retraining and likely reduce predictive performance, especially for complex interactions in patient vitals and symptoms, and does not guarantee bias reduction; interpretability alone does not correct the existing bias.

554
Multi-Selectmedium

A team is implementing a RAG system. They are designing the document loading and chunking strategy. Which TWO techniques are commonly used for chunking documents? (Select two.)

Select 2 answers
A.Fixed-size chunking with a token limit
B.Frequency-based chunking by term occurrence
C.Character-level chunking with no overlap
D.Semantic chunking using sentence boundaries
E.Hierarchical chunking using document structure
AnswersA, D

Why this answer

Fixed-size chunking (based on token count) and semantic chunking (based on natural boundaries) are both standard approaches. Hierarchical chunking is less common, and character-level is rarely used. Overlap is a parameter, not a chunking strategy.

555
Multi-Selecteasy

A data scientist is tuning a deep learning model. Which TWO hyperparameters directly affect the model's capacity to overfit?

Select 2 answers
A.Number of layers in the network.
B.Batch size.
C.Optimizer choice (e.g., SGD vs Adam).
D.Dropout rate.
E.Learning rate.
AnswersA, D

More layers increase capacity, raising overfitting risk.

Why this answer

Option A is correct because increasing the number of layers increases the model's depth, which expands its representational capacity and allows it to learn more complex patterns, including noise, thereby directly increasing overfitting risk. Option D is correct because dropout is a regularization technique that randomly drops neurons during training; a low dropout rate (e.g., 0.0) removes this regularization, while a high rate (e.g., 0.5) reduces overfitting by preventing co-adaptation of neurons.

Exam trap

CompTIA often tests the distinction between hyperparameters that affect model capacity (number of layers, dropout rate) versus those that affect training dynamics (batch size, optimizer, learning rate), leading candidates to mistakenly select learning rate or batch size as direct overfitting controls.

556
MCQhard

A company deploys an AI model for loan approval. The model shows bias against a protected group. The team decides to use adversarial debiasing. What is the PRIMARY advantage of this approach?

A.It guarantees the model's predictions are private.
B.It reduces bias while preserving predictive performance by learning representations that are invariant to sensitive attributes.
C.It is simpler to implement than pre-processing techniques.
D.It ensures equal approval rates across all groups.
AnswerB

This is the core benefit of adversarial debiasing.

Why this answer

Adversarial debiasing is an in-processing technique that trains a primary model to predict the target (e.g., loan approval) while simultaneously training an adversary to predict the sensitive attribute from the model's learned representations. The primary model is penalized when the adversary succeeds, forcing it to learn representations that are invariant to the sensitive attribute. This reduces bias while preserving predictive performance because the model retains the ability to learn task-relevant patterns that are not correlated with the protected attribute.

Exam trap

The trap here is that candidates confuse 'reducing bias' with 'ensuring equal outcomes' (demographic parity), but adversarial debiasing targets equalized odds or equal opportunity by focusing on representation invariance, not strict rate equality.

How to eliminate wrong answers

Option A is wrong because adversarial debiasing does not guarantee privacy; it addresses fairness, not confidentiality, and does not provide differential privacy or encryption. Option C is wrong because adversarial debiasing is an in-processing technique that is generally more complex to implement than pre-processing techniques like reweighing or sampling, which modify the dataset before training. Option D is wrong because adversarial debiasing aims to reduce bias by learning invariant representations, but it does not enforce equal approval rates across groups; equal approval rates would be demographic parity, which is a different fairness metric and may not align with the model's predictive performance.

557
MCQmedium

A financial services company has a real-time fraud detection system that uses Apache Kafka to stream transaction events, a TensorFlow Serving model for scoring, and a Redis cache for lookup of historical fraud patterns. The system processes 10,000 transactions per second with an SLA of 100ms latency per transaction. Recently, after a model update, the latency for some transactions spiked to over 500ms, causing timeouts. The model uses a deep neural network with 10 million parameters. The engineering team suspects the issue is due to increased model inference time. Which action should be taken to reduce latency without significant loss in accuracy?

A.Add more Redis nodes to the cache cluster
B.Increase the number of Kafka partitions and consumer threads
C.Decrease the inference batch size from 32 to 1
D.Quantize the model weights from FP32 to FP16
AnswerD

FP16 quantization reduces model size and speeds up inference, typically with minimal accuracy impact.

Why this answer

The latency spike is caused by increased model inference time after a model update. Quantizing model weights from FP32 to FP16 reduces memory bandwidth and computation requirements, directly speeding up inference on compatible hardware (e.g., GPUs with Tensor Cores) with minimal accuracy loss. This addresses the root cause—model inference latency—without changing the system architecture.

Exam trap

The trap here is that candidates confuse system-level scaling (adding cache nodes or Kafka partitions) with model-level optimization, failing to recognize that the latency spike originates from the model inference step itself.

How to eliminate wrong answers

Option A is wrong because adding Redis nodes improves cache lookup throughput, but the latency spike is due to model inference time, not cache performance. Option B is wrong because increasing Kafka partitions and consumer threads improves message ingestion parallelism, but does not reduce the per-transaction inference latency of the TensorFlow Serving model. Option C is wrong because decreasing the inference batch size from 32 to 1 reduces throughput and increases per-transaction overhead (e.g., kernel launch latency), which would worsen latency, not improve it.

558
MCQhard

An AI model is deployed to a mobile app with limited computational resources. The model is a deep neural network with high latency. Which technique is best to reduce inference time?

A.Increase batch size
B.Add more layers
C.Use a larger model
D.Quantization
AnswerD

Quantization reduces model size and speeds up inference by using lower-precision arithmetic.

Why this answer

Quantization reduces the precision of the model's weights and activations (e.g., from 32-bit floating point to 8-bit integer), which decreases memory footprint and speeds up computation on resource-constrained devices like mobile phones. This directly lowers inference latency without requiring additional hardware or architectural changes.

Exam trap

Cisco often tests the misconception that increasing batch size or model size improves performance on edge devices, when in fact these techniques increase resource demands and latency in low-resource environments.

How to eliminate wrong answers

Option A is wrong because increasing batch size improves throughput (samples per second) but does not reduce per-sample latency; it actually increases memory usage and can worsen latency on mobile devices with limited resources. Option B is wrong because adding more layers increases the model depth, which increases computational complexity and latency, making inference slower. Option C is wrong because using a larger model (more parameters) increases both memory and compute requirements, directly increasing inference time on constrained devices.

559
MCQmedium

An AI model's performance drops significantly in production compared to testing. The data shows distribution shift. What is the best first step?

A.Add more features
B.Retrain model with new data
C.Use a different algorithm
D.Reduce model complexity
AnswerB

Retraining with current data addresses drift.

Why this answer

Option B (Retrain model with new data) is correct because retraining with more representative data adapts to distribution shift. Option A (Add more features) may not address the shift. Option C (Change algorithm) is a larger change without addressing data.

Option D (Reduce model complexity) might worsen performance.

560
MCQhard

A company is training a large language model from scratch and wants to minimise its environmental impact. Which practice aligns with green AI principles?

A.Use model pruning and train on a smaller, representative dataset
B.Use more GPUs to parallelise training and reduce wall-clock time
C.Deploy the model on a cloud provider with renewable energy certificates
D.Train the model on a larger dataset to improve accuracy
AnswerA

Pruning reduces model size and computational cost; training on a smaller dataset also lowers energy consumption, aligning with green AI.

Why this answer

Green AI advocates for resource-efficient AI, including using smaller models, pruning, and efficient architectures to reduce carbon footprint. Training larger models with more data increases environmental impact, not reduces it.

561
Multi-Selectmedium

A data scientist is using differential privacy to protect individual privacy in a training dataset. Which TWO actions are correct implementations of differential privacy?

Select 2 answers
A.Train the model on a small subset of data to reduce exposure
B.Remove all personally identifiable information (PII) from the dataset
C.Aggregate data into groups before training
D.Set a privacy budget (epsilon) to limit information leakage
E.Add noise to the training data to mask individual contributions
AnswersD, E

The privacy budget epsilon quantifies the privacy guarantee and is a core concept of differential privacy.

Why this answer

Option D is correct because setting a privacy budget (epsilon) is a core mechanism in differential privacy that quantifies and limits the amount of information leaked about any individual in the dataset. By controlling epsilon, the data scientist can formally bound the privacy loss, ensuring that the model's outputs do not reveal whether any specific individual's data was included in training.

Exam trap

Cisco often tests the misconception that removing PII or using data aggregation alone constitutes differential privacy, when in fact differential privacy requires a formal mathematical framework with noise addition and a privacy budget parameter.

562
MCQmedium

A team is developing an AI agent to assist users with multi-step tasks such as booking a flight, reserving a hotel, and scheduling a car rental. The agent needs to reason about the order of steps and handle dependencies. Which pattern is BEST suited?

A.Simple tool use without reasoning
B.Using a single prompt with all instructions
C.Fine-tuning a model to output all steps at once
D.ReAct pattern (Reasoning and Acting)
AnswerD

ReAct interleaves reasoning and acting, allowing the agent to plan and adjust.

Why this answer

The ReAct pattern (Reasoning and Acting) is best suited because it interleaves reasoning traces with tool calls, allowing the agent to dynamically plan and adjust steps based on intermediate results. For multi-step tasks with dependencies (e.g., booking a flight before a hotel), ReAct enables the agent to reason about order, handle failures, and call external APIs step-by-step, which is essential for robust task completion.

Exam trap

Cisco often tests the misconception that a single large prompt or fine-tuned output can handle all multi-step tasks, but the key exam trap is that candidates overlook the need for dynamic reasoning and tool interaction, which only the ReAct pattern provides.

How to eliminate wrong answers

Option A is wrong because simple tool use without reasoning lacks the ability to plan or handle dependencies; it can only execute isolated function calls without context. Option B is wrong because using a single prompt with all instructions cannot adapt to dynamic changes or intermediate results; it assumes a static plan that fails if any step requires conditional logic or error recovery. Option C is wrong because fine-tuning a model to output all steps at once (single-shot generation) cannot handle real-time feedback from external systems or adapt to variable execution order, making it brittle for interactive multi-step workflows.

563
MCQhard

Based on the exhibit, what is the most likely cause of the accuracy drop?

A.A required feature is missing from the production data pipeline.
B.Data drift in the 'income' feature has caused the model to become less accurate.
C.The model was overfitted to the training data.
D.The model's confidence threshold needs to be adjusted.
AnswerB

The detected distribution shift for 'income' indicates data drift, a common cause of performance degradation.

Why this answer

The exhibit shows a sudden and sustained drop in model accuracy coinciding with a shift in the distribution of the 'income' feature. This is a classic symptom of data drift, where the statistical properties of the input feature change over time, causing the model's learned patterns to no longer match the production data. Option B correctly identifies this as the most likely cause because the model was trained on a prior income distribution and is now encountering values outside that range.

Exam trap

CompTIA often tests the distinction between data drift and model overfitting by presenting a sudden accuracy drop after stable performance, leading candidates to incorrectly attribute it to overfitting when the exhibit clearly shows a distribution shift in a specific feature.

How to eliminate wrong answers

Option A is wrong because a missing feature in the production data pipeline would typically cause a pipeline failure or missing-value error, not a gradual accuracy drop that correlates with a specific feature's distribution shift. Option C is wrong because overfitting would manifest as high training accuracy with poor generalization from the start, not a sudden accuracy drop after a period of stable performance; the exhibit shows a clear change point, not a consistently low accuracy. Option D is wrong because adjusting the confidence threshold changes the precision-recall trade-off but does not address the underlying cause of the model's predictions becoming less reliable due to shifted input distributions; it would not restore the original accuracy level.

564
MCQmedium

A healthcare startup is developing a deep learning model to detect diabetic retinopathy from retinal images. The model is trained on a dataset of 10,000 labeled images. During initial testing, the model achieves 99% accuracy on the training set but only 85% on the test set. The startup wants to deploy the model in a clinical setting where false negatives (missing a disease) are critical. The team has access to additional unlabeled retinal images from multiple sources. Which strategy should the team use to improve the model's generalization and reduce false negatives?

A.Use semi-supervised learning with the unlabeled images to improve feature representations
B.Apply aggressive data augmentation to the training set
C.Increase the learning rate during training
D.Add more convolutional layers to the model
AnswerA

Semi-supervised learning utilizes unlabeled data to learn generalizable features, reducing overfitting and improving test performance.

Why this answer

Semi-supervised learning leverages the large pool of unlabeled retinal images to learn robust feature representations, which helps the model generalize better to unseen data. By reducing overfitting (the gap between 99% training and 85% test accuracy), this approach directly improves test-set performance. Additionally, semi-supervised methods can be tuned to emphasize recall, thereby reducing false negatives critical in clinical diabetic retinopathy screening.

Exam trap

CompTIA often tests the misconception that simply increasing data or model complexity (augmentation, layers) always improves generalization, when in fact semi-supervised learning is the targeted solution for leveraging unlabeled data to close the train-test accuracy gap and address class-specific metrics like false negatives.

How to eliminate wrong answers

Option B is wrong because aggressive data augmentation, while helpful for generalization, does not directly address the high false-negative rate; it may even distort critical pathological features if applied too aggressively. Option C is wrong because increasing the learning rate typically destabilizes training, leading to divergence or poor convergence, and does not reduce false negatives or improve generalization. Option D is wrong because adding more convolutional layers increases model capacity, which would likely worsen overfitting given the already large gap between training and test accuracy, and does not specifically target false negatives.

565
MCQhard

A company uses the above policy to control AI model access. A data scientist tries to run inference with model "llama-3-70b" at 150 requests in 30 minutes. What will happen?

A.All requests are allowed because the model is in the allowed list
B.All requests are denied because the rate limit is per minute and 150 exceeds the limit
C.The first 100 requests are allowed; the remaining 50 are denied
D.All requests are denied because the second rule blocks all models
AnswerC

The rate limit allows 100 requests per hour; exceeding requests are denied.

Why this answer

Option C is correct because the policy allows up to 100 requests per 30 minutes for models in the allowed list, and 'llama-3-70b' is in that list. The rate limit is applied per 30-minute window, not per minute, so the first 100 requests are allowed, and the remaining 50 exceed the limit and are denied.

Exam trap

The trap here is that candidates often misinterpret the rate limit as a per-minute value (like 100 per minute) rather than the stated 100 per 30 minutes, leading them to incorrectly select Option B.

How to eliminate wrong answers

Option A is wrong because it ignores the rate limit; being in the allowed list does not bypass the 100 requests per 30-minute cap. Option B is wrong because it misinterprets the rate limit as per minute, but the policy specifies a per-30-minute window, so 150 requests in 30 minutes does not exceed a per-minute limit. Option D is wrong because the second rule does not block all models; it only blocks models not in the allowed list, and 'llama-3-70b' is explicitly allowed.

566
Multi-Selecthard

A company is deploying a large language model via a REST API using a cloud AI service. They expect high traffic and need to minimize latency while controlling costs. Which THREE strategies should they implement?

Select 3 answers
A.Enable prompt caching
B.Use batching to send multiple requests in one API call
C.Auto-scale the number of API endpoints
D.Quantize the model to FP16
E.Implement rate limiting for API requests
AnswersA, B, E

Prompt caching allows the API to reuse cached results for common prompt prefixes, reducing latency and cost for repeated queries.

Why this answer

Option A is correct because prompt caching stores the intermediate key-value (KV) cache from previous inference runs for identical or similar prompts. When a cached prompt is reused, the model skips recomputing the attention keys and values for the cached portion, significantly reducing time-to-first-token (TTFT) latency and lowering compute cost per request. This is especially effective for high-traffic scenarios where many users submit the same or slightly varied prompts.

Exam trap

Cisco often tests the distinction between infrastructure-level optimizations (like auto-scaling) and model-level optimizations (like quantization), expecting candidates to recognize that only API-layer strategies (caching, batching, rate limiting) directly control latency and cost at the REST endpoint.

567
Multi-Selecteasy

A data scientist is preparing a dataset for a classification model. The dataset contains several categorical variables with high cardinality. Which TWO encoding methods are appropriate for converting these categorical variables into numerical features?

Select 2 answers
A.Min-max scaling
B.K-means clustering
C.One-hot encoding
D.Principal component analysis (PCA)
E.Label encoding
AnswersC, E

One-hot encoding converts each category into a binary vector, suitable for categorical variables.

Why this answer

One-hot encoding is appropriate for high-cardinality categorical variables because it creates binary columns for each category, allowing the model to treat each category as an independent feature without imposing an ordinal relationship. This is crucial for classification models that assume numerical inputs, as it prevents the model from misinterpreting arbitrary integer labels as having meaningful order or magnitude.

Exam trap

Cisco often tests the distinction between encoding methods and preprocessing techniques, trapping candidates who confuse label encoding (which is valid for ordinal data but problematic for nominal high-cardinality features) with one-hot encoding, or who mistake scaling or clustering for categorical encoding.

568
Multi-Selecthard

During a security audit of an AI-powered code generation tool, the audit team discovers that the system prompt (which contains sensitive internal instructions) can be leaked through carefully crafted user inputs. Which THREE OWASP LLM Top 10 categories are MOST directly relevant to this finding?

Select 3 answers
A.Model denial of service
B.Prompt injection
C.Insecure output handling
D.Supply chain vulnerabilities
E.Sensitive information disclosure
AnswersB, C, E

Prompt injection (LLM01) is the direct attack technique that tricks the model into revealing the system prompt.

Why this answer

Prompt injection (direct or indirect) is the attack vector that causes the system prompt leak. Sensitive information disclosure is the consequence. Insecure output handling can also be relevant if the leak is due to improper output management.

Model denial of service, supply chain vulnerabilities, and training data poisoning are not directly related to prompt leaking.

569
Multi-Selectmedium

A data scientist is preparing a dataset for a binary classification model. The dataset has 1000 samples, with 800 positives and 200 negatives. To evaluate the model properly, which THREE steps should they take? (Select THREE)

Select 3 answers
A.Remove the minority class samples to make the dataset balanced
B.Use a stratified train-test split to preserve class proportions
C.Apply SMOTE (Synthetic Minority Over-sampling Technique) to balance the training set
D.Report only accuracy as the evaluation metric
E.Use precision, recall, and F1-score for evaluation
AnswersB, C, E

Stratified split ensures both training and test sets have similar class ratios.

Why this answer

Option B is correct because stratified train-test splitting ensures that the class distribution (80% positive, 20% negative) is preserved in both training and test sets. This prevents the model from being evaluated on a test set that has a different class ratio, which could give a misleading impression of performance, especially in imbalanced datasets.

Exam trap

Cisco often tests the misconception that removing minority samples or relying solely on accuracy is acceptable for imbalanced datasets, when in fact these approaches degrade model performance and evaluation validity.

570
MCQeasy

Which technique adds controlled noise to query results or training data to prevent an attacker from inferring whether a specific individual's data was included in the dataset?

A.Anonymisation
B.Federated learning
C.Differential privacy
D.Pseudonymisation
AnswerC

Differential privacy injects noise into computations or outputs to bound the risk of re-identification.

Why this answer

Differential privacy adds calibrated noise to ensure the output does not reveal individual participation. Anonymisation removes identifiers. Pseudonymisation replaces identifiers.

Federated learning decentralises data but does not necessarily add noise.

571
MCQeasy

A company wants to build a system that automatically tags uploaded images with objects they contain (e.g., 'car', 'tree', 'person'). Which AI application type is this?

A.Image classification/object detection
B.Recommendation system
C.Anomaly detection
D.Document intelligence
AnswerA

Object detection identifies and localizes objects in images, matching the requirement.

Why this answer

Option A is correct because the task of identifying and labeling objects (e.g., 'car', 'tree', 'person') within an image is a classic use case for image classification combined with object detection. Image classification assigns a single label to the entire image, while object detection localizes and classifies multiple objects within the image, which is exactly what the system requires.

Exam trap

Cisco often tests the distinction between image classification (single label per image) and object detection (multiple localized objects), so candidates may mistakenly choose image classification alone when the question implies multiple objects per image.

How to eliminate wrong answers

Option B is wrong because recommendation systems analyze user behavior and preferences to suggest items (e.g., movies, products), not to identify objects in images. Option C is wrong because anomaly detection identifies unusual patterns or outliers in data (e.g., fraud detection), not the presence of common objects in images. Option D is wrong because document intelligence focuses on extracting text, structure, and information from documents (e.g., OCR, form processing), not on visual object recognition.

572
MCQeasy

Refer to the exhibit. What is the recall of the model?

A.0.72
B.0.8
C.0.7
D.0.73
AnswerB

TP=80, FN=20, recall=80/100=0.8

Why this answer

Recall is calculated as True Positives divided by (True Positives + False Negatives). From the confusion matrix, True Positives = 72 and False Negatives = 18, so recall = 72 / (72 + 18) = 72 / 90 = 0.8. This measures the model's ability to correctly identify all actual positive cases.

Exam trap

CompTIA often tests recall by providing a confusion matrix and expects candidates to correctly identify the denominator as TP+FN, not total samples, to avoid confusing recall with accuracy or precision.

How to eliminate wrong answers

Option A (0.72) is wrong because it incorrectly uses True Positives divided by total predictions (72/100 = 0.72), which is accuracy, not recall. Option C (0.7) is wrong because it likely results from misreading the matrix (e.g., using 72/102 or confusing with precision). Option D (0.73) is wrong because it may come from a miscalculation such as (72 + 1)/(72 + 18 + 10) = 73/100, which is not a standard metric.

573
MCQhard

A team is training a deep learning model for image classification. The training loss decreases steadily but the validation loss plateaus after 20 epochs and then starts to increase. Which action is MOST likely to improve generalization?

A.Add more convolutional layers
B.Increase the learning rate
C.Implement early stopping
D.Reduce the batch size
AnswerC

Early stopping monitors validation loss and stops training before overfitting occurs, directly addressing the plateau and rise.

Why this answer

Early stopping halts training when validation loss stops improving, preventing overfitting. Increasing learning rate would worsen divergence; adding more layers increases capacity and overfitting; reducing batch size may help optimization but not directly address overfitting.

574
MCQmedium

An e-commerce company uses an AI system to set dynamic prices for products. A customer complains that the price they see is higher than the price shown to a friend for the same product at the same time. The company wants to ensure pricing fairness. Which ethical principle should guide the redesign of the pricing algorithm?

A.Transparency and explainability
B.Privacy by design
C.Accountability
D.Beneficence
AnswerA

Transparency requires the company to disclose how prices are determined, helping to ensure fairness and build trust.

Why this answer

Transparency and explainability is the correct principle because the core issue is that the customer cannot understand why the AI system set a different price for them compared to their friend. Redesigning the algorithm to provide clear, understandable reasons for price variations—such as demand, purchase history, or time of day—directly addresses this lack of visibility. This principle ensures that the system's decision-making process is open to scrutiny, which is essential for building trust and resolving fairness complaints in dynamic pricing models.

Exam trap

CompTIA often tests the distinction between 'accountability' (who is responsible) and 'transparency' (how the decision is made), leading candidates to pick accountability when the question explicitly asks for the principle that guides the redesign to ensure fairness through understanding.

How to eliminate wrong answers

Option B (Privacy by design) is wrong because the complaint is about price disparity and lack of understanding, not about how customer data is collected, stored, or protected. Option C (Accountability) is wrong because while accountability is important for assigning responsibility, it does not directly solve the customer's need to understand why the price differs; it focuses on who is responsible rather than making the algorithm's logic visible. Option D (Beneficence) is wrong because beneficence refers to doing good or maximizing benefits, but the immediate ethical failure here is the lack of clarity and justification for the pricing decision, not the absence of overall positive outcomes.

575
MCQhard

A research team is fine-tuning a BERT model for a text classification task. They notice that the model's performance on the validation set fluctuates wildly across epochs, sometimes dropping significantly from one epoch to the next. Which technique is MOST likely to stabilise training?

A.Use a smaller batch size
B.Increase the learning rate
C.Apply gradient clipping
D.Increase the number of epochs
AnswerC

Gradient clipping limits the norm of gradients, preventing large destabilising updates.

Why this answer

Gradient clipping directly addresses the problem of exploding gradients, which can cause large, destabilizing weight updates during fine-tuning of large models like BERT. By capping the gradient norm (e.g., to a value like 1.0), it prevents a single batch from drastically altering the model's parameters, thus smoothing out validation performance fluctuations across epochs.

Exam trap

Cisco often tests the misconception that increasing epochs or adjusting batch size alone can fix training instability, when the root cause is gradient explosion, which only gradient clipping directly mitigates.

How to eliminate wrong answers

Option A is wrong because using a smaller batch size typically increases gradient variance, which can actually worsen fluctuations in validation performance rather than stabilize them. Option B is wrong because increasing the learning rate amplifies the magnitude of weight updates, making the model more prone to overshooting minima and causing even more erratic validation loss spikes. Option D is wrong because increasing the number of epochs does not address the underlying instability; it merely extends training, which could allow the model to eventually converge but does not prevent the wild epoch-to-epoch drops caused by gradient instability.

576
MCQmedium

An AI team is deploying a large language model for a customer-facing application. They need to ensure that the model's output is always in valid JSON format for downstream processing. Which prompt engineering technique should they use?

A.Enable JSON mode in the model's API parameters
B.Use few-shot examples of JSON outputs in the prompt
C.Post-process the output with a JSON validator and reject invalid responses
D.Add a system prompt that says 'You are a helpful assistant.'
AnswerA

JSON mode instructs the model to produce only valid JSON, ensuring downstream parsability.

Why this answer

Structured output via JSON mode (available in many LLM APIs) constrains the model to output only valid JSON, which is critical for programmatic consumption.

577
MCQmedium

A data scientist is training a resume screening model to rank job applicants. The training data includes historical hiring decisions from the past 10 years. The company wants to avoid unfair bias against underrepresented groups. Which type of bias is most likely present in the training data?

A.Algorithmic bias
B.Selection bias
C.Confirmation bias
D.Historical bias
AnswerD

Historical bias is present when the training data reflects existing societal inequalities, such as discriminatory hiring practices.

Why this answer

Historical bias occurs when the training data reflects past societal biases, such as underrepresentation of certain groups in hiring. The model learns these patterns, perpetuating unfairness. Selection bias arises from non-random sampling, confirmation bias from favoring information that confirms preexisting beliefs, and algorithmic bias from model design choices.

578
Multi-Selectmedium

Which TWO actions should be taken to ensure an AI model complies with GDPR requirements when processing personal data?

Select 2 answers
A.Limit data collection to only what is necessary for the model
B.Provide a full explanation of model predictions
C.Store all user data for a minimum of 10 years
D.Anonymize all personal data before use
E.Implement user data deletion upon request
AnswersA, E

Data minimization is a GDPR principle.

Why this answer

Option A is correct because GDPR's data minimization principle (Article 5(1)(c)) requires that personal data collected be adequate, relevant, and limited to what is necessary for the purpose for which it is processed. In AI model training, this means collecting only the features essential for the model's objective, reducing the risk of processing excessive or irrelevant personal data.

Exam trap

CompTIA often tests the misconception that anonymization is always required before any AI processing of personal data, but GDPR allows processing under lawful bases without anonymization, making Option D a tempting but incorrect choice.

579
MCQeasy

A developer wants to secure an AI API service. Which practice is MOST effective for preventing unauthorized access to the model?

A.Using a larger context window
B.Enforcing least-privilege API access with proper key management
C.Enabling response logging
D.Implementing rate limiting
AnswerB

Correct. Least-privilege and key management are foundational access controls.

Why this answer

Enforcing least-privilege API access with proper key management is the most effective practice because it ensures that each API key or token has only the minimum permissions necessary for its intended function, reducing the attack surface. Proper key management includes rotating keys, using scoped access tokens (e.g., OAuth 2.0 scopes), and storing keys securely (e.g., using a secrets manager like AWS Secrets Manager or HashiCorp Vault). This directly prevents unauthorized access by limiting what a compromised or misused key can do, unlike other options that address secondary concerns.

Exam trap

Cisco often tests the distinction between preventive and detective controls, and the trap here is that candidates confuse rate limiting (a throttling mechanism) with access control, thinking it prevents unauthorized access when it only limits the frequency of requests.

How to eliminate wrong answers

Option A is wrong because using a larger context window increases the amount of input the model can process but does nothing to authenticate or authorize API requests; it is a model configuration parameter, not a security control. Option C is wrong because enabling response logging aids in auditing and detecting breaches after they occur, but it does not prevent unauthorized access in real time; it is a detective control, not a preventive one. Option D is wrong because implementing rate limiting mitigates denial-of-service attacks and abuse by throttling request volume, but it does not verify the identity or permissions of the requester; an attacker with a valid key could still access the model within rate limits.

580
MCQeasy

A startup wants to add an AI-powered virtual assistant to their mobile app. They have limited in-house AI expertise and need a solution that can be integrated quickly with minimal infrastructure management. Which deployment pattern is MOST suitable?

A.Implement an asynchronous processing queue for all user requests
B.Train and deploy a custom model on an on-premises server
C.Deploy the model on edge devices for offline inference
D.Use a cloud-based AI microservice (e.g., Amazon Lex, Azure Bot Service) with a pre-built model
AnswerD

Cloud AI microservices provide ready-to-use models, easy integration, and managed infrastructure, ideal for rapid development.

Why this answer

Using AI microservices from a cloud provider (e.g., AWS, Azure, GCP) allows quick integration, scalability, and minimal management. Training on-premises requires expertise and resources. Edge deployment is complex.

Async queues are for batch processing, not real-time assistant.

581
MCQmedium

A data science team uses Git for version control of model code and DVC for data versioning. They want to implement a model registry to track trained models, their hyperparameters, and performance metrics. Which tool is specifically designed for this purpose and integrates with the existing workflow?

A.Apache Airflow
B.Docker
C.MLflow Model Registry
D.Kubernetes
AnswerC

MLflow provides a model registry that stores model versions and metadata.

Why this answer

MLflow Model Registry is specifically designed for managing model versions, tracking metadata, and integrating with Git and DVC. Apache Airflow is for workflow orchestration, not model registry. Kubernetes is for container orchestration.

Docker is for containerization.

582
MCQeasy

An organisation is developing an AI policy. According to the NIST AI RMF, which function involves establishing policies and procedures to ensure the organisation governs AI responsibly?

A.Manage
B.Measure
C.Govern
D.Map
AnswerC

Govern involves setting policies, roles, and responsibilities for AI governance.

Why this answer

The NIST AI RMF's Govern function focuses on establishing governance structures, policies, and accountability mechanisms. Map, Measure, and Manage are other functions in the framework.

583
MCQeasy

Which embedding type is MOST suitable for capturing semantic meaning of text in a RAG pipeline?

A.Bag-of-words vectors
B.Dense embeddings from a pre-trained transformer model
C.TF-IDF vectors
D.One-hot encoding
AnswerB

Dense embeddings capture contextualized semantic meaning, enabling effective similarity search.

Why this answer

Dense embeddings represent semantic meaning in a continuous vector space, ideal for similarity search in RAG.

584
MCQmedium

A company has a TensorFlow model trained on-premises and wants to deploy it on AWS SageMaker for scalable inference. What is the BEST way to package the model for deployment?

A.Convert the model to ONNX and upload to SageMaker
B.Upload the .h5 file to S3 and create a SageMaker endpoint directly
C.Package the model in a Docker container with a TensorFlow serving script and push to Amazon ECR
D.Use SageMaker Studio to train the model again from scratch
AnswerC

This creates an inference container that SageMaker can deploy; it includes the model and serving logic.

Why this answer

SageMaker expects models in a container format; the inference container should include the model artifacts and the serving code, allowing SageMaker to host it on scalable endpoints.

585
MCQeasy

An AI security team is conducting a threat model for a new document summarization service. They want to identify threats related to spoofing of the AI's identity. Which STRIDE category should they consider?

A.Repudiation
B.Tampering
C.Information disclosure
D.Spoofing
AnswerD

Spoofing involves impersonation, such as an attacker pretending to be the AI service.

Why this answer

Spoofing in STRIDE refers to impersonating something or someone else. In the context of AI, an attacker could spoof the AI service to provide false summaries.

586
Multi-Selecthard

A team is implementing a RAG system for a legal document Q&A. They need to chunk documents effectively. Which THREE chunking strategies should they consider to improve retrieval accuracy for legal texts that contain hierarchical sections (clauses, sub-clauses, definitions)?

Select 3 answers
A.Hierarchical chunking that indexes chunks at clause and sub-clause levels with parent relationships
B.Overlapping chunks with a 10% overlap between consecutive chunks
C.Fixed-size chunking with a 512-token window and no overlap
D.Chunking based on the document's table of contents and section hierarchy
E.Semantic chunking that splits at natural boundaries (e.g., section headings, paragraph breaks)
AnswersA, D, E

Allows retrieval of granular chunks while maintaining broader context.

Why this answer

Semantic chunking splits at natural boundaries (e.g., paragraphs, sections), preserving meaning. Hierarchical chunking indexes with parent-child relationships for context. Fixed-size chunking is simple but may break sentences or clauses.

Overlapping chunks can help but is not a primary strategy for accuracy; sliding window is a specific technique.

587
MCQhard

A healthcare startup is developing a deep learning model to detect diabetic retinopathy from retinal fundus images. The dataset contains 50,000 images, but only 5% are labeled as positive for the disease. The team uses a convolutional neural network (CNN) with a final sigmoid layer and binary cross-entropy loss. After training for 20 epochs, the model achieves 95% accuracy on the test set, but the recall for the positive class is only 10%. The team suspects the model is biased toward the negative class due to class imbalance. The data is stored in a secure environment, and no additional labeled data can be obtained. The team has access to the following techniques: oversampling the minority class, undersampling the majority class, using class weights in the loss function, applying data augmentation, and using a different architecture. Which course of action is most likely to improve recall for the positive class while maintaining reasonable overall performance?

A.Undersample the majority class to balance the dataset
B.Oversample the minority class using synthetic image generation
C.Assign higher class weights to the positive class in the loss function
D.Replace the CNN with a transformer-based architecture
AnswerC

Class weights force the model to focus on the minority class, improving recall.

Why this answer

Assigning higher class weights to the positive class in the loss function directly penalizes misclassifications of the minority class during training. This forces the model to pay more attention to positive samples without altering the dataset distribution, which is critical when no additional labeled data can be obtained and the data is in a secure environment. It improves recall by increasing the gradient contribution from positive samples, while maintaining overall performance because the model still sees the original data distribution.

Exam trap

The trap here is that candidates often choose oversampling (Option B) as the default solution for class imbalance, but fail to recognize that synthetic image generation for medical images can introduce unrealistic patterns and is not a standard or safe technique, whereas class weights are a lightweight, data-preserving approach that directly addresses the loss function.

How to eliminate wrong answers

Option A is wrong because undersampling the majority class discards a large number of negative samples, which can lead to loss of valuable information and degrade overall accuracy, especially with a 95% negative class. Option B is wrong because oversampling the minority class using synthetic image generation (e.g., SMOTE) is not directly applicable to high-dimensional image data without careful adaptation, and it may introduce unrealistic artifacts that harm generalization; the question specifies 'synthetic image generation' which is not a standard or safe approach for retinal fundus images. Option D is wrong because replacing the CNN with a transformer-based architecture does not address the class imbalance problem; transformers are not inherently better at handling imbalanced data and would require more data and computational resources, which are not available here.

588
MCQhard

During inference, a model served via a REST API occasionally returns high latency due to cold starts. The team uses a containerized service on Kubernetes with horizontal pod autoscaling. Which solution minimizes cold start impact while controlling cost?

A.Configure the autoscaler based on request count with a shorter cooldown period
B.Increase CPU and memory requests for the inference container
C.Switch to vertical pod autoscaling
D.Use a sidecar container that pre-warms the model and set a minimum replica count
AnswerD

Pre-warming ensures the model is loaded; minimum replicas keep pods ready, reducing cold starts.

Why this answer

A sidecar warm-up agent and a minimum replica count keep pods ready. Increasing resources may not fix cold starts; autoscaling based on request count may lag; vertical scaling helps but not directly.

589
MCQeasy

A company wants to deploy a machine learning model that requires continuous learning as new data arrives. The model must be able to adapt to changing patterns without retraining from scratch. Which approach should be used?

A.Transfer learning
B.Online learning
C.Batch learning
D.Unsupervised learning
AnswerB

Online learning updates the model incrementally, allowing adaptation to new data without full retraining.

Why this answer

Online learning (also called incremental learning) updates the model incrementally as each new data point arrives, without requiring full retraining. This makes it ideal for scenarios where data arrives continuously and patterns shift over time, as the model can adapt its parameters on the fly.

Exam trap

CompTIA often tests the distinction between training paradigms (online vs. batch) and other ML concepts like transfer learning or unsupervised learning, so candidates may confuse 'continuous learning' with 'transfer learning' or incorrectly assume that any learning method can handle streaming data.

How to eliminate wrong answers

Option A is wrong because transfer learning reuses a pre-trained model on a new but related task, but it does not inherently support continuous adaptation to streaming data—it typically requires a separate fine-tuning phase. Option C is wrong because batch learning trains the model on the entire dataset at once and requires retraining from scratch when new data arrives, making it unsuitable for continuous learning. Option D is wrong because unsupervised learning is a paradigm for finding patterns in unlabeled data, not a deployment strategy for handling streaming data or model updates.

590
Multi-Selecthard

A data scientist is evaluating a binary classification model for fraud detection. The dataset is highly imbalanced (99% non-fraud, 1% fraud). Which TWO metrics are most appropriate for assessing model performance? (Choose two.)

Select 2 answers
A.Precision
B.Recall
C.F1 score
D.Area under the ROC curve (AUC-ROC)
E.Accuracy
AnswersA, B

Precision measures the proportion of predicted fraud that is actually fraud, important to avoid false positives.

Why this answer

Precision is appropriate because it measures the proportion of predicted fraud cases that are actually fraudulent, which is critical when false positives (flagging legitimate transactions as fraud) are costly. In a highly imbalanced dataset like this (99% non-fraud), precision directly evaluates the model's ability to avoid overwhelming fraud analysts with false alarms.

Exam trap

CompTIA often tests the misconception that AUC-ROC is always the best metric for imbalanced datasets, but the trap here is that AUC-ROC can be misleadingly high even when the model performs poorly on the minority class, whereas precision and recall directly address the class imbalance.

591
MCQhard

A research lab trains a language model using DP-SGD. What primary privacy risk does this technique mitigate?

A.Data poisoning attacks
B.Membership inference attacks
C.Adversarial patch attacks
D.Model inversion attacks
AnswerB

DP-SGD explicitly bounds the contribution of each datapoint, making membership inference harder.

Why this answer

DP-SGD (Differentially Private Stochastic Gradient Descent) mitigates membership inference attacks by adding calibrated noise to gradients during training, which bounds the influence any single training example can have on the final model. This differential privacy guarantee makes it difficult for an adversary to determine whether a specific data point was included in the training set, directly addressing the core risk of membership inference.

Exam trap

Cisco often tests the distinction between privacy risks (membership inference, model inversion) and security risks (poisoning, adversarial examples), and the trap here is that candidates confuse 'privacy risk' with 'security risk' and pick data poisoning or adversarial attacks instead of recognizing that DP-SGD is specifically designed for differential privacy against membership inference.

How to eliminate wrong answers

Option A is wrong because data poisoning attacks involve injecting malicious data to corrupt model behavior, which DP-SGD does not specifically prevent—it only limits per-example influence but does not detect or filter poisoned inputs. Option C is wrong because adversarial patch attacks target image classifiers by placing physical patches on objects to cause misclassification, which is a computer vision robustness issue unrelated to the privacy guarantees of DP-SGD. Option D is wrong because model inversion attacks aim to reconstruct training data features or attributes from the model, and while DP-SGD provides some defense, its primary and most direct mitigation is against membership inference, not full inversion which requires stronger assumptions and additional techniques.

592
MCQeasy

Which OWASP LLM Top 10 vulnerability involves an attacker manipulating the LLM through crafted inputs that override the system's intended instructions?

A.Sensitive information disclosure
B.Prompt injection
C.Supply chain vulnerabilities
D.Model denial of service
AnswerB

Correct. Prompt injection is the top OWASP LLM vulnerability.

Why this answer

Prompt injection (Option B) is the correct answer because it directly describes an attack where crafted inputs override the system's intended instructions, causing the LLM to execute unauthorized actions or reveal restricted information. This vulnerability exploits the LLM's inability to distinguish between user-supplied content and system-level directives, effectively hijacking the model's behavior.

Exam trap

Cisco often tests candidates' ability to distinguish between the attack vector (prompt injection) and its potential outcomes (e.g., sensitive information disclosure), leading them to incorrectly select the consequence rather than the root vulnerability.

How to eliminate wrong answers

Option A is wrong because sensitive information disclosure is a consequence of other vulnerabilities (e.g., prompt injection or insecure output handling), not the mechanism of overriding instructions. Option C is wrong because supply chain vulnerabilities involve compromised third-party components (e.g., pre-trained models, libraries) rather than direct input manipulation. Option D is wrong because model denial of service focuses on exhausting computational resources (e.g., via excessive token generation or resource-intensive queries), not on subverting instruction adherence.

593
Multi-Selectmedium

A data engineering team is designing a data pipeline to process streaming sensor data and feed it into an ML model for anomaly detection. Which THREE components are essential for this pipeline?

Select 3 answers
A.Apache Airflow for scheduling recurring batch jobs
B.Amazon S3 as a data lake for storing raw sensor data
C.Snowflake as a real-time streaming destination
D.Apache Kafka for ingesting streaming sensor data
E.Apache Spark Structured Streaming for real-time processing
AnswersB, D, E

S3 is a scalable object store that can serve as a data lake for raw sensor data, accessible for both streaming and batch processing.

Why this answer

Amazon S3 is essential as a data lake for storing raw sensor data because it provides durable, scalable, and cost-effective object storage that can serve as a central repository for streaming data before and after processing. In a streaming pipeline, raw data must be persisted for reprocessing, historical analysis, and compliance, and S3's integration with Apache Spark and Kafka makes it a natural landing zone for sensor data.

Exam trap

Cisco often tests the distinction between batch and streaming technologies, and the trap here is that candidates confuse Airflow's scheduling capability with real-time streaming orchestration, or assume Snowflake can act as a streaming sink when it is fundamentally a batch-oriented warehouse.

594
MCQmedium

You are an AI governance officer at a bank that uses a machine learning model to predict credit risk. The model was developed by an external vendor and uses a proprietary algorithm. The bank's compliance team has determined that the model must be explainable to meet regulatory requirements. However, the vendor claims the model is a 'black box' and cannot provide explanations. You need to ensure compliance while maintaining the model's performance. What is the best course of action?

A.Ignore the requirement as the model is proprietary
B.Ask the vendor to develop a custom explanation module
C.Replace the model with a simpler, interpretable model
D.Use a model-agnostic explanation technique like SHAP
AnswerD

SHAP provides explanations for any model, satisfying regulatory needs.

Why this answer

D is correct because model-agnostic explanation techniques like SHAP (SHapley Additive exPlanations) can provide post-hoc interpretability for any black-box model without requiring access to its internal structure or proprietary algorithm. This allows the bank to meet regulatory explainability requirements while preserving the vendor's proprietary model and its predictive performance.

Exam trap

The trap here is that candidates may assume that a 'black box' model cannot be explained at all, leading them to choose replacement with a simpler model (Option C), when in fact model-agnostic techniques like SHAP or LIME can provide explanations without altering the model itself.

How to eliminate wrong answers

Option A is wrong because ignoring regulatory requirements is not a viable option for a financial institution; it would lead to non-compliance and potential penalties. Option B is wrong because asking the vendor to develop a custom explanation module would require the vendor to modify their proprietary algorithm, which they have stated is a 'black box' and cannot provide explanations, making this request impractical and likely impossible. Option C is wrong because replacing the model with a simpler, interpretable model would sacrifice the predictive performance that the current model provides, which may be critical for accurate credit risk assessment.

595
MCQeasy

A dataset used for training a classification model contains 10% missing values in a feature that is known to be important. The data scientist decides to impute the missing values. Which imputation method is most robust if the data is not missing completely at random?

A.Delete all rows with missing values
B.Use multiple imputation to model missing values
C.Replace missing values with the mean of the feature
D.Fill missing values with 0
AnswerB

Multiple imputation provides unbiased estimates under missing at random assumption.

Why this answer

Multiple imputation is the most robust method when data is not missing completely at random (NMAR) because it uses a statistical model to account for the relationships between the missing feature and other observed features, generating multiple plausible values and combining them to produce unbiased estimates and valid standard errors. This approach preserves the variability and structure of the data, unlike simpler methods that can introduce bias when missingness depends on unobserved data.

Exam trap

Cisco often tests the misconception that mean imputation is a safe default for missing data, but the trap here is that mean imputation assumes data is missing completely at random (MCAR), which is rarely true in real-world datasets, and it fails to account for the underlying missing data mechanism.

How to eliminate wrong answers

Option A is wrong because deleting rows with missing values reduces sample size and can introduce selection bias, especially when data is not missing completely at random (NMAR), leading to loss of statistical power and potentially skewed model performance. Option C is wrong because replacing missing values with the mean of the feature ignores the correlation with other features and reduces variance, which can distort the distribution and bias the model when missingness is not random. Option D is wrong because filling missing values with 0 is arbitrary and assumes the missing value is zero, which is rarely valid for continuous features and can severely distort the feature's distribution and model coefficients.

596
Multi-Selecteasy

Which TWO of the following are common activation functions used in neural networks? (Choose two.)

Select 2 answers
A.Gradient descent
B.LSTM
C.Dropout
D.ReLU
E.Sigmoid
AnswersD, E

ReLU is a widely used activation function.

Why this answer

ReLU (Rectified Linear Unit) is a widely used activation function that outputs the input directly if it is positive, and zero otherwise, introducing non-linearity while mitigating the vanishing gradient problem. Sigmoid is another common activation function that maps any real-valued input to a value between 0 and 1, making it useful for binary classification output layers. Both are fundamental building blocks in neural network architectures.

Exam trap

CompTIA often tests the distinction between activation functions and other neural network components like optimizers (gradient descent), architectures (LSTM), or regularization techniques (dropout), expecting candidates to recognize that only ReLU and Sigmoid directly compute a neuron's output from its input.

597
Multi-Selecthard

An AI operations team is monitoring a deployed image classification model. They notice a gradual increase in prediction confidence but a drop in accuracy. Which THREE actions should they take to diagnose the issue?

Select 3 answers
A.Analyze the model's calibration curve to see if confidence scores align with actual accuracy.
B.Increase the size of the training dataset by collecting more unlabeled data.
C.Compare the distribution of input features between training and recent production data.
D.Evaluate model performance on a held-out test set collected at deployment time.
E.Retrain the model immediately with the most recent data.
AnswersA, C, D

Calibration analysis reveals if model is overconfident due to drift.

Why this answer

Option A is correct because a calibration curve (reliability diagram) directly compares predicted confidence scores against actual accuracy. In this scenario, increasing confidence with dropping accuracy indicates miscalibration—the model is becoming overconfident. Analyzing the calibration curve reveals whether the confidence scores systematically deviate from true probabilities, which is the core diagnostic step for this specific symptom.

Exam trap

CompTIA often tests the distinction between diagnostic actions and corrective actions—candidates mistakenly jump to retraining (Option E) or data collection (Option B) instead of first analyzing calibration and data distribution (Options A, C, D) to identify the specific type of drift or miscalibration.

598
MCQmedium

An organization wants to implement an AI system to automatically categorize support tickets into predefined categories. They have a labeled dataset of 10,000 tickets. Which approach is MOST appropriate?

A.Use a rule-based system with keyword matching
B.Use a prompt-based LLM with few-shot examples
C.Fine-tune a pre-trained text classification model
D.Train a custom neural network from scratch
AnswerC

Fine-tuning leverages existing knowledge and works well with 10k labeled examples.

Why this answer

Fine-tuning a pre-trained text classification model is a standard and effective approach for supervised classification when labeled data is available.

599
MCQeasy

In the AI project lifecycle, which phase involves partitioning the dataset into training, validation, and test sets?

A.Data acquisition
B.Model selection
C.Data preparation
D.Problem definition
AnswerC

Data preparation encompasses cleaning, normalisation, and splitting into train/validation/test sets.

Why this answer

Data preparation includes splitting the data to evaluate model performance and prevent leakage.

600
MCQeasy

A company wants to use AI to automatically categorize customer support tickets into topics like 'billing', 'technical', 'account'. They have 10,000 labeled examples. Which algorithm is most suitable for this task?

A.DBSCAN
B.Apriori
C.Principal component analysis (PCA)
D.Logistic regression
AnswerD

Logistic regression is a supervised learning algorithm for classification, suitable for multi-class problems with moderate data.

Why this answer

Logistic regression is a supervised learning algorithm that models the probability of a categorical outcome based on input features. With 10,000 labeled examples, it can efficiently learn decision boundaries to classify tickets into 'billing', 'technical', or 'account' by using a softmax (multinomial logistic regression) extension for multi-class classification.

Exam trap

Cisco often tests the distinction between supervised and unsupervised learning, so the trap here is that candidates may confuse clustering (DBSCAN) or dimensionality reduction (PCA) with classification, overlooking that labeled data requires a supervised algorithm like logistic regression.

How to eliminate wrong answers

Option A is wrong because DBSCAN is an unsupervised clustering algorithm that groups data based on density, not classification; it cannot use labeled examples to predict predefined categories. Option B is wrong because Apriori is an association rule mining algorithm used for market basket analysis to find frequent itemsets, not for classifying text into topics. Option C is wrong because PCA is an unsupervised dimensionality reduction technique that transforms features to capture variance, but it does not perform classification or use labels to assign categories.

Page 7

Page 8 of 14

Page 9