Question 315 of 500
AI Concepts and FoundationshardMultiple SelectObjective-mapped

Quick Answer

The answer is model quantization, weight pruning, and knowledge distillation. These three techniques directly address the bottleneck of model optimization for inference speed on edge devices by reducing computational load while preserving accuracy. Quantization lowers the precision of weights from 32-bit floats to 8-bit integers, dramatically cutting memory bandwidth and accelerating matrix operations on specialized hardware. Weight pruning removes redundant connections, shrinking the model size and reducing the number of floating-point operations per inference. Knowledge distillation trains a smaller "student" network to mimic a larger "teacher" model, compressing complex decision boundaries into a lightweight architecture. On the CompTIA AI+ AI0-001 exam, this question tests your understanding of deployment trade-offs: the common trap is confusing hyperparameter tuning (which affects training speed) with inference optimization. Remember the mnemonic "Q-P-K" for Quantization, Pruning, Knowledge distillation—three levers to pull when your model needs to run fast on a tiny chip.

AI0-001 AI Concepts and Foundations Practice Question

This AI0-001 practice question tests your understanding of ai concepts and foundations. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A team is deploying a deep learning model that uses a convolutional neural network (CNN) for image recognition. The model achieves high accuracy but is very slow to infer on edge devices. Which THREE optimization techniques should the team consider to speed up inference without significant accuracy loss? (Select three.)

Question 1hardmulti select
Full question →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Use weight pruning to remove unnecessary connections in the network.

Weight pruning removes redundant or less important connections (weights) from the neural network, reducing the number of computations required during inference. This directly speeds up inference on edge devices while typically causing only a minor drop in accuracy if done carefully, making it a standard optimization technique for deploying CNNs on resource-constrained hardware.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Use larger convolutional filters (e.g., 7x7 instead of 3x3) to capture more context.

    Why it's wrong here

    Larger filters increase computation and slow down inference.

  • Use weight pruning to remove unnecessary connections in the network.

    Why this is correct

    Pruning reduces computation and memory footprint.

    Related concept

    Read the scenario before looking for a memorised answer.

  • Implement knowledge distillation by training a smaller model to mimic the larger one.

    Why this is correct

    Knowledge distillation creates a compact model that retains much of the original accuracy.

    Related concept

    Read the scenario before looking for a memorised answer.

  • Increase the number of convolutional layers to improve feature extraction.

    Why it's wrong here

    More layers increase computational cost and latency.

  • Apply model quantization to reduce weight precision.

    Why this is correct

    Quantization reduces model size and speeds up inference, often with minimal accuracy loss.

    Related concept

    Read the scenario before looking for a memorised answer.

Common exam traps

Common exam trap: answer the scenario, not the keyword

CompTIA often tests the misconception that increasing model capacity (larger filters or more layers) improves performance without considering the trade-off in inference speed, leading candidates to select options that actually worsen latency on edge devices.

Detailed technical explanation

How to think about this question

Weight pruning often uses magnitude-based pruning, where weights with absolute values below a threshold are set to zero, creating sparsity that can be exploited by specialized hardware or software libraries (e.g., using sparse matrix multiplication) to reduce computation. Knowledge distillation trains a compact student network to replicate the softmax outputs of a larger teacher network, transferring knowledge while reducing model size. Quantization reduces the precision of weights and activations (e.g., from 32-bit floating point to 8-bit integer), which decreases memory bandwidth and allows faster integer arithmetic on edge devices like ARM CPUs or NPUs.

KKey Concepts to Remember

  • Read the scenario before looking for a memorised answer.
  • Find the constraint that changes the correct option.
  • Eliminate answers that are true in general but not in this case.

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A practitioner preparing for the AI0-001 exam encounters this exact type of scenario on the job. The correct answer here is not the most general option — it is the best answer for the specific constraint described. Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option. Real exam questions reward reading the full scenario before eliminating options, because the constraint defines which answer fits.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related practice questions

Related AI0-001 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free AI0-001 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this AI0-001 question test?

AI Concepts and Foundations — This question tests AI Concepts and Foundations — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Use weight pruning to remove unnecessary connections in the network. — Weight pruning removes redundant or less important connections (weights) from the neural network, reducing the number of computations required during inference. This directly speeds up inference on edge devices while typically causing only a minor drop in accuracy if done carefully, making it a standard optimization technique for deploying CNNs on resource-constrained hardware.

What should I do if I get this AI0-001 question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

1 more ways this is tested on AI0-001

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A self-driving car company is developing an object detection system using a convolutional neural network (CNN). The system needs to detect pedestrians and vehicles in real-time with high accuracy. Which technique can reduce inference time while maintaining accuracy?

hard
  • A.Apply model pruning and quantization
  • B.Use a pre-trained model and fine-tune it
  • C.Add more convolutional layers
  • D.Increase number of filters in each layer

Why A: Model pruning removes redundant or less important weights from the CNN, reducing computational load, while quantization converts floating-point weights to lower-precision integers (e.g., INT8). Together, they shrink model size and speed up inference without significantly degrading accuracy, making them ideal for real-time object detection in resource-constrained environments like autonomous vehicles.

Last reviewed: Jun 30, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This AI0-001 practice question is part of Courseiva's free CompTIA certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the AI0-001 exam.