A team is deploying a deep learning model that uses a convolutional neural network (CNN) for image recognition. The model achieves high accuracy but is very slow to infer on edge devices. Which THREE optimization techniques should the team consider to speed up inference without significant accuracy loss? (Select three.)
Trap 1: Use larger convolutional filters (e.g., 7x7 instead of 3x3) to…
Larger filters increase computation and slow down inference.
Trap 2: Increase the number of convolutional layers to improve feature…
More layers increase computational cost and latency.
- A
Use larger convolutional filters (e.g., 7x7 instead of 3x3) to capture more context.
Why wrong: Larger filters increase computation and slow down inference.
- B
Use weight pruning to remove unnecessary connections in the network.
Pruning reduces computation and memory footprint.
- C
Implement knowledge distillation by training a smaller model to mimic the larger one.
Knowledge distillation creates a compact model that retains much of the original accuracy.
- D
Increase the number of convolutional layers to improve feature extraction.
Why wrong: More layers increase computational cost and latency.
- E
Apply model quantization to reduce weight precision.
Quantization reduces model size and speeds up inference, often with minimal accuracy loss.