You have a TensorFlow training script that runs on a single machine. To speed up training on Vertex AI with 8 GPUs on a single machine, which strategy should you use?
Trap 1: tf.distribute.ParameterServerStrategy
ParameterServerStrategy is for asynchronous distributed training with parameter servers.
Trap 2: tf.distribute.TPUStrategy
TPUStrategy is for TPU hardware, not GPUs.
Trap 3: tf.distribute.MultiWorkerMirroredStrategy
MultiWorkerMirroredStrategy is for multi-machine distributed training.
- A
tf.distribute.ParameterServerStrategy
Why wrong: ParameterServerStrategy is for asynchronous distributed training with parameter servers.
- B
tf.distribute.MirroredStrategy
MirroredStrategy is designed for single-machine multi-GPU synchronous training.
- C
tf.distribute.TPUStrategy
Why wrong: TPUStrategy is for TPU hardware, not GPUs.
- D
tf.distribute.MultiWorkerMirroredStrategy
Why wrong: MultiWorkerMirroredStrategy is for multi-machine distributed training.