The Google PMLE exam tests your ability to design and operate production machine learning systems — not just train models. It covers the full ML lifecycle: framing the problem, preparing data, training at scale, deploying models, monitoring for drift, and keeping the pipeline reliable. If you have only trained models in notebooks, this exam will expose the gaps.
Practice this topic
Before building a model, frame the problem correctly: What is the prediction target? What labels do you have? What are the business metrics, and how do they relate to ML metrics (accuracy, AUC, RMSE)? Is this classification, regression, ranking, or generation? Can the problem be solved with rules or simple statistics instead of ML? Data strategy: structured data (Cloud SQL, BigQuery, Spanner), unstructured data (Cloud Storage for images/audio/video/text). Feature engineering in BigQuery ML or Vertex AI Feature Store (serves features consistently between training and serving — avoids training-serving skew). Data validation with TFX (TensorFlow Extended) DataValidationComponent detects schema drift and anomalies.
Vertex AI is GCP's unified ML platform. Custom Training: run training code in a managed container on CPUs, GPUs, or TPUs. Training pipelines in Vertex AI Pipelines (Kubeflow Pipelines SDK or TFX) orchestrate multi-step workflows with automatic caching and artifact tracking. Hyperparameter tuning: Vertex AI Vizier (Bayesian optimisation) explores the hyperparameter space more efficiently than grid or random search. Vertex AI Experiments tracks runs, parameters, and metrics for comparison. Model Registry: versioned model artefacts with aliases (production, staging, challenger) — separates model management from deployment. Distributed training: data parallelism (same model on multiple workers, each sees a different batch), model parallelism (split model layers across devices for models too large for one device). MirroredStrategy (single node, multiple GPUs) versus MultiWorkerMirroredStrategy (multiple nodes).
Vertex AI Endpoints: deploy model versions with traffic splits for A/B testing and canary rollouts. Dedicated endpoints (always-on) versus Serverless prediction (autoscaling to zero). Online prediction: low-latency, single-record requests. Batch prediction: high-throughput, asynchronous, for scoring large datasets. Model optimisation for serving: quantisation (FP32 to INT8 reduces model size and improves latency with some accuracy trade-off), distillation (train a smaller student model to mimic a larger teacher), TensorRT or ONNX Runtime for GPU inference optimisation. Feature latency: pre-compute slow features offline, serve fast features online from Memorystore. Explainability: Vertex Explainable AI provides feature attributions using SHAP (Shapley values) or Integrated Gradients. Required for regulated industries and for debugging unexpected model behaviour.
MLOps maturity levels: Level 0 (manual, notebook-driven), Level 1 (automated training pipeline, triggered by schedule or data drift), Level 2 (full CI/CD for ML — code changes trigger pipeline, evaluation gates before promotion). Model monitoring: Vertex AI Model Monitoring detects training-serving skew (difference between training data distribution and live prediction input distribution) and prediction drift (change in model output distribution over time). Alerts trigger retraining pipelines. Data freshness: stale feature data degrades model performance before accuracy metrics detect it. Governance: Dataplex for data cataloguing and lineage, BigQuery Authorized Views for column-level access control on training data, Vertex AI Model Cards for model documentation. Privacy-preserving ML: differential privacy (add calibrated noise during training), federated learning (train on device, aggregate model updates centrally — no raw data leaves the device).
High model accuracy guarantees the model solves the business problem correctly
High accuracy does not mean the model is correct for the business problem — check that ML metrics align with business outcomes
Training-serving skew and model drift are the same phenomenon
Training-serving skew is not the same as model drift — skew is a pipeline bug (different transformations); drift is a real-world data change
Quantisation always preserves full model accuracy
Quantisation reduces model size and latency but can reduce accuracy — always benchmark before deploying a quantised model
Try free Google PMLE practice questions with explanations, topic links and progress tracking.