You are tasked with building a robust ML pipeline that must be idempotent and handle data skew between training and serving. Which three practices should you implement?
Unique paths prevent collisions and support idempotency.
Why this answer
Idempotent components ensure the same inputs produce the same outputs. Passing data via GCS URIs is a best practice. Skew detection should compare training data distribution with serving data.
Using unique run IDs for outputs ensures idempotency. Avoiding in-memory data passing is important for large datasets.