A platform team is designing a monitoring strategy for a multi-tenant Kubernetes cluster. Each tenant runs workloads in separate namespaces. The team needs to ensure tenant isolation while providing aggregated cluster-wide dashboards. Which approach best meets these requirements?
Per-tenant Prometheus ensures isolation, and Thanos sidecar allows secure global aggregation with proper RBAC.
Why this answer
Option D is correct because deploying a Prometheus instance per tenant enforces strong tenant isolation by preventing cross-tenant metric access or resource contention, while Thanos provides a global view by aggregating metrics from all tenants via sidecar-based or query-frontend federation. This approach satisfies both isolation and aggregated dashboards without compromising security or scalability.
Exam trap
CNCF often tests the misconception that namespace labels alone provide sufficient isolation, but in practice, labels do not enforce access control or resource boundaries, making a single Prometheus instance a security and reliability risk in multi-tenant clusters.
How to eliminate wrong answers
Option A is wrong because a single Prometheus instance with namespace labels does not enforce tenant isolation; any user with access to Prometheus can query all namespaces, and a misconfigured or malicious tenant could overload the instance, affecting others. Option B is wrong because a global Prometheus with recording rules still runs a single instance, failing to isolate tenant workloads and creating a single point of failure and performance bottleneck. Option C is wrong because having each tenant deploy their own monitoring stack and view separately prevents the team from creating aggregated cluster-wide dashboards, as there is no unified query layer to combine metrics across tenants.