CCNA Ml Data Model Management Questions

45 questions · Ml Data Model Management topic · All types, answers revealed

1
MCQmedium

Refer to the exhibit. A team runs this Vertex AI Pipeline definition but the deploy component never executes, even though the evaluate step outputs a metric of 0.9. What is the most likely cause?

A.The deploy component depends on the gate component, but the gate is not producing an output.
B.The deploy container image does not exist.
C.The evaluate component must be run before train, but the pipeline order is incorrect.
D.The condition should reference the evaluate component's output directly instead of using an input variable.
E.The pipeline should use a custom component for the condition instead of the built-in type.
AnswerD

The `condition` expression must directly use the output reference, not a local input.

Why this answer

Option D is correct because in Vertex AI Pipelines, the `condition` block evaluates a boolean expression at pipeline compile time, not runtime. If the condition references an input variable rather than the actual output of the `evaluate` component, the pipeline will use the default or placeholder value (often `False`), causing the deploy step to be skipped even when the runtime metric is 0.9. The condition must directly reference the `evaluate` component's output (e.g., `evaluate.outputs['metric']`) to be evaluated correctly at runtime.

Exam trap

Google Cloud often tests the distinction between compile-time and runtime evaluation in pipeline orchestration, trapping candidates who assume that pipeline input parameters are dynamically resolved at the same point as component outputs.

How to eliminate wrong answers

Option A is wrong because the gate component is not required for the deploy step; the condition is evaluated based on the evaluate component's output, not a gate output. Option B is wrong because if the deploy container image did not exist, the pipeline would fail with an image pull error, not silently skip execution. Option C is wrong because the pipeline order (train → evaluate → deploy) is correct; the evaluate step must run after train, and the condition is on evaluate's output, not on train.

Option E is wrong because the built-in `condition` component in Vertex AI Pipelines is fully capable of evaluating boolean expressions; a custom component is not needed and would not fix the issue of referencing an input variable instead of a component output.

2
MCQeasy

Refer to the exhibit. A team runs the command above and sees only two models. They know there is a model 'model-v3' created three days ago. What is the most likely reason it is not listed?

A.The model was created in a different region.
B.The model is in a different project.
C.The model is not deployed to an endpoint.
D.The model's display name contains a hyphen.
E.The model was created by a different user.
AnswerA

The --region flag filters models by location; missing models are likely in another region.

Why this answer

The `gcloud ai models list` command lists models within a specific region, as Vertex AI models are regional resources. If 'model-v3' was created in a different region, it would not appear in the output unless the `--region` flag is set to that region. This is the most likely reason the model is missing from the list.

Exam trap

Google Cloud often tests the regional scope of Vertex AI resources, trapping candidates who assume model listing is global or project-wide, when in fact it is region-specific and requires the correct `--region` flag.

How to eliminate wrong answers

Option B is wrong because the `gcloud ai models list` command operates within a single project (the current configured project or one specified with `--project`), but the question states the team sees only two models, implying they are in the correct project; a different project would require explicit project specification. Option C is wrong because model listing does not require deployment to an endpoint; Vertex AI lists all models in the project/region regardless of deployment status. Option D is wrong because hyphens in display names are allowed and do not affect listing; the command lists models by their resource name or display name without filtering on special characters.

Option E is wrong because model listing is not user-scoped; all models in the project/region are visible to any user with appropriate permissions, regardless of who created them.

3
MCQmedium

Two teams are collaborating on a project and want to use a shared Feature Store in Vertex AI. They need to ensure that features are discoverable and that access is controlled. What is the best practice?

A.Export features to CSV files in Cloud Storage and share the bucket
B.Build a custom feature pipeline using Dataflow and store in Cloud SQL
C.Each team stores features in their own BigQuery table and shares the table
D.Use Vertex AI Feature Store and grant appropriate IAM roles to each team
AnswerD

Vertex AI Feature Store provides a unified repository with access control and discovery.

Why this answer

Option B is correct because Vertex AI Feature Store supports sharing features with access controls (IAM) and enables discovery through the UI and API. Option A is wrong because BigQuery alone lacks feature store metadata and online serving. Option C is wrong because Cloud Storage is not a feature store.

Option D is wrong because a ad-hoc pipeline is not a managed solution.

4
MCQhard

A team has set up the IAM policy above on a Vertex AI project. Alice, a data scientist, reports that she cannot create a Vertex AI Training custom job using a pre-built container. Other data scientists in the group 'data-scientists@example.com' have the same issue. What is the most likely cause?

A.The 'roles/aiplatform.user' role does not grant the permission to create custom training jobs.
B.The Vertex AI Custom Code Service Agent service account is missing the 'roles/aiplatform.user' role.
C.Alice is not included in the 'data-scientists@example.com' group.
D.The service account 'vertex-ai@project.iam.gserviceaccount.com' does not have permission to access the training data.
AnswerA

Creating custom jobs requires 'aiplatform.customJobs.create', which is not in the aiplatform.user role.

Why this answer

The 'roles/aiplatform.user' role includes the 'aiplatform.customJobs.create' permission, which is required to create custom training jobs. However, the issue is that Alice and her group cannot create a custom job using a pre-built container. The most likely cause is that the 'roles/aiplatform.user' role does not grant the permission to create custom training jobs with pre-built containers; it only allows using managed models or notebooks.

To create custom training jobs, the 'roles/aiplatform.customJobUser' role is needed, which includes the necessary permissions for custom job creation.

Exam trap

Google Cloud often tests the distinction between the 'roles/aiplatform.user' role and more specific roles like 'roles/aiplatform.customJobUser', leading candidates to assume the basic user role covers all Vertex AI actions, when in fact it does not include custom job creation.

How to eliminate wrong answers

Option B is wrong because the Vertex AI Custom Code Service Agent service account is used for custom code training, but the issue is about pre-built containers, not custom code, and the service account's role assignment is not the cause of the permission error for the data scientists. Option C is wrong because the problem states that other data scientists in the group have the same issue, implying Alice is likely in the group; if she were not, she would have a different error, but the group-wide issue points to a role/permission problem. Option D is wrong because the service account 'vertex-ai@project.iam.gserviceaccount.com' is not directly involved in creating custom jobs; it is used for Vertex AI's internal operations, and the error is about creating the job, not accessing training data.

5
MCQmedium

Refer to the exhibit. A team leader applies this IAM policy on a Vertex AI model resource. What does the condition accomplish?

A.Allows the data scientist to access model evaluations only
B.Limits access to models whose resource name starts with 'dev-'
C.Limits access to models owned by the data scientist
D.Limits access to models only in the us-central1 region
E.Limits access to models created after a certain date
AnswerB

The condition 'resource.name.startsWith' matches only models with the 'dev-' prefix.

Why this answer

The condition in the IAM policy uses the `resource.name.startsWith('dev-')` condition expression, which restricts access to Vertex AI model resources whose resource name begins with the prefix 'dev-'. This is a common pattern for environment-based access control, allowing the data scientist to only interact with models designated for development.

Exam trap

Google Cloud often tests the distinction between resource name prefix matching and other common IAM conditions like resource labels, resource location, or creation timestamp, leading candidates to confuse a simple string prefix check with more complex attribute-based conditions.

How to eliminate wrong answers

Option A is wrong because the condition does not restrict access to model evaluations; it filters based on the resource name prefix, not the resource type or sub-resource. Option C is wrong because IAM conditions cannot dynamically check resource ownership; they operate on resource attributes like name, not on who created the resource. Option D is wrong because the condition does not reference any region attribute; region-based filtering would require a condition on `resource.location` or similar.

Option E is wrong because the condition does not involve any date or timestamp comparison; it only checks the string prefix of the resource name.

6
MCQeasy

A team is using Cloud Composer to orchestrate ML workflows. They want to allow multiple data scientists to contribute DAGs without interfering with each other. What is the recommended approach?

A.Give each data scientist write access to the DAGs folder in Cloud Storage
B.Use a complex naming convention for DAG files to avoid overwriting
C.Store DAGs in a source control repository and use CI/CD to deploy to Cloud Composer
D.Create a separate Cloud Composer environment for each data scientist
AnswerC

Version control and CI/CD provide collaboration, testing, and safe deployment.

Why this answer

Option C is correct because Cloud Composer (based on Apache Airflow) recommends managing DAGs via source control and CI/CD pipelines to ensure version control, code review, and consistent deployment. This prevents conflicts when multiple data scientists contribute, as each change is tracked and tested before being synced to the DAGs folder in Cloud Storage, avoiding overwrites or broken workflows.

Exam trap

The trap here is that candidates may assume direct write access or naming conventions are sufficient for collaboration, but Cisco tests the understanding that production-grade ML workflows require source control and CI/CD to enforce code quality and prevent deployment conflicts.

How to eliminate wrong answers

Option A is wrong because giving each data scientist direct write access to the DAGs folder in Cloud Storage bypasses version control and can lead to accidental overwrites, conflicts, or deployment of untested code, breaking production workflows. Option B is wrong because a complex naming convention does not prevent race conditions or overwrites when multiple data scientists upload files simultaneously; it only reduces the probability of name collisions but does not address the core need for controlled, auditable deployments. Option D is wrong because creating a separate Cloud Composer environment for each data scientist is cost-prohibitive, inefficient, and defeats the purpose of shared orchestration; it also introduces overhead in managing multiple environments and does not solve the collaboration problem at scale.

7
Multi-Selectmedium

A machine learning team is collaborating on a project using Vertex AI Experiments to track model training runs. They want to ensure that all team members can reproduce any experiment by using the same code, data, and environment. Which THREE actions should the team take?

Select 3 answers
A.Store the training code in a Cloud Source Repository and tag commits with the experiment ID.
B.Build a custom container image for training and push it to Artifact Registry with a fixed tag.
C.Record the path and version of the training dataset in the experiment parameters.
D.Share a service account key with all team members so they can access the same resources.
E.Use Vertex AI's hyperparameter tuning job to automatically find the best parameters.
AnswersA, B, C

This ensures the exact code version is tied to the experiment.

Why this answer

Option A is correct because storing training code in a Cloud Source Repository with tags linked to experiment IDs ensures that every team member can retrieve the exact code version used for a given experiment. This is a core reproducibility practice in Vertex AI Experiments, where the code snapshot is a key component of the experiment lineage.

Exam trap

Google Cloud often tests the distinction between actions that enable reproducibility versus actions that improve model performance or access control, so candidates mistakenly select hyperparameter tuning or service account sharing as reproducibility measures.

8
MCQhard

A large e-commerce company uses Vertex AI to train a recommendation model daily. The training pipeline is built with Vertex AI Pipelines and involves three steps: data preprocessing, training, and model evaluation. The pipeline is triggered by a Cloud Scheduler job every morning at 8 AM. Recently, the pipeline has been failing intermittently during the data preprocessing step, with an error message indicating 'ResourceExhausted: Quota limits exceeded for read api requests.' The team has checked and confirmed that the quota for BigQuery read requests is not exceeded at the project level. The preprocessing step reads data from a BigQuery table with billions of rows. The team has also noticed that the pipeline runs on a custom machine type (n1-standard-4) with a persistent disk. What is the most likely cause of this error?

A.The BigQuery table is partitioned on a date column, and the pipeline is querying a specific partition that exceeds the quota.
B.The Cloud Scheduler job is triggering multiple pipeline runs that overlap, causing concurrent quota usage.
C.The preprocessing component is using a BigQuery client library that does not use exponential backoff for retries.
D.The pipeline is using a shared VPC that has traffic shaping limits.
AnswerC

Without backoff, rapid retries can exhaust per-user read API quotas.

Why this answer

Option C is correct because the error 'ResourceExhausted: Quota limits exceeded for read api requests' indicates that the BigQuery API is throttling requests from the client, even though the project-level quota is not exceeded. The preprocessing component likely uses a BigQuery client library that lacks exponential backoff retry logic, causing rapid, repeated requests that exhaust the per-client or per-connection quota. Implementing exponential backoff would allow the client to back off and retry, preventing quota exhaustion.

Exam trap

The trap here is that candidates assume quota errors always mean the project-level limit is reached, but Cisco tests the nuance that per-client or per-connection rate limits can be exhausted independently, especially when retry logic is missing.

How to eliminate wrong answers

Option A is wrong because querying a specific partition does not inherently exceed quota; partitioning actually reduces data scanned and can lower quota usage. Option B is wrong because Cloud Scheduler triggers a single pipeline run at 8 AM, and overlapping runs would require multiple triggers or a long-running pipeline, which is not indicated; the error is specific to read API requests, not concurrency. Option D is wrong because shared VPC traffic shaping limits affect network throughput, not BigQuery read API quota, which is a separate resource governed by Google Cloud's API quota system.

9
MCQeasy

A data science team is using a shared Cloud Storage bucket to store training data. Multiple team members are simultaneously uploading new data files, and occasionally the wrong version of a file is used in training, leading to inconsistent results. Which best practice should the team implement to ensure data version consistency?

A.Use Cloud Composer to schedule a daily snapshot of the Cloud Storage bucket.
B.Migrate all training data to BigQuery and use time-travel queries to access historical versions.
C.Enable object versioning on the Cloud Storage bucket and use the version ID when referencing data files.
D.Restrict write access to the bucket to only one team member using IAM roles.
AnswerC

Object versioning provides a way to keep multiple versions of an object, ensuring consistency.

Why this answer

Option C is correct because enabling object versioning on a Cloud Storage bucket preserves each object's history, allowing the team to reference a specific version ID when reading data files. This ensures that every training run uses the exact same version of a file, eliminating inconsistency from concurrent uploads. The version ID acts as an immutable pointer, decoupling the training process from the bucket's live state.

Exam trap

Google Cloud often tests the distinction between data versioning (object-level immutability) and data backup (snapshots or time-travel), leading candidates to choose snapshot or database-centric solutions that do not provide per-file version consistency in a shared object store.

How to eliminate wrong answers

Option A is wrong because Cloud Composer schedules workflows (e.g., Airflow DAGs) but does not provide per-object version consistency; a daily snapshot captures a point-in-time state but does not prevent concurrent uploads from overwriting files between snapshots. Option B is wrong because BigQuery time-travel queries access table snapshots within a 7-day window, but the scenario involves files in Cloud Storage, not tables; migrating all training data to BigQuery is an unnecessary architectural change that does not address file-level versioning. Option D is wrong because restricting write access to one team member creates a bottleneck and single point of failure, violating the team's need for simultaneous uploads and not solving the core problem of identifying which version is used.

10
MCQhard

A machine learning engineer needs to share a trained model with the product team for integration. The model is stored in Cloud Storage, and the product team’s service account needs read access. The engineer wants to follow the principle of least privilege. Which IAM configuration should be used?

A.Generate a signed URL with read access and share it with the product team.
B.Grant the product team's service account the roles/storage.objectViewer role at the bucket level.
C.Grant the product team's service account the roles/storage.objectAdmin role at the bucket level.
D.Grant the product team's service account the roles/storage.objectViewer role at the project level.
AnswerB

Bucket-level grants read access to objects in that bucket only, following least privilege.

Why this answer

Option B is correct because granting the product team's service account the roles/storage.objectViewer role at the bucket level provides read-only access to objects in that specific bucket, adhering to the principle of least privilege. This role allows the service account to list and read objects without granting broader permissions, such as modifying or deleting them, and scoping it to the bucket prevents unnecessary access to other buckets in the project.

Exam trap

The trap here is that candidates may confuse the principle of least privilege with convenience, choosing a signed URL (Option A) because it seems simple, or selecting a project-level role (Option D) without realizing it grants access to all buckets, both of which violate the core requirement of minimal necessary permissions.

How to eliminate wrong answers

Option A is wrong because generating a signed URL with read access creates a time-limited, publicly accessible URL that bypasses IAM authentication, which violates the principle of least privilege by not using the service account's identity and potentially exposing the model to unauthorized users if the URL is leaked. Option C is wrong because granting the roles/storage.objectAdmin role at the bucket level provides full control over objects, including delete and overwrite permissions, which exceeds the required read-only access and violates least privilege. Option D is wrong because granting the roles/storage.objectViewer role at the project level gives read access to all buckets in the project, not just the specific bucket containing the model, which violates least privilege by granting broader access than necessary.

11
MCQmedium

A company has multiple teams that need to access and manage ML models in Vertex AI. Different teams require different permission levels: the data science team should be able to create and update models, while the MLOps team should have full control. What is the recommended approach to manage access?

A.Grant the 'aiplatform.user' role to a Google Group containing all users
B.Use folders in Google Cloud Resource Manager and assign IAM roles at the folder level
C.Use labels and tags on models to control access
D.Create a separate Google Cloud project for each team
AnswerB

Folders allow hierarchical policy management, and IAM roles can be scoped appropriately for each team.

Why this answer

Option B is correct because Google Cloud Resource Manager folders allow hierarchical IAM policy inheritance, enabling you to assign roles like 'roles/aiplatform.user' (for data science) and 'roles/aiplatform.admin' (for MLOps) at the folder level. This approach scales across multiple projects within the folder, ensuring consistent permissions without per-project duplication. It aligns with the principle of least privilege and centralized access management for Vertex AI resources.

Exam trap

The trap here is that candidates confuse resource labels/tags (which are for organization and cost allocation) with IAM-based access control, leading them to incorrectly select Option C as a viable permission management method.

How to eliminate wrong answers

Option A is wrong because granting 'aiplatform.user' to a Google Group containing all users gives the same permission level to everyone, failing to differentiate between data science (create/update) and MLOps (full control) needs; it violates least privilege. Option C is wrong because labels and tags are metadata for organizing and filtering resources, not IAM mechanisms—they cannot enforce access control or grant permissions to models. Option D is wrong because creating a separate project for each team introduces administrative overhead, breaks centralized model governance, and does not inherently solve fine-grained access within Vertex AI; it also complicates cross-team model sharing and cost tracking.

12
MCQmedium

A data engineer is setting up a data pipeline for ML training. The raw data is in Cloud Storage, and they need to transform it into features stored in Vertex AI Feature Store. The pipeline should run daily. Which service should they use?

A.Cloud Composer with Airflow DAG.
B.Cloud Dataproc with Spark.
C.Dataflow with Apache Beam pipeline.
D.Vertex AI Pipelines with custom components.
E.Cloud Functions on a schedule.
AnswerC

Dataflow can read from Cloud Storage, transform, and write to Feature Store efficiently.

Why this answer

Dataflow with Apache Beam is the correct choice because it provides a fully managed, serverless service for both batch and streaming data processing, which is ideal for transforming raw data from Cloud Storage into features for Vertex AI Feature Store on a daily schedule. Dataflow handles auto-scaling, exactly-once processing, and integrates natively with Google Cloud services, making it efficient for ETL pipelines that need to run reliably at scale.

Exam trap

Google Cloud often tests the distinction between orchestration (Cloud Composer) and actual data processing (Dataflow), leading candidates to pick Cloud Composer because they see 'schedule' in the question, but the core requirement is transforming data, not just scheduling it.

How to eliminate wrong answers

Option A is wrong because Cloud Composer with Airflow DAG is primarily an orchestration tool for scheduling and monitoring workflows, not a data processing engine; it would need to delegate the actual transformation to another service like Dataflow or Dataproc. Option B is wrong because Cloud Dataproc with Spark is optimized for big data analytics and interactive queries, but it requires managing clusters and is less suited for a simple, daily batch transformation pipeline that benefits from serverless, auto-scaling execution. Option D is wrong because Vertex AI Pipelines with custom components is designed for orchestrating ML workflows (e.g., training, evaluation, deployment), not for generic data transformation tasks; it adds unnecessary complexity for a simple daily ETL job.

Option E is wrong because Cloud Functions on a schedule is limited by a 9-minute timeout and 2 GB memory, making it unsuitable for processing large volumes of raw data from Cloud Storage into features.

13
MCQmedium

A team of data scientists and ML engineers is collaborating on a project using Vertex AI Workbench. They need to share notebooks and code, but want to avoid conflicts and maintain a history of changes. Which approach should they use?

A.Email notebook files to each other and manually merge changes.
B.Store notebooks in a shared Cloud Storage bucket and access them simultaneously.
C.Use Vertex AI Experiments to share notebook outputs.
D.Use a git repository (e.g., Cloud Source Repositories) to manage code and notebooks.
AnswerD

Git provides branching, merging, and history.

Why this answer

Option D is correct because using a git repository (e.g., Cloud Source Repositories) provides version control, branching, and a full history of changes, which is essential for collaborative development. This approach avoids conflicts by allowing team members to work on separate branches and merge changes systematically, unlike shared storage or manual methods that lack conflict resolution and audit trails.

Exam trap

The trap here is that candidates confuse collaboration tools (like shared storage or experiment tracking) with version control, assuming that any shared access or logging mechanism can replace the structured history and conflict resolution of a git-based workflow.

How to eliminate wrong answers

Option A is wrong because emailing notebook files and manually merging changes is error-prone, lacks any version history or conflict detection, and does not scale for team collaboration. Option B is wrong because storing notebooks in a shared Cloud Storage bucket and accessing them simultaneously can lead to write conflicts, data corruption, and no built-in version history or merge capabilities. Option C is wrong because Vertex AI Experiments is designed for tracking and comparing model training runs and their metrics, not for managing source code or notebook version control.

14
MCQmedium

A data science team is using Vertex AI Feature Store for online serving. They notice that the online serving latency is high. What is the most likely cause?

A.The features are being computed on the fly instead of being precomputed.
B.The feature table has too many rows.
C.The feature values are stored in Cloud Storage.
D.The online store is not configured for high throughput.
E.The serving endpoint is in a different region than the client.
AnswerC

Cloud Storage has high latency for per-request access; online store should use Bigtable or Memorystore.

Why this answer

Option C is correct because Vertex AI Feature Store requires feature values to be stored in a low-latency online store (such as a Bigtable or Redis cluster) for serving. When features are stored in Cloud Storage, each online serving request must read from object storage, which introduces significant latency due to network overhead and lack of indexing. This design violates the fundamental architecture of Feature Store, which expects precomputed features in a key-value store optimized for sub-millisecond lookups.

Exam trap

The trap here is that candidates may assume any cloud storage is acceptable for online serving, but the PMLE exam tests the specific architectural requirement that Vertex AI Feature Store must use a low-latency online store (like Bigtable or Redis) for serving, not Cloud Storage.

How to eliminate wrong answers

Option A is wrong because computing features on the fly would increase latency, but the question states the team is using Vertex AI Feature Store for online serving, which implies features are precomputed; the high latency is not due to on-the-fly computation but rather the storage backend. Option B is wrong because the number of rows in a feature table does not directly cause high online serving latency; Vertex AI Feature Store uses indexing and partitioning to handle large tables efficiently. Option D is wrong because the online store's throughput configuration affects capacity under load, not baseline latency; high latency is more likely a storage or network issue.

Option E is wrong because while cross-region latency can add delay, Vertex AI Feature Store endpoints are regional by default, and the question does not indicate a region mismatch; the more direct cause is the storage layer.

15
MCQeasy

Your team manages multiple ML models in Vertex AI Model Registry. Each model has several versions deployed to different endpoints for testing and production. You need to implement a process where a model version can be promoted from a staging environment to production only after it has passed automated validation tests and been approved by a designated reviewer. The team uses CI/CD pipelines (Cloud Build) for training and deployment. Currently, model versions are deployed to endpoints using Vertex AI Endpoints with a single traffic split configuration. You want to track promotion requests and enforce approval gates. What should you do?

A.Deploy each model version to a separate endpoint, and use a custom database to track which endpoint is 'production'. Then use migration scripts to switch traffic.
B.Store the model version metadata in a BigQuery table and use a scheduled query to automatically update the endpoint deployment based on validation results.
C.Use Vertex AI Model Registry labels to mark versions as 'staging' or 'production', and create a Cloud Function that checks the label before deploying to the endpoint.
D.Use Vertex AI Model Registry version aliases ('staging', 'production') and configure Cloud Build to trigger a Cloud Run service that handles approval logic, then update the alias upon approval.
AnswerD

Version aliases provide a built-in way to denote environment stages and can be updated programmatically after validation and approval.

Why this answer

Option D is correct because Vertex AI Model Registry version aliases (e.g., 'staging', 'production') are designed to track model version lifecycle stages. By integrating Cloud Build to trigger a Cloud Run service that enforces approval logic before updating the alias, you create a clear promotion gate. This approach natively supports tracking promotion requests and enforcing approval without custom databases or manual scripts, aligning with CI/CD best practices.

Exam trap

Google Cloud often tests the distinction between labels (key-value metadata) and aliases (semantic lifecycle tags) in Vertex AI Model Registry, leading candidates to choose Option C because they confuse labels with the built-in promotion mechanism that aliases provide.

How to eliminate wrong answers

Option A is wrong because deploying each model version to a separate endpoint and using a custom database to track 'production' adds unnecessary complexity and operational overhead; Vertex AI already provides version aliases and traffic splitting to manage promotions. Option B is wrong because using a BigQuery table and scheduled queries to update endpoint deployments introduces latency and lacks real-time approval enforcement; it also bypasses the native Model Registry lifecycle management. Option C is wrong because Vertex AI Model Registry labels are key-value metadata not designed for version promotion workflows; they lack the built-in semantics of aliases and would require custom logic to enforce approval gates, whereas aliases directly support staging/production promotion.

16
MCQmedium

A team is using AI Platform Data Labeling Service to label data for a classification model. They want to allow a labeler from a different team to work on the same dataset. What is the correct way to grant access?

A.Add the labeler's account as a Project Editor on the project
B.Share the Cloud Storage bucket containing the data with the labeler
C.Export the dataset and have the labeler create a new dataset
D.Add the labeler as a participant in the labeling task and assign IAM roles on the dataset
AnswerD

The Data Labeling Service allows adding participants to tasks, and IAM roles control access.

Why this answer

Option D is correct because labeling tasks are shared by granting the labeler role on the dataset resource. Option A is wrong because sharing the entire project gives too much access. Option B is wrong because the Data Labeling Service does not use Cloud Storage ACLs for task access.

Option C is wrong because exporting and reimporting causes duplication.

17
Multi-Selectmedium

Which THREE considerations are important when setting up a shared feature store in Vertex AI Feature Store for multiple teams?

Select 3 answers
A.Enable feature monitoring for data quality and freshness
B.Use separate BigQuery tables for each team's features
C.Implement data governance policies for feature creation and access
D.Create a feature sharing policy to enable cross-team discovery
E.Allow each team to build independent ingestion pipelines
AnswersA, C, D

Monitoring helps maintain trust in the feature store.

Why this answer

Option A is correct because Vertex AI Feature Store provides built-in feature monitoring that tracks data quality metrics (e.g., fraction of null values, distribution drift) and freshness (e.g., staleness of feature values). Enabling this monitoring is critical when multiple teams share a feature store to ensure that features remain reliable and up-to-date for downstream models, preventing silent degradation.

Exam trap

Google Cloud often tests the misconception that a shared feature store requires separate physical storage per team (Option B) or fully independent ingestion (Option E), when in reality the value lies in centralization with controlled access and standardized pipelines.

18
MCQhard

Two teams independently develop two different versions of a model for the same use case. They both deploy to the same Vertex AI endpoint, causing conflicts. What is the best way to manage multiple model versions and avoid conflicts in a collaborative environment?

A.Have each team work on a separate Google Cloud project
B.Use custom metadata to tag each version and rely on team coordination
C.Deploy each team's model to a separate endpoint
D.Use Vertex AI Model Registry with staging and production channels, and implement CI/CD to control promotions
AnswerD

Model registry with staging/production allows controlled version management and rollback.

Why this answer

Option C is correct because using a model registry with separate staging and production channels helps control which version is promoted. Option A is wrong because deploying to different endpoints increases management overhead. Option B is wrong because versioning metadata does not enforce deployment order.

Option D is wrong because separate projects create silos and increase cost.

19
Multi-Selecthard

Which TWO actions should be taken to ensure reproducibility of ML experiments when collaborating across teams on Vertex AI?

Select 2 answers
A.Lock dependency versions in a container image used for training
B.Share notebooks via Colab Enterprise with real-time editing
C.Version control datasets using DVC or Vertex AI ML Metadata
D.Allow each team to use their own preferred environment
E.Always use random seeds for all random operations
AnswersA, C

Container images with fixed versions ensure environment reproducibility.

Why this answer

Locking dependency versions in a container image ensures that the exact same software environment (e.g., Python packages, CUDA libraries, system tools) is used every time a training job runs. This eliminates variability from package updates or OS patches, which is a fundamental requirement for reproducibility across teams. Vertex AI supports custom containers for training, making this a direct and reliable method.

Exam trap

The trap here is that candidates often think 'always use random seeds' is a safe blanket rule, but in practice, seeds must be explicitly set and logged per run, and some operations (e.g., certain GPU kernels) are inherently non-deterministic, making this option an oversimplification that is not a guaranteed action for reproducibility.

20
Multi-Selecthard

Which THREE of the following are valid ways to share a Vertex AI model across two different Google Cloud projects?

Select 3 answers
A.Use Vertex AI Model Registry's cross-project sharing feature with IAM conditions.
B.Publish the model to Google Cloud Marketplace.
C.Export the model to a Cloud Storage bucket accessible by both projects and import into the second project.
D.Use IAM to grant the second project's service account Vertex AI User role on the model resource.
E.Use the gcloud ai models copy command to copy the model across projects.
AnswersA, C, D

Model Registry supports sharing model versions across projects with fine-grained IAM.

Why this answer

Option A is correct because Vertex AI Model Registry supports cross-project sharing by allowing you to grant IAM roles with conditions on the model resource. This enables a model registered in one project to be accessed by a service account from another project without moving or copying the model artifacts.

Exam trap

The trap here is that candidates may assume a dedicated copy command exists for moving models across projects, but Vertex AI relies on IAM-based sharing or export/import workflows instead.

21
MCQeasy

Refer to the exhibit. A team runs this command to upload a model to Vertex AI. They want to create this model as a new version under an existing model named 'my_model'. What is missing from the command?

A.--description='Second version'
B.--version=v2
C.--labels=team=ml
D.--service-account=sa@project.iam.gserviceaccount.com
E.--parent-model=my_model
AnswerE

The --parent-model flag indicates the existing model to add a version to.

Why this answer

Option E is correct because the `--parent-model` flag is required when uploading a new model version to an existing model in Vertex AI. Without specifying the parent model name, the command would attempt to create a brand-new model rather than adding a version to the existing 'my_model'. The `gcloud ai models upload` command uses this flag to associate the new version with the specified parent model.

Exam trap

Google Cloud often tests the distinction between creating a new model versus adding a version to an existing model, and the trap here is that candidates assume a `--version` flag exists (like in some other services) instead of recognizing the required `--parent-model` parameter.

How to eliminate wrong answers

Option A is wrong because `--description` is an optional metadata field and does not affect versioning or parent-model association. Option B is wrong because Vertex AI does not support a `--version` flag; model versions are automatically assigned by the service based on the order of uploads under the same parent model. Option C is wrong because `--labels` are optional key-value pairs for organizing resources and have no role in version creation.

Option D is wrong because `--service-account` is used for specifying a custom service account for model deployment, not for versioning or parent-model linkage.

22
MCQeasy

To enable collaboration on notebook-based experiments across teams, what is the recommended approach in Google Cloud?

A.Use Colab Enterprise notebooks with shared runtimes and IAM permissions
B.Share Docker images containing the notebook environment
C.Each team member works on their own local Jupyter notebook and shares screenshots
D.Store notebooks in a Cloud Storage bucket and open them with Vertex AI Workbench
AnswerA

Colab Enterprise enables collaborative editing and shared compute resources.

Why this answer

Colab Enterprise notebooks with shared runtimes and IAM permissions is the recommended approach because it provides a fully managed, collaborative environment where multiple users can work on the same notebook simultaneously, with fine-grained access control via IAM and consistent runtime configurations. This eliminates version conflicts and environment drift, which are common in distributed notebook workflows.

Exam trap

Google Cloud often tests the misconception that shared storage (like Cloud Storage) alone is sufficient for collaboration, but the key requirement is shared runtimes and concurrent editing, which only Colab Enterprise provides among the options.

How to eliminate wrong answers

Option B is wrong because sharing Docker images containing the notebook environment addresses environment reproducibility but does not enable real-time collaboration or shared runtime execution; each user would still need to launch their own instance and manually sync changes. Option C is wrong because each team member working on their own local Jupyter notebook and sharing screenshots is a manual, non-scalable approach that lacks version control, concurrent editing, and centralized data access, making it unsuitable for team collaboration. Option D is wrong because storing notebooks in a Cloud Storage bucket and opening them with Vertex AI Workbench provides shared storage but does not inherently support shared runtimes or concurrent editing; Vertex AI Workbench instances are typically single-user, and multiple users would need to coordinate access to avoid conflicts.

23
MCQeasy

A data scientist needs to share a BigQuery dataset with a colleague in a different team so they can run queries. What is the simplest and most secure way to grant access?

A.Export the dataset to Cloud Storage and share the bucket
B.Add the colleague's account as a BigQuery Data Viewer on the dataset
C.Share the service account key of a BigQuery job user with the colleague
D.Add the colleague's account as a Project Viewer on the entire project
AnswerB

Direct IAM binding on the dataset provides least-privilege access.

Why this answer

Option A is correct because BigQuery dataset ACLs (via IAM) allow fine-grained access to specific datasets. Option B is wrong because sharing the entire project gives too much access. Option C is wrong because exporting to Cloud Storage adds unnecessary complexity and stale data.

Option D is wrong because sharing the service account key is a security risk.

24
MCQhard

A team uses Vertex AI Experiments to track ML training runs. They want to automatically trigger a retraining pipeline when new labeled data arrives in BigQuery, and ensure the pipeline uses only approved libraries from a central artifact registry. Which combination of services should they use?

A.Cloud Composer to orchestrate, with Cloud Storage for libraries.
B.Vertex AI Pipelines with a scheduled trigger, and use Cloud Build to pull libraries from Artifact Registry.
C.Cloud Functions triggered by BigQuery, Cloud Build to run training, and Artifact Registry for libraries.
D.Vertex AI Experiments with continuous evaluation, and a Cloud Run job for training.
E.Dataflow to preprocess, then trigger a Cloud Run job.
AnswerB

Scheduled pipeline can query BigQuery for new data, and Cloud Build ensures consistent library versions.

Why this answer

Option B is correct because Vertex AI Pipelines provides a managed orchestration service for ML workflows, and a scheduled trigger can be set to run the pipeline when new labeled data arrives in BigQuery (e.g., via a Cloud Scheduler or Eventarc trigger). Cloud Build is used to pull approved libraries from Artifact Registry, ensuring only vetted dependencies are used during pipeline execution, which meets the security and compliance requirement.

Exam trap

The trap here is that candidates may confuse Cloud Build (a CI/CD service) with Vertex AI Training (a managed ML training service), or think that Cloud Composer is the only orchestration option for ML pipelines, when Vertex AI Pipelines is the native, more integrated choice for ML workflows on Vertex AI.

How to eliminate wrong answers

Option A is wrong because Cloud Composer (based on Apache Airflow) is a general-purpose workflow orchestrator, not specifically designed for Vertex AI Pipelines, and using Cloud Storage for libraries does not enforce the use of a central artifact registry with version control and access policies. Option C is wrong because Cloud Functions triggered by BigQuery can initiate a retraining pipeline, but Cloud Build is a CI/CD tool, not a managed ML training service; Vertex AI Training or Pipelines should be used for the actual training run, not Cloud Build. Option D is wrong because Vertex AI Experiments tracks runs but does not orchestrate retraining pipelines; continuous evaluation is a monitoring feature, not a trigger mechanism, and Cloud Run is a serverless compute service for containers, not a managed ML training service.

Option E is wrong because Dataflow is a stream/batch data processing service, not a trigger mechanism for retraining, and Cloud Run is not designed for long-running ML training jobs; it lacks GPU support and has request timeout limits.

25
MCQeasy

A team wants to share a trained model with another team who will deploy it to a different Google Cloud project. Which is the recommended way to transfer the model?

A.Copy the model artifact from one project's Cloud Storage to another using gsutil.
B.Export the model as a SavedModel, store in a shared Cloud Storage bucket, and import into the second project.
C.Package the model in a Docker container and push to a cross-project Container Registry.
D.Use Cloud Marketplace to publish the model.
E.Use Vertex AI Model Registry with cross-project IAM permissions to allow the second project to access the model.
AnswerE

Model Registry maintains version history and metadata while enabling cross-project sharing.

Why this answer

Option E is correct because Vertex AI Model Registry supports cross-project access via IAM permissions, allowing the second project to directly deploy the model without copying artifacts. This approach maintains a single source of truth, avoids data duplication, and leverages Vertex AI's built-in versioning and lineage tracking. It is the recommended pattern for sharing models across projects in Google Cloud.

Exam trap

Google Cloud often tests the misconception that copying artifacts (gsutil) or using shared storage is the simplest approach, but the exam expects candidates to recognize that Vertex AI Model Registry with cross-project IAM is the recommended, managed solution for model sharing across projects.

How to eliminate wrong answers

Option A is wrong because copying model artifacts via gsutil bypasses Vertex AI's model management, losing metadata, versioning, and deployment history, and is not a recommended practice for production model sharing. Option B is wrong because exporting a SavedModel to a shared Cloud Storage bucket still requires manual import and does not leverage Vertex AI's model registry, leading to potential versioning and access control issues. Option C is wrong because packaging the model in a Docker container and pushing to a cross-project Container Registry is more appropriate for containerized inference services, not for sharing a trained model artifact itself, and adds unnecessary complexity.

Option D is wrong because Cloud Marketplace is designed for publishing commercial solutions, not for internal team-to-team model sharing within an organization.

26
Multi-Selectmedium

Which TWO of the following are recommended methods to ensure data privacy when collaborating with external partners on ML projects?

Select 2 answers
A.Use Vertex AI Feature Store with access controls.
B.Use Cloud DLP to de-identify data before sharing.
C.Grant the partner project's service account direct access to the raw data in BigQuery.
D.Use Confidential VMs for training with sensitive data.
E.Share data via email.
AnswersB, D

DLP can redact, tokenize, or mask sensitive data.

Why this answer

Cloud DLP (Data Loss Prevention) is a recommended method to de-identify sensitive data before sharing it with external partners. It can automatically detect and mask, tokenize, or redact PII, PCI, or other sensitive elements, ensuring that only anonymized data leaves your environment. This aligns with the principle of least privilege and data minimization for external collaboration.

Exam trap

Google Cloud often tests the misconception that access controls alone (like IAM or Feature Store ACLs) are sufficient for data privacy with external partners, but the key requirement is de-identification or encryption in use, not just authorization.

27
MCQhard

A company needs to maintain an audit trail of model changes for compliance. Multiple teams will be updating models. What is the best approach to track who created, modified, or deployed each model version?

A.Enable Cloud Storage audit logs and require all model files to be stored in a bucket
B.Use Cloud Logging to collect logs from all services and search for model names
C.Use Vertex AI Experiments and Metadata to track model lineage and audit logs
D.Ask team members to maintain a shared spreadsheet of changes
AnswerC

Vertex AI provides built-in audit capabilities with user attribution and metadata.

Why this answer

Option A is correct because Vertex AI automatically logs metadata (including user identity) via Cloud Audit Logs and ML Metadata. Option B is wrong because Cloud Storage logs only show object-level access, not model-specific actions. Option C is wrong because manual logging is error-prone.

Option D is wrong because Cloud Logging alone does not correlate events to model versions.

28
Multi-Selectmedium

A team of data scientists and ML engineers is collaborating on a shared feature store in Vertex AI Feature Store. They need to ensure that feature definitions are versioned and that changes are reviewed before being used in production pipelines. Which TWO practices should they implement?

Select 2 answers
A.Allow data scientists to edit feature definitions directly in the Vertex AI Feature Store console.
B.Require code reviews for all changes to feature definitions before merging to the main branch.
C.Define multiple feature views in Vertex AI Feature Store for different environments and manage access via IAM.
D.Store feature definition code in a version-controlled repository such as Cloud Source Repositories.
E.Use scheduled batch jobs to synchronize feature definitions from a shared spreadsheet to Vertex AI Feature Store.
AnswersB, D

Code reviews ensure quality and approval.

Why this answer

Option B is correct because requiring code reviews for all changes to feature definitions before merging to the main branch enforces a peer-review gate, ensuring that modifications are validated for correctness, consistency, and compliance before they reach production. This aligns with MLOps best practices for governance and reduces the risk of introducing errors or breaking changes into the feature store.

Exam trap

Google Cloud often tests the distinction between environment isolation (IAM and multiple feature views) and the actual versioning/review process, leading candidates to mistakenly select Option C as a versioning practice when it only addresses access control and environment separation.

29
MCQhard

A company has multiple business units using the same Vertex AI environment. They need to enforce that models deployed to production have passed a validation pipeline, and only the ML Engineering team can deploy to production. Which IAM configuration should they use?

A.Use Vertex AI Workbench with user-managed notebooks.
B.Use custom roles with permissions to deploy models, and use Cloud Audit Logs to monitor deployments.
C.Use Binary Authorization to ensure models are signed.
D.Use organization policies to restrict deployment to specific locations.
E.Use Vertex AI Model Registry with automated deployment via Cloud Build, and restrict those permissions to the ML Engineering team using IAM conditions.
AnswerE

This ensures only approved pipelines trigger deployment and only authorized team can initiate.

Why this answer

Option E is correct because it combines Vertex AI Model Registry (which enforces that only validated models are promoted to production) with Cloud Build for automated deployment, and uses IAM conditions to restrict deployment permissions exclusively to the ML Engineering team. This ensures that models must pass the validation pipeline before deployment, and only authorized personnel can trigger the deployment process.

Exam trap

The trap here is that candidates may confuse monitoring (Audit Logs) or location restrictions (Organization Policies) with enforcing a validation pipeline and team-specific deployment permissions, missing the need for a model registry and automated deployment with IAM conditions.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench with user-managed notebooks is a development environment for building and training models, not a mechanism for enforcing deployment validation or restricting deployment permissions. Option B is wrong because custom roles with deployment permissions and Cloud Audit Logs only provide monitoring and access control, but do not enforce that models have passed a validation pipeline before deployment. Option C is wrong because Binary Authorization is designed for container image signing and attestation, not for validating ML model pipelines or restricting deployment to specific teams.

Option D is wrong because organization policies can restrict deployment to specific locations (e.g., regions), but they do not enforce model validation or restrict deployment permissions to a specific team.

30
MCQeasy

A team is using Vertex AI Pipelines to automate their ML workflow. They want to ensure that pipeline runs are reproducible and that artifacts are tracked. Which feature should they use?

A.Vertex AI Feature Store
B.Vertex AI Experiments
C.Vertex AI Model Registry
D.Vertex AI Endpoints
AnswerB

Experiments track parameters, metrics, and artifacts for each run.

Why this answer

Vertex AI Experiments is the correct feature because it captures parameters, metrics, and artifacts for each pipeline run, enabling reproducibility and lineage tracking. This directly supports the team's need to ensure runs are reproducible and artifacts are tracked, as Experiments automatically logs metadata for every execution.

Exam trap

The trap here is that candidates confuse artifact tracking with model management or deployment features, leading them to select Model Registry or Endpoints instead of recognizing that Experiments provides the run-level metadata and lineage required for reproducibility.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is designed for managing and serving feature data for ML models, not for tracking pipeline runs or artifacts. Option C is wrong because Vertex AI Model Registry focuses on managing model versions and deployment, not on capturing run-level metadata or artifact lineage. Option D is wrong because Vertex AI Endpoints are for deploying models to serve predictions, not for tracking reproducibility or artifacts in pipeline runs.

31
Multi-Selecteasy

Which TWO of the following are best practices for versioning ML models and datasets?

Select 2 answers
A.Use Vertex AI Model Registry for model versioning and lineage tracking.
B.Use semantic versioning for datasets.
C.Store datasets and models in the same Cloud Storage bucket with version prefixes.
D.Use Git LFS for dataset versioning.
E.Use Cloud Data Catalog to tag dataset versions.
AnswersA, B

Model Registry is designed for model versioning and captures lineage.

Why this answer

Vertex AI Model Registry is a managed service that automatically tracks model versions, artifacts, and lineage metadata (e.g., training runs, evaluation metrics, and source datasets). It provides a centralized hub for model governance, enabling reproducibility and auditability without manual versioning overhead. This makes it a best practice for versioning ML models in a production MLOps workflow.

Exam trap

Google Cloud often tests the misconception that storing artifacts in the same bucket with version prefixes is sufficient for versioning, when in fact it lacks lineage tracking, automated metadata, and governance controls that dedicated registries and versioning schemes provide.

32
MCQhard

A company uses Vertex AI Experiments to track ML training runs. They want to enforce that all training runs use only approved libraries from a central Artifact Registry to ensure compliance. Which approach should they take?

A.Use a startup script in the training VM to install libraries from Artifact Registry.
B.Use Vertex AI Pipelines with a component that pulls libraries from Artifact Registry at runtime.
C.Create a custom Vertex AI training container that installs libraries from Artifact Registry at build time and restrict training job submission to that container using IAM.
D.Configure Vertex AI Training with a custom job configuration that specifies the library sources.
E.Use Cloud Build to build the training image with approved libraries and push to Container Registry, then restrict training jobs to that image.
AnswerC

This encapsulates libraries in the container and controls usage.

Why this answer

Option C is correct because it enforces compliance at the image level: by building a custom container that installs only approved libraries from Artifact Registry at build time, and then restricting training job submission to that specific container using IAM, you ensure that no unauthorized libraries can be introduced at runtime. This approach eliminates the risk of developers injecting unapproved dependencies via startup scripts or runtime pulls, and it aligns with the principle of immutable infrastructure for ML training.

Exam trap

The trap here is that candidates confuse runtime library installation (options A, B, D) with build-time image hardening (option C), overlooking that only a pre-built, IAM-restricted container can truly prevent unauthorized dependencies from being loaded during training.

How to eliminate wrong answers

Option A is wrong because a startup script runs after the VM starts, allowing users to modify or override the library list at runtime, which does not enforce compliance. Option B is wrong because pulling libraries from Artifact Registry at runtime still permits dynamic changes to dependencies, and the pipeline component itself could be modified to pull from other sources. Option D is wrong because a custom job configuration only specifies library sources as metadata; it does not prevent the training job from installing additional or different libraries during execution.

Option E is wrong because it pushes the image to Container Registry (now deprecated in favor of Artifact Registry) and does not restrict training jobs to that image via IAM—any user with permissions could submit a job using a different image.

33
MCQhard

A financial services company uses Vertex AI Pipelines to train and deploy models for fraud detection. The ML team consists of data scientists who develop models and ML engineers who deploy them. They use a CI/CD pipeline with Cloud Build to build and push Docker images to Artifact Registry, then trigger Vertex AI Pipelines. Recently, the team noticed that a model deployed to production was trained on a dataset that had not been approved by the data governance team. Upon investigation, they found that a data scientist accidentally used an unapproved version of the training data by specifying a Cloud Storage path that was not the latest approved dataset. The company needs to enforce that only approved datasets are used in training jobs. Which approach should they take?

A.Implement a manual approval process where data scientists request dataset paths from the data governance team before each training run.
B.After training, run a validation step that checks if the dataset used matches the latest approved version, and roll back if not.
C.Use a curated dataset registry in BigQuery or Cloud Storage with IAM conditions that allow access only to datasets tagged as 'approved'. Modify the CI/CD pipeline to pass only approved dataset references to the training job.
D.Restrict all Cloud Storage buckets to be read-only for the data scientists, and have ML engineers copy approved datasets to a separate bucket.
AnswerC

This automates governance by restricting training to approved datasets via IAM and pipeline configuration.

Why this answer

Option C is correct because it enforces governance at the source by using IAM conditions to restrict access to only approved datasets, preventing unauthorized data from being used in training. This approach integrates with the CI/CD pipeline to automatically pass only approved dataset references, eliminating the risk of human error in specifying Cloud Storage paths.

Exam trap

Google Cloud often tests the distinction between reactive validation (Option B) and proactive enforcement (Option C), where candidates mistakenly choose a post-training check that wastes resources instead of a preventive IAM-based control.

How to eliminate wrong answers

Option A is wrong because a manual approval process is error-prone, slow, and does not scale; it relies on human compliance rather than automated enforcement, leaving the system vulnerable to accidental misuse. Option B is wrong because it is reactive—it detects the issue after training has already occurred, wasting compute resources and potentially exposing the model to unapproved data before rollback. Option D is wrong because it restricts data scientists' access entirely, which hinders their ability to experiment and develop models; it also shifts the burden to ML engineers without addressing the root cause of dataset version control.

34
MCQmedium

A company uses Vertex AI Pipelines for model training and deployment. The pipeline includes a model evaluation step that produces metrics. If the metrics are below a threshold, the pipeline should fail and not deploy. Which component should they use?

A.Use a conditional operator in the pipeline to skip or fail based on metrics.
B.A Python component that uses the SDK to raise an exception if metrics are low.
C.A Vertex AI Model Evaluation component configured with a threshold.
D.Use Cloud Monitoring to trigger an alert and manually stop deployment.
E.A custom container that returns a non-zero exit code on failure.
AnswerA

Conditionals are the standard way to control pipeline flow based on data.

Why this answer

Option A is correct because Vertex AI Pipelines supports conditional execution via the `Condition` component or `if/else` operators within the pipeline DAG. This allows you to evaluate model metrics (e.g., accuracy, AUC) and, if they fall below a defined threshold, either skip the deployment step or explicitly fail the pipeline using `PipelineTask.fail()` or a conditional branch that raises an error. This is the native, declarative way to control pipeline flow based on evaluation results without relying on external services or manual intervention.

Exam trap

The trap here is that candidates confuse raising an exception in a component (Option B) with pipeline-level conditional failure, not realizing that exceptions may not propagate correctly in a distributed pipeline and that Vertex AI Pipelines provides explicit conditional operators for this exact purpose.

How to eliminate wrong answers

Option B is wrong because raising an exception inside a Python component using the SDK does not cleanly fail the pipeline in a controlled, observable manner; it may cause the component to retry or hang depending on the pipeline's error handling configuration, and it bypasses the pipeline's built-in conditional logic. Option C is wrong because Vertex AI Model Evaluation component does not have a configurable threshold that automatically fails the pipeline; it only produces evaluation metrics, and the threshold logic must be implemented separately (e.g., via a conditional). Option D is wrong because Cloud Monitoring alerts are for observability and manual intervention, not for programmatically failing a pipeline; this approach introduces latency and human error, and does not integrate with Vertex AI Pipelines' native failure mechanisms.

Option E is wrong because a custom container returning a non-zero exit code will cause the pipeline step to fail, but it does not provide a way to conditionally fail based on metrics without additional logic inside the container; moreover, it is less maintainable and less transparent than using a built-in conditional operator.

35
Multi-Selecthard

A machine learning team is deploying a model for real-time predictions using Vertex AI. They need to ensure that the deployment follows best practices for collaboration and governance. Which TWO actions should they take?

Select 2 answers
A.Use a continuous integration/continuous deployment (CI/CD) pipeline to deploy model versions.
B.Store all model artifacts in a local file system to reduce latency.
C.Enable model monitoring to detect data drift and performance degradation.
D.Manually configure autoscaling parameters for the endpoint.
E.Allow any team member to deploy directly to production without review.
AnswersA, C

CI/CD ensures consistent, repeatable deployments.

Why this answer

Option A is correct because using a CI/CD pipeline for deploying model versions ensures automated, repeatable, and auditable deployments, which is a best practice for collaboration and governance. This approach enforces version control, testing, and approval gates, reducing the risk of errors and enabling rollback if needed.

Exam trap

Google Cloud often tests the misconception that local storage or manual configuration is acceptable for governance, when in fact centralized artifact storage and automated scaling are required for collaboration and reliability.

36
MCQmedium

A data science team uses a shared Cloud Storage bucket to store training datasets. They notice that some team members accidentally overwrite existing datasets, causing issues with reproducibility. Which approach best prevents accidental overwrites while maintaining collaboration?

A.Use a single shared service account with strict IAM roles that allow only append operations.
B.Require team members to manually rename files before uploading.
C.Set bucket permissions to read-only for all team members except the data owner.
D.Enable object versioning on the bucket and use lifecycle rules to manage versions.
AnswerD

Versioning allows recovery of previous versions if overwritten.

Why this answer

Option D is correct because enabling object versioning on a Cloud Storage bucket preserves all versions of an object, so even if a team member overwrites a dataset, the previous version remains accessible. This maintains collaboration (anyone can upload) while preventing permanent data loss. Lifecycle rules can then be used to manage storage costs by automatically deleting old versions after a specified period.

Exam trap

The trap here is that candidates may think IAM roles or permissions are the only way to control data integrity, overlooking that object versioning provides a safety net without blocking collaboration.

How to eliminate wrong answers

Option A is wrong because Cloud Storage does not support 'append-only' IAM roles; objects are immutable and must be rewritten entirely, so this approach would not prevent overwrites and would break normal upload workflows. Option B is wrong because relying on manual renaming is error-prone and does not enforce any technical control, so accidental overwrites can still occur. Option C is wrong because making the bucket read-only for most team members prevents them from uploading new datasets at all, which destroys collaboration and is overly restrictive.

37
MCQmedium

A data science team is collaborating on a project to build a churn prediction model. They use Vertex AI Workbench instances for development. Each data scientist has their own instance with a persistent disk. They share code via a GitHub repository. They want to ensure that the model training is reproducible across different team members' environments. Currently, they manually install Python packages in their instances, and they have noticed that the model metrics differ slightly between runs on different instances. Which of the following is the best action to ensure reproducibility?

A.Standardize the instance machine type and ensure all have the same number of CPUs.
B.Use Cloud Functions to run the training code instead.
C.Use Vertex AI Experiments with a fixed environment by specifying a prebuilt container.
D.Create a custom Docker image with all dependencies and use it in Vertex AI Training jobs.
E.Ask all team members to use the same Python virtual environment and install packages from a requirements.txt file.
AnswerC

Experiments track parameters and metrics while ensuring a consistent environment.

Why this answer

Option C is correct because Vertex AI Experiments with a prebuilt container ensures a fixed, reproducible environment by pinning the exact OS, Python version, and all dependencies. This eliminates the variability introduced by manual package installations and differing instance configurations, directly addressing the team's issue of inconsistent model metrics across runs.

Exam trap

Google Cloud often tests the distinction between environment reproducibility (which requires fixed software stacks) and hardware consistency (which is less critical for deterministic training), leading candidates to mistakenly choose hardware standardization (Option A) or manual dependency management (Option E).

How to eliminate wrong answers

Option A is wrong because standardizing machine type and CPU count does not control for differences in Python package versions or system libraries, which are the primary cause of metric discrepancies. Option B is wrong because Cloud Functions are designed for event-driven, stateless workloads and are not suitable for long-running model training jobs; they also do not inherently enforce a fixed environment. Option D is wrong because while a custom Docker image is a valid approach, it is not the best action here because Vertex AI Experiments with a prebuilt container provides a simpler, managed solution that automatically tracks experiments and environments without requiring the team to build and maintain custom images.

Option E is wrong because manually using a requirements.txt file and virtual environments is error-prone and does not guarantee identical system-level dependencies or Python interpreter versions across different instances, leading to subtle reproducibility issues.

38
Drag & Dropmedium

Drag and drop the steps to perform a hyperparameter tuning job on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Define the search space, then create and run the tuning job, monitor, and select the best parameters.

39
MCQhard

Refer to the exhibit. A team uses this Cloud Build configuration to deploy a model to a Vertex AI endpoint. The build succeeds up to the 'upload' step, but the 'deploy-model' step fails with an error that the model 'my-model' does not exist. What is the most likely cause?

A.The deploy step uses the display name instead of the model resource ID
B.The model was not uploaded because the artifact URI is a directory, not a valid SavedModel
C.The Vertex AI API was not enabled for the project
D.The region in the deploy step does not match the model's region
AnswerB

The artifact URI must point to a specific model file or subdirectory, not a generic directory.

Why this answer

The 'deploy-model' step fails because the model was not successfully uploaded. Cloud Build's 'upload' step expects a valid SavedModel artifact (a directory containing a saved_model.pb file and variables subdirectory). If the artifact URI points to a directory that is not a valid SavedModel, the upload may appear to succeed but does not register a usable model resource, causing the subsequent deploy step to fail with 'model does not exist'.

Exam trap

Google Cloud often tests the distinction between a successful upload step and a valid model registration, trapping candidates who assume any directory upload creates a usable model resource.

How to eliminate wrong answers

Option A is wrong because the deploy step uses the model resource ID, not the display name; the error message explicitly says 'my-model' does not exist, indicating the model resource was never created. Option C is wrong because if the Vertex AI API were not enabled, the build would fail at the 'upload' step or earlier with an API enablement error, not specifically at the deploy step. Option D is wrong because region mismatch would cause a different error (e.g., 'model not found in region') or a permission error, but the error message states the model does not exist, implying it was never registered in any region.

40
MCQhard

A team is building a CI/CD pipeline for ML using Cloud Build. The pipeline trains a model and deploys it to Vertex AI. Recently, a change in the data processing step caused the model to be trained with a different data version, leading to a failed deployment because the model was invalid. How should the team prevent this in the future?

A.Add a manual review step before training
B.Pin all library versions in the Docker image
C.Use a data versioning tool (e.g., DVC) to track datasets and ensure the pipeline always uses the correct version
D.Schedule a cron job to check for data changes
AnswerC

Data versioning ensures reproducibility and consistency across pipeline runs.

Why this answer

Option C is correct because the root cause is a data version mismatch, not a code or environment issue. A data versioning tool like DVC (Data Version Control) tracks dataset versions via hash-based pointers in Git, ensuring the pipeline retrieves the exact dataset version used during training. This prevents silent failures when data processing steps change the data schema or content, which library pinning or manual reviews cannot guarantee.

Exam trap

The trap here is that candidates confuse environment reproducibility (pinning libraries) with data reproducibility, assuming that locking code dependencies is sufficient to prevent model failures caused by data drift or version changes.

How to eliminate wrong answers

Option A is wrong because a manual review step before training introduces human latency and does not enforce data version consistency; it relies on a person to catch a version mismatch that may not be visually obvious. Option B is wrong because pinning library versions in the Docker image addresses dependency drift in code, not data versioning; the model failed due to a different data version, not a library incompatibility. Option D is wrong because scheduling a cron job to check for data changes is reactive and does not prevent the pipeline from using the wrong data version; it only alerts after the fact, and the pipeline would still train on incorrect data.

41
MCQhard

A Vertex AI pipeline is triggered from Cloud Build using the configuration above. The pipeline fails with an error: 'Unable to submit build: The source code is not available.' What is the most likely cause?

A.The Docker build step failed silently due to a missing dependency.
B.The 'gcloud builds submit' command does not have access to the source code in the Cloud Build environment.
C.The Docker image tag does not include a hash, causing the push to fail.
D.The Cloud Build service account lacks permission to access the Vertex AI Pipeline API.
AnswerB

The source code must be provided or referenced explicitly; using 'gcloud builds submit' in a step requires the source to be available via a trigger or artifact.

Why this answer

The error 'Unable to submit build: The source code is not available' indicates that the Cloud Build environment cannot locate the source code when the 'gcloud builds submit' command is executed. This typically happens when the pipeline is triggered from Cloud Build but the source code is not properly staged or accessible in the build context, often because the build configuration does not include the source directory or the source is not uploaded to Cloud Storage. Option B correctly identifies that the command lacks access to the source code in the Cloud Build environment.

Exam trap

Google Cloud often tests the distinction between source code availability errors and permission or build failures, leading candidates to mistakenly attribute the error to service account permissions or Docker issues when the root cause is a missing or misconfigured source path.

How to eliminate wrong answers

Option A is wrong because a silent Docker build step failure due to a missing dependency would produce a different error, such as 'Failed to build' or 'Docker build failed', not a source code unavailability error. Option C is wrong because the Docker image tag missing a hash would cause a push failure with an error like 'unauthorized' or 'tag invalid', not a source code availability issue. Option D is wrong because a permission issue with the Cloud Build service account accessing the Vertex AI Pipeline API would result in an authorization error (e.g., 'Permission denied'), not a source code unavailability error.

42
Multi-Selectmedium

Which TWO options are recommended practices for managing model versions across teams in Google Cloud?

Select 2 answers
A.Store all model files in a GitHub repository
B.Maintain a custom database to map model names to artifact locations
C.Use AI Platform (Unified) Models as the primary model registry
D.Use Vertex AI Model Registry to track model versions and their deployment history
E.Use Cloud Storage buckets with object versioning enabled to store model artifacts
AnswersD, E

Model Registry is the recommended service for managing model versions.

Why this answer

Vertex AI Model Registry is the recommended service for managing model versions across teams because it provides a centralized repository to track model versions, their associated metadata, and deployment history. It integrates natively with Vertex AI endpoints and pipelines, enabling consistent governance and lineage tracking across the ML lifecycle.

Exam trap

Google Cloud often tests the distinction between legacy AI Platform (Unified) Models and the current Vertex AI Model Registry, expecting candidates to recognize that the registry is the recommended service for version management and deployment history, not just a generic model storage location.

43
Matchingmedium

Match each ML pipeline component to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Production ML pipeline framework by Google

ML toolkit for Kubernetes-based workflows

Unified stream and batch data processing service

Managed Apache Airflow workflow orchestration

Serverless ML pipeline orchestration on Vertex AI

Why these pairings

Pipeline components are key for MLOps in Google Cloud.

44
MCQeasy

A data science team is deploying a large NLP model to Vertex AI for real-time inference. They notice high latency per request. Which action should they take first to reduce latency?

A.Use Cloud Functions for inference.
B.Use model optimization techniques like quantization or pruning.
C.Use Vertex AI Model Optimization to quantize the model and deploy on a smaller machine.
D.Enable autoscaling and set min replicas to 5.
E.Implement batch prediction instead of online prediction.
AnswerC

Quantization reduces model size and latency directly.

Why this answer

Option C is correct because it directly addresses the root cause of high latency in real-time inference: model size and compute requirements. Vertex AI Model Optimization applies quantization or pruning to reduce the model's memory footprint and computational cost, allowing it to run on a smaller, faster machine (e.g., fewer vCPUs or less GPU memory) while maintaining acceptable accuracy. This is the first step recommended by Google Cloud best practices for latency-sensitive deployments, as it reduces per-request processing time without requiring architectural changes.

Exam trap

Google Cloud often tests the misconception that scaling out (autoscaling) or switching to batch processing is the first step to reduce latency, when in fact model optimization and hardware matching are the primary levers for per-request performance in real-time inference.

How to eliminate wrong answers

Option A is wrong because Cloud Functions are stateless, short-lived compute units with a maximum timeout of 9 minutes and limited GPU support, making them unsuitable for hosting large NLP models for real-time inference; they introduce cold-start latency and lack the persistent infrastructure needed for model serving. Option B is wrong because it suggests using model optimization techniques like quantization or pruning but omits the critical step of deploying on a smaller machine; without adjusting the underlying hardware, the latency reduction from optimization alone may be insufficient, and the question asks for the first action to take. Option D is wrong because enabling autoscaling with a minimum of 5 replicas increases resource availability but does not reduce per-request latency; it may even increase cost and complexity without addressing the model's inference speed.

Option E is wrong because batch prediction is designed for asynchronous, high-throughput processing of large datasets, not for real-time inference; it introduces higher latency per request due to queuing and batching overhead, making it counterproductive for reducing latency in a real-time scenario.

45
MCQeasy

A team wants to ensure that only approved models are deployed to production. Which Vertex AI feature should they use?

A.Vertex AI Experiments.
B.Cloud DLP.
C.Vertex AI Pipelines.
D.Vertex AI Feature Store.
E.Vertex AI Model Registry with versioning and alias.
AnswerE

Model Registry provides version control and alias-based deployment gates.

Why this answer

Vertex AI Model Registry with versioning and alias (Option E) is the correct feature because it allows teams to manage model lifecycle, track approved versions, and assign aliases (e.g., 'champion' or 'production') to designate which model is approved for deployment. This ensures only vetted models are promoted to production, aligning with governance and compliance requirements.

Exam trap

Google Cloud often tests the distinction between model tracking (Experiments) and model governance (Registry), so the trap here is assuming that any 'management' feature (like Pipelines or Experiments) can enforce deployment approvals, when only the Registry with aliases provides explicit version control and approval semantics.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing ML training runs, not for managing model deployment approvals. Option B is wrong because Cloud DLP (Data Loss Prevention) is a service for inspecting and masking sensitive data, not for model governance or deployment control. Option C is wrong because Vertex AI Pipelines orchestrates ML workflows (e.g., training, evaluation) but does not inherently enforce approval gates for production deployment.

Option D is wrong because Vertex AI Feature Store is used for storing, serving, and sharing feature data, not for model versioning or deployment approval.

Ready to test yourself?

Try a timed practice session using only Ml Data Model Management questions.