CCNA Data For Ai Questions — Page 3 of 3

151

Multi-Selecthard

A data scientist is using Einstein Discovery to analyze sales data. The model results show a high correlation between two predictor variables. Which TWO actions should the data scientist take?

Select 2 answers

A.Apply regularization.

B.Combine them into a single feature.

C.Include both to capture more information.

D.Increase the sample size.

E.Remove one of the correlated variables.

AnswersB, E

Creates a new variable that captures the combined effect.

Why this answer

Removing one correlated variable or combining them reduces multicollinearity.

Practice this question →

152

MCQhard

A data scientist notices that a Salesforce Einstein model's performance degrades over time. The model was trained on data from the last year. What is the most likely cause?

A.Concept drift

B.Underfitting

C.Data leakage

D.Overfitting

AnswerA

Concept drift describes changes in the relationship between predictors and target over time, leading to performance decay.

Why this answer

Option D is correct because concept drift occurs when the underlying data distribution changes over time, causing model degradation. Overfitting and underfitting are training issues, not time-dependent; data leakage would cause overestimation initially.

Practice this question →

153

MCQhard

A data scientist discovers that an AI model used for loan approval predicts high default risk disproportionately for a specific demographic group. What is the first step to address this issue?

A.Use a different algorithm

B.Audit the training data for bias

C.Remove demographic features from the model

D.Retrain the model with more data

AnswerB

Auditing helps identify and mitigate bias in data.

Why this answer

Option B is correct because auditing the training data for bias helps identify if the model learned biased patterns. Option A is wrong because retraining with more data may not solve the bias if the new data also contains bias. Option C is wrong because removing demographic features may not eliminate bias if other correlated features exist.

Option D is wrong because changing the algorithm does not address biased data.

Practice this question →

154

MCQeasy

A retail company has implemented a Salesforce AI lead scoring model to prioritize high-value customers. After three months, the model's AUC-ROC score is only 0.55, indicating poor performance. The data scientist reviews the training data and finds that 20% of the records are exact duplicates due to multiple data imports from different sources. The duplicates have inconsistent target labels (some labeled 'converted', others 'not converted'). What should the data scientist do to improve model performance?

A.Downsample duplicates to reduce their impact but keep all records.

B.Use the duplicates as a separate class to indicate noisy data.

C.Remove all duplicate records and keep only one instance per duplicate group, resolving label conflicts by majority vote.

D.Keep all duplicates because they represent multiple interactions; increase model complexity to handle them.

AnswerC

This cleans the data, removes noise, and provides consistent labels, likely improving model performance.

Why this answer

Duplicate records with conflicting labels confuse the model. Removing duplicates and resolving label conflicts (e.g., by majority vote) is the most effective step to clean the data and improve performance.

Practice this question →

155

MCQeasy

A retail company uses Salesforce Data Cloud to power Einstein AI for personalized product recommendations. They have integrated customer data from multiple sources: ERP (order history), marketing automation (email engagement), and web analytics (browsing behavior). The data model includes a unified Customer__dlm object with fields: Age__c, TotalSpend__c, LastPurchaseDate__c, EmailEngagementScore__c, and WebSessionCount__c. The AI model is configured to predict "LikelyToPurchaseNextWeek__c" (Boolean). The data team has noticed that the predictions are less accurate for new customers (those with less than 30 days of data). The model was trained on all customer data without any filtering. The team wants to improve model performance without increasing training frequency. What should they do?

A.Increase the training window from Last_90_Days to Last_180_Days to include more data.

B.Include a new feature such as 'DaysSinceFirstPurchase__c' to capture customer maturity.

C.Change the model from classification to regression to predict a probability instead of binary.

D.Exclude new customers from the training set to focus on established customers.

AnswerB

This feature directly differentiates new and established customers, helping the model adapt.

Why this answer

Option A is correct because adding a feature that captures customer tenure (e.g., days since first purchase) allows the model to learn patterns specific to new vs. established customers, improving accuracy. Option B is incorrect because increasing the training window does not address the lack of tenure information. Option C is incorrect because excluding new customers would bias the model and ignore a valuable segment.

Option D is incorrect because changing to regression would not solve the underlying issue and changes the problem scope.

Practice this question →

156

MCQmedium

Refer to the exhibit. A data scientist sees this error when training an Einstein Discovery model for customer churn prediction. What is the most likely reason for the error?

A.The field count (8) exceeds the maximum of 5 allowed fields.

B.The positive examples (180) are insufficient for the number of fields (8).

C.The model name contains a version number, which is not allowed.

D.The dataset has too few records (3200) for 8 fields.

AnswerB

50 per field * 8 = 400; 180 is below the threshold.

Why this answer

Einstein Discovery requires at least 50 positive examples per predictor field. With 8 fields, at least 400 positive examples are needed. Only 180 were provided, causing the error.

Practice this question →

157

MCQmedium

Refer to the exhibit. A data access policy is defined for a customer data set. Which statement best describes this policy?

A.All users can read customer data.

B.Only users in the EU region can read customer data, with emails and phones partially masked.

C.All users can read customer data but only those in the EU can see unmasked emails.

D.Only users in the EU region can read customer data, with emails and phones fully hidden.

AnswerB

The condition 'region eq EU' limits access, and masking method 'partial' obscures part of the data.

Why this answer

Option B is correct because the policy shown in the exhibit applies a row-level security filter that restricts access to customer data based on the user's region, and a column-level masking rule that partially masks email and phone fields. Only users whose region attribute is set to 'EU' can view the rows, and even then, the email and phone columns are obfuscated (e.g., j***@example.com, ***-***-1234). This matches the behavior of a data masking policy combined with a regional access filter, which is a common pattern in data platforms like Snowflake or AWS Lake Formation.

Exam trap

Salesforce often tests the distinction between 'partial masking' and 'full hiding' in data access policies, where candidates mistakenly assume that a regional restriction implies full obfuscation rather than a controlled partial reveal.

How to eliminate wrong answers

Option A is wrong because the policy does not grant read access to all users; it explicitly restricts access to only users in the EU region. Option C is wrong because it claims all users can read customer data, which contradicts the regional filter, and it also states that EU users can see unmasked emails, whereas the policy partially masks emails and phones for all viewers. Option D is wrong because it says emails and phones are fully hidden, but the policy applies partial masking (e.g., showing a portion of the data), not complete suppression.

Practice this question →

158

MCQeasy

A non-profit organization uses Data Cloud to manage donor data from multiple sources (email campaigns, event attendance, donations). They want to use an AI model to predict future donations. The data scientist says the model needs a unified view of each donor with consistent fields. What is the first step the data architect should take in Data Cloud to enable this?

A.Define a Data Model that maps each source's fields to a common donor object

B.Create a Data Stream for each source (email, events, donations)

C.Set up Data Actions to clean data at each source

D.Immediately start training the AI model on raw data from the streams

AnswerA

Provides the unified schema required for the AI model.

Why this answer

Option B is the correct first step. Defining a Data Model that maps fields from each source to a common donor object ensures all data conforms to a single schema, which is foundational for unified views. Option A is necessary but can be done after the model.

Option C is premature without unified data. Option D is too early; cleaning at source is good but not the first step in Data Cloud.

Practice this question →

159

MCQmedium

A company is preparing data for Einstein Prediction Builder to forecast lead conversion. They have historical data with fields like Lead Source, Industry, Number of Employees, and Converted (boolean). Which data preparation step is most critical?

A.Mix data from all lead sources without normalization

B.Ensure data completeness by handling missing values in Lead Source

C.Use only the last 3 months of data for training

D.Remove all records with outliers in Number of Employees

AnswerB

Completeness is a key data quality dimension; missing values in a predictor reduce model reliability.

Why this answer

Handling missing values in Lead Source is critical because Einstein Prediction Builder requires complete, high-quality data to train accurate predictive models. Missing categorical fields like Lead Source can introduce bias or cause the model to ignore important patterns in lead conversion. Ensuring data completeness through imputation or removal of incomplete records is a standard data preparation step for AI/ML in Salesforce.

Exam trap

Salesforce often tests the misconception that more data or aggressive cleaning (like removing outliers or using only recent data) always improves AI model accuracy, when in fact data completeness and representative sampling are more critical for supervised learning tasks like lead conversion prediction.

How to eliminate wrong answers

Option A is wrong because mixing data from all lead sources without normalization can introduce scale differences and skew model predictions; normalization is often required for numerical features, but the key issue here is that mixing without handling categorical consistency (like Lead Source) can degrade model performance. Option C is wrong because using only the last 3 months of data may not capture seasonal trends or sufficient historical patterns, leading to overfitting or poor generalization; Einstein Prediction Builder benefits from a broader historical window (e.g., 12-24 months) to learn conversion patterns. Option D is wrong because removing all records with outliers in Number of Employees can discard valuable data points that represent legitimate business segments (e.g., large enterprises) and reduce model robustness; outlier treatment should be context-aware, not automatic removal.

Practice this question →

160

MCQeasy

Which Salesforce feature automatically flags data quality issues before training an AI model?

A.Flow

B.Validation Rules

C.Duplicate Matching

D.Data Manager (Data Prep Assistant)

AnswerD

Data Manager scans datasets for missing values, outliers, and other issues before modeling.

Why this answer

Option B is correct because the Data Manager (or Data Prep Assistant) in Salesforce provides data quality checks and recommendations. Validation Rules are for entry-time; Duplicate Matching is for deduplication; Flow is for automation.

Practice this question →

161

MCQhard

Refer to the exhibit. A data analyst has defined this field mapping for Einstein Prediction Builder. Which data issue would most likely arise from this mapping?

A.The 'LeadSource' field should be mapped to 'Text' instead of 'Category' to preserve verbatim values

B.The 'Amount' field should be mapped to 'Category' to discretize the values

C.The 'Id' field should be excluded as it can cause data leakage and overfitting

D.The 'CloseDate' field should be mapped to 'Text' to avoid date parsing issues

AnswerC

Unique identifiers act as keys and should not be used as predictors.

Why this answer

Option C is correct because including the 'Id' field in Einstein Prediction Builder can cause data leakage and overfitting. The 'Id' field is a unique identifier that has no predictive value for the target outcome, but the model could learn to memorize specific records based on it, leading to poor generalization on unseen data.

Exam trap

Salesforce often tests the concept of data leakage by including a seemingly harmless field like 'Id', tricking candidates into thinking all fields should be mapped, when in fact unique identifiers must be excluded to prevent overfitting.

How to eliminate wrong answers

Option A is wrong because 'LeadSource' is a categorical field with a limited set of values (e.g., 'Web', 'Phone'), so mapping it to 'Category' is appropriate; mapping it to 'Text' would treat each unique value as a separate token, which is not suitable for prediction models. Option B is wrong because 'Amount' is a continuous numeric field, and mapping it to 'Category' would discretize it, losing granularity and potentially reducing model accuracy; it should remain as a numeric field. Option D is wrong because 'CloseDate' is a date field, and mapping it to 'Text' would prevent proper date parsing and feature extraction (e.g., day of week, month); Einstein Prediction Builder handles date fields natively for time-based features.

Practice this question →

162

MCQmedium

A team has limited labeled data for a Salesforce predictive model but wants to leverage a pre-trained model from a related task. Which machine learning approach should they use?

A.Transfer learning

B.Unsupervised learning

C.Supervised learning

D.Reinforcement learning

AnswerA

Transfer learning adapts a pre-trained model to a new task with limited data.

Why this answer

Option C is correct because transfer learning uses a pre-trained model and fine-tunes it with limited labeled data. Option A is wrong because unsupervised learning does not use labels. Option B is wrong because supervised learning requires large labeled datasets.

Option D is wrong because reinforcement learning is for decision-making, not classification.

Practice this question →

163

Multi-Selectmedium

Which TWO actions are required to prepare data for an Einstein Discovery model?

Select 2 answers

A.Remove all records with missing values in any field.

B.Select exactly 10 predictor fields manually.

C.Ensure the data is stored in a Salesforce object or a connected data source.

D.Create a separate dataset for training and validation.

E.Define the outcome field that the model will predict.

AnswersC, E

Einstein Discovery requires data accessible via Salesforce.

Why this answer

Options A and C are correct. A: Data must be in a supported object (standard or custom). C: The outcome field must be specified.

Option B is not required; Einstein handles missing values. Option D is not required; data can be within Salesforce or connected. Option E is not required; features can be auto-selected.

Practice this question →