Knowledge + Practice

Salesforce AI Associate AI Associate (AI Associate) — Questions 226–300

506 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 4 of 7

226

MCQmedium

A retail company wants to increase average order value by showing personalized product recommendations on their website. They currently use Salesforce Commerce Cloud and have Einstein Recommendations enabled. However, they notice that recommendations are not reflecting recent customer interactions, such as items added to cart but not purchased. What should the administrator do to improve recommendation relevance?

A.Reset the Einstein Recommendations model and retrain from scratch.

B.Implement an Einstein Bot to directly ask customers about their preferences.

C.Disable Einstein Recommendations for email and ads, leaving only web recommendations.

D.Enable the 'Add to Cart' and 'Checkout' events in the Einstein Recommendations data integration to capture real-time data.

AnswerD

Capturing real-time events allows the model to incorporate recent cart activity.

Why this answer

Option D is correct because Einstein Recommendations relies on data integration events to capture real-time customer behavior. By enabling 'Add to Cart' and 'Checkout' events, the system can ingest recent interactions (e.g., items added to cart but not purchased) and adjust recommendations accordingly, improving relevance without requiring a full model reset.

Exam trap

Salesforce often tests the misconception that resetting the model (Option A) is the default fix for stale recommendations, when in fact the root cause is almost always missing or misconfigured data integration events.

How to eliminate wrong answers

Option A is wrong because resetting and retraining the model from scratch would discard all historical learning and does not address the missing real-time event data; it is an overreaction that would degrade performance temporarily. Option B is wrong because an Einstein Bot is designed for conversational interactions, not for passively capturing behavioral events like add-to-cart; it would add unnecessary complexity and user friction without solving the data integration gap. Option C is wrong because disabling recommendations for email and ads does not affect the capture of real-time events on the website; the issue is about data ingestion, not channel distribution.

Full explanation →

227

MCQeasy

A marketing manager uses Einstein recommendations on their website, but customers are receiving suggestions for products they already purchased. What is the most likely cause?

A.The product catalog is not updated with purchase history.

B.The model is using real-time browsing data that includes past purchases.

C.The recommendation model is not filtering out previously purchased items.

D.The recommendations are based on collaborative filtering without personalization.

AnswerC

Einstein recommendations can exclude purchased items.

Why this answer

C is correct because the most likely cause is that the recommendation model is not configured to exclude previously purchased items. Einstein Recommendations uses customer purchase history to personalize suggestions, but if the model's filtering logic does not explicitly remove items the customer has already bought, those items will continue to appear in the recommendations. This is a common oversight in model configuration rather than a data sync or algorithm type issue.

Exam trap

Salesforce often tests the distinction between data source issues (e.g., catalog not updated) versus model configuration issues (e.g., missing filters), and the trap here is assuming the problem is a data sync failure when it is actually a missing business rule in the recommendation logic.

How to eliminate wrong answers

Option A is wrong because the product catalog typically contains product metadata (e.g., name, price, category), not individual customer purchase history; purchase history is stored separately in order or transaction objects. Option B is wrong because real-time browsing data includes current session behavior, not past purchases; past purchases are historical data, not real-time. Option D is wrong because collaborative filtering inherently personalizes recommendations based on user-item interactions; the issue is not the algorithm type but the lack of a filter to exclude purchased items.

Full explanation →

228

MCQmedium

A multinational corporation uses Salesforce AI to analyze customer feedback across multiple languages. They have 10,000 English reviews, 2,000 Spanish reviews, and 500 French reviews. The sentiment model performs well on English (F1=0.85) but poorly on French (F1=0.40). The data scientist wants to improve French sentiment performance without collecting new data. What should they do?

A.Translate all French reviews to English and train only on English data.

B.Use a multilingual pre-trained model without any additional French data.

C.Remove French data and use only English and Spanish to avoid imbalance.

D.Apply data augmentation to the French reviews using back-translation (translate to another language and back) to create more training examples.

AnswerD

Back-translation generates realistic paraphrases, augmenting the French dataset and improving model performance.

Why this answer

Data augmentation techniques like back-translation generate synthetic French samples, effectively increasing the minority language's representation and helping the model learn better.

Full explanation →

229

Multi-Selectmedium

Which TWO actions are essential for ensuring transparency in an AI system? (Choose two.)

Select 2 answers

A.Hide the model's internal logic to protect intellectual property

B.Log all AI decisions and allow audit

C.Train the model on the largest dataset available

D.Provide clear explanations for AI decisions

E.Obtain consent from all data subjects

AnswersB, D

Auditability is essential for transparency.

Why this answer

Options B and D are correct because providing explanations and logging decisions are key to transparency. Option A is wrong because hiding the model reduces transparency. Option C is wrong because training on all data may embed biases, and does not directly relate to transparency.

Option E is wrong because consent is about privacy, not transparency.

Full explanation →

230

Multi-Selectmedium

Which THREE factors should be considered when selecting features for a predictive model in Salesforce?

Select 3 answers

A.Volume of data available for each feature

B.Correlation between features to avoid multicollinearity

C.Relevance of the feature to the target variable

D.Compliance with data privacy regulations

E.Business interpretability of the feature

AnswersB, C, E

Multicollinearity can harm model stability.

Why this answer

Option B is correct because multicollinearity occurs when two or more features are highly correlated, which can destabilize model coefficients and reduce interpretability. In Salesforce's predictive models, such as those built with Einstein Discovery, correlated features can inflate variance and lead to unreliable predictions. Avoiding multicollinearity ensures that the model's feature importance estimates are trustworthy and that the model generalizes well to new data.

Exam trap

Salesforce often tests the distinction between feature selection criteria (predictive power, correlation, interpretability) and broader data management concerns (privacy, volume), leading candidates to mistakenly include compliance or data volume as direct feature selection factors.

Full explanation →

231

MCQeasy

A company wants to use Einstein to predict the optimal discount amount for each deal. Which type of machine learning problem does this represent?

A.Reinforcement learning

B.Regression

C.Classification

D.Clustering

AnswerB

Regression predicts continuous numeric outcomes.

Why this answer

Predicting a continuous numerical value, such as the optimal discount amount for a deal, is a regression problem. In the context of Einstein, this would use a regression model to learn from historical deal data and output a specific discount percentage or dollar amount, rather than a category or cluster.

Exam trap

Salesforce often tests the distinction between regression and classification by presenting a scenario where the output is a number, leading candidates to mistakenly think it is classification because they associate 'prediction' with categories, but the key is whether the output is continuous or discrete.

How to eliminate wrong answers

Option A is wrong because reinforcement learning involves an agent learning to make sequences of decisions through trial and error to maximize a reward, not predicting a single continuous value like a discount amount. Option C is wrong because classification predicts discrete categories or labels (e.g., 'high discount' vs 'low discount'), not a continuous numerical output. Option D is wrong because clustering groups unlabeled data into clusters based on similarity, without a target variable to predict a specific discount amount.

Full explanation →

232

MCQeasy

A company is building a chatbot using Einstein Bot's AI capabilities. They want to train intent recognition using historical chat transcripts. The transcripts contain many typos (e.g., 'hellp' instead of 'help') and slang (e.g., 'gonna' instead of 'going to'). The initial model performs poorly, misclassifying many intents. What data cleaning step is most important?

A.Use a spell-checker only for words that appear infrequently.

B.Keep the raw text as is because it reflects real user behavior.

C.Normalize text by applying spell-correction and replacing slang with standard terms.

D.Remove all messages that contain typos or slang to clean the dataset.

AnswerC

Normalization reduces noise and variability, enabling the model to focus on meaningful patterns.

Why this answer

Normalizing text by correcting common typos and expanding slang reduces vocabulary sparsity and helps the model learn consistent word associations, improving intent recognition.

Full explanation →

233

MCQeasy

A machine learning team is preparing a dataset for a supervised learning task. They have 100,000 labeled samples. Which data preparation step is essential before splitting into train/test sets?

A.Normalize all features to the same scale.

B.Remove all outliers from the dataset.

C.Shuffle the dataset randomly.

D.Visualize the data distribution for each feature.

AnswerC

Shuffling prevents biased splits.

Why this answer

Option C is correct because shuffling the dataset randomly before splitting into train/test sets ensures that the data distribution is similar across both subsets. Without shuffling, the split might inadvertently separate ordered or grouped data (e.g., time-series or batches), leading to biased model evaluation. This step is essential for supervised learning to prevent data leakage and ensure the test set is representative of the overall population.

Exam trap

Salesforce often tests the misconception that normalization or outlier removal must be done before splitting, but the trap here is that candidates overlook the fundamental need to randomize the data order to avoid temporal or structural bias in the train/test split.

How to eliminate wrong answers

Option A is wrong because normalizing features to the same scale is a preprocessing step typically applied after splitting the data, using statistics (e.g., mean and standard deviation) computed only from the training set to avoid data leakage into the test set. Option B is wrong because removing all outliers before splitting can introduce bias and reduce the dataset's representativeness; outlier handling should be done with care, often after splitting, and may be domain-specific. Option D is wrong because visualizing data distributions is an exploratory step that helps understand the data but is not essential before splitting; it can be performed after splitting to avoid influencing the split decisions.

Full explanation →

234

MCQmedium

Refer to the exhibit. What action should be taken?

A.Increase the accuracy threshold

B.Remove the model

C.Deploy as is

D.Retrain the model with balanced training data

AnswerD

Balancing data can reduce disparate impact.

Why this answer

Option B is correct because retraining with balanced data can help reduce disparate impact. Option A is wrong deploying with a ratio of 0.6 is likely illegal and unethical. Option C is wrong increasing accuracy threshold does not address fairness.

Option D is wrong removing the model may be too drastic without attempting mitigation.

Full explanation →

235

MCQeasy

Which method is most suitable for ingesting streaming data from IoT sensors into a data lake?

A.Copying data via FTP.

B.Batch ingestion every 24 hours.

C.Manual upload via web interface.

D.Real-time streaming with Apache Kafka.

AnswerD

Kafka provides high-throughput, fault-tolerant streaming for IoT data.

Why this answer

Apache Kafka is the most suitable option because it is a distributed streaming platform designed for high-throughput, fault-tolerant, real-time data ingestion. IoT sensors generate continuous, high-velocity data streams, and Kafka's publish-subscribe model allows data to be ingested into a data lake with low latency, ensuring near-real-time availability for analytics.

Exam trap

Salesforce often tests the distinction between batch and real-time processing, and the trap here is that candidates may choose batch ingestion (Option B) thinking it is simpler or sufficient, overlooking the fundamental requirement for low-latency streaming in IoT sensor data ingestion.

How to eliminate wrong answers

Option A is wrong because FTP (File Transfer Protocol) is a batch-oriented file transfer protocol that lacks real-time streaming capabilities, introduces latency, and does not handle continuous data streams from IoT sensors efficiently. Option B is wrong because batch ingestion every 24 hours introduces unacceptable latency for streaming IoT data, which often requires immediate processing for time-sensitive applications like anomaly detection or predictive maintenance. Option C is wrong because manual upload via a web interface is impractical for high-frequency sensor data, as it requires human intervention, cannot scale, and introduces significant delays and errors.

Full explanation →

236

MCQmedium

A nonprofit organization uses Salesforce Nonprofit Cloud with Einstein Discovery to analyze donation patterns. They have activated a story that predicts which donors are most likely to churn (stop donating) in the next three months. The story shows a top influence called 'DonationFrequency' with a negative correlation: donors who donate less than once per quarter are 40% more likely to churn. The director of development wants to use this insight to create a retention campaign. However, the story also includes a field called 'LastDonationAmount' which has a small positive influence. The development team wants to ensure the predictions are actionable. What should the administrator do to maximize the effectiveness of the Einstein Discovery story for this retention campaign?

A.Retrain the prediction model using only 'DonationFrequency' and 'LastDonationAmount' as predictors.

B.Delete the 'LastDonationAmount' influence from the story to simplify the output.

C.Adjust the influence weight of 'DonationFrequency' to be higher in the story settings.

D.Create a segment of donors with low donation frequency and use that as the target for the retention campaign.

AnswerD

Focusing on the strongest actionable influence maximizes campaign impact.

Why this answer

Option D is correct because the most actionable insight from the Einstein Discovery story is the strong negative correlation of 'DonationFrequency' with churn. By creating a segment of donors with low donation frequency, the administrator can directly target the highest-risk group for a retention campaign, making the prediction actionable without altering the model or its output. This approach leverages the story's findings as-is, which is the intended use of Einstein Discovery insights.

Exam trap

Salesforce often tests the misconception that administrators can directly edit or retrain Einstein Discovery models to suit specific needs, when in fact the platform is designed to be used as-is, with actionable insights derived from segmenting the data rather than altering the model.

How to eliminate wrong answers

Option A is wrong because retraining the model with only two predictors removes other potentially valuable influences and violates the principle of using the model as generated by Einstein Discovery, which automatically selects the most predictive features. Option B is wrong because deleting an influence from the story does not change the underlying model; it only hides the field from the UI, and the prediction still uses 'LastDonationAmount' internally, so this does not make the output more actionable. Option C is wrong because Einstein Discovery does not allow manual adjustment of influence weights in story settings; the influence percentages are determined by the model's algorithm and cannot be overridden by an administrator.

Full explanation →

237

MCQmedium

A mid-size company uses Sales Cloud with Einstein Lead Scoring and Einstein Activity Capture. The sales team reports that lead scores are not updating for leads that have been engaged via email and calendar events over the past two weeks. The admin checks the Einstein Lead Scoring model and finds that the model status is 'Active' and was retrained last month. The admin also verifies that Einstein Activity Capture is enabled and syncing data correctly. However, the lead scores remain unchanged. Upon further investigation, the admin discovers that the leads were created before the Einstein Lead Scoring model was activated, and the model's training data includes only leads created after activation. The company has over 10,000 leads, but only 200 were created after activation. Historical conversion data for leads created before activation is not being used. What should the admin do to ensure lead scores reflect recent engagement?

A.Map the email and event fields to the lead object so that the model can use them

B.Add the Activity Count field to the scoring fields list in the model configuration

C.Re-enable Einstein Activity Capture to resync all historical emails and events

D.Retrain the Einstein Lead Scoring model using all historical lead data, including pre-activation leads

AnswerD

Retraining with a larger dataset improves the model's ability to score older leads.

Why this answer

Option D is correct because retraining the model with all historical lead data (including pre-activation leads) will include conversion patterns from a larger dataset, improving accuracy and enabling scores for older leads. Option A is wrong because field mapping alone does not cause scoring to update. Option B is wrong because Einstein Activity Capture is already syncing; the issue is with the model.

Option C is wrong because the scoring fields are separate from activity tracking.

Full explanation →

238

MCQhard

A company is developing an AI system to screen job applications. They want to ensure compliance with ethical AI standards and avoid discrimination. Which approach demonstrates the most robust ethical governance?

A.Only test the model for bias after receiving complaints from applicants

B.Rely solely on Salesforce's built-in fairness metrics to validate the model

C.Remove sensitive attributes from training data to ensure fairness

D.Implement an AI ethics board with cross-functional stakeholders, conduct bias testing before deployment, and establish ongoing monitoring

AnswerD

This provides a robust governance framework.

Why this answer

Option D (Implement an AI ethics board with cross-functional stakeholders, conduct bias testing before deployment, and establish ongoing monitoring) is the most comprehensive. Option A (relying solely on Salesforce's built-in fairness tools) is insufficient without organizational governance. Option B (using anonymized data but not testing for proxy variables) might miss subtle biases.

Option C (only testing after complaints) is reactive, not proactive.

Full explanation →

239

Multi-Selecteasy

Which TWO statements are true about Einstein Prediction Builder in Salesforce? (Choose two.)

Select 2 answers

A.It is only available for lead scoring models.

B.It only supports predictions on the Opportunity object.

C.The prediction model automatically retrains every 24 hours.

D.It can use related object fields as predictors in the model.

E.It allows users to create custom predictions using fields from standard and custom objects.

AnswersD, E

Related object fields can be included as input features.

Why this answer

Option D is correct because Einstein Prediction Builder can include fields from related objects (e.g., child objects or lookup objects) as predictors in the model, enabling richer data inputs for predictions. This is achieved through the platform's ability to traverse relationships and aggregate data from related records, which significantly enhances model accuracy.

Exam trap

The trap here is that candidates often assume Einstein Prediction Builder is limited to lead scoring or a single object, but Salesforce designed it to be object-agnostic, and the automatic retraining interval is not 24 hours but rather triggered by data changes or a configurable schedule.

Full explanation →

240

MCQhard

A service team uses Einstein Discovery to analyze customer churn. The story shows 'Average Resolution Time' is a key driver. What is the best action?

A.Reduce Average Resolution Time through process changes

B.Update account records via process builder

C.Configure a flow to send churn alerts

D.Create a custom report on churn

AnswerA

Directly addresses the driver identified by AI.

Why this answer

Einstein Discovery identifies drivers; reducing resolution time directly addresses root cause. Custom reports just show data, flows send alerts, processes update records but don't reduce time.

Full explanation →

241

MCQmedium

A data scientist notices that the model accuracy drops significantly after retraining with new data. Upon inspection, they find that many records have missing values for a key feature. Which data quality improvement should be prioritized first?

A.Implement imputation for missing feature values.

B.Normalize the feature range.

C.Reduce the number of features.

D.Remove duplicate records.

AnswerA

Imputation addresses missing data, a common cause of accuracy drop.

Why this answer

The core issue is that missing values in a key feature introduce noise and bias, directly degrading model performance. Imputation (option A) is the most direct and impactful first step because it preserves the dataset size and feature set, allowing the model to learn from complete patterns. Without addressing missing data first, other quality improvements like normalization or feature reduction would be applied to corrupted data, failing to resolve the root cause.

Exam trap

Salesforce often tests the misconception that data quality improvements like normalization or feature reduction are universal fixes, when in fact the most urgent step is always to handle missing data, as it directly undermines model training and inference.

How to eliminate wrong answers

Option B is wrong because normalizing the feature range (e.g., scaling to 0-1) does not address missing values; it only adjusts the distribution of existing values, leaving the model to train on incomplete records. Option C is wrong because reducing the number of features may discard the key feature entirely, which could be critical for prediction, and does not fix the missing data problem in the remaining features. Option D is wrong because removing duplicate records addresses redundancy, not missing values; duplicates are not the cause of the accuracy drop, and removing them could even reduce valuable training data.

Full explanation →

242

Multi-Selecthard

Which THREE of the following are best practices for feature engineering in Einstein Studio?

Select 3 answers

A.Remove all records with missing values

B.Apply normalization to numerical features

C.Use raw data directly without any transformation

D.Use domain knowledge to create derived features

E.Use one-hot encoding for categorical variables

AnswersB, D, E

Normalization ensures features are on a similar scale.

Why this answer

Options B, D, and E are correct. Normalization scales features, domain knowledge creates meaningful derived features, and one-hot encoding handles categorical variables. Option A is wrong because raw data often needs processing.

Option C is wrong because removing missing values can lose information; imputation is often better.

Full explanation →

243

Multi-Selecteasy

A company is ingesting data from multiple sources into Data Cloud for Einstein. Which THREE data preparation steps should be performed?

Select 3 answers

A.Normalization

B.Field mapping

C.Encryption

D.Data labeling

E.Deduplication

AnswersA, B, E

Ensures consistent data formats across sources.

Why this answer

Normalization is correct because Data Cloud requires data from multiple sources to be transformed into a consistent format, such as standardizing date formats, units, or naming conventions, to ensure the data can be unified and analyzed effectively. This step is critical for Einstein AI models to process data without inconsistencies that could skew predictions or insights.

Exam trap

Salesforce often tests the distinction between data preparation steps (normalization, field mapping, deduplication) and data security or ML-specific tasks (encryption, data labeling) to see if candidates confuse operational data engineering with security or model training processes.

Full explanation →

244

MCQmedium

An AI Associate deploys an Einstein Bot that uses sentiment analysis to escalate frustrated customers. After launch, the bot escalates disproportionately for non-native English speakers. What is the most likely cause?

A.The sentiment model was trained on a non-representative dataset.

B.The bot is routing to the wrong department.

C.The escalation threshold is set too low.

D.The bot is not properly connected to the escalation queue.

AnswerA

Training data lacking linguistic diversity causes biased sentiment detection.

Why this answer

Option A is correct because the sentiment analysis model likely exhibits bias due to training data that does not adequately represent the linguistic patterns, idioms, or expressions of non-native English speakers. This causes the model to misinterpret neutral or positive statements from these users as negative or frustrated, leading to disproportionate escalations. A non-representative dataset is a common source of algorithmic bias in AI systems.

Exam trap

Salesforce often tests the concept that bias in AI systems typically originates from the training data or model design, not from operational configuration issues like thresholds or routing, which are common distractors.

How to eliminate wrong answers

Option B is wrong because routing to the wrong department would cause misdirected escalations, not a disproportionate escalation rate for a specific demographic group. Option C is wrong because a low escalation threshold would increase escalations across all users uniformly, not selectively for non-native English speakers. Option D is wrong because a disconnected escalation queue would prevent any escalations from being processed, not cause selective over-escalation.

Full explanation →

245

Multi-Selecteasy

According to Salesforce's AI Trust Principles, which TWO practices are essential for ethical AI deployment?

Select 2 answers

A.Use AI to replace human jobs entirely.

B.Ensure models achieve 100% accuracy before deployment.

C.Fully automate decision-making without human review.

D.Hold the organization accountable for AI-driven outcomes.

E.Be transparent about how AI models are built and used.

AnswersD, E

Accountability ensures responsible use.

Why this answer

Option D is correct because Salesforce's AI Trust Principles emphasize organizational accountability for AI-driven outcomes, ensuring that the organization takes responsibility for the decisions and impacts of its AI systems. This principle aligns with ethical AI deployment by requiring governance, oversight, and mechanisms to address unintended consequences, rather than shifting blame to the technology itself.

Exam trap

The trap here is that candidates often confuse 'automation' with 'efficiency' and select Option C, overlooking that ethical AI frameworks like Salesforce's explicitly require human-in-the-loop review for high-stakes decisions.

Full explanation →

246

MCQeasy

A healthcare company uses AI to predict patient readmission rates. What is a critical ethical requirement?

A.Explanation of predictions to doctors

B.Low latency

C.High precision

D.Use of external data sources

AnswerA

Explainability ensures doctors can trust and act on predictions responsibly.

Why this answer

Option B is correct because doctors need explanations of predictions to make informed decisions and maintain accountability. Option A is wrong while precision is important, explanation is more critical for ethical use. Option C is wrong because low latency is a performance requirement, not ethical.

Option D is wrong because using external data may introduce privacy risks.

Full explanation →

247

Multi-Selecteasy

A company is preparing customer data to train a custom AI model for sentiment analysis. Which two data preparation best practices should they follow? (Choose two.)

Select 2 answers

A.Use only data from the last month.

B.Ensure data is representative of all customer demographics.

C.Remove all records with missing values.

D.Label data manually by a single annotator.

E.Anonymize personally identifiable information (PII) before training.

AnswersB, E

Representative data prevents model bias and improves generalization across customer segments.

Why this answer

Ensuring representative data and anonymizing PII are critical for model fairness and privacy. Removing all records with missing values can discard useful information; using only recent data may introduce bias; single-annotator labeling can cause subjective bias.

Full explanation →

248

MCQmedium

A company uses Einstein Bots to handle sales inquiries. The bot sometimes provides incorrect product information, leading to customer dissatisfaction. What is the MOST ethical course of action?

A.Add a disclaimer that the bot may make mistakes and escalate complex issues

B.Replace the bot with human agents entirely

C.Keep the bot but do not inform customers of errors

D.Blame the bot developers publicly

AnswerA

Transparency about limitations and providing human escalation is ethical.

Why this answer

Option C is correct because transparency with customers about bot limitations is ethical and builds trust. Option A is wrong because humans cannot be replaced entirely in all cases. Option B is wrong as it hides the issue.

Option D is wrong because blame is not constructive.

Full explanation →

249

MCQeasy

A health app collects users' location data for AI-driven recommendations, but users are not informed about this data collection. Which ethical principle is most compromised?

A.Transparency

B.Data minimization and consent

C.Fairness

D.Accountability

AnswerB

Collecting data without consent violates privacy and consent principles.

Why this answer

Option A is correct: Data minimization and consent require that only necessary data is collected with permission. Option B is wrong because transparency involves disclosure, but the deeper issue is unauthorized collection. Option C is wrong because accountability is about responsibility.

Option D is wrong because fairness is about bias.

Full explanation →

250

MCQeasy

A marketing agency needs to ingest real-time social media mentions for a sentiment analysis AI model. Which Data Cloud object type should they use to set up the ingestion?

A.Data Lake Object

B.Calculated Insight

C.Data Stream Object with Event type

D.Data Transform

AnswerC

Specifically designed for real-time streaming ingest.

Why this answer

Option D is correct because Data Stream objects with type 'Event' are designed for real-time data ingestion. Option A is wrong because Data Lake Objects are for batch. Option B is wrong because Calculated Insights aggregate data.

Option C is wrong because Data Transformations process existing data.

Full explanation →

251

MCQhard

A data scientist notices that an Einstein model for predicting customer churn has unusually high accuracy on training data but performs poorly on validation data. Which data issue is the most likely cause?

A.The dataset has an imbalanced class distribution

B.The dataset contains many missing values

C.The model was trained on stale data from a different season

D.A field containing future information (e.g., 'churn_date') was included in features

AnswerD

Data leakage from a field that reveals the outcome causes overfitting and high train accuracy.

Why this answer

Option D is correct because including a field like 'churn_date' in the feature set introduces target leakage, where the model has access to information that would not be available at prediction time. This causes the model to appear highly accurate on training data (since it can directly 'see' the outcome) but fails to generalize to validation data where such future information is absent. In Salesforce Einstein, features must be strictly historical or static to avoid this data leakage issue.

Exam trap

Salesforce often tests the concept of data leakage by presenting it as a scenario where the model performs well on training data but poorly on validation data, and the trap is that candidates may confuse this with overfitting or class imbalance, rather than recognizing the inclusion of a future or target-related field as the root cause.

How to eliminate wrong answers

Option A is wrong because imbalanced class distribution typically causes the model to predict the majority class, leading to high accuracy on training data but poor performance on validation data only if the imbalance is extreme and not handled; however, the question describes 'unusually high accuracy' on training data, which is more characteristic of overfitting or leakage, not class imbalance. Option B is wrong because missing values generally degrade model performance across both training and validation sets, not causing a stark contrast between high training accuracy and low validation accuracy. Option C is wrong because stale data from a different season would cause poor performance on both training and validation data if the validation data is from the same season, or poor performance on validation data if it is from a different season, but it would not explain unusually high training accuracy.

Full explanation →

252

MCQmedium

Refer to the exhibit. What is the most likely cause of the fairness issue?

A.The model overfits to the male group.

B.The training data is imbalanced, causing the model to perform better on the majority group.

C.The overall accuracy is too low.

D.The model is inherently biased against females.

AnswerB

Imbalanced data leads to unequal performance.

Why this answer

Option B is correct because imbalanced training data often leads to disparate performance. Option A is wrong because the model is not inherently biased. Option C is wrong because overall accuracy can be high despite bias.

Option D is wrong because there is no indication of overfitting.

Full explanation →

253

Multi-Selecthard

Which THREE are valid considerations when deploying an Einstein Bot?

Select 3 answers

A.Defining the intents the bot should handle.

B.Testing the bot in a sandbox before activation.

C.Limiting the bot to a single communication channel.

D.Training the bot with sample user dialogs.

E.Ensuring the bot can escalate to a human agent only via email.

AnswersA, B, D

Intents are the core of bot functionality.

Why this answer

Defining intents is a core requirement for an Einstein Bot because intents represent the specific goals or tasks users want to accomplish, such as checking an order status or resetting a password. The bot uses Natural Language Processing (NLP) to map user utterances to these intents, enabling it to route conversations appropriately. Without clearly defined intents, the bot cannot accurately understand or respond to user requests.

Exam trap

Salesforce often tests the misconception that a bot must be restricted to one channel or a single escalation method, when in reality Einstein Bots are built for omnichannel flexibility and support multiple escalation options.

Full explanation →

254

MCQmedium

A sales operations team is training an AI model to forecast quarterly revenue. They have five years of historical data, which includes a strong seasonal pattern but also a significant outlier: during the pandemic year, revenue dropped by 70% from typical values. The model trains with high accuracy on historical data but fails to predict future quarters accurately, consistently overestimating revenue. What should the data scientist do to improve forecast accuracy?

A.Add a binary feature indicating whether each quarter was during the pandemic.

B.Remove the data points corresponding to the pandemic year from the training set.

C.Normalize the entire dataset using Z-scores to reduce the impact of the outlier.

D.Include the outlier data and increase the model capacity to capture the anomaly.

AnswerB

Removing the outlier helps the model focus on typical patterns, improving generalization to future non-pandemic quarters.

Why this answer

Option B is correct because removing the pandemic year data eliminates the extreme outlier that is causing the model to learn a distorted seasonal pattern. The 70% revenue drop is not representative of future quarters, so including it forces the model to overestimate revenue to compensate for the anomaly. By training only on typical data, the model can learn the true seasonal pattern and generalize better to future quarters.

Exam trap

Salesforce often tests the misconception that you should keep all data and adjust the model (e.g., via normalization or capacity increase) rather than removing non-representative outliers, leading candidates to pick options like C or D.

How to eliminate wrong answers

Option A is wrong because adding a binary pandemic feature does not remove the outlier's influence; the model may still overfit to the anomalous drop and fail to generalize, as the feature only labels the outlier without correcting the skewed distribution. Option C is wrong because Z-score normalization scales the data but does not eliminate the outlier's impact on the model's learned weights; the extreme value still distorts the mean and variance, leading to biased forecasts. Option D is wrong because increasing model capacity to capture the anomaly encourages overfitting to the pandemic year's unique pattern, which will not recur, thus worsening generalization and maintaining the overestimation error.

Full explanation →

255

Multi-Selecteasy

Which TWO features are part of Einstein AI capabilities in Salesforce Sales Cloud?

Select 2 answers

A.Einstein Opportunity Scoring

B.Einstein Case Classification

C.Einstein Lead Scoring

D.Einstein Bots

E.Einstein Article Recommendations

AnswersA, C

Part of Sales Cloud Einstein.

Why this answer

Einstein Opportunity Scoring is a core Einstein AI capability in Sales Cloud that uses predictive models to analyze historical data and assign a score to each opportunity, indicating its likelihood to close. This helps sales reps prioritize their efforts on deals most likely to convert, directly leveraging AI to enhance sales productivity.

Exam trap

Salesforce often tests the distinction between Sales Cloud and Service Cloud Einstein features, so the trap here is assuming that all Einstein AI capabilities are available across all clouds, when in fact features like Case Classification and Article Recommendations are exclusive to Service Cloud.

Full explanation →

256

MCQhard

A company wants to deploy an Einstein Prediction Builder model to predict lead conversion within 30 days. They have historical data from the past 12 months. Which data preprocessing step is most critical to ensure the model learns correctly?

A.Normalize all numerical features to a 0-1 range.

B.Remove leads that converted after 30 days.

C.Include only leads that were assigned to a sales rep.

D.Ensure the target variable is computed based on conversion status at exactly 30 days from creation.

AnswerD

Accurate target alignment is crucial.

Why this answer

Option D is correct because the target variable for a time-based prediction model like Einstein Prediction Builder must be computed at a precise, consistent point in time—in this case, exactly 30 days from lead creation. If the target is computed at varying intervals, the model will learn incorrect patterns, as it cannot distinguish between leads that converted at 31 days versus those that never converted, leading to biased or invalid predictions.

Exam trap

Salesforce often tests the concept of 'target variable definition' in time-series or prediction scenarios, where candidates mistakenly focus on data cleaning or feature engineering instead of the precise labeling of the outcome variable, which is the foundational step for supervised learning.

How to eliminate wrong answers

Option A is wrong because normalizing numerical features to a 0-1 range is not the most critical step for a binary classification model like lead conversion; Einstein Prediction Builder handles feature scaling internally, and the primary concern is correct target definition, not feature normalization. Option B is wrong because removing leads that converted after 30 days would discard valuable negative examples (non-conversion within the window) and introduce survivorship bias, making the model unable to learn the true conversion rate within the 30-day period. Option C is wrong because including only leads assigned to a sales rep introduces selection bias and ignores leads that may convert without assignment, which is not a required preprocessing step for the model to learn correctly; the model should be trained on all leads to generalize properly.

Full explanation →

257

MCQeasy

Which data transformation is most appropriate for converting categorical variables into numerical format for a machine learning model?

A.Normalization.

B.One-hot encoding.

C.Principal component analysis.

D.Standardization.

AnswerB

One-hot encoding creates binary columns for each category, making them usable in models.

Why this answer

One-hot encoding is the correct transformation because it converts categorical variables into a binary vector representation, where each category becomes a separate column with a 1 or 0. This allows machine learning models to interpret categorical data without implying any ordinal relationship, which is essential for algorithms that rely on numerical distances or linear algebra.

Exam trap

Salesforce often tests the distinction between data preprocessing techniques (normalization, standardization) and encoding methods, trapping candidates who confuse scaling with categorical conversion.

How to eliminate wrong answers

Option A is wrong because normalization scales numerical features to a range (e.g., 0 to 1) and is used for continuous data, not for converting categorical variables into numbers. Option C is wrong because principal component analysis (PCA) is a dimensionality reduction technique that transforms existing numerical features into uncorrelated components, not a method for encoding categorical data. Option D is wrong because standardization centers data around a mean of 0 and standard deviation of 1, which is applied to numerical features and would not create meaningful representations for categorical variables.

Full explanation →

258

Multi-Selecthard

A company wants to ensure their AI is fair. Which TWO steps are appropriate?

Select 2 answers

A.Use a single fairness metric to evaluate the model

B.Deploy the model quickly to gather real-world data

C.Remove all sensitive attributes from the data

D.Test model performance on different demographic groups

E.Involve diverse stakeholders in model development

AnswersD, E

Disaggregated testing reveals performance disparities.

Why this answer

Options B and D are correct. Testing the model on different demographic groups helps identify disparities, and involving diverse stakeholders brings multiple perspectives. Option A is wrong because removing all sensitive attributes may not eliminate bias due to proxy features.

Option C is wrong because a single metric cannot capture all fairness aspects. Option E is wrong because deploying quickly without testing can exacerbate unfairness.

Full explanation →

259

MCQhard

Refer to the exhibit. A data pipeline fails during the DataTransformation stage. What is the most likely root cause?

A.The pipeline has a network connectivity issue.

B.The data type for 'income' is incorrect.

C.A transformation step references the 'age' column, but it is not present in the input data.

D.The 'age' column contains null values.

AnswerC

The error clearly states 'age' column not found.

Why this answer

Option C is correct because the error occurs during the DataTransformation stage, which processes data after it has been successfully ingested. If a transformation step references the 'age' column but that column is missing from the input data, the pipeline will fail with a column-not-found error. This is a common schema mismatch issue in data pipelines, distinct from connectivity or data quality problems.

Exam trap

Salesforce often tests the distinction between pipeline stages (ingestion vs. transformation) and the specific type of error (missing column vs. data quality issue) to see if candidates understand that a missing column causes an immediate failure, while nulls or type mismatches may be handled differently depending on the pipeline configuration.

How to eliminate wrong answers

Option A is wrong because a network connectivity issue would typically cause the pipeline to fail during the data ingestion or extraction stage, not during the DataTransformation stage. Option B is wrong because an incorrect data type for 'income' would cause a type conversion error, but the question specifically states the failure is during transformation, and the error would be related to type mismatch, not a missing column. Option D is wrong because null values in the 'age' column would not cause a pipeline failure during transformation unless the transformation logic explicitly fails on nulls; most pipelines handle nulls gracefully or can be configured to skip or impute them.

Full explanation →

260

Multi-Selecthard

Which TWO actions best promote transparency in an AI system?

Select 2 answers

A.Limit access to the model's logic to protect intellectual property.

B.Publish an audit trail of model inputs and decisions.

C.Use a complex deep learning model for higher accuracy.

D.Provide clear explanations for individual predictions.

E.Remove feature importance to simplify the model.

AnswersB, D

Audit trails provide insight into decision process.

Why this answer

Option B is correct because publishing an audit trail of model inputs and decisions enables external verification of the AI system's behavior, which is a core requirement for transparency. This allows stakeholders to trace how specific inputs led to particular outputs, ensuring accountability and facilitating debugging or compliance audits.

Exam trap

Salesforce often tests the misconception that transparency is about protecting the model or maximizing accuracy, when in fact it is about openness and explainability of decisions.

Full explanation →

261

MCQhard

An organization uses Einstein Search to power a portal's search functionality. Users report that search results are not ranking relevant documents highly. Which configuration change is most likely to improve relevance?

A.Enable stemming and synonyms.

B.Tune the search algorithm by promoting frequently accessed content.

C.Increase the number of indexed fields.

D.Decrease the index refresh frequency.

AnswerB

Promoting popular content improves relevance.

Why this answer

Option B is correct because search relevance is improved by promoting popular or authoritative content. Option A is wrong because more fields may add noise. Option C is wrong because refresh frequency affects freshness but not ranking algorithm.

Option D is wrong because stemming and synonyms help recall but not necessarily ranking.

Full explanation →

262

MCQeasy

A fraud detection model is being trained on transaction data where only 1% of transactions are fraudulent. The current model predicts 'non-fraud' for all transactions, achieving 99% accuracy. Which technique should be applied to improve model performance?

A.Remove the minority class to have balanced data

B.Set a lower classification threshold for fraud

C.Add more features like transaction location

D.Oversample the minority class or undersample the majority class

AnswerD

Resampling techniques create a more balanced training set, improving recall for fraud.

Why this answer

Oversampling or undersampling addresses class imbalance, allowing the model to learn minority patterns. Using more features alone doesn't fix imbalance, setting a lower threshold may help but is less common than resampling, and removing minority class is counterproductive.

Full explanation →

263

MCQmedium

A social media platform's AI recommends content that inadvertently amplifies misinformation. An ethical review board is considering changes. Which of the following actions best addresses the unintended harm?

A.Increase the amount of training data

B.Remove all AI recommendation engines

C.Conduct an ethical impact assessment and adjust algorithms accordingly

D.Reduce overall user engagement metrics

AnswerC

Assessments help identify and mitigate unintended consequences.

Why this answer

Option D is correct: A diverse ethical review board and impact assessment can identify and mitigate unintended harm. Option A is wrong because removing AI entirely may be impractical. Option B is wrong because reducing engagement without analysis may be arbitrary.

Option C is wrong because more data doesn't guarantee less misinformation.

Full explanation →

264

MCQmedium

A healthcare provider implements Data Cloud to predict patient readmission rates. They have HIPAA compliance requirements. The data includes sensitive patient health information (PHI). The AI model must be trained without exposing PHI to unauthorized users. The data architect uses Data Cloud's data masking on PHI fields. However, model performance drops significantly after masking because the masked values lose predictive value. What additional step should the architect consider to maintain model performance while protecting PHI?

A.Use tokenization for highly predictive fields like diagnosis codes instead of masking

B.Implement differential privacy within Einstein Studio

C.Remove masking and rely on user permissions to restrict access

D.Increase the volume of training data to compensate for masking

AnswerA

Tokenization retains referential integrity while hiding actual values.

Why this answer

Option C is the best approach because tokenization preserves the relationship between values (e.g., diagnosis codes) while obscuring the actual PHI. This allows the model to learn patterns without exposing sensitive data. Option A violates HIPAA.

Option B is not directly available in Einstein Studio as a built-in feature; differential privacy might be complex to implement. Option D does not address the masking issue.

Full explanation →

265

MCQmedium

An admin creates a predictive model in Einstein Prediction Builder to forecast customer churn. The model shows high accuracy on test data but poor performance in production. What is the most likely cause?

A.Improper feature scaling in the training pipeline

B.The model is overfitted to the training data

C.Target leakage in the training dataset

D.Data drift between training and production environments

AnswerD

Changes in customer behavior or data collection can make the model less effective.

Why this answer

Option D is correct because the model's high accuracy on test data but poor performance in production is a classic symptom of data drift. In Einstein Prediction Builder, the model was trained on historical data that may not reflect current customer behavior patterns, leading to a mismatch between training and production distributions. This is not a model training issue but a data environment shift.

Exam trap

Salesforce often tests the distinction between overfitting (which affects test data) and data drift (which affects production data), trapping candidates who confuse high test accuracy with model generalization.

How to eliminate wrong answers

Option A is wrong because improper feature scaling would typically cause poor performance on both test and production data, not specifically a drop in production only. Option B is wrong because overfitting would manifest as high training accuracy but low test accuracy, not high test accuracy with poor production performance. Option C is wrong because target leakage would inflate accuracy on both training and test sets, not just training, and would not explain a production-only degradation.

Full explanation →

266

MCQhard

A developer is implementing retrieval augmented generation (RAG) for a customer service bot. Which component is essential for supplying real-time data to the prompt?

A.A fine-tuned large language model

B.A prompt template in Prompt Builder

C.A static knowledge base

D.A vector database for embedding and retrieval

AnswerD

RAG relies on retrieving relevant chunks of data using vector similarity search, which then become part of the prompt.

Why this answer

Correct: A vector database stores embeddings for efficient retrieval of relevant context. Option A: A fine-tuned LLM is static. Option B: Prompt Builder is for templates.

Option D: A knowledge base alone doesn't provide real-time retrieval augmentation unless indexed.

Full explanation →

267

Multi-Selectmedium

Which TWO approaches are recommended for mitigating bias in AI models?

Select 2 answers

A.Increasing model depth

B.Removing all sensitive attributes

C.Re-weighting training samples

D.Adding regularization

E.Using adversarial debiasing

AnswersC, E

Adjusts for imbalanced representation.

Why this answer

Option A is correct because re-weighting training samples can adjust for imbalanced representation. Option C is correct because adversarial debiasing reduces bias by learning unbiased representations. Option B is wrong regularization may not directly mitigate bias.

Option D is wrong increasing model depth can amplify bias. Option E is wrong removing all sensitive attributes may not eliminate bias due to proxy variables.

Full explanation →

268

MCQmedium

Refer to the exhibit. A sales manager sees that an account has an Einstein Score of 78 with a confidence of 0.65. What is the most appropriate interpretation?

A.The score indicates the account has been contacted 78 times, with a 65% satisfaction rate.

B.The account is predicted to have a 78% chance of converting, and the model is 65% confident in that prediction.

C.The account is in the top 78% of scoring accounts, with a 65% chance of being accurate.

D.The account is predicted to convert, but the model's confidence is relatively low, suggesting the prediction should be verified.

AnswerD

Moderate confidence warrants human review.

Why this answer

Option D is correct because the Einstein Score is a predictive lead scoring model that outputs a conversion probability (0 to 100), and the confidence score (0 to 1) indicates the model's certainty in that prediction. A confidence of 0.65 is below the typical threshold (e.g., 0.75 or higher), meaning the prediction is less reliable and should be manually verified before acting on it.

Exam trap

Salesforce often tests the distinction between the prediction score (what is predicted) and the confidence score (how sure the model is), leading candidates to misinterpret the confidence as an accuracy percentage or to conflate the two values into a single probability.

How to eliminate wrong answers

Option A is wrong because the Einstein Score is not a count of contacts or a satisfaction rate; it is a predicted conversion probability. Option B is wrong because it conflates the score (78) with a percentage chance of converting, but the score is already a probability (78%), and the confidence (0.65) is the model's certainty in that probability, not an additional percentage. Option C is wrong because the score does not represent a percentile rank (top 78%); it is an absolute probability, and the confidence is not an accuracy percentage but a measure of model certainty.

Full explanation →

269

MCQeasy

For an AI project, data must be stored in a way that supports both training and real-time inference. Which storage solution meets this requirement?

A.Data warehouse (e.g., Snowflake)

B.Relational database (e.g., PostgreSQL)

C.Data lake (e.g., Amazon S3 or Azure Data Lake)

D.In-memory cache (e.g., Redis)

AnswerC

Data lakes store raw and processed data for various purposes.

Why this answer

A data lake (e.g., Amazon S3 or Azure Data Lake) is the correct choice because it can store vast amounts of raw, unstructured, and structured data in its native format, making it ideal for training AI models on diverse datasets. At the same time, data lakes support real-time inference by enabling direct access to data via APIs or streaming services (e.g., AWS Lambda or Azure Functions) without the latency of transforming data into a schema-on-write structure. This dual capability—handling both batch processing for training and low-latency reads for inference—is a key requirement that other storage solutions cannot fulfill as effectively.

Exam trap

Salesforce often tests the misconception that a data warehouse or relational database is sufficient for AI workloads because candidates overlook the need for raw, unstructured data storage and the flexibility of schema-on-read, instead focusing only on structured query performance.

How to eliminate wrong answers

Option A is wrong because a data warehouse (e.g., Snowflake) is optimized for structured, aggregated data and analytical queries, not for storing raw, unstructured data needed for AI training, and its schema-on-write approach introduces latency unsuitable for real-time inference. Option B is wrong because a relational database (e.g., PostgreSQL) enforces strict schemas and ACID transactions, which limit the flexibility to store diverse data types (e.g., images, text) required for AI training, and its row-based storage is inefficient for high-throughput, low-latency inference workloads. Option D is wrong because an in-memory cache (e.g., Redis) is designed for ephemeral, high-speed data access but lacks persistent storage and the capacity to hold large-scale training datasets, making it unsuitable for long-term data storage required for AI model training.

Full explanation →

270

MCQmedium

A sales representative uses Einstein Activity Capture to log emails automatically. However, some critical emails are not being captured. What is the most likely reason?

A.The sender or recipient is not a Salesforce user with a license.

B.The emails were sent from a non-Outlook or Gmail client.

C.The Einstein Activity Capture feature is disabled for that specific user.

D.The emails contain attachments that exceed the size limit.

AnswerA

Only Salesforce users with licenses are captured.

Why this answer

Einstein Activity Capture relies on Salesforce user licenses to associate email activities with records. If the sender or recipient is not a licensed Salesforce user, the system cannot link the email to a user identity, causing it to be skipped during capture. This is the most common reason for missing emails because the feature is designed to log activities only for licensed users.

Exam trap

Salesforce often tests the misconception that Einstein Activity Capture requires a specific email client (like Outlook or Gmail), but the real limitation is the Salesforce user license requirement for the sender or recipient.

How to eliminate wrong answers

Option B is wrong because Einstein Activity Capture supports multiple email clients, including Outlook, Gmail, and Exchange/Google Workspace via server-side sync, so non-Outlook or Gmail clients are not a barrier. Option C is wrong because if the feature were disabled for that specific user, no emails would be captured at all, not just some critical ones. Option D is wrong because Einstein Activity Capture does not enforce a specific attachment size limit; it captures the email metadata and body, and attachments are handled separately without causing the email to be skipped.

Full explanation →

271

MCQhard

A data scientist is building a predictive model for customer churn using Salesforce data. The dataset has 20 features, and the target variable is highly imbalanced (5% churn, 95% non-churn). Which technique should be applied to handle the class imbalance before training?

A.Apply Principal Component Analysis (PCA) for dimensionality reduction

B.Create interaction features between existing variables

C.Use accuracy as the evaluation metric

D.Use Synthetic Minority Over-sampling Technique (SMOTE)

AnswerD

SMOTE creates synthetic examples of the minority class.

Why this answer

SMOTE (Synthetic Minority Over-sampling Technique) is the correct choice because it generates synthetic samples for the minority class (churn) by interpolating between existing minority instances, effectively balancing the dataset without simply duplicating data. This prevents the model from being biased toward the majority class (non-churn) and improves recall for the churn class, which is critical in imbalanced classification problems.

Exam trap

Salesforce often tests the misconception that any data preprocessing technique (like PCA or feature engineering) can fix class imbalance, when in fact only resampling methods (SMOTE, ADASYN) or cost-sensitive learning directly address the skewed target distribution.

How to eliminate wrong answers

Option A is wrong because PCA is a dimensionality reduction technique that does not address class imbalance; it reduces feature space but does not alter the distribution of the target variable. Option B is wrong because creating interaction features may capture non-linear relationships but does not solve the imbalance problem; it can even exacerbate overfitting if the minority class remains underrepresented. Option C is wrong because accuracy is a misleading metric for imbalanced datasets—a model predicting all non-churn would achieve 95% accuracy but fail to identify any churn cases; metrics like precision, recall, F1-score, or AUC-ROC are appropriate instead.

Full explanation →

272

MCQmedium

After applying a log transformation to a numeric feature, an Einstein model’s performance dropped significantly. What is the most likely cause?

A.The data volume was reduced by the transformation

B.The feature was normally distributed after transformation

C.The feature contained zero or negative values

D.The transformation introduced multicollinearity with other features

AnswerC

Log of non-positive values is undefined, causing missing or infinity values.

Why this answer

Log transformation is undefined for zero or negative values because log(0) is negative infinity and log of a negative number is not a real number. In Salesforce Einstein, numeric features with such invalid transformed values can cause the model to fail or produce erratic results, leading to a significant drop in performance. This is the most likely cause given the symptom described.

Exam trap

Salesforce often tests the misconception that log transformation always improves model performance, but the trap here is that candidates overlook the mathematical constraint that log is undefined for non-positive values, causing them to choose a less relevant option like data volume reduction or multicollinearity.

How to eliminate wrong answers

Option A is wrong because log transformation does not reduce data volume; it merely applies a mathematical function to each value, preserving the number of records. Option B is wrong because making a feature normally distributed is typically beneficial for many models, not detrimental; a normal distribution after transformation would likely improve, not degrade, performance. Option D is wrong because log transformation is applied to a single feature and does not introduce multicollinearity, which is a relationship between two or more independent variables; it cannot create collinearity with other features on its own.

Full explanation →

273

MCQhard

A company is using customer support tickets to train a model for auto-classifying issues. The dataset includes fields like 'Case Title', 'Description', 'Product', and 'Customer Name'. Which privacy concern is most critical to address before training?

A.Anonymize personal identifiable information (PII) in Description and Title

B.Encrypt session tokens used in the support system

C.Remove all case numbers to prevent data leakage

D.Ensure all customers have opted in before using their data

AnswerA

PII in text must be removed to comply with privacy regulations and prevent bias.

Why this answer

Anonymizing PII in the text fields is critical to avoid exposing customer information in model artifacts or predictions. Session tokens are irrelevant, and case numbers are not PII. Opt-in is a legal requirement but not directly about data preparation for AI.

Full explanation →

274

MCQhard

Refer to the exhibit. An admin runs a preprocess script before training an Einstein model. Why is normalization applied to the 'AnnualRevenue' and 'NumberOfEmployees' columns?

A.To detect outliers in the data

B.To ensure both features contribute equally to the model

C.To remove rows with missing values

D.To reduce the number of features from 30 to 2

AnswerB

Equalizing scales prevents one feature from having undue influence.

Why this answer

Normalization scales features like 'AnnualRevenue' and 'NumberOfEmployees' to a comparable range (e.g., 0–1 or with zero mean and unit variance). Without normalization, a feature with larger numeric values (e.g., revenue in millions) would dominate distance-based calculations in models like k-nearest neighbors or gradient descent, causing the model to undervalue the smaller-scale feature. By normalizing, both features contribute equally to the model's learning process, which is essential for many machine learning algorithms used in Einstein.

Exam trap

Salesforce often tests the distinction between data preprocessing steps (normalization, scaling) and other data preparation tasks (outlier detection, missing value handling, dimensionality reduction), and the trap here is that candidates confuse normalization with outlier detection or feature reduction because both involve numerical transformations.

How to eliminate wrong answers

Option A is wrong because detecting outliers is typically done using statistical methods (e.g., Z-score, IQR) or visualization, not normalization; normalization itself does not identify outliers. Option C is wrong because removing rows with missing values is a data cleaning step (e.g., using dropna() or imputation), not a purpose of normalization. Option D is wrong because reducing the number of features from 30 to 2 is dimensionality reduction (e.g., PCA or feature selection), not normalization, which preserves all features.

Full explanation →

275

MCQeasy

A company wants to deploy an AI system that makes hiring decisions. To comply with ethical guidelines, what should they do before deployment?

A.Conduct an ethics review and perform bias testing on diverse datasets.

B.Ensure the system achieves high accuracy and ignore other metrics.

C.Deploy immediately and monitor for issues.

D.Test the system only on a small dataset to expedite launch.

AnswerA

Ethics review and bias testing are proactive measures.

Why this answer

Option A is correct because conducting an ethics review and performing bias testing on diverse datasets are essential steps to identify and mitigate potential discriminatory outcomes in AI-driven hiring systems. This aligns with ethical AI frameworks that require fairness, accountability, and transparency before deployment, ensuring the model does not perpetuate historical biases or violate anti-discrimination laws.

Exam trap

Salesforce often tests the misconception that high accuracy alone guarantees ethical AI, when in fact fairness metrics and bias testing are mandatory to prevent discriminatory outcomes in high-stakes applications like hiring.

How to eliminate wrong answers

Option B is wrong because prioritizing only high accuracy can mask harmful biases; a model may achieve high overall accuracy but still systematically discriminate against protected groups due to imbalanced data or proxy features. Option C is wrong because deploying immediately without prior testing violates ethical guidelines and can lead to real-world harm, legal liability, and loss of trust; monitoring alone cannot retroactively fix embedded biases. Option D is wrong because testing on a small dataset is insufficient to detect bias across diverse demographic groups and may lead to overfitting or failure to uncover edge cases, undermining the reliability and fairness of the system.

Full explanation →

276

MCQmedium

A Salesforce admin is preparing a dataset for Einstein Prediction Builder. The dataset contains a field "Income" with many missing values. The admin wants to minimize bias in the model. What is the best practice?

A.Delete all rows where Income is missing

B.Review the pattern of missingness and document reasons, then decide on imputation

C.Fill missing values with the mean of Income

D.Replace missing values with 0

AnswerB

Understanding why data is missing prevents bias from systematic exclusion or imputation.

Why this answer

Option C is correct because reviewing missingness patterns and documenting reasons helps uncover systemic biases. Option A (fill with mean) may distort relationships; Option B (delete rows) reduces sample and may introduce selection bias; Option D (replace with 0) is arbitrary and can skew results.

Full explanation →

277

MCQeasy

Refer to the exhibit. A data analyst runs a profile on a dataset and sees these statistics. Based on best practices, which action should be taken first?

A.Impute the 500 missing values with the mean

B.Remove the 200 duplicate records

C.Remove the 50 outliers in the Amount field

D.Skip all preprocessing and train the model directly

AnswerB

Duplicates can artificially inflate certain patterns and cause data leakage.

Why this answer

Option B is correct because duplicate records introduce bias and redundancy, leading to overfitting or skewed model performance. Removing duplicates is a standard first step in data preprocessing to ensure data integrity before handling missing values or outliers. In the context of the AI Associate exam, best practices prioritize deduplication early in the data cleaning pipeline.

Exam trap

Salesforce often tests the order of preprocessing steps, trapping candidates who jump to imputation or outlier removal without first cleaning duplicates, which is the foundational step in data preparation.

How to eliminate wrong answers

Option A is wrong because imputing missing values with the mean should only be considered after duplicates are removed, as duplicates can inflate the mean and distort imputation. Option C is wrong because removing outliers should be done after addressing duplicates and missing values, and only after understanding the domain context; premature outlier removal can discard legitimate data. Option D is wrong because skipping all preprocessing ignores fundamental data quality issues (missing values, duplicates, outliers) that degrade model accuracy and reliability, violating best practices for AI workflows.

Full explanation →

278

Multi-Selectmedium

Which TWO of the following are ethical considerations when deploying AI in Salesforce?

Select 2 answers

A.Maximizing model complexity for better accuracy

B.Providing transparency in AI-generated outcomes

C.Ensuring data privacy and compliance with regulations

D.Using only historical data without review for biases

AnswersB, C

Transparency helps users understand how decisions are made, building trust.

Why this answer

Option B is correct because transparency in AI-generated outcomes is a core ethical principle, especially in Salesforce's Einstein platform, where users must understand how predictions (e.g., lead scoring or opportunity insights) are made. Salesforce provides tools like 'Why This Prediction?' to explain model outputs, ensuring trust and accountability. Without transparency, users cannot validate or challenge AI decisions, leading to potential bias or misuse.

Exam trap

Salesforce often tests the misconception that maximizing accuracy (Option A) is always ethical, when in fact it can compromise interpretability and fairness, which are key to responsible AI deployment.

Full explanation →

279

MCQmedium

A service organization wants to use Einstein Reply Recommendations to suggest responses to customer chats. The feature is enabled, but agents report that no recommendations appear. The admin has ensured the permission set is assigned and the chat data is flowing. What should the admin examine next?

A.The number of closed cases with successful chat histories.

B.The model training schedule.

C.The language settings for the chat channels.

D.The user's profile settings for chat.

AnswerC

Language support is a prerequisite for recommendations to appear.

Why this answer

Einstein Reply Recommendations require the chat channel language to be set to a supported language (e.g., English, Spanish, French, German, Portuguese, or Japanese). If the language setting is unsupported or mismatched, the model cannot generate suggestions, even if permissions and data flow are correct. This is a common misconfiguration because the feature silently fails when the language is not recognized.

Exam trap

Salesforce often tests the misconception that model training or data volume is the primary cause of missing recommendations, when in fact the language setting is a prerequisite that must be configured correctly before any recommendations can appear.

How to eliminate wrong answers

Option A is wrong because the number of closed cases with successful chat histories does not affect the real-time generation of reply recommendations; Einstein uses live chat transcripts and historical data for training, but the feature's immediate availability depends on language and model readiness, not case closure counts. Option B is wrong because the model training schedule determines when the model is updated, but recommendations can still appear immediately after initial training if the language is supported; a missing or unscheduled training would cause outdated or no suggestions, but the primary blocker here is language settings, not the training cadence. Option D is wrong because the user's profile settings for chat control agent interface permissions (e.g., ability to send messages), not the underlying Einstein model's ability to generate recommendations; the admin already confirmed the permission set is assigned, so profile settings are not the issue.

Full explanation →

280

MCQhard

A Salesforce admin notices that Einstein Account Scoring is not generating scores for all accounts. Some accounts have no score even though they meet the data requirements. What is the most likely cause?

A.The org has fewer than 100 account records

B.Einstein features are not enabled in the org

C.Users do not have the 'View Scores' permission

D.The accounts lack custom fields for scoring

AnswerA

Einstein requires a minimum data volume (e.g., 100 records) to generate scores.

Why this answer

Einstein Account Scoring requires a minimum of 100 account records to generate meaningful scores. If the org has fewer than 100 accounts, the model cannot establish a baseline for scoring, resulting in no scores for any account. This is a hard threshold enforced by the Einstein AI engine, not a configurable setting.

Exam trap

The trap here is that candidates often assume missing scores are due to permission issues or missing custom fields, but the real constraint is a hard data volume minimum that the Einstein engine enforces silently.

How to eliminate wrong answers

Option B is wrong because if Einstein features were not enabled, no Einstein functionality would work at all, not just scoring for some accounts. Option C is wrong because the 'View Scores' permission controls user visibility of scores, not the generation of scores by the AI engine. Option D is wrong because Einstein Account Scoring uses standard fields and does not require custom fields; it leverages existing data like industry, revenue, and engagement history.

Full explanation →

281

Multi-Selecteasy

Which TWO practices contribute to ethical AI transparency?

Select 2 answers

A.Using open-source algorithms.

B.Providing user-facing explanations.

C.Collecting sensitive demographic data.

D.Allowing users to opt out of AI processing.

E.Documenting model decisions.

AnswersB, E

Explanations help users understand how decisions are made.

Why this answer

Options A and C are correct. Documenting model decisions provides an audit trail, and providing user-facing explanations helps users understand AI behavior. Option B is wrong because open-source algorithms do not guarantee transparency about specific model decisions.

Option D is wrong because collecting sensitive data may violate privacy principles. Option E is wrong because opt-out is about user control, not transparency.

Full explanation →

282

Multi-Selecteasy

A data scientist is preparing numeric features for a regression model. Which TWO transformations are commonly applied to improve model performance?

Select 2 answers

A.Normalize to a 0-1 range

B.Remove outliers beyond 3 standard deviations

C.Convert numbers to string labels

D.Apply one-hot encoding

E.Standardize to mean 0 and variance 1

AnswersA, E

Scales features to a common range, helpful for distance-based models.

Why this answer

Normalizing features to a 0-1 range (min-max scaling) ensures that all numeric features contribute equally to the model, preventing features with larger magnitudes from dominating the gradient descent optimization. This is especially important for distance-based algorithms like k-nearest neighbors or neural networks, where feature scale directly impacts convergence speed and model accuracy.

Exam trap

Salesforce often tests the distinction between data cleaning (e.g., outlier removal) and feature transformation (e.g., scaling), leading candidates to mistakenly select outlier removal as a transformation that improves model performance.

Full explanation →

283

Multi-Selecteasy

A Salesforce administrator deploys an Einstein Bot. Which TWO ethical considerations should be addressed? (Choose two.)

Select 2 answers

A.The bot should disclose it is an AI

B.The bot should mimic a human

C.The bot should make decisions autonomously

D.The bot should not collect personal data

E.The bot should escalate to a human when needed

AnswersA, E

Correct. Transparency requires the bot to identify itself as AI.

Why this answer

Option A is correct because ethical AI guidelines, including those from Salesforce, require that bots disclose their non-human identity to users. This transparency builds trust and ensures users are aware they are interacting with an AI, not a human, which is a core ethical principle in AI deployment.

Exam trap

Salesforce often tests the misconception that ethical AI means bots should never collect personal data, but the real ethical requirement is transparency and consent, not an absolute prohibition on data collection.

Full explanation →

284

MCQhard

An admin reviews the Einstein service configuration JSON. Based on the exhibit, which statement is true?

A.Lead scoring is disabled for the org

B.Opportunities will not have Einstein scores

C.The lead scoring model retrains daily

D.The admin has not configured any scoring fields for leads

AnswerB

Opportunity scoring is disabled (false).

Why this answer

Option B is correct because the JSON exhibit shows that the Einstein Opportunity Scoring feature is disabled (the 'status' field is set to 'Disabled'), meaning opportunities will not have Einstein scores. This is determined by the configuration flag that controls whether Einstein scoring is active for opportunities in the org.

Exam trap

Salesforce often tests the distinction between lead scoring and opportunity scoring configurations, and candidates may mistakenly assume that if lead scoring is enabled, opportunity scoring must also be enabled, or overlook the specific 'status' field for each object in the JSON.

How to eliminate wrong answers

Option A is wrong because the JSON shows lead scoring is enabled (the 'leadScoring' object has 'status' set to 'Enabled'), so lead scoring is not disabled for the org. Option C is wrong because the JSON does not contain any field indicating a retraining schedule for the lead scoring model; the retraining frequency is not specified in this configuration snippet. Option D is wrong because the JSON includes a 'scoringFields' array under 'leadScoring' that lists specific fields like 'Industry' and 'AnnualRevenue', indicating the admin has configured scoring fields for leads.

Full explanation →

285

MCQhard

A Salesforce admin has enabled Einstein Opportunity Scoring but notices that some opportunities are not being scored. The admin checks the data and finds that the opportunities have all required fields. What could be another reason for missing scores?

A.Opportunity team members have not been assigned

B.The org is on a Professional Edition

C.The org has fewer than 100 opportunities with activities

D.The opportunities are missing custom fields

AnswerC

Einstein requires sufficient data history to generate scores.

Why this answer

Einstein Opportunity Scoring requires a minimum of 100 opportunities with activities (e.g., emails, events, tasks) in the last 90 days to generate scores. Even if all required fields are present, the model needs sufficient historical data to calculate a predictive score. Option C correctly identifies this data volume threshold as the likely cause.

Exam trap

The trap here is that candidates assume missing scores are always due to missing fields or permissions, but Salesforce specifically requires a minimum data volume (100 opportunities with activities) for Einstein models to function.

How to eliminate wrong answers

Option A is wrong because opportunity team members are not a prerequisite for Einstein Opportunity Scoring; the model scores opportunities based on field values and activities, not team assignments. Option B is wrong because Einstein Opportunity Scoring is available on Professional Edition with the required add-on licenses, so edition alone does not prevent scoring. Option D is wrong because the admin already confirmed all required fields are present, and missing custom fields would not block scoring unless they are specifically mapped as required by the model, which is not the case here.

Full explanation →

286

MCQmedium

An admin created a data stream to bring external customer data into Data Cloud for Einstein. The data stream fails with error 'Schema mismatch: expected 10 fields, got 8'. What is the likely cause?

A.The data flow has a filter that drops fields.

B.The target object has validation rules.

C.The source file has extra columns.

D.The data stream definition expects more fields than the source provides.

AnswerD

Directly matches the error: expected 10, got 8.

Why this answer

The error 'Schema mismatch: expected 10 fields, got 8' indicates that the data stream definition in Data Cloud is configured to map 10 fields from the source, but the actual source file or API response only provides 8 fields. This mismatch occurs when the schema defined in the data stream does not match the source schema, typically because the source has fewer columns than expected. Option D correctly identifies this as the likely cause.

Exam trap

Salesforce often tests the distinction between schema-level errors (field count mismatch) and data-level errors (validation rules, filters), leading candidates to confuse data flow operations with data stream schema definitions.

How to eliminate wrong answers

Option A is wrong because a filter in a data flow drops rows (records), not fields (columns), and the error explicitly mentions a field count mismatch, not a row count issue. Option B is wrong because validation rules on the target object would cause record-level failures during data insertion, not a schema mismatch error during the data stream definition or ingestion phase. Option C is wrong because extra columns in the source file would cause the error to report more fields than expected (e.g., 'expected 10, got 12'), not fewer.

Full explanation →

287

MCQmedium

During data transformation, a data scientist applies one-hot encoding to a categorical feature with 50 unique values. The resulting dataset has 50 new columns. What is a potential drawback of this transformation?

A.Reduction in training time

B.Increased interpretability of the model

C.High cardinality leading to sparse data and overfitting

D.Loss of ordinal information in categories

AnswerC

High cardinality creates many sparse columns, risking overfitting.

Why this answer

One-hot encoding a categorical feature with 50 unique values creates 50 binary columns, each representing one category. This high cardinality leads to a very sparse matrix (most entries are 0), which can cause the model to overfit by learning noise from rare categories, especially when the dataset is not large enough to support such dimensionality.

Exam trap

Salesforce often tests the misconception that one-hot encoding always improves model performance by preserving all information, when in fact high cardinality introduces sparsity and overfitting risks that can degrade model accuracy.

How to eliminate wrong answers

Option A is wrong because one-hot encoding increases the number of features, which typically increases training time due to higher dimensionality, not reduces it. Option B is wrong because adding 50 new binary columns reduces interpretability; the model becomes more complex and harder to explain, especially with many dummy variables. Option D is wrong because one-hot encoding is designed for nominal (unordered) categories; ordinal information is not lost because it was never present — the feature had no inherent order, so no information is lost.

Full explanation →

288

MCQhard

A global company needs to ensure that customer data used for AI models complies with multiple regional regulations (GDPR, CCPA, LGPD). Which data governance practice is most effective?

A.Apply the strictest regulation globally.

B.Use a unified data catalog with tagging and classification.

C.Store all data in a single data warehouse.

D.Allow each region to manage its own data separately.

AnswerB

A data catalog helps track data lineage, apply policies, and ensure compliance per region.

Why this answer

Option D is correct because a unified data catalog with tagging and classification allows the organization to manage data governance and compliance across regions consistently.

Full explanation →

289

MCQeasy

To ensure AI model fairness and avoid biased outcomes, which practice is most critical when preparing training data?

A.Use only recent data

B.Increase model complexity

C.Use balanced training data

D.Use more features

AnswerC

Balanced data reduces bias towards any group.

Why this answer

Option C is correct because using balanced training data across different groups helps prevent bias. Option A is wrong because adding more features can introduce bias. Option B is wrong because increasing model complexity may overfit.

Option D is wrong because using only recent data may not represent all demographics.

Full explanation →

290

MCQmedium

A retail company uses Einstein Prediction Service to forecast customer churn. To improve model accuracy, which data preparation step is most critical?

A.Select only the top three features based on correlation.

B.Clean the dataset by handling missing values and outliers.

C.Use a different algorithm like neural networks.

D.Increase the dataset size by collecting more customer records.

AnswerB

Proper data cleaning ensures the model learns accurate patterns.

Why this answer

Handling missing values and outliers is the most critical data preparation step for Einstein Prediction Service because the underlying gradient boosting models (like XGBoost) are sensitive to data quality issues. Missing values can introduce bias or cause the model to misinterpret patterns, while outliers can disproportionately influence split decisions, reducing predictive accuracy for churn scenarios.

Exam trap

Salesforce often tests the misconception that feature selection or algorithm changes are the primary levers for accuracy, when in reality data cleaning is the foundational step that directly impacts model reliability in Einstein Prediction Service.

How to eliminate wrong answers

Option A is wrong because selecting only the top three features based on correlation ignores feature interactions and non-linear relationships that Einstein's ensemble methods rely on; it also risks discarding weakly correlated but collectively predictive features. Option C is wrong because the question asks about data preparation, not algorithm selection; changing the algorithm does not address data quality issues and Einstein Prediction Service already uses optimized algorithms (e.g., gradient boosting) that require clean input. Option D is wrong because simply increasing dataset size without cleaning existing data amplifies noise and bias; more records with missing values or outliers degrade model performance rather than improve accuracy.

Full explanation →

291

MCQeasy

In Salesforce Data Cloud, which AI capability is used to automatically generate audience segments based on customer behavior patterns?

A.Prompt Builder

B.Einstein AI (Data Cloud's machine learning capabilities)

C.Einstein Discovery

D.Einstein GPT

AnswerB

Data Cloud uses Einstein AI to analyze customer data and automatically create predictive segments.

Why this answer

Einstein AI in Salesforce Data Cloud provides machine learning capabilities that automatically analyze customer behavior patterns, such as purchase history and engagement metrics, to generate predictive audience segments. This enables marketers to target specific groups without manual rule creation, leveraging Data Cloud's unified data model and Einstein's propensity scoring.

Exam trap

Salesforce often tests the distinction between generative AI tools (like Einstein GPT or Prompt Builder) and predictive machine learning capabilities (like Einstein AI), leading candidates to confuse content generation with automated segmentation.

How to eliminate wrong answers

Option A is wrong because Prompt Builder is a tool for creating and managing prompts for generative AI models, not for automatically generating audience segments based on behavior patterns. Option C is wrong because Einstein Discovery is a separate analytics tool focused on identifying trends and root causes in data, not on generating audience segments from customer behavior. Option D is wrong because Einstein GPT is a generative AI assistant for content creation and summarization, not a machine learning engine for segment generation.

Full explanation →

292

Multi-Selectmedium

A company wants to ensure their AI model complies with ethical guidelines. Which TWO actions are essential? (Choose two.)

Select 2 answers

A.Avoid transparency

B.Provide human oversight

C.Use the most complex model

D.Automate all decisions

E.Document model decisions

AnswersB, E

Correct. Human oversight ensures decisions can be reviewed and overridden.

Why this answer

Human oversight (Option B) is essential because it ensures that AI decisions can be reviewed, overridden, or corrected by a person, which is a core requirement of ethical AI frameworks such as the EU AI Act and NIST AI Risk Management Framework. This oversight helps catch biased outputs, edge cases, or harmful actions that the model might produce, maintaining accountability and safety.

Exam trap

Salesforce often tests the misconception that 'automation' is always the goal of AI, but the trap here is that ethical guidelines require human oversight and documentation, not full automation or complexity.

Full explanation →

293

Multi-Selecthard

Which TWO components are essential for an AI ethics governance framework?

Select 2 answers

A.Using the most recent algorithms

B.Maximizing data collection

C.Assigning an ethics officer

D.Conducting regular audits

E.Establishing a code of ethics

AnswersD, E

Ensures ongoing compliance.

Why this answer

Option A is correct because a code of ethics provides foundational principles. Option D is correct because regular audits ensure ongoing compliance. Option B is wrong while helpful, not always considered essential.

Option C is wrong using the latest algorithms is not an ethics component. Option E is wrong maximizing data collection contradicts ethical principles like privacy.

Full explanation →

294

MCQmedium

An AI system used for medical diagnosis occasionally produces incorrect results. A doctor notices the errors but continues using the system without reporting them. Which ethical principle is primarily at risk?

A.Fairness

B.Transparency

C.Privacy

D.Accountability

AnswerD

Healthcare professionals are accountable for AI-assisted decisions and must report errors.

Why this answer

Option A is correct: Accountability means humans must oversee AI decisions and report issues. Option B is wrong because transparency is about disclosure. Option C is wrong because fairness is about bias.

Option D is wrong because privacy is about data protection.

Full explanation →

295

MCQmedium

A company deploys an AI system that recommends loan amounts. They want to ensure explainability. Which approach best aligns with ethical AI?

A.Allow the system to update its features dynamically without documentation.

B.Restrict the system's use to only internal employees.

C.Provide loan recommendations with a detailed rationale.

D.Use a black-box neural network for highest accuracy.

AnswerC

Detailed rationale enables users to understand decisions.

Why this answer

Option B is correct because providing a detailed rationale supports explainability and transparency. Option A is wrong because black-box models lack interpretability. Option C is wrong because restricting use does not ensure explainability.

Option D is wrong because undocumented changes undermine accountability.

Full explanation →

296

MCQmedium

Refer to the exhibit. An admin built a prediction model for case closure within 24 hours. The model accuracy is 72% with 500 training records. Which change would most likely improve accuracy?

A.Change the outcome to 'Escalated'

B.Increase the training sample size to 5000 records

C.Remove the 'Subject' field from the model

D.Add more fields like 'Comments'

AnswerB

More data typically improves accuracy.

Why this answer

Increasing the training sample size from 500 to 5000 records provides the model with more data to learn patterns from, which reduces overfitting and improves generalization. In CRM AI models, larger datasets typically lead to higher accuracy because the algorithm can better capture underlying relationships without being skewed by noise in a small sample.

Exam trap

Salesforce often tests the misconception that adding more fields always improves accuracy, when in reality, irrelevant or noisy features can degrade performance, while increasing sample size is a more reliable method to boost model accuracy.

How to eliminate wrong answers

Option A is wrong because changing the outcome to 'Escalated' alters the target variable entirely, which would require retraining a new model for a different prediction task and does not address the accuracy of the original case closure model. Option C is wrong because removing the 'Subject' field may discard valuable textual features that could help predict closure time; unless the field is irrelevant or noisy, reducing features often harms model performance. Option D is wrong because adding more fields like 'Comments' without ensuring data quality or relevance can introduce noise and increase dimensionality, potentially degrading accuracy rather than improving it.

Full explanation →

297

Multi-Selecteasy

Which TWO of the following are key principles of trustworthy AI according to Salesforce's AI ethics guidelines?

Select 2 answers

A.Profitability

B.Complexity

C.Transparency

D.Explainability

E.Accountability

AnswersC, D

Transparency ensures users understand how AI decisions are made.

Why this answer

Transparency is a core principle of trustworthy AI because it requires that AI systems operate in a way that is open and understandable, allowing stakeholders to see how decisions are made. Salesforce's AI ethics guidelines emphasize transparency to ensure that users can trust the system's outputs and that any biases or limitations are visible.

Exam trap

Salesforce often tests the distinction between 'accountability' (a broader governance concept) and 'explainability' (a specific technical principle), leading candidates to mistakenly select accountability when the question explicitly asks for the two key principles from Salesforce's guidelines.

Full explanation →

298

MCQhard

Refer to the exhibit. A Salesforce admin runs the Einstein model list command and sees the output. Which model is currently available for use in predictive scoring?

A.All models are available for scoring.

B.Lead Score Model

C.Case Deflection

D.Opportunity Win Rate

AnswerB

The status is 'Deployed', meaning it is ready for scoring.

Why this answer

Option A is correct because only the model with status 'Deployed' is ready for production use. Option B is incorrect because 'Training' status means the model is not yet ready. Option C is incorrect because 'Error' status indicates a problem preventing deployment.

Option D is incorrect because not all models are available.

Full explanation →

299

MCQhard

An organization is deploying an AI system for loan decisions. They want to ensure human oversight. Which is the best implementation?

A.The system operates fully autonomously but logs decisions for audit.

B.The system makes decisions automatically, with post-hoc review only for high-value loans.

C.The system provides recommendations, and a human must approve all decisions.

D.The system only flags edge cases for human review.

AnswerD

Edge-case review balances efficiency with oversight.

Why this answer

Option D is correct because flagging edge cases for human review efficiently focuses oversight on the most uncertain or risky decisions. Option A is wrong because post-hoc review for high-value loans may miss issues in other cases. Option B is wrong while thorough, it may be too slow for low-risk decisions.

Option C is wrong because full autonomy reduces human involvement.

Full explanation →

300

MCQhard

A credit scoring company develops an AI model that includes social media activity as a factor. The model awards higher scores to individuals with many online connections and consistent posting. Consumer advocates argue that this penalizes individuals with limited internet access or those who value privacy. The company defends the model, stating that it predicts creditworthiness better than traditional models. However, a regulatory body is investigating potential discrimination. The company wants to address ethical concerns without completely abandoning the model. Which approach is most appropriate?

A.Remove social media data from the model immediately.

B.Increase the weight of traditional factors like income and payment history.

C.Conduct a thorough analysis to determine whether social media activity is a legitimate, non-discriminatory predictor of creditworthiness.

D.Continue using the current model but offer an alternative traditional scoring option.

AnswerC

Validating the factor's relevance and fairness ensures the model is both ethical and effective.

Why this answer

Option B is correct because validating the relevance of social media data through rigorous analysis ensures that the factor is both fair and predictive. Option A abandons an innovative feature without evidence of harm. Option C may not be enough if the feature is irrelevant.

Option D is too narrow.

Full explanation →

Page 4 of 7

All pages

Practice AI Associate by domain

Target a specific domain to shore up weak areas.

AI Fundamentals AI Capabilities in CRM Ethical Considerations of AI Data for AI

See all domains with question counts →