AI Associate Data for AI — All Questions With Answers

Question 1easymultiple choice

Read the full Data for AI explanation →

A company wants to use Einstein Prediction Builder to predict customer churn. Which data preparation step is essential before building the model?

Question 2mediummultiple choice

Read the full Data for AI explanation →

A data scientist needs to prepare data for Einstein Discovery. The dataset includes a field 'Customer_Status__c' with values 'Active', 'Inactive', and 'Churned'. How should this field be treated?

Question 3hardmultiple choice

Read the full Data for AI explanation →

A company uses Salesforce Data Cloud to unify customer data from multiple sources. After connecting a data stream, they notice that records are missing from the unified profile. What is the most likely cause?

Question 4easymultiple choice

Read the full Data for AI explanation →

A Salesforce admin is training an Einstein Bot to answer customer questions. Which data source should the bot use to provide accurate responses?

Question 5mediummultiple choice

Read the full Data for AI explanation →

A company uses Einstein Discovery to identify factors that increase case resolution time. After training, the model shows that 'Case_Origin__c' has high importance. What action should the company take?

Question 6hardmultiple choice

Read the full Data for AI explanation →

A company has set up Einstein Next Best Action with a recommendation strategy. They want to ensure that recommendations are personalized based on the customer's recent behavior. What data should be used?

Question 7easymultiple choice

Read the full Data for AI explanation →

A company wants to use Einstein Article Recommendations to suggest knowledge articles to support agents. What is a prerequisite for this feature?

Question 8mediummulti select

Read the full Data for AI explanation →

Which TWO actions are required to prepare data for an Einstein Discovery model?

Question 9hardmulti select

Read the full Data for AI explanation →

Which THREE factors should be considered when evaluating the quality of a dataset for an AI model?

Question 10mediummulti select

Read the full Data for AI explanation →

Which TWO data sources can be used with Einstein Prediction Builder?

Question 11mediummultiple choice

Read the full Data for AI explanation →

A company uses Salesforce Data Platform to store customer data. They want to use this data to train an AI model for lead scoring, but they are concerned about data quality. Which step should they take first to ensure the data is suitable for AI?

Question 12hardmultiple choice

Read the full Data for AI explanation →

A data scientist is building a predictive model for customer churn using Salesforce data. The dataset has 20 features, and the target variable is highly imbalanced (5% churn, 95% non-churn). Which technique should be applied to handle the class imbalance before training?

Question 13easymultiple choice

Read the full Data for AI explanation →

An administrator is configuring a Salesforce AI model that uses historical sales data. The data includes fields like 'Amount', 'Close_Date', and 'Lead_Source'. What is the primary purpose of data preprocessing in this context?

Question 14mediummultiple choice

Read the full Data for AI explanation →

A company is deploying an AI model that recommends next best actions for sales reps. They notice that the model's recommendations are biased towards high-revenue opportunities. Which data-related action can help reduce this bias?

Question 15easymultiple choice

Read the full Data for AI explanation →

A Salesforce admin wants to use Einstein Prediction Builder to predict case resolution time. What type of data is most critical for training this model?

Question 16mediummultiple choice

Read the full Data for AI explanation →

During the data preparation phase for an AI model, a data engineer discovers that the 'AnnualRevenue' field contains some negative values. What is the best course of action?

Question 17hardmulti select

Read the full Data for AI explanation →

Which TWO techniques are commonly used to handle missing values in a dataset for AI training?

Question 18mediummulti select

Read the full Data for AI explanation →

Which THREE factors should be considered when selecting features for a predictive model in Salesforce?

Question 19easymulti select

Read the full Data for AI explanation →

Which TWO are common data quality issues that can negatively impact AI model performance?

Question 20easymultiple choice

Read the full Data for AI explanation →

A company wants to use its data from Salesforce to train an Einstein AI model. However, they need to exclude records where the customer has opted out of data use. Which field should they configure in the Data Manager?

Question 21mediummultiple choice

Read the full Data for AI explanation →

A Salesforce admin is troubleshooting an Einstein Prediction Builder model that is not generating predictions. The model was created with a custom object 'Feedback__c'. The admin notices that the model's data source includes records with status 'In Progress' and 'Closed'. What is the most likely cause of the model not generating predictions?

Question 22hardmultiple choice

Read the full Data for AI explanation →

A large enterprise is using Einstein Lead Scoring and notices that the model score is not updating for leads created via a web-to-lead form. The leads have all required fields populated. The admin has verified that the model is active and the data source includes the Lead object. What could be causing the score to remain static?

Question 23easymultiple choice

Read the full Data for AI explanation →

A company wants to use Einstein Article Recommendations to surface relevant knowledge articles to its support agents. What two data components are required to set up this feature?

Question 24mediummultiple choice

Read the full Data for AI explanation →

An admin is configuring Einstein Vision and wants to train a model to identify product defects from images. The admin has uploaded 500 images of defective products and 500 images of non-defective products. However, the model training fails with an error about data quality. What is the most likely cause?

Question 25hardmultiple choice

Read the full Data for AI explanation →

A company is using Einstein Discovery to predict customer churn. The model was created six months ago and has been making predictions. Recently, the model's accuracy has dropped significantly. The data scientist confirms that the data schema has not changed. What is the most likely reason for the drop in accuracy?

Question 26mediummulti select

Read the full Data for AI explanation →

A company is implementing Einstein Prediction Builder to predict whether a support case will escalate. Which TWO data preparation steps should the admin take to improve model accuracy?

Question 27hardmultiple choice

Read the full Data for AI explanation →

You are a Salesforce AI Specialist at a mid-sized manufacturing company. The company uses Einstein Lead Scoring to prioritize leads. The model was trained on historical lead data and has been in production for three months. Recently, the sales team reports that high-scoring leads are not converting as expected. You investigate and find that the model's data source includes leads from the past 18 months. However, six months ago, the company changed its lead qualification process: they started requiring a demo before scoring leads as 'qualified.' As a result, the definition of a converted lead changed. What is the best course of action to improve model performance?

Question 28mediummultiple choice

Read the full Data for AI explanation →

You are an admin at a financial services firm. The firm wants to use Einstein Next Best Action to offer personalized product recommendations to customers on its service portal. The data includes customer profiles, transaction history, and support case history. The Einstein Next Best Action strategy is configured with a recommendation that shows a 'Savings Account' offer to customers who have a checking account. However, the recommendation is not appearing for any customers. You check the Data Flow and see that the 'Account' object data is flowing correctly. The recommendation's filter condition is: AND( Has_Checking_Accountc = true, Agec > 18 ). You verify that many customers meet these conditions. What is the most likely reason the recommendation is not appearing?

Question 29easymultiple choice

Read the full NAT/PAT explanation →

You are a Salesforce admin at a nonprofit organization. The organization uses Einstein Engagement Scoring to prioritize donors for outreach. The model is based on donation history and event attendance. Recently, the model stopped generating new scores for recently added donors. You check the data source and see that the model's data includes the 'Contact' and 'Opportunity' objects. The data refresh is scheduled daily. The model status is 'Active'. What should you investigate first to resolve the issue?

Question 30hardmultiple choice

Read the full NAT/PAT explanation →

You are a data scientist at a retail company. The company uses Einstein Discovery to analyze customer purchase patterns. The model is built on a dataset of 50,000 transactions. The model's R-squared is 0.85, but the predictions for new customers are consistently off by a large margin. The data includes features like 'Customer Age', 'Income', 'Previous Purchases', and 'Product Category'. The model was trained on data from the past two years. However, six months ago, the company launched a new loyalty program that significantly changed purchasing behavior. You suspect the model is not generalizing to new customers. What should you do to validate your hypothesis?

Question 31mediummultiple choice

Read the full Data for AI explanation →

A company is preparing customer data for a predictive model. They notice that many records have missing values for the 'annual income' field. Which approach is best to handle this issue while minimizing bias?

Question 32hardmultiple choice

Read the full Data for AI explanation →

A team is labeling text data for a sentiment analysis model. To ensure consistency and quality, which practice should they prioritize?

Question 33easymultiple choice

Read the full Data for AI explanation →

For a real-time AI application that requires low-latency access to customer interaction data, which storage solution is most appropriate?

Question 34mediummultiple choice

Read the full Data for AI explanation →

A company wants to use customer purchase history to train a recommendation model. Which action is essential to comply with data privacy regulations?

Question 35hardmultiple choice

Read the full Data for AI explanation →

A data pipeline fails intermittently when processing large CSV files. The error log shows 'OutOfMemoryError'. Which configuration change is most likely to resolve this?

Question 36easymultiple choice

Read the full Data for AI explanation →

Which data transformation is most appropriate for converting categorical variables into numerical format for a machine learning model?

Question 37mediummultiple choice

Read the full NAT/PAT explanation →

A dataset contains a 'date' column. Which feature engineering technique would best capture both long-term trends and seasonal patterns?

Question 38easymultiple choice

Read the full Data for AI explanation →

Which method is most suitable for ingesting streaming data from IoT sensors into a data lake?

Question 39hardmultiple choice

Read the full Data for AI explanation →

A global company needs to ensure that customer data used for AI models complies with multiple regional regulations (GDPR, CCPA, LGPD). Which data governance practice is most effective?

Question 40mediummulti select

Read the full Data for AI explanation →

Which TWO data preparation steps are critical for ensuring high-quality training data?

Question 41hardmulti select

Read the full Data for AI explanation →

Which THREE are key dimensions of data quality that directly impact AI model performance?

Question 42easymulti select

Read the full Data for AI explanation →

Which TWO considerations are important when labeling data for a supervised learning model?

Question 43mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data access policy is defined for a customer data set. Which statement best describes this policy?

Exhibit

{
  "policy": {
    "resource": "customer_data",
    "action": "read",
    "conditions": [
      {"field": "region", "operator": "eq", "value": "EU"}
    ],
    "masking": {
      "fields": ["email", "phone"],
      "method": "partial"
    }
  }
}

Question 44hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. The data pipeline is failing. What is the most likely cause?

Exhibit

2023-10-01 12:00:01 ERROR [DataPipeline] com.salesforce.datalake.pipeline.TransformException: Field 'account_id' not found in schema. Expected String, got null.
2023-10-01 12:00:02 INFO [DataPipeline] Retrying task 3/3 after 5000ms.

Question 45easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A developer runs a SOQL query. What does the output indicate?

Network Topology

Question 46mediummultiple choice

Read the full Data for AI explanation →

A data engineer is troubleshooting a predictive model that stopped updating. The data flow from Data Cloud shows 'Data Transform Failed' with error: 'Field Amount cannot be null'. What is the most likely cause?

Question 47easymultiple choice

Read the full Data for AI explanation →

A company is preparing data for Einstein Article Recommendation. Which data source is most appropriate for training the model?

Question 48hardmultiple choice

Read the full Data for AI explanation →

A retail company uses Einstein Next Best Action with customer data from Data Cloud. The recommendations are not personalized. The admin checks the data quality dashboard and finds that the 'Customer_Profile' object has 40% records with missing 'PreferredChannel' field. What is the best course of action?

Question 49mediummultiple choice

Read the full Data for AI explanation →

An admin created a data stream to bring external customer data into Data Cloud for Einstein. The data stream fails with error 'Schema mismatch: expected 10 fields, got 8'. What is the likely cause?

Question 50easymultiple choice

Read the full Data for AI explanation →

A company wants to use Einstein Reply Recommendations in Service Cloud. What data is required to train the model?

Question 51hardmultiple choice

Read the full Data for AI explanation →

A data architect is designing a data model for Einstein Discovery. The data includes categorical variables with high cardinality (e.g., postal codes). What is the best practice to handle such features?

Question 52mediummultiple choice

Read the full Data for AI explanation →

A company uses Einstein Prediction Builder to predict customer churn. The model's accuracy is low. The admin reviews the training data and notices that only 2% of records are churned. What should the admin do to improve the model?

Question 53hardmultiple choice

Read the full Data for AI explanation →

A system administrator receives an error when running a Data Cloud data transform: 'Row-level security settings are preventing access to the source data.' The admin has appropriate permissions. What is the most likely cause?

Question 54easymultiple choice

Read the full Data for AI explanation →

A marketer wants to use Einstein Segment Creation to build a segment for a campaign. Which data source can be used?

Question 55mediummulti select

Read the full Data for AI explanation →

A data analyst is evaluating data quality for an Einstein model. Which TWO dimensions are most critical for model accuracy?

Question 56easymulti select

Read the full Data for AI explanation →

A company is ingesting data from multiple sources into Data Cloud for Einstein. Which THREE data preparation steps should be performed?

Question 57hardmulti select

Read the full Data for AI explanation →

A data scientist is using Einstein Discovery to analyze sales data. The model results show a high correlation between two predictor variables. Which TWO actions should the data scientist take?

Question 58mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. What effect does this masking policy have on the data used for training an Einstein model?

Exhibit

{
  "maskingPolicy": {
    "name": "MaskSSN",
    "fields": ["SSN", "CreditCard"],
    "maskingType": "partial",
    "character": "*",
    "showLastFour": true
  }
}

Question 59hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. What is the most likely cause of this error?

Exhibit

2025-03-01 14:32:15 ERROR [DataTransformRunner] Transform failed: java.lang.ArithmeticException: / by zero
at com.salesforce.dc.datatransform.FormulaEvaluator.processField(FormulaEvaluator.java:256)

Question 60easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. What data quality issue does the exhibit reveal?

Exhibit

SELECT Campaign__c, COUNT(*) as Records, COUNT(Response__c) as NonNullResponse
FROM UnifiedCustomer
GROUP BY Campaign__c

Results:
Campaign__c | Records | NonNullResponse
Summer Sale  | 1000    | 750
Spring Promo | 800     | 800
Fall Clearance | 600    | 0

Question 61mediummultiple choice

Read the full Data for AI explanation →

A company is preparing data for Einstein Prediction Builder to forecast lead conversion. They have historical data with fields like Lead Source, Industry, Number of Employees, and Converted (boolean). Which data preparation step is most critical?

Question 62hardmultiple choice

Read the full Data for AI explanation →

A data scientist notices that an Einstein model for predicting customer churn has unusually high accuracy on training data but performs poorly on validation data. Which data issue is the most likely cause?

Question 63easymultiple choice

Read the full Data for AI explanation →

A company wants to build a sentiment analysis model using customer feedback. What is the best practice for labeling the training data?

Question 64mediummultiple choice

Read the full Data for AI explanation →

A large enterprise needs to integrate data from Salesforce CRM, an external ERP, and marketing automation to train an AI model for cross-sell recommendations. Which data storage strategy is most aligned with Salesforce's AI capabilities?

Question 65hardmultiple choice

Read the full Data for AI explanation →

A company is using customer support tickets to train a model for auto-classifying issues. The dataset includes fields like 'Case Title', 'Description', 'Product', and 'Customer Name'. Which privacy concern is most critical to address before training?

Question 66easymultiple choice

Read the full Data for AI explanation →

A fraud detection model is being trained on transaction data where only 1% of transactions are fraudulent. The current model predicts 'non-fraud' for all transactions, achieving 99% accuracy. Which technique should be applied to improve model performance?

Question 67mediummultiple choice

Read the full Data for AI explanation →

After applying a log transformation to a numeric feature, an Einstein model’s performance dropped significantly. What is the most likely cause?

Question 68hardmultiple choice

Read the full Data for AI explanation →

A bank uses Einstein Discovery to generate insights about loan approval decisions. After deployment, they notice the model denies loans to a higher percentage of applicants from a certain postal code. Which action should be taken to ensure responsible AI?

Question 69easymultiple choice

Read the full Data for AI explanation →

A company plans to use Einstein Discovery to analyze sales data. Which data preparation step is essential for time-series forecasting?

Question 70mediummulti select

Read the full Data for AI explanation →

A company is training a customer service chatbot using historical conversation logs. Which TWO data preparation practices should be followed to ensure data quality?

Question 71hardmulti select

Read the full Data for AI explanation →

Before training an Einstein Prediction model, a data analyst must perform data quality checks. Which THREE checks are most critical?

Question 72easymulti select

Read the full Data for AI explanation →

A data scientist is preparing numeric features for a regression model. Which TWO transformations are commonly applied to improve model performance?

Question 73hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data analyst has defined this field mapping for Einstein Prediction Builder. Which data issue would most likely arise from this mapping?

Exhibit

{
  "fieldMapping": [
    {"sourceField": "Id", "targetType": "Text"},
    {"sourceField": "AccountName", "targetType": "Text"},
    {"sourceField": "CloseDate", "targetType": "Date"},
    {"sourceField": "Amount", "targetType": "Number"},
    {"sourceField": "LeadSource", "targetType": "Category"}
  ]
}

Question 74mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data file for click-through model training has the above content. Which data quality issue is most critical to address before training?

Exhibit

Date,Clicks,Conversions
2023-01-01,100,10
01/02/2023,150,15
2023-03-01,200,

Question 75easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data analyst runs a profile on a dataset and sees these statistics. Based on best practices, which action should be taken first?

Exhibit

Total records: 10000
Missing values: 500
Duplicates: 200
Outliers in Amount: 50

Question 76mediummultiple choice

Read the full Data for AI explanation →

A data scientist notices that the model accuracy drops significantly after retraining with new data. Upon inspection, they find that many records have missing values for a key feature. Which data quality improvement should be prioritized first?

Question 77hardmultiple choice

Read the full Data for AI explanation →

A company is building a text classification model for customer support tickets. They have a dataset of 10,000 tickets. The team decides to use active learning for labeling. Which approach best aligns with active learning principles?

Question 78easymultiple choice

Read the full Data for AI explanation →

For an AI project, data must be stored in a way that supports both training and real-time inference. Which storage solution meets this requirement?

Question 79mediummultiple choice

Read the full Data for AI explanation →

A data engineer needs to create a feature that represents the average purchase amount per customer over the last 30 days. The transactional data is timestamped. Which feature engineering technique is most appropriate?

Question 80hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare AI model uses patient data. The legal team requires that all data used for training be de-identified according to HIPAA Safe Harbor method. Which data handling process satisfies this?

Question 81easymultiple choice

Read the full Data for AI explanation →

A team is building a pipeline to train a model daily. The source data arrives in CSV files but needs to be converted to Parquet for efficiency. Which pipeline step should perform this conversion?

Question 82mediummultiple choice

Read the full Data for AI explanation →

During data transformation, a data scientist applies one-hot encoding to a categorical feature with 50 unique values. The resulting dataset has 50 new columns. What is a potential drawback of this transformation?

Question 83hardmultiple choice

Read the full Data for AI explanation →

An organization uses Salesforce Data Cloud to unify customer data from multiple sources. They want to ensure that data lineage is tracked for AI models. Which practice supports data lineage?

Question 84easymultiple choice

Read the full Data for AI explanation →

A machine learning team is preparing a dataset for a supervised learning task. They have 100,000 labeled samples. Which data preparation step is essential before splitting into train/test sets?

Question 85mediummulti select

Read the full Data for AI explanation →

Which TWO of the following are common dimensions of data quality that must be addressed for AI training?

Question 86hardmulti select

Read the full Data for AI explanation →

Which TWO considerations are critical when planning data labeling for a computer vision project in a regulated industry?

Question 87easymulti select

Read the full Data for AI explanation →

Which THREE types of data sources are commonly integrated into Salesforce Data Cloud for AI use cases?

Question 88mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data scientist tries to query the dataset but receives an error. Which of the following is the most likely cause?

Exhibit

{
  "dataAccessPolicy": {
    "dataset": "customer_transactions",
    "allowedUsers": ["data_scientist", "ml_engineer"],
    "deniedUsers": ["intern"],
    "fields": ["transaction_id", "amount", "date"],
    "conditions": {
      "amount": {"gt": 0, "lt": 10000}
    }
  }
}

Question 89hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data pipeline fails during the DataTransformation stage. What is the most likely root cause?

Exhibit

2024-05-10 14:32:15 ERROR PipelineRunner - Stage: DataTransformation
2024-05-10 14:32:15 ERROR PipelineRunner - Exception: Column 'age' not found in schema
Schema: name string, income float, region string

Question 90easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data transformation configuration is shown. Which of the following describes the outcome of applying this transformation?

Exhibit

transform:
  - type: one-hot
    columns: [color]
  - type: standard-scaler
    columns: [price, weight]

Question 91mediummultiple choice

Read the full Data for AI explanation →

A company uses Einstein Prediction Builder to predict customer churn. The data includes account creation date, number of support cases, and average payment delay. After training, the model shows low confidence scores. What is the most likely cause?

Question 92easymultiple choice

Read the full Data for AI explanation →

A Salesforce admin wants to use Einstein Recommendations to suggest products. What is a key requirement for the data used to train the recommendation model?

Question 93hardmultiple choice

Read the full Data for AI explanation →

An organization is preparing data for Einstein Next Best Action. They have multiple action types (discounts, product suggestions, content). Which data model approach best ensures accurate recommendations?

Question 94mediummultiple choice

Read the full Data for AI explanation →

A data scientist is preparing data for Einstein Discovery. The dataset has 10,000 records with 5 predictors and one outcome. The outcome is binary (1/0). What is the minimum number of positive outcomes typically required for a reliable model?

Question 95easymultiple choice

Read the full Data for AI explanation →

An admin is setting up Einstein Article Recommendations. Which type of data is essential for the model to learn which articles are relevant?

Question 96hardmultiple choice

Read the full Data for AI explanation →

A company uses Einstein Forecasting for revenue prediction. The historical data shows seasonal spikes every quarter. The model consistently underestimates peak periods. What is the best data preparation step to improve accuracy?

Question 97mediummultiple choice

Read the full Data for AI explanation →

An admin is troubleshooting Einstein Sentiment. The model returns high confidence but wrong sentiment (e.g., positive reviews labeled negative). What is the most likely issue?

Question 98easymultiple choice

Read the full Data for AI explanation →

When using Einstein Lead Scoring, which data source is most critical for generating accurate lead scores?

Question 99hardmultiple choice

Read the full NAT/PAT explanation →

A company has international customers and wants Einstein Prediction Builder to forecast deal closure probability. The data includes fields like 'region', 'product line', and 'deal amount'. What is a best practice to ensure the model works for all regions?

Question 100mediummulti select

Read the full Data for AI explanation →

Which TWO data preparation steps are required before using Einstein Discovery for sales forecasting? (Choose 2)

Question 101hardmulti select

Read the full Data for AI explanation →

Which THREE actions are recommended when preparing data for Einstein Next Best Action? (Choose 3)

Question 102mediummulti select

Read the full Data for AI explanation →

A data analyst is troubleshooting Einstein Article Recommendations that are not showing up on the site. Which TWO checks should be performed first? (Choose 2)

Question 103mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data scientist sees this error when training an Einstein Discovery model for customer churn prediction. What is the most likely reason for the error?

Exhibit

{
  "model": "EinsteinDiscovery_Churn_v2",
  "status": "TRAINING_FAILED",
  "errorCode": "INSUFFICIENT_POSITIVE_EXAMPLES",
  "fieldCount": 8,
  "recordCount": 3200,
  "positiveExamples": 180
}

Question 104hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A developer runs this SOQL query to prepare data for Einstein Lead Scoring. The query returns an error. What is the most likely issue?

Exhibit

SELECT AccountId, SUM(Amount) TotalAmount
FROM Opportunity
WHERE CloseDate > LAST_N_DAYS:365
GROUP BY AccountId
HAVING TotalAmount > 100000

Question 105easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A dataflow is set up to prepare data for a prediction model. The model is expected to predict close probability for all open opportunities. What is wrong with this dataflow?

Exhibit

{
  "dataflow": "EinsteinDataPrep_Sales",
  "nodes": [
    {
      "id": 1,
      "type": "source",
      "object": "Opportunity"
    },
    {
      "id": 2,
      "type": "transform",
      "operation": "filter",
      "condition": "StageName = 'Closed Won'"
    },
    {
      "id": 3,
      "type": "output",
      "target": "EinsteinDiscovery_Stage"
    }
  ]
}

Question 106easymultiple choice

Read the full Data for AI explanation →

A company wants to train an AI model to predict customer churn using historical data that contains many missing values. What is the best practice for handling missing data?

Question 107mediummultiple choice

Read the full Data for AI explanation →

A data scientist needs to feed customer interaction data into Einstein Discovery for predictive analysis. Which data format is required?

Question 108hardmultiple choice

Read the full Data for AI explanation →

A company uses Salesforce Data Cloud to unify customer data from multiple sources for AI model training. After adding a new data source, model performance degrades significantly. What is the most likely cause?

Question 109easymultiple choice

Read the full Data for AI explanation →

Which data type is most commonly used for image recognition AI models?

Question 110mediummultiple choice

Read the full Data for AI explanation →

A team has limited labeled data for a Salesforce predictive model but wants to leverage a pre-trained model from a related task. Which machine learning approach should they use?

Question 111hardmultiple choice

Read the full Data for AI explanation →

After deploying an AI model in Salesforce, the data scientist notices high accuracy on the training set but poor accuracy on new incoming data. What is this phenomenon called?

Question 112easymultiple choice

Read the full Data for AI explanation →

To ensure AI model fairness and avoid biased outcomes, which practice is most critical when preparing training data?

Question 113mediummultiple choice

Read the full Data for AI explanation →

A company wants to integrate external customer behavior data into Salesforce to enhance AI predictions. Which Salesforce Data Cloud feature is specifically designed to ingest and map external data?

Question 114hardmultiple choice

Read the full NAT/PAT explanation →

A data scientist discovers that an AI model used for loan approval predicts high default risk disproportionately for a specific demographic group. What is the first step to address this issue?

Question 115mediummulti select

Read the full Data for AI explanation →

Which TWO are best practices for data labeling in AI projects? (Choose two.)

Question 116hardmulti select

Read the full Data for AI explanation →

Which THREE are key considerations for data privacy when using AI models that process customer data? (Choose three.)

Question 117easymulti select

Read the full Data for AI explanation →

Which TWO are common data quality issues that negatively impact AI model performance? (Choose two.)

Question 118mediummultiple choice

Read the full Data for AI explanation →

What is the most likely cause of the error?

Exhibit

Refer to the exhibit.

Error log:
2023-11-01 10:23:45 ERROR: Failed to load data from source 'salesforce_opportunities'. Reason: Field 'Amount' has null values in 30% of records.
2023-11-01 10:23:46 WARN: Data quality check: 'Stage' field contains 15 distinct values, expected 8.

Question 119hardmultiple choice

Read the full Data for AI explanation →

What is the primary purpose of this policy?

Exhibit

Refer to the exhibit.

JSON policy:
{
  "dataAccess": {
    "allowedSources": ["Salesforce", "AWS S3"],
    "dataFields": ["Account.Name", "Contact.Email"],
    "dataMasking": {"SSN": "XXXXX"},
    "retentionDays": 90
  }
}

Question 120easymultiple choice

Read the full Data for AI explanation →

What is being performed in this command?

Network Topology

Question 121mediummultiple choice

Read the full Data for AI explanation →

A Salesforce admin is preparing a dataset for Einstein Prediction Builder. The dataset contains a field "Income" with many missing values. The admin wants to minimize bias in the model. What is the best practice?

Question 122easymultiple choice

Read the full Data for AI explanation →

When training an Einstein Discovery model, which data type is not supported as a predictor field?

Question 123hardmultiple choice

Read the full Data for AI explanation →

A data scientist notices that a Salesforce Einstein model's performance degrades over time. The model was trained on data from the last year. What is the most likely cause?

Question 124mediummultiple choice

Read the full Data for AI explanation →

To integrate external data into Salesforce for AI, which tool is recommended by Salesforce for building data pipelines?

Question 125easymultiple choice

Read the full Data for AI explanation →

In Salesforce CRM Analytics (formerly Einstein Analytics), what is the primary purpose of a dataset?

Question 126hardmultiple choice

Read the full Data for AI explanation →

A company wants to use Einstein Next Best Action but needs to ensure data privacy. What is the required step for anonymizing customer data in Data Pipelines?

Question 127mediummultiple choice

Read the full Data for AI explanation →

While building a prediction model in Einstein Studio, the system warns about "high cardinality" for a categorical field. What should the admin do?

Question 128easymultiple choice

Read the full Data for AI explanation →

Which Salesforce feature automatically flags data quality issues before training an AI model?

Question 129hardmultiple choice

Read the full Data for AI explanation →

A data integration specialist is using Data Pipelines to combine Salesforce data with an external CSV file. The CSV has a header row but some rows have extra commas, causing parsing errors. What should the specialist do?

Question 130mediummulti select

Read the full Data for AI explanation →

A Salesforce admin is reviewing data sources for Einstein Recommendation Builder. Which two data types are required for training? (Choose two.)

Question 131hardmulti select

Read the full Data for AI explanation →

Which three practices help maintain data quality for AI models in Salesforce? (Choose three.)

Question 132easymulti select

Read the full Data for AI explanation →

When preparing data for Einstein Next Best Action, which two aspects must be considered for compliance with data privacy regulations? (Choose two.)

Question 133mediummultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. In the JSON configuration above, which data preparation step could introduce bias?

Exhibit

{
  "dataset": "OpportunityPrediction",
  "fields": [
    {"name": "Amount", "type": "currency", "missing": "fill_with_median"},
    {"name": "Stage", "type": "picklist", "missing": "exclude"},
    {"name": "CreatedDate", "type": "date", "missing": "use_default"},
    {"name": "Description", "type": "textarea", "missing": "ignore"}
  ],
  "model": "EinsteinDiscovery",
  "target": "Won"
}

Question 134easymultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. What is the most likely cause of the pipeline failure?

Exhibit

Pipeline: MyPipeline
Status: FAILED
Error: RecordsProcessed: 5000, Errors: 120
LastError: FIELD_INTEGRITY_EXCEPTION: CustomLeadField__c: value not found in picklist values

Question 135hardmultiple choice

Read the full Data for AI explanation →

A global company uses Salesforce Einstein Discovery to predict customer churn. They have a dataset with fields: Customer_Since__c (date), Last_Interaction_Date__c (date), Support_Cases__c (number), Product_Usagec (percentage), Regionc (picklist), and Churned__c (boolean target). The model was trained and deployed, but predictions show bias against customers in the "EMEA" region. The data scientist notices that in the training data, 80% of EMEA customers are labeled as churned, while only 20% of other regions. Additionally, the Product_Usage__c field has many missing values for EMEA customers. The company wants to retrain the model to reduce bias. What is the best course of action?

Question 136easymultiple choice

Read the full Data for AI explanation →

A marketing agency needs to ingest real-time social media mentions for a sentiment analysis AI model. Which Data Cloud object type should they use to set up the ingestion?

Question 137mediummultiple choice

Read the full Data for AI explanation →

A retailer's AI model for recommendation is producing poor results. Analysis shows that the customer entity has many duplicate records with slight variations. Which Data Cloud feature should be used to address this?

Question 138hardmultiple choice

Read the full Data for AI explanation →

A large enterprise uses Data Cloud to power an Einstein model for lead scoring. The model's feature pipeline includes dozens of fields from multiple data streams. Performance has degraded, and the team suspects slow feature retrieval. What is the most efficient way to speed up feature computation in Data Cloud?

Question 139easymultiple choice

Read the full Data for AI explanation →

A company plans to train an AI model using data from Salesforce CRM and an external marketing automation platform. What is the first step to unify these data sources in Data Cloud?

Question 140mediummultiple choice

Read the full Data for AI explanation →

A financial institution must ensure that customer data used for AI models does not expose personally identifiable information (PII) to unauthorized users. Which Data Cloud feature should be applied to the data model?

Question 141hardmultiple choice

Read the full Data for AI explanation →

A data architect notices that a Data Stream from an external ERP system is failing intermittently with schema mismatch errors. The ERP team says the schema changes occasionally. What is the most effective long-term solution?

Question 142easymultiple choice

Read the full Data for AI explanation →

A news outlet wants to build an AI model that predicts article popularity using real-time social media mentions. Which data source type should they use to ingest tweets?

Question 143mediummultiple choice

Read the full Data for AI explanation →

A manufacturer wants to improve demand forecasting by enriching its CRM orders with external demographic data. The external data is available via a SOAP API. How should the data architect implement this?

Question 144easymulti select

Read the full Data for AI explanation →

Which TWO of the following are valid methods to improve data quality in Data Cloud before training an AI model?

Question 145mediummulti select

Read the full Data for AI explanation →

Which THREE of the following are required when setting up a data stream from Salesforce to Data Cloud?

Question 146hardmulti select

Read the full Data for AI explanation →

Which THREE of the following are best practices for feature engineering in Einstein Studio?

Question 147mediummultiple choice

Read the full Data for AI explanation →

A large retail company uses Data Cloud to consolidate customer data from e-commerce, POS, and loyalty programs. They plan to use Einstein Studio to build a churn prediction model. The data architect notices that the churn model's accuracy is below expectations. Upon investigation, they find that the customer entity in Data Cloud has multiple records for the same customer with slightly different spellings and addresses. The data comes from different streams. What should the data architect do to improve the model?

Question 148hardmultiple choice

Read the full Data for AI explanation →

A financial services firm uses Data Cloud to enrich sales data with external credit scores via an API. They set up a Data Action to call the credit bureau API for each new lead. Over time, API costs are rising, and the action is slowing down lead processing. They only need credit scores for leads with a high probability of conversion. What is the best approach to reduce costs and improve performance?

Question 149easymultiple choice

Read the full NAT/PAT explanation →

A non-profit organization uses Data Cloud to manage donor data from multiple sources (email campaigns, event attendance, donations). They want to use an AI model to predict future donations. The data scientist says the model needs a unified view of each donor with consistent fields. What is the first step the data architect should take in Data Cloud to enable this?

Question 150mediummultiple choice

Read the full NAT/PAT explanation →

A healthcare provider implements Data Cloud to predict patient readmission rates. They have HIPAA compliance requirements. The data includes sensitive patient health information (PHI). The AI model must be trained without exposing PHI to unauthorized users. The data architect uses Data Cloud's data masking on PHI fields. However, model performance drops significantly after masking because the masked values lose predictive value. What additional step should the architect consider to maintain model performance while protecting PHI?

Question 151easymulti select

Read the full Data for AI explanation →

A company is preparing customer data to train a custom AI model for sentiment analysis. Which two data preparation best practices should they follow? (Choose two.)

Question 152hardmulti select

Read the full Data for AI explanation →

Data quality is critical for AI model performance. Which three data quality dimensions should be monitored? (Choose three.)

Question 153easymultiple choice

Read the full Data for AI explanation →

A retail company has implemented a Salesforce AI lead scoring model to prioritize high-value customers. After three months, the model's AUC-ROC score is only 0.55, indicating poor performance. The data scientist reviews the training data and finds that 20% of the records are exact duplicates due to multiple data imports from different sources. The duplicates have inconsistent target labels (some labeled 'converted', others 'not converted'). What should the data scientist do to improve model performance?

Question 154mediummultiple choice

Read the full Data for AI explanation →

A telecom company uses Einstein Discovery to predict customer churn. The training dataset contains 100,000 records, but only 5% represent churned customers. The model achieves 95% accuracy on a holdout test set, but the recall for churn is only 20%. The business wants to proactively retain at-risk customers, so they need to identify as many churners as possible. What action should the data scientist take to improve churn recall?

Question 155hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare organization uses Salesforce to develop an AI model for patient readmission prediction. They must comply with HIPAA regulations. The dataset includes patient names, addresses, medical record numbers, and detailed clinical notes. The data scientist plans to train a supervised model using historical readmission outcomes. What is the most important data governance step before model training?

Question 156easymultiple choice

Read the full Data for AI explanation →

A marketing team wants to use Einstein Recommendations to personalize product offers on their e-commerce site. They have a dataset of 50,000 customers with purchase history. However, 40% of customers have no purchase history (new registrations). The model performs well for returning customers but gives generic recommendations for new ones. The team wants to improve recommendations for new customers. What data preparation step should they take?

Question 157mediummultiple choice

Read the full NAT/PAT explanation →

A sales operations team is training an AI model to forecast quarterly revenue. They have five years of historical data, which includes a strong seasonal pattern but also a significant outlier: during the pandemic year, revenue dropped by 70% from typical values. The model trains with high accuracy on historical data but fails to predict future quarters accurately, consistently overestimating revenue. What should the data scientist do to improve forecast accuracy?

Question 158hardmultiple choice

Read the full Data for AI explanation →

A financial services company uses Salesforce AI to detect fraudulent transactions. The dataset has 1 million legitimate transactions and only 1,000 fraudulent ones. The model trained with default parameters achieves 99.9% accuracy but identifies no fraud (precision and recall of 0). The data scientist wants to maximize fraud detection (recall) while minimizing false positives. Which approach is most effective?

Question 159easymultiple choice

Read the full Data for AI explanation →

A company is building a chatbot using Einstein Bot's AI capabilities. They want to train intent recognition using historical chat transcripts. The transcripts contain many typos (e.g., 'hellp' instead of 'help') and slang (e.g., 'gonna' instead of 'going to'). The initial model performs poorly, misclassifying many intents. What data cleaning step is most important?

Question 160mediummultiple choice

Read the full NAT/PAT explanation →

A multinational corporation uses Salesforce AI to analyze customer feedback across multiple languages. They have 10,000 English reviews, 2,000 Spanish reviews, and 500 French reviews. The sentiment model performs well on English (F1=0.85) but poorly on French (F1=0.40). The data scientist wants to improve French sentiment performance without collecting new data. What should they do?

Question 161mediummulti select

Read the full Data for AI explanation →

A company is preparing their Salesforce Data Cloud for Einstein AI predictions. They need to ensure data quality and governance. Which TWO actions should they take? (Choose two.)

Question 162hardmultiple choice

Read the full Data for AI explanation →

Refer to the exhibit. A data analyst receives an error when trying to use this model configuration for Einstein AI predictions. Which issue is most likely causing the error?

Exhibit

{
  "modelConfig": {
    "modelType": "classification",
    "targetField": "Churn__c",
    "featureFields": ["Age__c", "Tenure__c", "Usage__c"],
    "dataSource": "DataCloudObject",
    "trainingWindow": "Last_90_Days",
    "predictionWindow": "Next_30_Days",
    "splitRatio": 0.8,
    "regularization": "L2"
  }
}

Question 163easymultiple choice

Read the full Data for AI explanation →

A retail company uses Salesforce Data Cloud to power Einstein AI for personalized product recommendations. They have integrated customer data from multiple sources: ERP (order history), marketing automation (email engagement), and web analytics (browsing behavior). The data model includes a unified Customerdlm object with fields: Agec, TotalSpendc, LastPurchaseDatec, EmailEngagementScorec, and WebSessionCountc. The AI model is configured to predict "LikelyToPurchaseNextWeek__c" (Boolean). The data team has noticed that the predictions are less accurate for new customers (those with less than 30 days of data). The model was trained on all customer data without any filtering. The team wants to improve model performance without increasing training frequency. What should they do?