Practice AIF-C01 Fundamentals of AI and ML questions with full explanations on every answer.
Start practicing
Fundamentals of AI and ML — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A data scientist wants to quickly build a supervised learning model for binary classification on a tabular dataset with 10,000 rows and 200 features. The dataset has some missing values and requires minimal code. Which AWS service should the data scientist use?
2An ML team is deploying a real-time inference endpoint for a computer vision model using Amazon SageMaker. The model requires GPU acceleration for low latency. Which instance type should the team choose to minimize cost while meeting the GPU requirement?
3A company is training a deep learning model on Amazon SageMaker using a custom Docker container. The training job fails with the error 'CannotStartContainerError: API error (500): failed to create shim task'. The team verifies that the container image is compatible with the selected instance type. What is the most likely cause of this error?
4A machine learning engineer is using Amazon SageMaker to train a model and wants to automatically stop the training job if the loss does not improve for 10 consecutive epochs. Which SageMaker feature should be used?
5A company needs to store large amounts of unstructured training data (images, videos) in a cost-effective manner while ensuring low-latency retrieval for training jobs running on Amazon SageMaker. Which storage solution should be used?
6An organization wants to detect anomalies in real-time streaming data from IoT devices. The data includes sensor readings, and the team plans to use a machine learning model. Which AWS service should be used to build and deploy the model with minimal operational overhead?
7During a SageMaker training job, the data scientist observes that the loss is not decreasing after the initial few epochs. The model is a deep neural network with ReLU activations. Which hyperparameter adjustment is most likely to help?
8Which TWO services can be used to preprocess data for machine learning in AWS? (Choose two.)
9Which THREE statements about Amazon SageMaker Ground Truth are correct? (Choose three.)
10Which TWO factors should be considered when choosing between a CPU-based instance and a GPU-based instance for training a machine learning model on Amazon SageMaker? (Choose two.)
11Refer to the exhibit. A data scientist attaches the above IAM policy to a SageMaker notebook instance role. The notebook is in the same AWS account as the S3 bucket. When trying to read a file from 's3://my-bucket/training/data.csv', the data scientist gets an Access Denied error. What is the most likely cause?
12Refer to the exhibit. A data scientist is training a neural network model on SageMaker. The training log shows the loss values per epoch. Which issue is most likely occurring?
13A data scientist is training a binary classification model to predict customer churn. The dataset has 10,000 records with 9,500 non-churners and 500 churners. After training a logistic regression model, the model achieves 95% accuracy on the test set. However, the business team reports that the model is not useful because it predicts almost all customers as non-churners. Which metric should the data scientist use to evaluate the model's performance in this scenario?
14A company is deploying a machine learning model for real-time fraud detection. The model must make predictions with latency under 10 milliseconds. The data scientist trained a gradient boosting model that achieves high accuracy but has inference latency of 50 milliseconds. The team has access to a larger instance type with more CPU cores. Which approach should the data scientist take to reduce inference latency while maintaining accuracy?
15A company wants to build a system that automatically categorizes customer support tickets into predefined categories (e.g., billing, technical, account). The team has a large dataset of historical tickets with their category labels. Which type of machine learning problem is this?
16A data scientist is using Amazon SageMaker to train a deep learning model for image classification. The training job is taking too long. The dataset consists of 100,000 images stored in Amazon S3. Which action can the data scientist take to reduce training time without modifying the model architecture?
17Which TWO of the following are best practices for preparing training data for a machine learning model?
18A financial services company uses a machine learning model to approve loan applications. The model is a gradient boosting classifier trained on historical loan data. Recently, the company noticed that the model's approval rate for applicants from a certain demographic group is significantly lower than for other groups, even though the model's overall accuracy remains high. The data science team has been asked to address this potential bias while minimizing the impact on overall model performance. The team has access to the training data and the trained model. They have limited time and budget. Which course of action should the team take first?
19A retail company uses a machine learning model to forecast daily product demand. The model is a time series model that uses historical sales data. The model has been performing well, but recently the forecasts have been consistently too low, leading to stockouts. The data scientist notices that the model was trained on data up to last year, and the company has since launched a successful marketing campaign that increased sales by 20%. The data scientist needs to update the model to reflect the new sales patterns. Which approach should the data scientist take?
20A company wants to use AI to automatically transcribe customer service calls into text. Which AWS service is most suitable?
21Which metric is most appropriate for evaluating a classification model when false positives are costly?
22A data scientist is preparing data for a machine learning model. What is the purpose of splitting the data into training, validation, and test sets?
23A company wants to automatically detect anomalies in server metrics. Which algorithm is most appropriate?
24A team is training a binary classification model using Amazon SageMaker. They notice that the training accuracy is 99% but the test accuracy is only 70%. Which technique should they apply first to address this?
25During model training, the loss decreases rapidly for the first few epochs and then plateaus. The validation loss starts increasing after some epochs. What should the team do to improve generalization?
26A company is deploying a machine learning model for real-time fraud detection. The model must have latency under 100ms. Which infrastructure choice is most appropriate?
27A team trains a model using Amazon SageMaker built-in XGBoost. After training, they want to evaluate feature importance. Which SageMaker feature allows them to view this?
28An ML engineer wants to store training data in a format optimized for linear data scanning and columnar access in SageMaker. Which format is most appropriate?
29A data scientist is preparing data for a classification task. Which TWO techniques are commonly used for handling missing values? (Choose two.)
30Which AWS services can be used to build, train, and deploy custom machine learning models? (Choose two.)
31A company is training a deep learning model for image classification. Which THREE practices help reduce overfitting? (Choose three.)
32A company wants to automatically detect anomalies in their AWS CloudTrail logs to identify potential security threats. Which AWS service is specifically designed for this purpose?
33A startup is building a recommendation engine for their e-commerce platform. They need a fully managed service that can generate personalized product recommendations based on user behavior. Which AWS service should they use?
34A data scientist is working with a dataset that contains both numerical and categorical features. Which algorithm is commonly used for regression tasks in AWS SageMaker?
35A company uses Amazon SageMaker to train a model. The training job fails with 'InsufficientInstanceCapacity' error. What is the most likely cause?
36A financial services company needs to ensure that the machine learning models used for loan approval are explainable and meet regulatory compliance. Which AWS feature can help explain model predictions?
37A company is using Amazon Rekognition to detect objects in images. They need to detect custom objects that are specific to their domain. What should they do?
38A team is training a deep learning model using Horovod distributed training on SageMaker. They observe that the loss stops decreasing after a few epochs. Which technique should they implement to reduce overfitting?
39A data scientist wants to perform automatic model tuning (hyperparameter optimization) on SageMaker. They need to find the best hyperparameters for a gradient boosting model. Which strategy is BEST for this task?
40An e-commerce company stores user interaction logs in Amazon S3. They want to use machine learning to segment users based on purchasing behavior. Which unsupervised learning algorithm is most appropriate?
41A company wants to use AWS services to process natural language text. Which TWO AWS services provide natural language processing (NLP) capabilities? (Select TWO.)
42A data scientist is evaluating different AWS services for building a machine learning pipeline. Which THREE components are part of Amazon SageMaker? (Select THREE.)
43A company is using Amazon Fraud Detector to detect fraudulent transactions. Which TWO actions can be taken to improve model accuracy? (Select TWO.)
44A startup with limited ML expertise wants to quickly prototype a binary classification model using a small customer dataset. They need a managed environment to run Jupyter notebooks and access pre-built algorithms. Which AWS service should they choose?
45A data scientist wants to host a pre-trained model on Amazon SageMaker for real-time inference with minimal latency. Which approach should they use?
46A company is using Amazon SageMaker to train a model. They want to automatically stop training if the model performance stops improving on a validation dataset. Which SageMaker feature should they enable?
47A company is training a deep learning model on Amazon SageMaker using a large dataset stored in S3. Training jobs are frequently failing with 'OutOfMemoryError'. The training algorithm uses PyTorch. How should the data scientist solve this without reducing model accuracy?
48A data science team is using Amazon SageMaker to train multiple models with different hyperparameters. They want to track metrics, compare runs, and reproduce the best result. Which SageMaker feature should they use?
49A company wants to use Amazon SageMaker to train a model using a custom Docker container that has specific dependencies. The training code is stored in an S3 bucket. Which steps must be taken to run the training job?
50A financial services company needs to deploy a real-time fraud detection model with sub-100ms inference latency. The model is a large ensemble requiring 8 GB of memory per request. The workload has bursty traffic. Which Amazon SageMaker deployment strategy best meets these requirements?
51A company is using Amazon SageMaker to train a large language model with hundreds of billions of parameters. The model does not fit into the memory of a single GPU. Which approach should they use to train the model efficiently?
52A healthcare company is using Amazon SageMaker to deploy a model that makes predictions on patient data. They need to ensure that the model's predictions are explainable to comply with regulations. Which approach should they take?
53A data scientist wants to deploy a custom model built with TensorFlow to Amazon SageMaker for real-time inference. Which TWO steps are required? (Choose two.)
54A company wants to use Amazon SageMaker Ground Truth to build a labeled dataset for a custom object detection model. Which TWO labeling strategies are available? (Choose two.)
55A data engineer is using Amazon SageMaker Data Wrangler to prepare tabular data for ML. Which THREE data transformations are natively supported? (Choose three.)
56Refer to the exhibit. A data scientist ran a training job on Amazon SageMaker. The job failed with the error shown. What is the most likely cause?
57Refer to the exhibit. A SageMaker training job fails with an 'AccessDenied' error when trying to read files from the S3 bucket 'my-training-data'. The IAM role used by the training job has the policy shown. What is the most likely reason for the failure?
58Refer to the exhibit. A SageMaker real-time endpoint is experiencing increasing latency and memory errors after running for a few hours. What is the most likely cause and recommended fix?
59A data scientist is building a binary classification model for fraud detection. The dataset is highly imbalanced (99% legitimate, 1% fraud). Which metric is most appropriate to evaluate model performance?
60A company wants to predict customer churn. They have historical data with features like usage minutes, support tickets, contract length. The target is binary: churn/not churn. Which ML algorithm is best suited?
61A team trained a deep learning model that achieves 99% accuracy on training data but only 70% on validation data. What is the most likely issue?
62A data scientist is using Amazon SageMaker to train a model. The training job is taking longer than expected. Which change would most likely reduce training time?
63A team has built a regression model to predict house prices. The RMSE is 50,000 on the test set. Which action is most appropriate to improve model performance?
64A company is using Amazon Rekognition to detect objects in images. They find that the service sometimes mislabels objects. What is the best way to improve accuracy for their specific use case?
65A data scientist needs to preprocess categorical data with high cardinality (e.g., zip code with 50,000 unique values). Which technique is most appropriate?
66A company is using Amazon Comprehend for sentiment analysis on customer reviews. They notice that the sentiment is often incorrect for negative reviews with sarcasm. What is the likely cause?
67A team is evaluating a classification model. The confusion matrix shows: TP=80, FN=20, FP=10, TN=90. What is the precision?
68Which TWO of the following are best practices for data preprocessing in machine learning? (Select TWO.)
69Which THREE of the following are capabilities of Amazon SageMaker? (Select THREE.)
70Which TWO techniques are commonly used to prevent overfitting in machine learning models? (Select TWO.)
71Refer to the exhibit. A data scientist ran a training job on Amazon SageMaker and it failed. Which action should the data scientist take FIRST to resolve the issue?
72Refer to the exhibit. A developer wants to ensure the notebook instance can access the internet to download packages. Which property configuration ensures this?
73Refer to the exhibit. A user tries to create a training job that reads data from a bucket named 'my-bucket'. The job fails with an access denied error. What is the most likely cause?
74A startup needs to predict customer churn based on historical data containing labels (churned or not). Which type of machine learning should they use?
75A data scientist is training a model using Amazon SageMaker and notices the training loss is decreasing but validation loss starts increasing after a few epochs. Which technique should they apply to address this?
76A company wants to deploy a real-time inference endpoint for a custom model on SageMaker. The model has high latency (100ms) and they need to handle variable traffic with spikes. Which deployment strategy is most cost-effective?
77A developer needs to preprocess a dataset consisting of customer reviews for sentiment analysis. Which text preprocessing technique is most likely to improve model accuracy?
78A company wants to build a model to forecast monthly sales. The data is a time series with trend and seasonality. Which SageMaker algorithm is most appropriate?
79An organization wants to use Amazon Rekognition to analyze images of people for a security application. They must comply with GDPR. What is the best practice?
80In a binary classification problem, the model predicts majority class for all inputs. What is this issue called?
81A data scientist is using SageMaker to train a model on a dataset with many features. They suspect some features are redundant. Which feature engineering technique would help?
82A SageMaker endpoint is configured with automatic scaling. The model's inference time is 50ms, and traffic increases gradually. What scaling metric should be used to add instances before latency increases?
83Which TWO of the following are types of feature scaling?
84Which THREE are SageMaker built-in algorithms suitable for regression tasks?
85Which TWO are best practices for model monitoring in production on AWS?
86A company is training a large language model using Amazon SageMaker. The training job fails with the error 'OutOfMemory'. They are using a single ml.p3.2xlarge instance. The dataset is 50GB and the model is 2GB. The training script uses standard data loading. Which action should they take to resolve the issue?
87A deployed model on an Amazon SageMaker endpoint is experiencing high inference latency (average 500ms) during peak hours. The model is a deep neural network with 10 million parameters. The endpoint uses a single ml.c5.xlarge instance. The company wants to reduce latency to under 200ms without retraining or changing the model architecture. Which action should they take?
88A social media company needs to automatically detect and flag toxic comments in multiple languages. They have a large stream of user comments and require real-time moderation. Which AWS service is best suited for this task?
89Which TWO of the following are examples of supervised learning tasks that can be performed using Amazon SageMaker built-in algorithms?
90A data scientist at a retail company is tasked with building a model to predict customer churn. The dataset contains 100,000 records with features such as age, purchase history, customer support interactions, and a binary label indicating whether the customer churned in the past. The team needs a model that can be deployed for real-time inference with low latency. They have limited time and want to use a built-in algorithm from Amazon SageMaker that is optimized for classification tasks. Which approach should they take?
91A manufacturing company is deploying IoT sensors to monitor equipment performance. The sensors generate continuous unlabeled time-series data with thousands of dimensions. The goal is to detect anomalies indicating potential failures in real time. The data science team has experience with unsupervised learning and wants to use a SageMaker built-in algorithm that can handle high-dimensional data and identify outliers. They also need to reduce the number of dimensions to improve training speed without losing important information. Which approach should they take?
92A marketing agency wants to analyze customer feedback from social media posts to gauge sentiment. They have no labeled data and limited ML expertise. The team needs a managed service that provides pre-trained models for sentiment analysis without requiring them to train or manage infrastructure. They also need to process text in multiple languages. Which AWS service should they use?
93A startup wants to build a product recommendation engine for their e-commerce platform. They have user purchase history and item metadata. They want a fully managed solution that can automatically train and deploy a recommendation model without needing to manage the underlying ML lifecycle. The solution should provide personalized recommendations based on collaborative filtering. Which AWS service should they use?
94A financial institution is deploying a fraud detection model using Amazon SageMaker. The model must be able to handle sudden spikes in inference requests during promotional events while keeping costs low. The team wants to use a serverless architecture to avoid provisioning idle capacity and to scale automatically from zero. However, the inference latency requirement is under 5 seconds for each request. Which SageMaker inference option should they choose?
95A data science team needs to choose a machine learning approach for a project that requires predicting customer churn based on historical data. The team has a labeled dataset with 10,000 records and needs to interpret the model's decisions to provide business insights. Which machine learning technique should the team prioritize?
96Refer to the exhibit. A data scientist trained an XGBoost model using Amazon SageMaker. Which TWO actions should the data scientist take to improve the model's performance based on the exhibited training job metrics and resource configuration?
97A company is building a chatbot to answer customer queries using Amazon Lex. The development team has created a large dataset of customer interactions and intends to use Amazon SageMaker to train a custom machine learning model for natural language understanding (NLU). The team wants to integrate the trained model with Amazon Lex to handle intents and slots. The team has limited experience with SageMaker and wants to minimize operational overhead. Which solution should the team use?
The Fundamentals of AI and ML domain covers the key concepts tested in this area of the AIF-C01 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all AIF-C01 domains — no account required.
The Courseiva AIF-C01 question bank contains 97 questions in the Fundamentals of AI and ML domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Fundamentals of AI and ML domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included