How many Fundamentals of AI and ML questions are on the AIF-C01 exam?

The Fundamentals of AI and ML domain is one of the weighted domains on the AIF-C01 exam. The Courseiva question bank has 97 practice questions for this domain.

How can I practice Fundamentals of AI and ML questions for AIF-C01?

Click any of the 97 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Fundamentals of AI and ML domain.

Free AIF-C01 Fundamentals of AI and ML Practice Questions (2026)

Practice Fundamentals of AI and ML questions

10Q 20Q 30Q 50Q

All AIF-C01 Fundamentals of AI and ML questions (97)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A data scientist wants to quickly build a supervised learning model for binary classification on a tabular dataset with 10,000 rows and 200 features. The dataset has some missing values and requires minimal code. Which AWS service should the data scientist use?

An ML team is deploying a real-time inference endpoint for a computer vision model using Amazon SageMaker. The model requires GPU acceleration for low latency. Which instance type should the team choose to minimize cost while meeting the GPU requirement?

A company is training a deep learning model on Amazon SageMaker using a custom Docker container. The training job fails with the error 'CannotStartContainerError: API error (500): failed to create shim task'. The team verifies that the container image is compatible with the selected instance type. What is the most likely cause of this error?

A machine learning engineer is using Amazon SageMaker to train a model and wants to automatically stop the training job if the loss does not improve for 10 consecutive epochs. Which SageMaker feature should be used?

A company needs to store large amounts of unstructured training data (images, videos) in a cost-effective manner while ensuring low-latency retrieval for training jobs running on Amazon SageMaker. Which storage solution should be used?

An organization wants to detect anomalies in real-time streaming data from IoT devices. The data includes sensor readings, and the team plans to use a machine learning model. Which AWS service should be used to build and deploy the model with minimal operational overhead?

During a SageMaker training job, the data scientist observes that the loss is not decreasing after the initial few epochs. The model is a deep neural network with ReLU activations. Which hyperparameter adjustment is most likely to help?

Which TWO services can be used to preprocess data for machine learning in AWS? (Choose two.)

Which THREE statements about Amazon SageMaker Ground Truth are correct? (Choose three.)

Which TWO factors should be considered when choosing between a CPU-based instance and a GPU-based instance for training a machine learning model on Amazon SageMaker? (Choose two.)

Refer to the exhibit. A data scientist attaches the above IAM policy to a SageMaker notebook instance role. The notebook is in the same AWS account as the S3 bucket. When trying to read a file from 's3://my-bucket/training/data.csv', the data scientist gets an Access Denied error. What is the most likely cause?

Refer to the exhibit. A data scientist is training a neural network model on SageMaker. The training log shows the loss values per epoch. Which issue is most likely occurring?

A data scientist is training a binary classification model to predict customer churn. The dataset has 10,000 records with 9,500 non-churners and 500 churners. After training a logistic regression model, the model achieves 95% accuracy on the test set. However, the business team reports that the model is not useful because it predicts almost all customers as non-churners. Which metric should the data scientist use to evaluate the model's performance in this scenario?

A company is deploying a machine learning model for real-time fraud detection. The model must make predictions with latency under 10 milliseconds. The data scientist trained a gradient boosting model that achieves high accuracy but has inference latency of 50 milliseconds. The team has access to a larger instance type with more CPU cores. Which approach should the data scientist take to reduce inference latency while maintaining accuracy?

A company wants to build a system that automatically categorizes customer support tickets into predefined categories (e.g., billing, technical, account). The team has a large dataset of historical tickets with their category labels. Which type of machine learning problem is this?

A data scientist is using Amazon SageMaker to train a deep learning model for image classification. The training job is taking too long. The dataset consists of 100,000 images stored in Amazon S3. Which action can the data scientist take to reduce training time without modifying the model architecture?

Which TWO of the following are best practices for preparing training data for a machine learning model?

A financial services company uses a machine learning model to approve loan applications. The model is a gradient boosting classifier trained on historical loan data. Recently, the company noticed that the model's approval rate for applicants from a certain demographic group is significantly lower than for other groups, even though the model's overall accuracy remains high. The data science team has been asked to address this potential bias while minimizing the impact on overall model performance. The team has access to the training data and the trained model. They have limited time and budget. Which course of action should the team take first?

A retail company uses a machine learning model to forecast daily product demand. The model is a time series model that uses historical sales data. The model has been performing well, but recently the forecasts have been consistently too low, leading to stockouts. The data scientist notices that the model was trained on data up to last year, and the company has since launched a successful marketing campaign that increased sales by 20%. The data scientist needs to update the model to reflect the new sales patterns. Which approach should the data scientist take?

A company wants to use AI to automatically transcribe customer service calls into text. Which AWS service is most suitable?

Which metric is most appropriate for evaluating a classification model when false positives are costly?

A data scientist is preparing data for a machine learning model. What is the purpose of splitting the data into training, validation, and test sets?

A company wants to automatically detect anomalies in server metrics. Which algorithm is most appropriate?

A team is training a binary classification model using Amazon SageMaker. They notice that the training accuracy is 99% but the test accuracy is only 70%. Which technique should they apply first to address this?

During model training, the loss decreases rapidly for the first few epochs and then plateaus. The validation loss starts increasing after some epochs. What should the team do to improve generalization?

A company is deploying a machine learning model for real-time fraud detection. The model must have latency under 100ms. Which infrastructure choice is most appropriate?

A team trains a model using Amazon SageMaker built-in XGBoost. After training, they want to evaluate feature importance. Which SageMaker feature allows them to view this?

An ML engineer wants to store training data in a format optimized for linear data scanning and columnar access in SageMaker. Which format is most appropriate?

A data scientist is preparing data for a classification task. Which TWO techniques are commonly used for handling missing values? (Choose two.)

Which AWS services can be used to build, train, and deploy custom machine learning models? (Choose two.)

A company is training a deep learning model for image classification. Which THREE practices help reduce overfitting? (Choose three.)

A company wants to automatically detect anomalies in their AWS CloudTrail logs to identify potential security threats. Which AWS service is specifically designed for this purpose?

A startup is building a recommendation engine for their e-commerce platform. They need a fully managed service that can generate personalized product recommendations based on user behavior. Which AWS service should they use?

A data scientist is working with a dataset that contains both numerical and categorical features. Which algorithm is commonly used for regression tasks in AWS SageMaker?

A company uses Amazon SageMaker to train a model. The training job fails with 'InsufficientInstanceCapacity' error. What is the most likely cause?

A financial services company needs to ensure that the machine learning models used for loan approval are explainable and meet regulatory compliance. Which AWS feature can help explain model predictions?

A company is using Amazon Rekognition to detect objects in images. They need to detect custom objects that are specific to their domain. What should they do?

A team is training a deep learning model using Horovod distributed training on SageMaker. They observe that the loss stops decreasing after a few epochs. Which technique should they implement to reduce overfitting?

A data scientist wants to perform automatic model tuning (hyperparameter optimization) on SageMaker. They need to find the best hyperparameters for a gradient boosting model. Which strategy is BEST for this task?

An e-commerce company stores user interaction logs in Amazon S3. They want to use machine learning to segment users based on purchasing behavior. Which unsupervised learning algorithm is most appropriate?

A company wants to use AWS services to process natural language text. Which TWO AWS services provide natural language processing (NLP) capabilities? (Select TWO.)

A data scientist is evaluating different AWS services for building a machine learning pipeline. Which THREE components are part of Amazon SageMaker? (Select THREE.)

A company is using Amazon Fraud Detector to detect fraudulent transactions. Which TWO actions can be taken to improve model accuracy? (Select TWO.)

A startup with limited ML expertise wants to quickly prototype a binary classification model using a small customer dataset. They need a managed environment to run Jupyter notebooks and access pre-built algorithms. Which AWS service should they choose?

A data scientist wants to host a pre-trained model on Amazon SageMaker for real-time inference with minimal latency. Which approach should they use?

A company is using Amazon SageMaker to train a model. They want to automatically stop training if the model performance stops improving on a validation dataset. Which SageMaker feature should they enable?

A company is training a deep learning model on Amazon SageMaker using a large dataset stored in S3. Training jobs are frequently failing with 'OutOfMemoryError'. The training algorithm uses PyTorch. How should the data scientist solve this without reducing model accuracy?

A data science team is using Amazon SageMaker to train multiple models with different hyperparameters. They want to track metrics, compare runs, and reproduce the best result. Which SageMaker feature should they use?

A company wants to use Amazon SageMaker to train a model using a custom Docker container that has specific dependencies. The training code is stored in an S3 bucket. Which steps must be taken to run the training job?

A financial services company needs to deploy a real-time fraud detection model with sub-100ms inference latency. The model is a large ensemble requiring 8 GB of memory per request. The workload has bursty traffic. Which Amazon SageMaker deployment strategy best meets these requirements?

A company is using Amazon SageMaker to train a large language model with hundreds of billions of parameters. The model does not fit into the memory of a single GPU. Which approach should they use to train the model efficiently?

A healthcare company is using Amazon SageMaker to deploy a model that makes predictions on patient data. They need to ensure that the model's predictions are explainable to comply with regulations. Which approach should they take?

A data scientist wants to deploy a custom model built with TensorFlow to Amazon SageMaker for real-time inference. Which TWO steps are required? (Choose two.)

A company wants to use Amazon SageMaker Ground Truth to build a labeled dataset for a custom object detection model. Which TWO labeling strategies are available? (Choose two.)

A data engineer is using Amazon SageMaker Data Wrangler to prepare tabular data for ML. Which THREE data transformations are natively supported? (Choose three.)

Refer to the exhibit. A data scientist ran a training job on Amazon SageMaker. The job failed with the error shown. What is the most likely cause?

Refer to the exhibit. A SageMaker training job fails with an 'AccessDenied' error when trying to read files from the S3 bucket 'my-training-data'. The IAM role used by the training job has the policy shown. What is the most likely reason for the failure?

Refer to the exhibit. A SageMaker real-time endpoint is experiencing increasing latency and memory errors after running for a few hours. What is the most likely cause and recommended fix?

A data scientist is building a binary classification model for fraud detection. The dataset is highly imbalanced (99% legitimate, 1% fraud). Which metric is most appropriate to evaluate model performance?

A company wants to predict customer churn. They have historical data with features like usage minutes, support tickets, contract length. The target is binary: churn/not churn. Which ML algorithm is best suited?

A team trained a deep learning model that achieves 99% accuracy on training data but only 70% on validation data. What is the most likely issue?

A data scientist is using Amazon SageMaker to train a model. The training job is taking longer than expected. Which change would most likely reduce training time?

A team has built a regression model to predict house prices. The RMSE is 50,000 on the test set. Which action is most appropriate to improve model performance?

A company is using Amazon Rekognition to detect objects in images. They find that the service sometimes mislabels objects. What is the best way to improve accuracy for their specific use case?

A data scientist needs to preprocess categorical data with high cardinality (e.g., zip code with 50,000 unique values). Which technique is most appropriate?

A company is using Amazon Comprehend for sentiment analysis on customer reviews. They notice that the sentiment is often incorrect for negative reviews with sarcasm. What is the likely cause?

A team is evaluating a classification model. The confusion matrix shows: TP=80, FN=20, FP=10, TN=90. What is the precision?

Which TWO of the following are best practices for data preprocessing in machine learning? (Select TWO.)

Which THREE of the following are capabilities of Amazon SageMaker? (Select THREE.)

Which TWO techniques are commonly used to prevent overfitting in machine learning models? (Select TWO.)

Refer to the exhibit. A data scientist ran a training job on Amazon SageMaker and it failed. Which action should the data scientist take FIRST to resolve the issue?

Refer to the exhibit. A developer wants to ensure the notebook instance can access the internet to download packages. Which property configuration ensures this?

Refer to the exhibit. A user tries to create a training job that reads data from a bucket named 'my-bucket'. The job fails with an access denied error. What is the most likely cause?

A startup needs to predict customer churn based on historical data containing labels (churned or not). Which type of machine learning should they use?

A data scientist is training a model using Amazon SageMaker and notices the training loss is decreasing but validation loss starts increasing after a few epochs. Which technique should they apply to address this?

A company wants to deploy a real-time inference endpoint for a custom model on SageMaker. The model has high latency (100ms) and they need to handle variable traffic with spikes. Which deployment strategy is most cost-effective?

A developer needs to preprocess a dataset consisting of customer reviews for sentiment analysis. Which text preprocessing technique is most likely to improve model accuracy?

A company wants to build a model to forecast monthly sales. The data is a time series with trend and seasonality. Which SageMaker algorithm is most appropriate?

An organization wants to use Amazon Rekognition to analyze images of people for a security application. They must comply with GDPR. What is the best practice?

In a binary classification problem, the model predicts majority class for all inputs. What is this issue called?

A data scientist is using SageMaker to train a model on a dataset with many features. They suspect some features are redundant. Which feature engineering technique would help?

A SageMaker endpoint is configured with automatic scaling. The model's inference time is 50ms, and traffic increases gradually. What scaling metric should be used to add instances before latency increases?

Which TWO of the following are types of feature scaling?

Which THREE are SageMaker built-in algorithms suitable for regression tasks?

Which TWO are best practices for model monitoring in production on AWS?

A company is training a large language model using Amazon SageMaker. The training job fails with the error 'OutOfMemory'. They are using a single ml.p3.2xlarge instance. The dataset is 50GB and the model is 2GB. The training script uses standard data loading. Which action should they take to resolve the issue?

A deployed model on an Amazon SageMaker endpoint is experiencing high inference latency (average 500ms) during peak hours. The model is a deep neural network with 10 million parameters. The endpoint uses a single ml.c5.xlarge instance. The company wants to reduce latency to under 200ms without retraining or changing the model architecture. Which action should they take?

A social media company needs to automatically detect and flag toxic comments in multiple languages. They have a large stream of user comments and require real-time moderation. Which AWS service is best suited for this task?

Which TWO of the following are examples of supervised learning tasks that can be performed using Amazon SageMaker built-in algorithms?

A data scientist at a retail company is tasked with building a model to predict customer churn. The dataset contains 100,000 records with features such as age, purchase history, customer support interactions, and a binary label indicating whether the customer churned in the past. The team needs a model that can be deployed for real-time inference with low latency. They have limited time and want to use a built-in algorithm from Amazon SageMaker that is optimized for classification tasks. Which approach should they take?

A manufacturing company is deploying IoT sensors to monitor equipment performance. The sensors generate continuous unlabeled time-series data with thousands of dimensions. The goal is to detect anomalies indicating potential failures in real time. The data science team has experience with unsupervised learning and wants to use a SageMaker built-in algorithm that can handle high-dimensional data and identify outliers. They also need to reduce the number of dimensions to improve training speed without losing important information. Which approach should they take?

A marketing agency wants to analyze customer feedback from social media posts to gauge sentiment. They have no labeled data and limited ML expertise. The team needs a managed service that provides pre-trained models for sentiment analysis without requiring them to train or manage infrastructure. They also need to process text in multiple languages. Which AWS service should they use?

A startup wants to build a product recommendation engine for their e-commerce platform. They have user purchase history and item metadata. They want a fully managed solution that can automatically train and deploy a recommendation model without needing to manage the underlying ML lifecycle. The solution should provide personalized recommendations based on collaborative filtering. Which AWS service should they use?

A financial institution is deploying a fraud detection model using Amazon SageMaker. The model must be able to handle sudden spikes in inference requests during promotional events while keeping costs low. The team wants to use a serverless architecture to avoid provisioning idle capacity and to scale automatically from zero. However, the inference latency requirement is under 5 seconds for each request. Which SageMaker inference option should they choose?

A data science team needs to choose a machine learning approach for a project that requires predicting customer churn based on historical data. The team has a labeled dataset with 10,000 records and needs to interpret the model's decisions to provide business insights. Which machine learning technique should the team prioritize?

Refer to the exhibit. A data scientist trained an XGBoost model using Amazon SageMaker. Which TWO actions should the data scientist take to improve the model's performance based on the exhibited training job metrics and resource configuration?

A company is building a chatbot to answer customer queries using Amazon Lex. The development team has created a large dataset of customer interactions and intends to use Amazon SageMaker to train a custom machine learning model for natural language understanding (NLU). The team wants to integrate the trained model with Amazon Lex to handle intents and slots. The team has limited experience with SageMaker and wants to minimize operational overhead. Which solution should the team use?

Practice all 97 Fundamentals of AI and ML questions

Other AIF-C01 exam domains

Applications of Foundation Models Fundamentals of Generative AI Guidelines for Responsible AI Security, Compliance and Governance for AI Solutions

Frequently asked questions

What does the Fundamentals of AI and ML domain cover on the AIF-C01 exam?

The Fundamentals of AI and ML domain covers the key concepts tested in this area of the AIF-C01 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all AIF-C01 domains — no account required.

How many Fundamentals of AI and ML questions are in the AIF-C01 question bank?

The Courseiva AIF-C01 question bank contains 97 questions in the Fundamentals of AI and ML domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Fundamentals of AI and ML for AIF-C01?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Fundamentals of AI and ML questions for AIF-C01?

Yes — the session launcher on this page draws questions exclusively from the Fundamentals of AI and ML domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your AIF-C01 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included