Question 1easymultiple choice
Read the full ML Model Development explanation →MLA-C01 ML Model Development • Complete Question Bank
Complete MLA-C01 ML Model Development question bank — all 0 questions with answers and detailed explanations.
{
"TrainingJobName": "my-xgboost-job",
"HyperParameters": {
"num_round": "100",
"max_depth": "6",
"eta": "0.3",
"subsample": "0.8",
"colsample_bytree": "0.8",
"objective": "binary:logistic",
"eval_metric": "auc"
},
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3Uri": "s3://my-bucket/train.csv",
"S3DataType": "S3Prefix"
}
}
},
{
"ChannelName": "validation",
"DataSource": {
"S3DataSource": {
"S3Uri": "s3://my-bucket/validation.csv",
"S3DataType": "S3Prefix"
}
}
}
],
"AlgorithmSpecification": {
"TrainingImage": "811284229777.dkr.ecr.us-west-2.amazonaws.com/xgboost:1.5-1",
"TrainingInputMode": "File"
},
"RoleArn": "arn:aws:iam::123456789012:role/SageMakerRole",
"OutputDataConfig": {
"S3OutputPath": "s3://my-bucket/output"
},
"ResourceConfig": {
"InstanceType": "ml.m5.xlarge",
"InstanceCount": 1,
"VolumeSizeInGB": 30
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 86400
}
}Model Artifacts:
ModelArtifacts:
S3ModelArtifacts: s3://my-bucket/output/model.tar.gz
ModelMetrics:
Metrics:
training:accuracy: 0.95
validation:accuracy: 0.92
FinalHyperParameters:
learning_rate: 0.01
batch_size: 32
epochs: 10[2024-01-15 10:30:45] Training job 'my-training-job' started.
[2024-01-15 10:31:10] Using algorithm 'built-in' with hyperparameters: {'epochs': 10, 'batch-size': 32, 'learning-rate': 0.001}
[2024-01-15 10:31:15] File system creation failed: No usable scratch space. Error: Input/output error.
[2024-01-15 10:31:15] Retrying with local SSD...
[2024-01-15 10:31:20] Training completed with status 'Failed'.{
"EndpointConfigName": "my-config",
"ProductionVariants": [
{
"VariantName": "variant1",
"ModelName": "my-model-v1",
"InitialInstanceCount": 1,
"InstanceType": "ml.c5.large",
"InitialVariantWeight": 1.0
}
]
}#!/bin/bash set -e cd /home/ec2-user/SageMaker git clone https://github.com/org/repo.git pip install -r requirements.txt
{
"TrainingJobName": "job-123",
"TrainingJobStatus": "Failed",
"FailureReason": "ClientError: Review the error message. Training failed due to insufficient instance memory.",
"AlgorithmSpecification": {
"TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.0-1",
"TrainingInputMode": "File"
},
"ResourceConfig": {
"InstanceType": "ml.m5.large",
"InstanceCount": 1,
"VolumeSizeInGB": 30
}
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sagemaker:CreateTrainingJob",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}[1] #011train-auc:0.890 [2] #011train-auc:0.895 [3] #011train-auc:0.892 [4] #011validation-auc:0.880
Refer to the exhibit. You are investigating a failed SageMaker training job. The following message appears in the training job's CloudWatch Logs: 2024-08-15 14:30:45,212 sagemaker-training-toolkit ERROR ClientError: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit for 'ml.p3.16xlarge for training job usage' is 0.
Refer to the exhibit. You are configuring SageMaker Debugger for a training job. The following is part of the debugger configuration:
{
"DebugHookConfig": {
"CollectionConfigurations": [
{
"CollectionName": "gradients",
"Parameters": {
"save_interval": "500"
}
}
]
},
"DebugRules": [
{
"RuleConfigurationName": "LossNotDecreasing",
"RuleParameters": {
"rule_to_use": "LossNotDecreasing",
"save_interval": "500",
"patience": "10",
"threshold": "0.001"
}
}
]
}Refer to the exhibit. A data scientist creates a SageMaker training job with the following configuration:
{
"AlgorithmSpecification": {
"TrainingImage": "382416733822.dkr.ecr.us-west-2.amazonaws.com/xgboost:1",
"TrainingInputMode": "File"
},
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://my-bucket/train/",
"S3DataDistributionType": "FullyReplicated"
}
}
}
],
"HyperParameters": {
"objective": "reg:squarederror",
"num_round": "50",
"max_depth": "10"
},
"ResourceConfig": {
"InstanceType": "ml.m5.large",
"InstanceCount": 1,
"VolumeSizeInGB": 10
}
}{
"TrainingJobName": "my-training-job",
"TrainingJobStatus": "Failed",
"FailureReason": "ClientError: Cannot evaluate expression: loss",
"AlgorithmSpecification": {
"TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/custom-latest",
"TrainingInputMode": "File"
},
"ResourceConfig": {
"InstanceType": "ml.m5.large",
"InstanceCount": 1,
"VolumeSizeInGB": 30
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 86400
},
"OutputDataConfig": {
"S3OutputPath": "s3://my-bucket/output"
}
}{
"HyperParameterTuningJobConfig": {
"Strategy": "Bayesian",
"HyperParameterTuningJobObjective": {
"Type": "Maximize",
"MetricName": "validation:accuracy"
},
"ResourceLimits": {
"MaxNumberOfTrainingJobs": 20,
"MaxParallelTrainingJobs": 5
},
"TrainingJobDefinition": {
"StaticHyperParameters": {
"epochs": "50"
},
"AlgorithmSpecification": {
"TrainingImage": "some-image",
"TrainingInputMode": "File"
},
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://bucket/train.csv" } }
}
],
"OutputDataConfig": { "S3OutputPath": "s3://bucket/output" },
"ResourceConfig": { "InstanceType": "ml.m5.large", "InstanceCount": 1 },
"StoppingCondition": { "MaxRuntimeInSeconds": 3600 }
}
}
}{
"DebugHookConfig": {
"S3OutputPath": "s3://my-bucket/debug/",
"CollectionConfigurations": [
{"CollectionName": "losses"},
{"CollectionName": "gradients"}
]
},
"DebugRuleConfigurations": [
{
"RuleConfigurationName": "Overfitting",
"RuleEvaluatorImage": "...",
"InstanceType": "ml.t3.medium",
"VolumeSizeInGB": 5
}
]
}{
"TrainingJobStatus": "Failed",
"FailureReason": "AlgorithmError: Data does not conform to the expected format. Please check that the input CSV has headers matching the training schema.",
"TrainingJobName": "my-model-training-20240301"
}{
"TrainingJobName": "fraud-detection-model-20241015",
"TrainingJobStatus": "Failed",
"FailureReason": "AlgorithmError: Encountered an unexpected error during training: ValueError: Expected 2D array, got 1D array instead. Reshape your data using array.reshape(-1, 1) if your data has a single feature.",
"AlgorithmSpecification": {
"TrainingImage": "382416733822.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.0-1-cpu-py3",
"TrainingInputMode": "File"
},
"ResourceConfig": {
"InstanceType": "ml.m5.large",
"InstanceCount": 1
},
"InputDataConfig": [
{
"ChannelName": "training",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://my-bucket/train/data.csv",
"S3DataDistributionType": "FullyReplicated"
}
},
"ContentType": "text/csv",
"CompressionType": "None"
}
]
}Refer to the exhibit. ``` Training Job Name: my-training-job Status: Failed Failure Reason: ClientError: Data download failed. Unable to locate credentials. Please configure your SageMaker Execution Role with the necessary permissions. ``` This is the output from `aws sagemaker describe-training-job --training-job-name my-training-job`.