Question 1easymultiple choice
Read the full AI Models and Data Engineering explanation →AI0-001 AI Models and Data Engineering • Complete Question Bank
Complete AI0-001 AI Models and Data Engineering question bank — all 0 questions with answers and detailed explanations.
Refer to the exhibit. ``` Epoch 1/10 - loss: 0.6932 - accuracy: 0.5234 - val_loss: 0.6918 - val_accuracy: 0.5312 Epoch 2/10 - loss: 0.4231 - accuracy: 0.8047 - val_loss: 0.5234 - val_accuracy: 0.7422 Epoch 3/10 - loss: 0.3125 - accuracy: 0.8828 - val_loss: 0.6015 - val_accuracy: 0.7344 Epoch 4/10 - loss: 0.2146 - accuracy: 0.9219 - val_loss: 0.7234 - val_accuracy: 0.7188 Epoch 5/10 - loss: 0.1478 - accuracy: 0.9531 - val_loss: 0.8342 - val_accuracy: 0.7031 ```
Refer to the exhibit.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::ml-training-data/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": "10.0.0.0/16"
}
}
}
]
}
```The following output is from an MLflow run: Run ID: abc123 experiment_id: 1 status: FINISHED start_time: 2023-10-01 10:00:00 end_time: 2023-10-01 10:05:00 params: learning_rate: 0.01 max_depth: 10 n_estimators: 100 metrics: train_accuracy: 0.999 val_accuracy: 0.82 val_f1: 0.79 tags: model_type: RandomForest dataset: churn_v2
The following is a JSON schema snippet from a data pipeline:
{
"type": "object",
"properties": {
"user_id": { "type": "integer" },
"timestamp": { "type": "string", "format": "date-time" },
"event_type": { "type": "string" },
"value": { "type": "number" }
},
"required": ["user_id", "event_type", "value"]
}The following is a confusion matrix for a binary classifier:
Predicted: Positive Predicted: Negative
Actual Positive: 80 20
Actual Negative: 30 70model:
type: Sequential
layers:
- type: Dense
units: 128
activation: relu
- type: Dense
units: 64
activation: relu
- type: Dense
units: 1
activation: sigmoid
optimizer:
type: Adam
learning_rate: 0.01Data Validation Report: Table: customers - column "age": null values: 0, unique values: 87, min:18, max:99 - column "income": null values: 12, unique values: 1500, min:0, max:500000 - column "region": null values: 0, unique values: 4, values: ["North", "South", "East", "West"] - column "gender": null values: 0, unique values: 2, values: ["M", "F"]
Data Pipeline Architecture: - Source: IoT devices -> Kafka Topic "sensor_data" - Stream Processing: Apache Flink job that ingests from Kafka, cleanses data, and outputs to another Kafka Topic "cleaned_sensor_data" - Batch Processing: Apache Spark job that reads from "cleaned_sensor_data" via Kafka batch integration, performs feature engineering, and writes to HDFS as Parquet - Model Training: Python script reads from HDFS, trains an LSTM model, and saves to model registry - Inference: REST API loads model from registry and serves predictions
{
"data_pipeline": {
"input": "raw_sales.csv",
"steps": [
{"type": "drop_columns", "columns": ["customer_id", "transaction_id"]},
{"type": "impute_missing", "strategy": "mean", "columns": ["age", "income"]},
{"type": "encode_categorical", "method": "onehot", "columns": ["product_category"]},
{"type": "normalize", "method": "minmax", "columns": ["age", "income"]}
],
"output": "processed_sales.parquet"
}
}