PMLE Serving and scaling models • Complete Question Bank
Complete PMLE Serving and scaling models question bank — all 0 questions with answers and detailed explanations.
Refer to the exhibit.
```
Log entry:
{
"severity": "ERROR",
"message": "Model server process exited with code 137 (SIGKILL)",
"container": {
"memory_usage_mb": 4096,
"memory_limit_mb": 4096
},
"@type": "type.googleapis.com/google.cloud.ml.v1.PredictionLog"
}
```Refer to the exhibit.
```
$ curl -X POST -H "Content-Type: application/json" -d '{"instances": [[1.0, 2.0, 3.0]]}' https://us-central1-aiplatform.googleapis.com/v1/projects/my-project/locations/us-central1/endpoints/123456:predict
{
"error": {
"code": 400,
"message": "Prediction failed: exception during prediction: RuntimeError: Model input shape mismatch. Expected shape (None, 2) but received shape (1, 3)."
}
}Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag a concept onto its matching description — or click a concept then click the description.
Convert categorical variable into binary columns
Combine two or more features to capture interactions
Normalize numeric features to a standard range
Group continuous values into discrete intervals
Weight term frequency by inverse document frequency
Drag a concept onto its matching description — or click a concept then click the description.
Stochastic gradient descent with constant learning rate
Adaptive moment estimation with per-parameter learning rates
Root mean square propagation, adapts learning rate per parameter
Adaptive gradient algorithm, reduces learning rate for frequent features
Accelerates SGD by adding a fraction of previous update
$ gcloud ai endpoints describe my-endpoint --region=us-central1 displayName: my-endpoint name: projects/123456/locations/us-central1/endpoints/789012 deployedModels: - id: '1' model: projects/123456/locations/us-central1/models/456789 displayName: model_v1 createTime: '2024-01-15T10:00:00Z' modelDisplayName: test_model trafficSplit: 0.8 - id: '2' model: projects/123456/locations/us-central1/models/987654 displayName: model_v2 createTime: '2024-01-20T10:00:00Z' modelDisplayName: test_model_v2 trafficSplit: 0.2
deployments: - model: projects/my-project/locations/us-central1/models/123 displayName: model_v1 trafficPercentage: 100 minReplicaCount: 2 maxReplicaCount: 10 machineType: n1-standard-4 acceleratorType: NVIDIA_TESLA_T4 acceleratorCount: 1 strategy: manual
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: model-serving
spec:
template:
spec:
containers:
- image: gcr.io/my-project/model:v2
resources:
limits:
cpu: '2'
memory: 8Gi
startupProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
containerConcurrency: 80Refer to the exhibit. gcloud ai endpoints deploy-model $ENDPOINT_ID \ --model=$MODEL_ID \ --display-name=my-model \ --machine-type=n1-standard-4 \ --min-replica-count=2 \ --max-replica-count=10 \ --traffic-split=0-100
Refer to the exhibit.
{
"name": "projects/my-project/locations/us-central1/endpoints/1234",
"displayName": "my-endpoint",
"dedicatedEndpointEnabled": false,
"deployedModels": [
{
"id": "model-a-1",
"displayName": "model-a",
"model": "projects/my-project/locations/us-central1/models/456",
"dedicatedResources": {
"minReplicaCount": 1,
"maxReplicaCount": 5,
"machineSpec": {
"machineType": "n1-standard-4",
"acceleratorType": "NVIDIA_TESLA_T4",
"acceleratorCount": 1
}
}
},
{
"id": "model-b-1",
"displayName": "model-b",
"model": "projects/my-project/locations/us-central1/models/789",
"dedicatedResources": {
"minReplicaCount": 1,
"maxReplicaCount": 5,
"machineSpec": {
"machineType": "n1-standard-8",
"acceleratorType": "NVIDIA_TESLA_T4",
"acceleratorCount": 2
}
}
}
],
"trafficSplit": {
"model-a-1": 50,
"model-b-1": 50
}
}Refer to the exhibit.
gcloud ai endpoints describe projects/my-project/locations/us-central1/endpoints/456
...
deployedModels:
- id: 'bert-model-1'
model: projects/my-project/locations/us-central1/models/bert
displayName: bert
automaticResources:
minReplicaCount: 1
maxReplicaCount: 10
machineType: n1-standard-4
accelerator:
count: 0
enableAccessLogging: true
...
disableContainerLogging: true
...