A data engineer wants to train a linear regression model in BigQuery ML to predict sales. The training data includes a categorical feature with 1000+ unique values. Which method is most appropriate to handle this feature in the CREATE MODEL statement?
Trap 1: Set max_categorical_features=100 in the model options.
max_categorical_features is not a parameter in BigQuery ML linear regression.
Trap 2: Use the OPTIONS(ENCODE='ONE_HOT_ENCODING') parameter in the model…
ENCODE is not a valid model option in BigQuery ML.
Trap 3: The model automatically handles high-cardinality features without…
BigQuery ML automatically handles low-cardinality features but may not scale for 1000+ categories; manual handling is recommended.
- A
Set max_categorical_features=100 in the model options.
Why wrong: max_categorical_features is not a parameter in BigQuery ML linear regression.
- B
Use TRANSFORM clause with ML.FEATURE_CROSS or manual hashing.
TRANSFORM allows custom feature engineering including hashing for high-cardinality features.
- C
Use the OPTIONS(ENCODE='ONE_HOT_ENCODING') parameter in the model options.
Why wrong: ENCODE is not a valid model option in BigQuery ML.
- D
The model automatically handles high-cardinality features without any additional steps.
Why wrong: BigQuery ML automatically handles low-cardinality features but may not scale for 1000+ categories; manual handling is recommended.