hardmultiple choiceObjective-mapped

A data scientist is training a logistic regression model to predict customer churn using a small dataset with 500 records and 200 features. The model achieves 97% accuracy on the training set but only 65% on a held-out test set, indicating severe overfitting. The data scientist wants to reduce overfitting by automatically eliminating irrelevant features. Which technique should the data scientist apply?

Question 1hardmultiple choice

A data scientist is training a logistic regression model to predict customer churn using a small dataset with 500 records and 200 features. The model achieves 97% accuracy on the training set but only 65% on a held-out test set, indicating severe overfitting. The data scientist wants to reduce overfitting by automatically eliminating irrelevant features. Which technique should the data scientist apply?

Answer choices

Why each option matters

Good practice is not just finding the correct option. The wrong answers often show the exact trap the exam wants you to fall into.

Best answer

Apply L1 regularization (Lasso) to the model

L1 regularization adds a penalty term that can zero out coefficients of less important features, performing feature selection and reducing model complexity to combat overfitting.

Distractor review

Apply L2 regularization (Ridge) to the model

L2 regularization penalizes large coefficients but does not eliminate features entirely; it shrinks coefficients but keeps them non-zero, which may not sufficiently reduce complexity when many irrelevant features are present.

Distractor review

Use k-fold cross-validation to select the best model

Cross-validation is a technique for evaluating model performance and tuning hyperparameters, but it does not directly reduce overfitting or eliminate features. It would need to be combined with a regularized model.

Distractor review

Increase the number of training samples by data augmentation

Data augmentation is common for image or text data, but for structured churn data with 200 features, creating realistic synthetic samples is complex and not a standard direct approach to reduce overfitting.

Common exam trap

Common exam trap: NAT rules depend on direction and matching traffic

NAT is not only about the public address. The inside/outside interface roles and the ACL or rule that matches traffic are just as important.

Technical deep dive

How to think about this question

NAT questions usually test address translation, overload/PAT behaviour, static mappings and whether the right traffic is being translated. Read the interface direction and address terms carefully.

KKey Concepts to Remember

Static NAT maps one inside address to one outside address.
PAT allows many inside hosts to share one public address using ports.
Inside local and inside global describe the private and translated addresses.
NAT ACLs identify traffic for translation, not always security filtering.

TExam Day Tips

Identify inside and outside interfaces first.
Check whether the scenario needs static NAT, dynamic NAT or PAT.
Do not confuse NAT matching ACLs with normal packet-filtering intent.

Related AI-900 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

AI-900 fundamentals practice questions

Practise AI-900 questions linked to AI-900 fundamentals.

AI-900 scenario practice questions

Practise AI-900 questions linked to AI-900 scenario.

AI-900 troubleshooting practice questions

Practise AI-900 questions linked to AI-900 troubleshooting.

Questions learners often ask

What does this AI-900 question test?

Static NAT maps one inside address to one outside address.

What is the correct answer to this question?

The correct answer is: Apply L1 regularization (Lasso) to the model — L1 regularization (Lasso) adds a penalty equal to the absolute value of the magnitude of coefficients. This penalty shrinks less important feature coefficients to zero, effectively performing automatic feature selection. This directly addresses the overfitting caused by having too many irrelevant features. L2 regularization (Ridge) also penalizes large coefficients but does not force them to zero, so it does not eliminate features. Cross-validation is a method to assess generalization and tune hyperparameters, not a technique to reduce overfitting by itself. Data augmentation increases dataset size but is not applicable to tabular churn data in this straightforward way.

What should I do if I get this AI-900 question wrong?

Then try more questions from the same exam bank and focus on understanding why the wrong options are tempting.

Discussion

Loading comments…

Why each option matters

Best answer

Distractor review

Distractor review

Distractor review

Common exam trap: NAT rules depend on direction and matching traffic

How to think about this question

KKey Concepts to Remember

TExam Day Tips

Related AI-900 practice-question pages

AI-900 fundamentals practice questions

AI-900 scenario practice questions

AI-900 troubleshooting practice questions

More questions from this exam

A developer wants to build a virtual assistant that can understand user intents such as 'Book a flight' or 'Check weather' and extract relevant entities like destination and date. The developer has a small set of labeled example utterances. Which Azure AI Language feature should the developer use?

A developer is building a customer support chatbot using Azure OpenAI. The chatbot should never reveal its system instructions or internal configuration. The developer wants to add a rule at the beginning of the conversation to prevent prompt injection attacks. Which technique should they use?

A developer is using Azure OpenAI to generate creative product descriptions. The outputs are often repetitive and lack variety. The developer wants to increase the diversity of the generated text while still keeping it coherent. Which parameter should the developer increase?

A developer is using Azure OpenAI Service to generate product descriptions. They want the output to be highly focused and deterministic, with less randomness. Which parameter should they decrease?

Questions learners often ask

What does this AI-900 question test?

What is the correct answer to this question?

What should I do if I get this AI-900 question wrong?

Discussion