easymultiple choiceObjective-mapped

A company builds a machine learning model to predict whether a customer will purchase a product. They use a training dataset with 50% purchasers and 50% non-purchasers. The model achieves 90% accuracy on the test set. However, when deployed, the model performs poorly because the actual customer base has only 5% purchasers. What is the most likely cause of this poor performance?

Question 1easymultiple choice
Full question →

A company builds a machine learning model to predict whether a customer will purchase a product. They use a training dataset with 50% purchasers and 50% non-purchasers. The model achieves 90% accuracy on the test set. However, when deployed, the model performs poorly because the actual customer base has only 5% purchasers. What is the most likely cause of this poor performance?

Answer choices

Why each option matters

Good practice is not just finding the correct option. The wrong answers often show the exact trap the exam wants you to fall into.

A

Distractor review

The model is overfitted to the training data.

Overfitting means the model performs well on training but poorly on unseen data from the same distribution. In this scenario, the test set likely mirrored the training distribution (balanced), and accuracy was high, so overfitting is not the primary issue.

B

Distractor review

The model is underfitted and fails to capture key patterns.

Underfitting would cause poor performance on both training and test sets. The 90% test accuracy indicates the model captured patterns well on the balanced distribution.

C

Distractor review

Data leakage caused inflated accuracy during testing.

Data leakage would allow the model to see information it shouldn't, leading to unrealistically high accuracy. There is no evidence of leakage; the problem is a change in the underlying data distribution.

D

Best answer

The training and deployment data have different distributions.

This is correct. The training set had 50% purchasers, but the production environment only has 5%. The model's assumptions no longer hold, leading to poor real-world performance even though test accuracy was high.

Common exam trap

Common exam trap: NAT rules depend on direction and matching traffic

NAT is not only about the public address. The inside/outside interface roles and the ACL or rule that matches traffic are just as important.

Technical deep dive

How to think about this question

NAT questions usually test address translation, overload/PAT behaviour, static mappings and whether the right traffic is being translated. Read the interface direction and address terms carefully.

KKey Concepts to Remember

  • Static NAT maps one inside address to one outside address.
  • PAT allows many inside hosts to share one public address using ports.
  • Inside local and inside global describe the private and translated addresses.
  • NAT ACLs identify traffic for translation, not always security filtering.

TExam Day Tips

  • Identify inside and outside interfaces first.
  • Check whether the scenario needs static NAT, dynamic NAT or PAT.
  • Do not confuse NAT matching ACLs with normal packet-filtering intent.

Related practice questions

Related AI-900 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

More questions from this exam

Keep practising from the same exam bank, or move into a focused topic page if this question exposed a weak area.

Question 1

A developer wants to build a virtual assistant that can understand user intents such as 'Book a flight' or 'Check weather' and extract relevant entities like destination and date. The developer has a small set of labeled example utterances. Which Azure AI Language feature should the developer use?

Question 2

A developer is building a customer support chatbot using Azure OpenAI. The chatbot should never reveal its system instructions or internal configuration. The developer wants to add a rule at the beginning of the conversation to prevent prompt injection attacks. Which technique should they use?

Question 3

A developer is using Azure OpenAI Service to generate product descriptions from technical specifications. The generated descriptions sometimes include plausible-sounding but incorrect details (hallucinations). The developer wants to ensure the model's responses are strictly based on the provided product data and does not add any external or invented information. Which approach should the developer use?

Question 4

A developer is using Azure OpenAI with GPT-4 to build a chatbot that answers legal questions based on a company's internal policy documents. The developer wants the model's responses to be maximally deterministic and factual, avoiding any creative or speculative language. Which parameter should the developer set to the lowest possible value in the API call?

Question 5

A developer is using Azure OpenAI to generate creative product descriptions. The outputs are often repetitive and lack variety. The developer wants to increase the diversity of the generated text while still keeping it coherent. Which parameter should the developer increase?

Question 6

A developer is using Azure OpenAI Service to generate product descriptions. They want the output to be highly focused and deterministic, with less randomness. Which parameter should they decrease?

FAQ

Questions learners often ask

What does this AI-900 question test?

Static NAT maps one inside address to one outside address.

What is the correct answer to this question?

The correct answer is: The training and deployment data have different distributions. — The model was trained on a balanced dataset, but the real-world data is heavily imbalanced. This is a classic case of data mismatch between training and deployment distributions (covariate shift or concept drift). The model learned to assume a roughly equal chance of purchase, which does not hold in production. Overfitting (A) would show high training accuracy and poor test accuracy on similar distribution, but here test accuracy was high. Underfitting (B) would show poor performance on both sets. Data leakage (C) would inflate performance artificially, but the issue here is distribution difference.

What should I do if I get this AI-900 question wrong?

Then try more questions from the same exam bank and focus on understanding why the wrong options are tempting.

Discussion

Loading comments…

Sign in to join the discussion.