Knowledge + Practice

CCNA Describe features of generative AI workloads on Azure Questions

75 of 206 questions · Page 1/3 · Describe features of generative AI workloads on Azure · Answers revealed

Practice these questions Domain overview All questions

1

MCQmedium

A developer uses Azure OpenAI to generate product descriptions. The outputs often repeat the same phrases multiple times within a single description. Which parameter should the developer increase to reduce this repetition?

A.Temperature

B.Frequency penalty

C.Presence penalty

D.Max tokens

AnswerB

Correct. Increasing the frequency penalty discourages the model from repeating the same tokens, reducing repetition.

Why this answer

The frequency penalty parameter reduces repetition by penalizing tokens that have already appeared in the generated text. Increasing this value discourages the model from reusing the same phrases, making the output more diverse and less repetitive.

Exam trap

The trap here is that candidates confuse frequency penalty with presence penalty, thinking both reduce repetition equally, but frequency penalty specifically targets how often a token appears, while presence penalty only cares if it has appeared at all.

How to eliminate wrong answers

Option A is wrong because temperature controls randomness in token selection, not repetition; higher temperature increases creativity but does not prevent phrase repetition. Option C is wrong because presence penalty penalizes tokens based on whether they have appeared at all, not how often, so it reduces topic repetition but not multiple occurrences of the same phrase. Option D is wrong because max tokens limits the total length of the output, not the repetition of phrases within it.

Practice this question →

2

MCQmedium

A social media company uses Azure OpenAI Service to automatically generate captions for user-uploaded images. The company has a strict content policy that prohibits any generated captions containing profanity, hate speech, or self-harm references. Which feature of the Azure OpenAI Service should the company configure to automatically block such harmful content?

A.Temperature parameter

B.Top-p parameter

C.Content filtering

D.Max-tokens parameter

AnswerC

Content filtering detects and blocks harmful content categories like hate, violence, and self-harm, making it the correct feature for blocking prohibited content.

Why this answer

Content filtering is the correct feature because it is specifically designed to detect and block harmful content such as profanity, hate speech, and self-harm references in both input prompts and generated outputs. Azure OpenAI Service's content filtering system uses multi-class classification models to enforce responsible AI policies automatically, without requiring custom training or manual moderation.

Exam trap

The trap here is that candidates confuse parameters that control output generation (temperature, top-p, max-tokens) with safety mechanisms, assuming any configurable setting can be used to block harmful content, when in fact content filtering is a separate, dedicated feature.

How to eliminate wrong answers

Option A is wrong because the Temperature parameter controls the randomness of token selection in the model's output, not content safety. Option B is wrong because the Top-p parameter (nucleus sampling) limits the cumulative probability of token choices to influence output diversity, not filter harmful content. Option D is wrong because the Max-tokens parameter sets a hard limit on the length of the generated response, but has no ability to detect or block prohibited content.

Practice this question →

3

MCQmedium

What is a system prompt in an Azure OpenAI deployment?

A.A technical error message returned when the model fails

B.An instruction that defines the model's behavior, persona, and constraints for a session

C.A user's first message to start a conversation

D.A command to restart the AI model instance

AnswerB

System prompts configure how the model behaves — setting its role, response style, and guardrails for all user interactions.

Why this answer

Option B is correct because a system prompt in Azure OpenAI is a foundational instruction set that defines the model's behavior, persona, and constraints for a session. It acts as a persistent directive that guides the model's responses throughout the conversation, ensuring alignment with specific use cases like tone, safety, or domain focus.

Exam trap

The trap here is that candidates often confuse the system prompt with the user's first message or a technical error, because the term 'prompt' is broadly used in AI, but Azure OpenAI specifically distinguishes system prompts as developer-set instructions, not user inputs or error outputs.

How to eliminate wrong answers

Option A is wrong because a system prompt is not an error message; error messages in Azure OpenAI are returned as HTTP status codes (e.g., 400 for bad request) or specific error objects, not as system prompts. Option C is wrong because the user's first message is a 'user prompt' or 'user input,' not a system prompt; the system prompt is set by the developer before any user interaction. Option D is wrong because there is no command to restart an AI model instance in Azure OpenAI; model instances are stateless and managed via deployment endpoints, and a system prompt does not trigger a restart.

Practice this question →

4

MCQhard

A developer uses Azure OpenAI Service to generate product name suggestions. They want to ensure the model never outputs a specific word, such as 'Corporation', because it is too formal for their brand. Which parameter should the developer configure to reduce the probability of that token being generated?

A.Temperature

B.Logit Bias

C.Top P (Nucleus Sampling)

D.Frequency Penalty

AnswerB

Logit Bias is a parameter that directly modifies the logit (pre-softmax score) of specific tokens, allowing the developer to increase or decrease the chance of a particular word like 'Corporation' being generated.

Why this answer

Logit Bias is the correct parameter because it directly modifies the logits (raw prediction scores) for specific tokens before the softmax function, allowing the developer to reduce the probability of generating a particular token like 'Corporation'. By setting a negative bias value for that token's ID, the model is less likely to output it, even if it would otherwise be a high-probability choice. This is the only parameter that provides token-level control over output content.

Exam trap

The trap here is that candidates often confuse Logit Bias with Temperature or Top P, thinking that adjusting overall randomness or sampling scope can prevent a specific word, but only Logit Bias provides token-level control over generation probabilities.

How to eliminate wrong answers

Option A is wrong because Temperature controls the randomness of the entire output distribution by scaling logits uniformly, not the probability of a specific token. Option C is wrong because Top P (Nucleus Sampling) selects from a cumulative probability mass of tokens, but it does not allow targeting or reducing the chance of a single token like 'Corporation'. Option D is wrong because Frequency Penalty reduces the likelihood of tokens that have already appeared in the generated text, not the likelihood of a specific token regardless of context.

Practice this question →

5

MCQmedium

What is 'Azure AI Search' (formerly Cognitive Search) and how does it support generative AI?

A.A web crawling service that indexes publicly available web content for Azure customers

B.A search service that retrieves relevant document chunks for RAG — grounding LLM responses in source material

C.A service that searches Azure resource configurations for compliance violations

D.A full-text search plugin that adds search to Azure SQL databases

AnswerB

Azure AI Search is the retrieval engine in RAG — indexing vectors and retrieving relevant chunks to ground LLM answers in real documents.

Why this answer

Option B is correct because Azure AI Search is a cloud search service that indexes and retrieves relevant document chunks, which can be used in a Retrieval-Augmented Generation (RAG) pattern. By providing grounded, source-specific context to a large language model (LLM), it helps ensure the generated responses are based on factual, retrieved data rather than solely on the model's training data.

Exam trap

The trap here is that candidates confuse Azure AI Search with a general-purpose web crawler or a simple SQL full-text search plugin, overlooking its key role as a dedicated retrieval engine for RAG in generative AI workloads.

How to eliminate wrong answers

Option A is wrong because Azure AI Search is not a web crawling service; it indexes your own data (e.g., from Azure Blob Storage, Cosmos DB) using built-in indexers, not publicly available web content. Option C is wrong because Azure AI Search is not used for compliance or resource configuration scanning; that is the role of Azure Policy or Azure Security Center. Option D is wrong because Azure AI Search is a standalone search service with its own indexing and query capabilities, not a plugin that simply adds full-text search to Azure SQL databases.

Practice this question →

6

MCQmedium

A marketing team uses Azure OpenAI Service to generate product descriptions. They have a base description and want the model to produce multiple variations with different tones, such as formal, playful, and technical, while still being factually accurate. Which parameter should they adjust to control the randomness and diversity of the output?

A.temperature

B.max_tokens

C.top_p

D.frequency_penalty

AnswerA

Temperature directly controls the randomness of the model's output. Lower values produce more predictable, conservative text, and higher values increase creativity and variation.

Why this answer

Temperature controls the randomness of the model's output by scaling the logits before applying the softmax function. A higher temperature (e.g., 0.8) increases diversity and creativity, while a lower temperature (e.g., 0.2) makes the output more deterministic and focused. For generating product descriptions with different tones while maintaining factual accuracy, adjusting temperature is the correct approach.

Exam trap

Microsoft often tests the distinction between temperature and top_p, where candidates mistakenly choose top_p because both affect randomness, but temperature is the primary parameter for controlling overall diversity and creativity in the output.

How to eliminate wrong answers

Option B (max_tokens) is wrong because it limits the length of the generated text, not the randomness or diversity of the output. Option C (top_p) is wrong because it controls nucleus sampling, which selects from the smallest set of tokens whose cumulative probability exceeds a threshold; while it also affects diversity, it is not the primary parameter for controlling randomness—temperature is the standard choice. Option D (frequency_penalty) is wrong because it reduces the likelihood of repeating the same tokens or phrases, which addresses repetition rather than overall randomness or tonal variation.

Practice this question →

7

MCQmedium

What is a copilot in the context of Microsoft AI products?

A.A hardware accelerator for AI model training

B.An AI assistant integrated into products that helps users complete tasks using natural language

C.A type of database for storing conversation history

D.A software testing tool for AI models

AnswerB

Copilots embed LLM capabilities into products, letting users accomplish tasks by describing what they want in natural language.

Why this answer

Option B is correct because a copilot in Microsoft AI products, such as Microsoft 365 Copilot or GitHub Copilot, is an AI assistant that uses large language models (LLMs) and natural language processing to help users complete tasks like drafting documents, generating code, or summarizing emails. It integrates directly into the user interface of applications (e.g., Word, Excel, Teams) and interprets natural language prompts to produce contextually relevant outputs, leveraging Azure OpenAI Service under the hood.

Exam trap

The trap here is that candidates confuse the term 'copilot' with a general-purpose AI tool or hardware component, rather than recognizing it as a specific Microsoft product category that integrates generative AI as an assistant within existing applications to enhance user productivity.

How to eliminate wrong answers

Option A is wrong because a hardware accelerator for AI model training refers to specialized chips like GPUs (e.g., NVIDIA A100) or FPGAs, not a software-based AI assistant; copilots run on existing hardware and do not accelerate training. Option C is wrong because a database for storing conversation history is a data store (e.g., Azure Cosmos DB or a vector database), not an AI assistant; copilots may use such databases to maintain context but are not themselves databases. Option D is wrong because a software testing tool for AI models (e.g., Azure Machine Learning's model evaluation or fairness assessment tools) is used to validate model performance, not to assist users in completing tasks via natural language.

Practice this question →

8

MCQhard

A developer is using Azure OpenAI with GPT-4 to build a chatbot that answers legal questions based on a company's internal policy documents. The developer wants the model's responses to be maximally deterministic and factual, avoiding any creative or speculative language. Which parameter should the developer set to the lowest possible value in the API call?

A.Temperature

B.Frequency penalty

C.Presence penalty

D.Top_p

AnswerA

Temperature controls the randomness of the model's output. Lower values (down to 0) make the model more deterministic and factual, reducing creative or speculative language.

Why this answer

Temperature controls the randomness of the model's output. Setting it to the lowest possible value (0) makes the model deterministic, always choosing the most likely next token, which is ideal for factual, non-creative responses like legal answers. Higher temperature values introduce variability and creativity, which would be undesirable for this use case.

Exam trap

The trap here is that candidates often confuse 'randomness' with 'repetition' or 'topic diversity,' leading them to choose frequency or presence penalties, but those parameters do not enforce deterministic factual output—only temperature set to 0 does.

How to eliminate wrong answers

Option B (Frequency penalty) is wrong because it reduces repetition by penalizing tokens that have already appeared, but it does not control determinism or creativity; it can still allow creative language. Option C (Presence penalty) is wrong because it encourages the model to talk about new topics by penalizing tokens that have appeared at all, which can actually increase variability and speculative language. Option D (Top_p) is wrong because it controls nucleus sampling (the cumulative probability threshold for token selection) and, while it can reduce randomness, it does not guarantee maximal determinism like setting temperature to 0 does; a low top_p still allows some randomness within the selected nucleus.

Practice this question →

9

MCQmedium

What is 'chain of thought' prompting in generative AI?

A.Connecting multiple AI models in a processing pipeline

B.A prompting technique that elicits step-by-step reasoning to improve accuracy on complex tasks

C.Linking multiple conversation turns to maintain context

D.Training a model using sequential text data only

AnswerB

CoT prompting makes the model reason through steps explicitly — dramatically improving performance on multi-step reasoning tasks.

Why this answer

Chain of thought prompting is a technique where the model is asked to produce intermediate reasoning steps before arriving at a final answer, which significantly improves performance on multi-step arithmetic, logic, and common-sense reasoning tasks. Unlike a simple answer request, it forces the model to externalize its reasoning process, reducing errors from shortcut or pattern-matching behaviors. This is a prompting strategy, not a model architecture change, and is particularly effective in large language models like GPT-4 or Azure OpenAI's GPT-3.5 Turbo.

Exam trap

The trap here is that candidates confuse 'chain of thought' with 'chaining models' (Option A) because both involve the word 'chain', but chain of thought is a single-model prompting technique, not a multi-model pipeline.

How to eliminate wrong answers

Option A is wrong because connecting multiple AI models in a processing pipeline describes a model orchestration or ensemble architecture (e.g., using Azure Logic Apps to chain a translator with a summarizer), not a prompting technique. Option C is wrong because linking multiple conversation turns to maintain context refers to session management or multi-turn dialog state tracking (e.g., using conversation history in a chatbot), not a single-prompt reasoning method. Option D is wrong because training a model using sequential text data only describes a training data format (e.g., sequence-to-sequence learning or autoregressive pretraining), not a prompting strategy applied at inference time.

Practice this question →

10

MCQmedium

A developer uses Azure OpenAI Service to generate code. They provide a few examples of function definitions and their corresponding descriptions, then ask the model to write a new function based on a new description. Which technique is the developer using?

A.Fine-tuning the model with the examples

B.Prompt engineering with few-shot learning

C.Training a custom model from scratch

D.Using reinforcement learning from human feedback

AnswerB

In few-shot learning, you provide a few examples in the prompt to inform the model's output, without modifying the model itself.

Why this answer

The developer is using prompt engineering with few-shot learning, a technique where a small set of input-output examples (here, function definitions and descriptions) is included in the prompt to guide the model's behavior without modifying its weights. This leverages the model's in-context learning ability to generalize from the provided examples and generate a new function for a new description.

Exam trap

The trap here is that candidates confuse providing examples in the prompt (few-shot learning) with fine-tuning, because both involve using examples, but fine-tuning permanently alters the model's weights while prompt engineering does not.

How to eliminate wrong answers

Option A is wrong because fine-tuning involves updating the model's weights through additional training on a dataset, which is not what is happening here—the examples are provided only in the prompt at inference time. Option C is wrong because training a custom model from scratch requires a massive dataset, significant compute resources, and is not a technique used with Azure OpenAI Service for this task; the developer is using a pre-trained model. Option D is wrong because reinforcement learning from human feedback (RLHF) is a training process used to align model behavior based on human preferences, not a method for providing examples in a single prompt to generate code.

Practice this question →

11

MCQeasy

What is a 'system message' (system prompt) in Azure OpenAI chat models?

A.An error notification sent by Azure when the OpenAI service is unavailable

B.A developer-set instruction that defines the model's role, persona, and behavioural constraints

C.Automated messages the model sends to confirm it received the user's input

D.The first message a user sends to start a new conversation session

AnswerB

The system message shapes the model's responses — defining its persona, topic scope, and style before any user interaction begins.

Why this answer

Option B is correct because a system message (system prompt) in Azure OpenAI chat models is a developer-defined instruction that sets the model's role, persona, and behavioral constraints. This prompt is sent as part of the conversation context to guide the model's responses, ensuring it adheres to specific guidelines, tone, or safety rules. It is not an error notification, automated confirmation, or user input.

Exam trap

The trap here is that candidates confuse the system message with the user's first input or an error notification, because the term 'system' might be misinterpreted as an automated system-generated response rather than a developer-controlled instruction.

How to eliminate wrong answers

Option A is wrong because a system message is not an error notification; Azure OpenAI uses HTTP status codes (e.g., 503 Service Unavailable) or specific error responses to indicate service unavailability, not a system prompt. Option C is wrong because the model does not send automated confirmation messages; user input is acknowledged implicitly through the model's response, and there is no built-in 'received' confirmation mechanism in the chat completion API. Option D is wrong because the first message a user sends is a 'user message' (role: 'user'), not a system message; the system message is set by the developer before any user interaction to define the assistant's behavior.

Practice this question →

12

MCQmedium

What is a 'hallucination' in the context of large language models?

A.When a model refuses to answer a question

B.When a model generates plausible-sounding but factually incorrect information

C.When a model processes images instead of text

D.When a model runs out of context window space

AnswerB

Hallucination is when an LLM confidently produces false information — a key limitation of purely statistical language models.

Why this answer

In the context of large language models (LLMs), a hallucination occurs when the model generates text that is fluent, coherent, and plausible-sounding but is factually incorrect or nonsensical. This happens because LLMs are trained to predict the next token based on statistical patterns in their training data, not to verify facts against a ground truth. Option B correctly identifies this behavior.

Exam trap

The trap here is that candidates may confuse a model's refusal to answer (safety guardrails) with a hallucination, or think that running out of context window is a type of hallucination, when in fact hallucination is specifically about generating confident but false content.

How to eliminate wrong answers

Option A is wrong because a model refusing to answer a question is typically a safety or alignment feature (e.g., content filtering or refusal to comply with harmful prompts), not a hallucination. Option C is wrong because processing images instead of text describes a multimodal capability, not a hallucination; hallucinations can occur in text-only models. Option D is wrong because running out of context window space causes truncation or loss of earlier context, leading to incoherence or forgetting, but not the generation of plausible-sounding falsehoods that characterize a hallucination.

Practice this question →

13

MCQmedium

A developer uses Azure OpenAI Service to generate multiple alternative product slogans. The developer wants to get exactly 5 different slogan options in a single API call, each being a separate piece of text. Which parameter should the developer set to control the number of completions returned?

A.temperature

B.max_tokens

C.n

D.stop

AnswerC

The 'n' parameter directly determines how many completions the API returns for the prompt.

Why this answer

The 'n' parameter in Azure OpenAI Service specifies the number of completions (candidate responses) to generate for each API call. Setting n=5 returns exactly five distinct slogan options as separate text strings, fulfilling the requirement of a single request producing multiple alternatives.

Exam trap

The trap here is that candidates confuse parameters that affect output quality (temperature) or length (max_tokens) with the parameter that controls output quantity (n), leading them to pick a plausible-sounding but incorrect option like temperature.

How to eliminate wrong answers

Option A is wrong because temperature controls the randomness or creativity of the output, not the count of completions; it influences token probability distributions but does not determine how many responses are returned. Option B is wrong because max_tokens limits the total number of tokens in a single completion, not the number of completions; it caps response length but does not affect multiplicity. Option D is wrong because stop defines a sequence that halts generation for each completion, used to control output structure, not to specify how many completions are produced.

Practice this question →

14

MCQhard

A developer is using Azure OpenAI Service to generate product descriptions from technical specifications. The generated descriptions sometimes include plausible-sounding but incorrect details (hallucinations). The developer wants to ensure the model's responses are strictly based on the provided product data and does not add any external or invented information. Which approach should the developer use?

A.Use Azure OpenAI On Your Data to connect to a product database so the model retrieves and references only the provided specifications.

B.Increase the frequency penalty to discourage the model from repeating common phrases.

C.Decrease the temperature to 0 so the model always picks the most likely next token, making it more predictable.

D.Enable content filtering to block any outputs that contain harmful or biased language.

AnswerA

Azure OpenAI On Your Data grounds the model on your data, making it more likely to generate responses based solely on the provided facts, thus reducing hallucinations.

Why this answer

Option A is correct because Azure OpenAI On Your Data allows the developer to ground the model's responses in a specific data source, such as a product database. This ensures the model retrieves and references only the provided specifications, preventing the generation of external or invented information (hallucinations). By using this feature, the model's outputs are strictly based on the connected data, aligning with the requirement for factual accuracy.

Exam trap

The trap here is that candidates often confuse hyperparameter tuning (temperature, frequency penalty) or content filtering with data grounding, mistakenly believing these can prevent hallucinations when they only control output style or safety, not factual accuracy.

How to eliminate wrong answers

Option B is wrong because increasing the frequency penalty reduces the likelihood of repeating common phrases but does not prevent the model from inventing new, plausible-sounding details; it addresses repetition, not grounding. Option C is wrong because decreasing the temperature to 0 makes the model more deterministic and predictable, but it does not restrict the model to using only provided data; the model can still hallucinate based on its training data. Option D is wrong because enabling content filtering blocks harmful or biased language but does not ensure the model's responses are strictly based on the provided product data; it addresses safety, not factual grounding.

Practice this question →

15

MCQmedium

What is 'Azure AI Content Safety' and what types of harmful content does it detect?

A.A firewall that blocks malicious network traffic from reaching Azure AI services

B.A service that detects hate, violence, sexual, and self-harm content in text and images at configurable severity levels

C.A GDPR compliance tool that detects and redacts personal data from AI training datasets

D.Copyright detection software that identifies AI-generated content derived from copyrighted material

AnswerB

Azure AI Content Safety classifies content across four harm categories with severity scoring — enabling configurable safety filtering.

Why this answer

Azure AI Content Safety is a cloud service that detects harmful user-generated and AI-generated content in text and images. It identifies categories such as hate, violence, sexual, and self-harm content, and allows you to configure severity levels (safe, low, medium, high) to filter content appropriately. This makes option B correct because it accurately describes the service's purpose and the specific types of harmful content it detects.

Exam trap

The trap here is that candidates confuse Azure AI Content Safety with other Azure security or compliance services (like Azure Firewall, Azure Purview, or Content Moderator), leading them to pick options that describe unrelated capabilities such as network filtering, data privacy, or copyright detection.

How to eliminate wrong answers

Option A is wrong because Azure AI Content Safety is not a network firewall; it analyzes content for harmful material, not network traffic, and does not block malicious packets or use firewall rules. Option C is wrong because it describes a data privacy tool (like Azure Purview or DICOM de-identification), not a content safety service; Content Safety does not detect or redact personal data from training datasets. Option D is wrong because it refers to copyright detection or plagiarism checking, which is not a capability of Azure AI Content Safety; the service focuses on harmful content categories, not intellectual property infringement.

Practice this question →

16

MCQmedium

A company wants to build a chatbot that answers customer questions using a large language model. The company has an extensive internal knowledge base with accurate, up-to-date product information. To ensure the chatbot's answers are based on this reliable source rather than the model's internal knowledge, which technique should they use?

A.Fine-tuning the model on the knowledge base

B.Zero-shot learning

C.Grounding with retrieval-augmented generation

D.Prompt engineering with few-shot examples

AnswerC

Grounding retrieves relevant documents or data from the knowledge base and provides them as context to the model, enabling accurate and current responses.

Why this answer

Option C is correct because grounding with retrieval-augmented generation (RAG) retrieves relevant, up-to-date chunks from the internal knowledge base and provides them as context to the large language model (LLM) at inference time. This ensures the chatbot's answers are factually based on the company's reliable source rather than relying on the model's potentially outdated or incorrect parametric memory.

Exam trap

The trap here is that candidates often confuse fine-tuning (which alters the model's internal knowledge) with retrieval-augmented generation (which keeps the model unchanged and instead supplies external context at query time), leading them to incorrectly select fine-tuning as the method to ensure answers come from a specific knowledge base.

How to eliminate wrong answers

Option A is wrong because fine-tuning updates the model's weights using the knowledge base, which can cause catastrophic forgetting of other capabilities and does not guarantee that the model will use only the most current information from the knowledge base at inference time. Option B is wrong because zero-shot learning relies entirely on the model's pre-existing internal knowledge without any external retrieval, so the chatbot would not be constrained to the company's specific knowledge base. Option D is wrong because prompt engineering with few-shot examples provides in-context examples but does not dynamically retrieve and inject relevant, up-to-date content from the knowledge base, leaving the model free to generate answers from its internal training data.

Practice this question →

17

MCQmedium

What is 'guardrails' in generative AI applications and how are they implemented?

A.Physical barriers around AI data centres to prevent unauthorised access

B.Safety and quality constraints (content filters, system prompts, output validation) preventing harmful AI outputs

C.Legal terms of service that constrain how developers can use Azure OpenAI

D.Rate limits that prevent individual users from generating too many responses

AnswerB

Guardrails layer multiple protections — content safety, system prompts, RAG grounding, and output validation for defence-in-depth.

Why this answer

Guardrails in generative AI applications are safety and quality constraints implemented to prevent harmful or inappropriate AI outputs. They include content filters that block offensive language, system prompts that steer model behavior, and output validation that checks responses against predefined policies. This is correct because guardrails are a core feature of responsible AI deployment, ensuring that generative models like GPT-4 in Azure OpenAI Service produce safe, compliant, and contextually appropriate content.

Exam trap

The trap here is that candidates confuse operational controls (rate limits) or legal agreements (terms of service) with technical safety mechanisms (guardrails), which are specifically designed to filter and validate AI outputs in real time.

How to eliminate wrong answers

Option A is wrong because guardrails are not physical barriers; they are software-based safety mechanisms, not hardware security measures for data centers. Option C is wrong because legal terms of service are contractual agreements, not technical guardrails; they define usage rights and liabilities, not runtime constraints on AI outputs. Option D is wrong because rate limits control API call frequency to manage resource usage, not the content or safety of generated responses; guardrails focus on output quality and harm prevention, not throughput.

Practice this question →

18

MCQmedium

A developer is using Azure OpenAI to generate code snippets for a banking application. The developer wants to minimize the risk that the generated code contains security vulnerabilities or malicious instructions, even if the prompt is ambiguous. Which Azure OpenAI feature should the developer configure to address this concern?

A.Set the temperature parameter to 0

B.Enable content filters

C.Set max_tokens to a low value

D.Use a specific system message that requests secure code

AnswerB

Content filters are designed to detect and prevent the generation of harmful content, including code that could be used maliciously. This is the most direct way to improve safety.

Why this answer

Content filters in Azure OpenAI are specifically designed to detect and block harmful content, including security vulnerabilities and malicious instructions, in both prompts and completions. Unlike other parameters, content filters provide a safety layer that actively scans generated code for prohibited patterns, making them the correct choice for minimizing security risks in ambiguous prompts.

Exam trap

The trap here is that candidates often confuse model parameters (temperature, max_tokens) or prompt engineering (system messages) with actual safety mechanisms, overlooking that content filters are the only built-in feature that actively enforces security policies on generated output.

How to eliminate wrong answers

Option A is wrong because setting the temperature parameter to 0 only makes the model more deterministic and less creative, but it does not prevent the generation of insecure or malicious code—it simply reduces randomness. Option C is wrong because setting max_tokens to a low value limits the length of the output but does not filter or block harmful content; the model could still generate a short snippet containing a security vulnerability. Option D is wrong because while a specific system message requesting secure code can guide the model, it is not a guaranteed safeguard—the model may still produce insecure code if the prompt is ambiguous, and system messages lack the enforcement capability of content filters.

Practice this question →

19

MCQhard

What is 'constitutional AI' and how does it relate to responsible AI development?

A.Legal requirements in government constitutions that regulate AI development

B.A training approach using a set of ethical principles for the model to self-critique and revise outputs

C.Ensuring AI models are built on open standards that any organisation can adopt

D.A framework requiring AI models to have explicit constitutional rights and protections

AnswerB

Constitutional AI builds principle-following into training — the model evaluates its outputs against a constitution to improve helpfulness and harmlessness.

Why this answer

Constitutional AI is a training approach developed by Anthropic where a language model is fine-tuned using a set of written ethical principles (a 'constitution'). The model learns to self-critique its own outputs against these principles and revise them to be more helpful, harmless, and honest. This directly supports responsible AI development by embedding ethical guardrails into the model's behavior without relying solely on human feedback at every step.

Exam trap

The trap here is that candidates confuse 'constitutional' with government law or legal rights, when in fact it refers to a custom set of ethical principles used for model self-critique and revision.

How to eliminate wrong answers

Option A is wrong because constitutional AI is not about legal requirements in government constitutions; it is a technical training method using a custom set of ethical rules, not a legal framework. Option C is wrong because constitutional AI does not mandate open standards or interoperability; it focuses on model self-supervision based on a predefined constitution. Option D is wrong because constitutional AI does not grant rights or protections to the AI model itself; it uses a constitution as a guide for output behavior, not as a legal status for the model.

Practice this question →

20

MCQmedium

What is 'grounding with Bing search' in Microsoft Copilot?

A.Using Bing Maps to provide location-based responses

B.Retrieving current web information from Bing to augment LLM responses beyond its training cutoff

C.Translating Copilot responses using Microsoft's Bing Translator

D.Using Bing advertising data to personalize AI responses

AnswerB

Bing search grounding queries the web at inference time — providing current information that post-dates the model's training data.

Why this answer

Grounding with Bing search in Microsoft Copilot refers to the technique of retrieving real-time, current web information from Bing to augment the responses of a large language model (LLM) beyond its static training cutoff date. This allows Copilot to provide up-to-date answers on recent events, data, or topics not present in the model's original training corpus, effectively grounding the AI's output in verifiable, live web content.

Exam trap

The trap here is that candidates confuse 'grounding' with any Bing-related feature (like maps, translation, or ads) rather than recognizing it as a specific RAG technique for retrieving current web information to augment LLM responses.

How to eliminate wrong answers

Option A is wrong because grounding with Bing search is not about using Bing Maps for location-based responses; that would be a specific geolocation feature, not a general retrieval-augmented generation (RAG) technique. Option C is wrong because translating Copilot responses using Bing Translator is a separate language service, not a method for augmenting LLM responses with current web data. Option D is wrong because using Bing advertising data to personalize AI responses is unrelated to grounding; grounding focuses on factual retrieval from web search results, not ad-driven personalization.

Practice this question →

21

MCQmedium

A company uses Azure OpenAI Service to power an AI assistant that helps customers with product troubleshooting. The assistant must maintain the conversation history to provide contextually relevant answers across multiple turns. Which API endpoint should be used for this purpose?

A.Completions API

B.Chat Completions API

C.Embeddings API

D.Fine-tuning

AnswerB

The Chat Completions API processes a conversation history (list of messages) and generates responses that maintain context across multiple turns.

Why this answer

The Chat Completions API is designed for multi-turn conversational scenarios because it accepts a list of messages with roles (system, user, assistant) that represent the conversation history. This allows the model to maintain context across multiple interactions, making it the correct choice for an AI assistant that needs to provide contextually relevant answers over several turns.

Exam trap

The trap here is that candidates often confuse the Completions API with the Chat Completions API, assuming both can handle multi-turn dialogue, but the Completions API lacks the message-role structure needed for maintaining conversation context.

How to eliminate wrong answers

Option A is wrong because the Completions API is a single-turn endpoint that does not support conversation history or message roles; it simply generates a completion from a prompt without any built-in mechanism for maintaining context across multiple exchanges. Option C is wrong because the Embeddings API converts text into numerical vectors for similarity search or clustering, not for generating conversational responses or maintaining dialogue history. Option D is wrong because Fine-tuning is a training process that customizes a base model on a specific dataset, not an API endpoint for runtime inference; it does not handle conversation history during inference.

Practice this question →

22

MCQeasy

What is generative AI?

A.AI that classifies existing data into predefined categories

B.AI that creates new content such as text, images, or code based on learned patterns

C.AI that detects anomalies in structured data

D.AI that controls physical robots

AnswerB

Generative AI produces original content (text, images, code) by learning patterns from training data.

Why this answer

Generative AI refers to models that learn patterns from training data and then produce new, original content—such as text, images, audio, or code—that resembles the training distribution. Unlike discriminative models that map inputs to labels, generative models (e.g., GPT, DALL-E) sample from a learned probability distribution to create novel outputs. This is the core definition tested in AI-900 for the 'features of generative AI workloads' domain.

Exam trap

The trap here is that candidates confuse generative AI with discriminative AI tasks (like classification or anomaly detection) because both involve learning from data, but generative AI's defining characteristic is the creation of new content, not just analysis or labeling.

How to eliminate wrong answers

Option A is wrong because classifying existing data into predefined categories is a discriminative AI task (e.g., logistic regression, SVM), not generative—generative AI creates new data rather than assigning labels. Option C is wrong because detecting anomalies in structured data is an unsupervised or supervised anomaly detection task (e.g., using isolation forests or autoencoders), which does not involve generating new content. Option D is wrong because controlling physical robots falls under robotics and control systems (e.g., ROS, PID controllers), not generative AI, which focuses on content creation from learned patterns.

Practice this question →

23

MCQmedium

What is a prompt in the context of generative AI?

A.A configuration file for training AI models

B.The input text or instruction given to a generative AI model to guide its output

C.A reward signal used in reinforcement learning

D.A type of neural network activation function

AnswerB

A prompt is the text input that tells the AI model what to generate — prompt quality directly affects output quality.

Why this answer

In generative AI, a prompt is the input text or instruction provided to a model (such as GPT-4 or DALL-E) to guide its output. It acts as the starting context or query that the model uses to generate a relevant response, image, or completion. This is a fundamental concept in Azure OpenAI Service and other generative AI workloads, where prompt engineering is used to refine outputs.

Exam trap

The trap here is that candidates confuse 'prompt' with training-related concepts like configuration files or reinforcement learning signals, because generative AI models are often discussed alongside training terminology, but prompts are strictly inference-time inputs.

How to eliminate wrong answers

Option A is wrong because a configuration file for training AI models is typically a hyperparameter or training config (e.g., learning rate, batch size), not a prompt; prompts are used at inference time, not during training. Option C is wrong because a reward signal is used in reinforcement learning to provide feedback on actions, not as an input to guide generative output; prompts are static instructions, not dynamic rewards. Option D is wrong because an activation function (e.g., ReLU, sigmoid) is a mathematical operation within a neural network layer, not a text input; prompts are textual or token-based inputs to the model.

Practice this question →

24

MCQmedium

What are 'plugins' or 'tools' in the context of AI agents and Microsoft Copilot?

A.Browser extensions that block AI-generated content on websites

B.Extensions that give AI models the ability to call external APIs and take actions beyond text generation

C.Audio plugins for improving AI speech synthesis quality

D.Software updates for Azure OpenAI service deployments

AnswerB

Plugins/tools let AI agents search the web, query databases, run code, and call APIs — extending LLM capabilities to real-world actions.

Why this answer

Option B is correct because plugins (or tools) in AI agents and Microsoft Copilot are extensions that enable the AI to call external APIs, retrieve real-time data, or perform actions beyond text generation. This allows the AI to interact with services like databases, calendars, or custom business logic, making it an agent capable of executing tasks rather than just generating static responses.

Exam trap

The trap here is that candidates confuse 'plugins' with generic add-ons (like browser extensions or audio tools) rather than recognizing them as API-calling mechanisms that enable AI agents to perform actions beyond text generation.

How to eliminate wrong answers

Option A is wrong because browser extensions that block AI-generated content are unrelated to plugins in AI agents; plugins extend AI capabilities, not restrict them. Option C is wrong because audio plugins for speech synthesis are a specific audio processing tool, not the general-purpose API-calling extensions used in AI agents like Copilot. Option D is wrong because software updates for Azure OpenAI service deployments are infrastructure updates, not the extensibility mechanism that allows AI models to invoke external functions or services.

Practice this question →

25

MCQhard

A quality assurance team at a software company uses Azure OpenAI Service to generate compliance reports. They need the model to produce the exact same output for a given prompt every time the API is called, to ensure reproducibility during testing. Which parameter should they set to achieve this deterministic behavior?

A.Set temperature to 0

B.Set frequency penalty to 1

C.Set top_p to 1

D.Set max_tokens to the expected output length

AnswerA

Temperature controls randomness; setting it to 0 makes the model choose the most likely token every time, producing deterministic outputs.

Why this answer

Setting temperature to 0 forces the model to choose the most likely token at each step, eliminating randomness and producing deterministic outputs for the same prompt. This is essential for reproducibility in testing scenarios where identical results are required across API calls.

Exam trap

The trap here is that candidates confuse parameters that reduce variability (like frequency penalty or top_p=1) with the one that eliminates it entirely (temperature=0), assuming any penalty or high probability threshold ensures determinism.

How to eliminate wrong answers

Option B is wrong because frequency penalty reduces repetition by penalizing tokens that have already appeared, but it does not eliminate randomness—it only adjusts token probabilities, so outputs can still vary. Option C is wrong because setting top_p to 1 means the model considers all tokens with cumulative probability up to 1.0, which includes low-probability tokens and introduces variability, not determinism. Option D is wrong because max_tokens only caps the length of the output; it does not control the randomness of token selection, so outputs can differ even with the same token limit.

Practice this question →

26

MCQhard

A company uses Azure OpenAI Service to generate marketing copy for a new product. They have a strict brand voice that requires formal, technical language and explicitly prohibits any humorous or informal phrases. They want to enforce these constraints without retraining the model. Which technique should they use?

A.A) Fine-tuning

B.B) Prompt engineering

C.C) Reinforcement learning

D.D) Transfer learning

AnswerB

Prompt engineering designs the input prompt to control the model's output characteristics, such as tone, style, and content. This is a lightweight, no-training approach to enforce brand voice constraints.

Why this answer

Prompt engineering is correct because it allows the user to craft system messages or user prompts that explicitly instruct the model to use formal, technical language and avoid humor, all without modifying the underlying model weights. This technique leverages the model's instruction-following capability to enforce constraints at inference time, making it ideal for brand voice enforcement without retraining.

Exam trap

The trap here is that candidates often confuse fine-tuning (which requires retraining) with prompt engineering (which is inference-only), leading them to select fine-tuning when the question explicitly prohibits retraining.

How to eliminate wrong answers

Option A is wrong because fine-tuning involves retraining the model on a custom dataset, which contradicts the requirement to avoid retraining and is overkill for simple stylistic constraints. Option C is wrong because reinforcement learning requires a reward signal and iterative training to adjust model behavior, which is a retraining process and not applicable for inference-time constraints. Option D is wrong because transfer learning is a training paradigm for adapting a pre-trained model to a new task via additional training, which also requires retraining and does not directly enforce prompt-level constraints.

Practice this question →

27

MCQmedium

What is 'top_p' (nucleus sampling) in Azure OpenAI and how does it differ from temperature?

A.The maximum percentage of the context window used for generating output

B.Restricting token selection to those whose cumulative probability reaches p — an alternative diversity control to temperature

C.The probability threshold above which the model considers a response correct

D.A parameter setting the minimum confidence before the model outputs a response

AnswerB

Top_p=0.9 means only consider tokens that together hold 90% probability mass — adapting the selection pool to the distribution.

Why this answer

Option B is correct because top_p (nucleus sampling) in Azure OpenAI controls diversity by selecting tokens from the smallest set whose cumulative probability exceeds the threshold p, rather than sampling from the full probability distribution. This differs from temperature, which scales the logits before the softmax to flatten or sharpen the distribution; top_p dynamically cuts off the long tail of low-probability tokens, providing an alternative method to control randomness without affecting the relative ranking of high-probability tokens.

Exam trap

The trap here is that candidates confuse top_p with a confidence or correctness threshold, when in fact it is a sampling parameter that controls the diversity of token selection by truncating the probability distribution.

How to eliminate wrong answers

Option A is wrong because top_p does not relate to the context window size; the context window is a fixed token limit (e.g., 4096 tokens for GPT-3.5) that determines how much input the model can process, not a sampling parameter. Option C is wrong because top_p is not a correctness threshold; the model does not use probability thresholds to deem a response correct—it generates tokens probabilistically, and correctness is evaluated separately (e.g., via human judgment or metrics). Option D is wrong because top_p does not set a minimum confidence; confidence thresholds are not a standard parameter in Azure OpenAI's text generation—parameters like top_p and temperature control sampling behavior, not a confidence cutoff.

Practice this question →

28

MCQmedium

What are embeddings in the context of AI and language models?

A.The process of inserting AI capabilities into existing applications

B.Numerical vector representations of text that capture semantic meaning

C.The training dataset used to build a language model

D.Compressed versions of large language models for edge deployment

AnswerB

Embeddings convert text into high-dimensional vectors where semantic similarity is captured by vector proximity — enabling semantic search.

Why this answer

Option B is correct because embeddings are dense numerical vector representations of text that capture semantic meaning, enabling language models to understand relationships between words and phrases. In the context of AI and language models, embeddings map words, sentences, or documents to high-dimensional vectors where similar meanings are closer in vector space, which is fundamental for tasks like semantic search, clustering, and transfer learning.

Exam trap

The trap here is that candidates confuse the general term 'embedding' (as in integrating AI into apps) with the specific NLP concept of vector embeddings, leading them to pick Option A.

How to eliminate wrong answers

Option A is wrong because it describes 'embedding AI capabilities into applications,' which is a general integration concept, not the technical definition of embeddings in NLP. Option C is wrong because it confuses embeddings with the training dataset; embeddings are learned representations derived from data, not the dataset itself. Option D is wrong because it refers to model compression techniques like quantization or pruning for edge deployment, which are unrelated to the vector representations used for semantic encoding.

Practice this question →

29

MCQeasy

What is 'text generation' as a generative AI capability and what are common use cases?

A.Extracting and copying text from scanned images using OCR

B.Creating new coherent text from prompts for writing, code, summaries, and conversational AI

C.Converting speech audio into a written transcript

D.Formatting existing text by adding headings, bullets, and correct punctuation

AnswerB

Text generation is the core LLM capability — producing novel text for writing assistance, code, customer service, and content creation.

Why this answer

Text generation in generative AI refers to the capability of models (like GPT-4 or GPT-3.5) to produce new, coherent text based on a given prompt. This includes tasks such as writing articles, generating code, creating summaries, and powering conversational AI agents. The key distinction is that the output is novel content, not a direct extraction or transformation of existing text.

Exam trap

The trap here is that candidates confuse text generation with text extraction or transformation tasks (like OCR, transcription, or formatting), because all involve text, but only generative AI creates new, original content from a prompt.

How to eliminate wrong answers

Option A is wrong because it describes Optical Character Recognition (OCR), which extracts text from images but does not generate new content; it is a form of data extraction, not generative AI. Option C is wrong because it describes speech-to-text transcription, which converts audio to text without creating new or original content; it is a recognition task, not generation. Option D is wrong because it describes text formatting or editing (e.g., adding headings, bullets, punctuation), which modifies existing text but does not produce new, original content from a prompt; this is a transformation task, not generative AI.

Practice this question →

30

MCQmedium

What is 'content moderation' in the context of Azure OpenAI?

A.Controlling how much content a user is allowed to generate per day

B.Automatically filtering and classifying inputs/outputs for harmful content categories

C.Editing generated text to improve grammar and style

D.Optimising prompt length to reduce token costs

AnswerB

Content moderation screens for hate speech, violence, sexual content, and self-harm — protecting users and organisations from harmful AI outputs.

Why this answer

Content moderation in Azure OpenAI uses AI models to automatically scan both user prompts (inputs) and generated responses (outputs) for harmful content such as hate, violence, sexual material, and self-harm. It applies configurable severity filters (e.g., low, medium, high) to block or flag content that violates Microsoft's Responsible AI policies, ensuring safe deployment of generative AI workloads.

Exam trap

The trap here is that candidates confuse content moderation with usage quotas or prompt engineering, but the exam specifically tests the safety filtering and classification of harmful content as a core feature of responsible AI in Azure OpenAI.

How to eliminate wrong answers

Option A is wrong because it describes a rate-limiting or quota control feature, not content moderation; Azure OpenAI uses tokens-per-minute (TPM) limits for that purpose. Option C is wrong because it describes a grammar/style editing function, which is not part of content moderation; Azure OpenAI's content filters do not perform linguistic improvements. Option D is wrong because it describes prompt optimization for cost efficiency, which is unrelated to safety filtering; content moderation focuses on harmful content detection, not token usage.

Practice this question →

31

MCQhard

What is 'speculative decoding' and how does it improve LLM inference speed?

A.Predicting user input before they finish typing to pre-compute responses

B.Using a small draft model to generate candidate tokens that a large model verifies in parallel — improving throughput

C.Generating speculative forecasts about future events using language model knowledge

D.Running model inference on the CPU while the GPU processes the next request in parallel

AnswerB

Speculative decoding gets multiple tokens per main model pass — reducing latency without changing output quality.

Why this answer

Speculative decoding improves LLM inference speed by using a small, fast draft model to generate multiple candidate tokens in sequence, which are then verified in parallel by the large target model. This parallel verification allows the large model to accept or reject entire blocks of tokens at once, significantly reducing the number of sequential autoregressive steps required. The technique leverages the observation that draft models can produce acceptable continuations most of the time, and the large model only needs to correct mistakes, leading to higher throughput without sacrificing output quality.

Exam trap

The trap here is that candidates confuse speculative decoding with simple input prediction or CPU/GPU offloading, but Microsoft often tests the specific mechanism of using a draft model for parallel token verification as the defining characteristic of speculative decoding.

How to eliminate wrong answers

Option A is wrong because it describes input prediction or autocomplete, not speculative decoding; speculative decoding does not pre-compute responses based on partial user input but rather uses a draft model to generate candidate tokens for parallel verification. Option C is wrong because speculative decoding is a technique for accelerating inference, not a method for generating forecasts about future events; it has nothing to do with predictive modeling of real-world events. Option D is wrong because speculative decoding does not involve CPU/GPU parallelism for different requests; it is a single-request optimization where both draft and target models run on the same accelerator (typically GPU) to parallelize token generation within one inference pass.

Practice this question →

32

Drag & Dropmedium

Drag and drop the steps to perform a face detection using Azure Face API into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Face detection requires setting up the resource, sending an image, and parsing the detected faces.

Practice this question →

33

MCQmedium

A company wants to use Azure OpenAI to generate product descriptions. They have a few example descriptions that perfectly match their desired style and structure. They want the model to produce new descriptions in the same style without retraining the underlying model. Which approach should they use?

A.Fine-tune the model on the example descriptions

B.Few-shot prompting with the examples in the prompt

C.Embeddings and similarity search

D.Content filtering configurations

AnswerB

Correct. Few-shot prompting uses the provided examples in the prompt to condition the model on the desired style without any retraining.

Why this answer

Few-shot prompting provides the model with a small number of example inputs and outputs directly in the prompt, allowing it to infer the desired style and structure without any training. This approach is ideal when you have a few high-quality examples and want to generate new content that matches them, without the cost and complexity of fine-tuning.

Exam trap

The trap here is that candidates often confuse fine-tuning with few-shot prompting, assuming that any use of examples requires retraining the model, when in fact the examples can simply be placed in the prompt to achieve the same effect without modifying the model.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires retraining the model on a labeled dataset, which is unnecessary and more resource-intensive when only a few examples are available; it also changes the model weights permanently. Option C is wrong because embeddings and similarity search are used for retrieving relevant documents or measuring semantic similarity, not for generating new text in a specific style. Option D is wrong because content filtering configurations are designed to block harmful or policy-violating content, not to guide the model's output style or structure.

Practice this question →

34

MCQmedium

What is 'Azure AI Foundry's model hub' and what models are available there?

A.A marketplace where organisations can sell their custom-trained AI models to other Azure customers

B.A curated collection of leading AI models from OpenAI, Microsoft (Phi), Meta, Mistral, and others

C.A version control system for AI models similar to Git for code

D.A centralised repository of Microsoft's internal research models not available to customers

AnswerB

The model hub provides one-stop model discovery — GPT-4o, Phi-4, Llama 3, and more — deployable as Azure endpoints.

Why this answer

Azure AI Foundry's model hub is a curated collection of leading AI models from providers like OpenAI, Microsoft (Phi), Meta, Mistral, and others. It enables developers to discover, compare, and deploy pre-built models for generative AI workloads without needing to train models from scratch. This aligns with the exam's focus on leveraging existing AI services in Azure.

Exam trap

The trap here is that candidates confuse the model hub with a general marketplace or version control system, overlooking that it is specifically a curated collection of pre-built, ready-to-deploy models from multiple leading AI providers.

How to eliminate wrong answers

Option A is wrong because the model hub is not a marketplace for selling custom-trained models; it is a curated catalog of pre-built models from major providers. Option C is wrong because the model hub is not a version control system like Git; it is a repository for model discovery and deployment, not for tracking code changes. Option D is wrong because the model hub includes models from multiple third-party vendors and is fully available to customers, not restricted to Microsoft's internal research models.

Practice this question →

35

MCQmedium

What is fine-tuning in the context of large language models?

A.Adjusting the model's response speed for production deployment

B.Training a pre-trained model further on domain-specific data to improve task performance

C.Manually reviewing and correcting model outputs

D.Compressing a large model into a smaller, faster version

AnswerB

Fine-tuning adapts a foundation model to specific tasks or domains through additional training on targeted data.

Why this answer

Fine-tuning takes a pre-trained large language model (LLM) and continues the training process on a smaller, domain-specific dataset. This adjusts the model's weights to specialize its outputs for particular tasks (e.g., legal document summarization or medical Q&A) without retraining from scratch. It is distinct from prompt engineering or retrieval-augmented generation because it permanently modifies the model parameters.

Exam trap

The trap here is that candidates confuse fine-tuning with inference optimization or model compression, because all three can improve performance in production, but only fine-tuning actually modifies model weights through additional training on domain-specific data.

How to eliminate wrong answers

Option A is wrong because adjusting response speed for production deployment is an inference optimization technique (e.g., model quantization, batching, or using Azure OpenAI's throughput settings), not a training process like fine-tuning. Option C is wrong because manually reviewing and correcting outputs is a post-processing or human-in-the-loop validation step, not a model training method. Option D is wrong because compressing a large model into a smaller, faster version describes model distillation or pruning, which reduces model size and latency but does not involve training on domain-specific data to improve task performance.

Practice this question →

36

MCQmedium

What is 'Azure AI Services multi-service resource' and what is its advantage?

A.A resource that automatically selects the best AI model for each request based on the task

B.A single resource providing one API key for Vision, Language, Speech, and Translator with unified billing

C.A resource type that runs multiple AI workloads simultaneously on shared compute

D.An enterprise licence for unlimited usage of all Azure AI services

AnswerB

Multi-service resource simplifies management — one key, one endpoint, consolidated bill for multiple Azure AI services.

Why this answer

Option B is correct because an Azure AI Services multi-service resource provides a single endpoint and API key to access multiple Azure AI services (Vision, Language, Speech, Translator) under one resource, enabling unified billing and simplified management. This is distinct from single-service resources, which require separate keys and endpoints for each service, increasing administrative overhead.

Exam trap

The trap here is that candidates confuse 'multi-service resource' with a load balancer or auto-scaling feature, when in reality it is purely a billing and key-management convenience with no impact on how AI models are selected or executed.

How to eliminate wrong answers

Option A is wrong because it describes a hypothetical auto-selection mechanism that does not exist in Azure AI Services; multi-service resources do not automatically choose models—they expose individual APIs that must be called explicitly. Option C is wrong because multi-service resources do not run workloads on shared compute; they are logical containers for API access, and each service runs on its own dedicated backend infrastructure. Option D is wrong because there is no 'enterprise licence for unlimited usage'—Azure AI Services are billed per-call or per-transaction, and multi-service resources simply consolidate billing under one meter, not provide unlimited usage.

Practice this question →

37

MCQmedium

A developer is using Azure OpenAI Service to classify customer support tickets into categories such as 'Billing', 'Technical Issue', and 'Account Management'. The developer provides three labeled examples for each category in the prompt to improve the model's accuracy. What technique is the developer applying?

A.Fine-tuning

B.Few-shot learning

C.Prompt engineering

D.Retrieval-augmented generation

AnswerB

By providing a few labeled examples in the prompt, the developer is using few-shot learning to guide the model's classification behavior without retraining.

Why this answer

Few-shot learning is the correct technique because the developer is providing a small number of labeled examples (three per category) directly in the prompt to guide the model's output without updating the model's weights. This approach leverages the model's in-context learning ability, where the examples act as a pattern for the model to follow when classifying new tickets.

Exam trap

The trap here is that candidates often confuse few-shot learning with fine-tuning, assuming that any use of examples to improve accuracy must involve retraining the model, but few-shot learning does not modify model weights—it only uses examples in the prompt.

How to eliminate wrong answers

Option A is wrong because fine-tuning involves retraining the model on a custom dataset to update its weights, which is not what is happening here—the developer is only adding examples to the prompt, not training the model. Option C is wrong because prompt engineering is a broader practice that includes designing prompts for clarity and structure, but the specific technique of including labeled examples in the prompt to improve accuracy is called few-shot learning, not just prompt engineering. Option D is wrong because retrieval-augmented generation (RAG) involves fetching external data from a knowledge base at inference time to augment the prompt, whereas here the examples are static and pre-defined in the prompt itself.

Practice this question →

38

MCQmedium

What is 'Microsoft Copilot Studio' and what is it used for?

A.A professional audio/video editing suite powered by AI for content creators

B.A low-code platform for building custom AI bots and agents integrated with Microsoft 365

C.An IDE for enterprise developers building high-performance LLM applications in C#

D.A tool for generating Copilot-branded marketing content for Microsoft partners

AnswerB

Copilot Studio enables citizen developers to build domain-specific bots — extending Microsoft Copilot with custom knowledge and workflows.

Why this answer

Microsoft Copilot Studio is a low-code platform that allows users to build custom AI-powered bots and agents that integrate seamlessly with Microsoft 365 services. It extends the capabilities of Microsoft Copilot by enabling tailored conversational experiences, such as automating workflows, answering queries, and handling tasks within the Microsoft ecosystem without requiring extensive coding.

Exam trap

The trap here is that candidates may confuse 'Copilot Studio' with a general-purpose development tool or creative suite, rather than recognizing it as a low-code platform specifically for building custom AI bots integrated with Microsoft 365.

How to eliminate wrong answers

Option A is wrong because Microsoft Copilot Studio is not a professional audio/video editing suite; that describes tools like Adobe Premiere Pro or DaVinci Resolve, not a low-code AI bot builder. Option C is wrong because Copilot Studio is not an IDE for building LLM applications in C#; it is a low-code platform, whereas an IDE like Visual Studio or JetBrains Rider would be used for such development. Option D is wrong because Copilot Studio is not a marketing content generation tool for partners; it is a platform for creating custom AI agents, not for producing Copilot-branded marketing materials.

Practice this question →

39

MCQmedium

A company uses Azure OpenAI Service to automatically generate customer support email responses. They want to ensure that the model does not produce responses containing offensive language, hate speech, or biased content. Which Microsoft responsible AI principle is most directly addressed by implementing content filters that screen the model's output before it is sent?

A.A. Transparency

B.B. Reliability and Safety

C.C. Inclusiveness

D.D. Fairness

AnswerD

Fairness is the principle that AI systems should treat all people fairly and avoid bias. Implementing content filters to block hate speech and offensive language is a direct application of Fairness.

Why this answer

Implementing content filters to screen model outputs for offensive language, hate speech, or biased content directly addresses the Fairness principle, which requires AI systems to treat all people equitably and avoid reinforcing societal biases. By filtering out harmful or biased content, the organization ensures that the generated responses do not discriminate against or marginalize any group, aligning with Microsoft's commitment to fairness in AI.

Exam trap

The trap here is that candidates often confuse Reliability and Safety (which deals with system uptime and operational failures) with the specific need to prevent biased or offensive outputs, which falls under Fairness in Microsoft's responsible AI framework.

How to eliminate wrong answers

Option A is wrong because Transparency refers to the principle of making AI systems understandable and providing clear information about their capabilities and limitations, not about filtering outputs for harmful content. Option B is wrong because Reliability and Safety focuses on ensuring the AI system operates dependably and safely under normal conditions, which includes preventing failures but does not specifically target bias or offensive language filtering. Option C is wrong because Inclusiveness aims to design AI systems that empower and include all people, often through accessible interfaces and diverse data representation, but it does not directly address the screening of outputs for offensive or biased content.

Practice this question →

40

MCQeasy

A developer uses Azure OpenAI Service to generate marketing copy. They want the model to produce more focused and deterministic responses, reducing the variety of outputs for the same prompt. Which parameter should the developer decrease?

A.Temperature

B.Max tokens

C.Top P

D.Frequency penalty

AnswerA

Lowering temperature reduces randomness, making outputs more deterministic and focused.

Why this answer

Temperature controls the randomness of the model's output. Lowering temperature (e.g., from 1.0 to 0.2) makes the model more deterministic and focused, reducing output variety for the same prompt. This is the correct parameter to adjust for more consistent marketing copy.

Exam trap

The trap here is that candidates often confuse temperature with Top P, thinking both control randomness identically, but temperature directly scales logits while Top P sets a cumulative probability cutoff for token selection.

How to eliminate wrong answers

Option B (Max tokens) is wrong because it controls the maximum length of the output, not the randomness or determinism. Option C (Top P) is wrong because it controls nucleus sampling, which also affects output diversity but through cumulative probability threshold, not directly reducing variety in a deterministic way. Option D (Frequency penalty) is wrong because it reduces repetition of tokens based on their frequency in the output, not the overall randomness or determinism of the response.

Practice this question →

41

MCQhard

A developer uses Azure OpenAI Service to generate product descriptions. They want to ensure that the model only considers the most likely tokens that together have a cumulative probability of 0.95, ignoring very low-probability tokens that could lead to nonsensical outputs. Which parameter should they configure?

A.Temperature

B.Top_p

C.Frequency penalty

D.Presence penalty

AnswerB

Correct. Top_p (nucleus sampling) sets a cumulative probability threshold so that only the most probable tokens that together reach that threshold are considered, eliminating very unlikely tokens.

Why this answer

Option B (Top_p) is correct because the developer wants to limit token selection to those with a cumulative probability of 0.95, which is exactly what the Top_p (nucleus sampling) parameter controls. By setting Top_p to 0.95, the model will only consider the smallest set of tokens whose combined probability mass reaches 0.95, effectively ignoring low-probability tokens that could produce nonsensical outputs.

Exam trap

The trap here is that candidates often confuse Top_p with Temperature, assuming both control randomness, but Temperature scales logits without filtering low-probability tokens, whereas Top_p directly removes them based on cumulative probability mass.

How to eliminate wrong answers

Option A (Temperature) is wrong because it controls the randomness of token selection by scaling the logits before applying softmax, not by filtering based on cumulative probability. Option C (Frequency penalty) is wrong because it reduces the likelihood of tokens that have already appeared in the generated text, aiming to avoid repetition, not to filter low-probability tokens. Option D (Presence penalty) is wrong because it penalizes tokens that have appeared at least once in the text, encouraging the model to introduce new topics, but does not perform cumulative probability filtering.

Practice this question →

42

MCQmedium

What is 'retrieval augmented generation' (RAG) and which Azure services typically implement it?

A.Using Azure Storage to retrieve training data for model fine-tuning

B.Combining Azure AI Search (retrieval) with Azure OpenAI (generation) to ground LLM responses in a knowledge base

C.Using Azure CDN to deliver AI-generated content faster globally

D.A method of compressing large datasets before training language models

AnswerB

RAG: AI Search retrieves relevant documents → provided as context to Azure OpenAI → LLM generates answers grounded in retrieved content.

Why this answer

Retrieval Augmented Generation (RAG) is a pattern that combines a retrieval step with a generative step. In Azure, this is typically implemented by using Azure AI Search to retrieve relevant documents or chunks from a knowledge base, then passing those results as context to an Azure OpenAI model (e.g., GPT-4) to generate a grounded, fact-based response. This approach reduces hallucinations and ensures the output is based on authoritative data rather than the model's training data alone.

Exam trap

The trap here is that candidates confuse RAG with fine-tuning, mistakenly thinking retrieval modifies the model's training data, whereas RAG is a prompt-time augmentation that leaves the model unchanged.

How to eliminate wrong answers

Option A is wrong because RAG does not involve fine-tuning the model; it retrieves external data at inference time to augment the prompt, not to update the model's weights. Option C is wrong because Azure CDN is a content delivery network for caching and accelerating static assets, not a component of the RAG pipeline which focuses on retrieval and generation. Option D is wrong because RAG is not a compression technique; it is an architecture that retrieves relevant information from a vector or keyword index to provide context for generation, leaving the dataset and model unchanged.

Practice this question →

43

MCQeasy

A company wants to build a chatbot that answers customer questions using only their internal knowledge base, which consists of several PDFs and Word documents. They do not want the chatbot to use any information from the model's pre-trained knowledge. Which Azure OpenAI feature should they use to achieve this?

A.Content filtering

B.Prompt flow

C.Azure OpenAI on your data

D.Temperature parameter

AnswerC

This feature integrates the model with your own data sources (e.g., PDFs, databases) to generate responses based exclusively on your data, overriding the model's general knowledge.

Why this answer

Azure OpenAI on your data allows you to connect Azure OpenAI models to your own data sources (such as PDFs and Word documents) and restrict the model to generate responses solely from that data, without using the model's pre-trained knowledge. This is achieved by indexing the documents into an Azure Cognitive Search index and using retrieval-augmented generation (RAG) to ground the model's responses in your specific content.

Exam trap

The trap here is that candidates often confuse prompt engineering techniques (like setting temperature or using Prompt flow) with the data grounding mechanism provided by Azure OpenAI on your data, mistakenly thinking they can control knowledge sources through parameters or workflow tools.

How to eliminate wrong answers

Option A is wrong because content filtering is a safety feature that blocks harmful or policy-violating content in inputs and outputs, but it does not restrict the model's knowledge source to your own data. Option B is wrong because Prompt flow is a development tool for building and orchestrating AI workflows, not a feature that confines the model's knowledge to your documents. Option D is wrong because the temperature parameter controls the randomness of the model's responses, not the source of information the model uses.

Practice this question →

44

MCQmedium

What is 'retrieval-augmented generation' (RAG) and what problem does it solve?

A.Storing model responses in a cache to retrieve them faster for repeated questions

B.Retrieving relevant documents from a knowledge base to provide accurate context for LLM responses

C.Generating random responses and selecting the most relevant using a ranker model

D.A technique for making LLM responses shorter by removing irrelevant sections

AnswerB

RAG grounds LLM answers in retrieved documents — solving hallucination, knowledge cutoff, and private data limitations.

Why this answer

Retrieval-augmented generation (RAG) combines a retrieval step with a generative language model. It first retrieves relevant documents or passages from an external knowledge base (e.g., Azure Cognitive Search) and then feeds that context into the LLM to ground its response. This solves the problem of LLMs producing outdated, hallucinated, or factually incorrect answers by ensuring the model has access to current, authoritative information.

Exam trap

The trap here is that candidates confuse RAG with simple caching or response shortening, overlooking that the core innovation is grounding generation in externally retrieved, up-to-date knowledge rather than relying solely on the model's parametric memory.

How to eliminate wrong answers

Option A is wrong because caching model responses improves latency for repeated queries but does not address factual accuracy or grounding; it is a performance optimization, not a solution for hallucination or outdated knowledge. Option C is wrong because generating random responses and then ranking them is not how RAG works; RAG retrieves relevant documents first, then generates a single response grounded in that context, not a random selection. Option D is wrong because RAG is about augmenting the input with retrieved context, not about shortening responses; truncation or summarization techniques are separate concerns.

Practice this question →

45

MCQmedium

A developer is using Azure OpenAI Service to generate structured data in JSON format. They want to ensure that every response is valid JSON without adding instructions in every prompt. Which Azure OpenAI feature should they configure?

A.Set the temperature parameter to a low value (e.g., 0).

B.Set the top_p parameter to a high value (e.g., 1).

C.Set the response_format parameter to 'json_object'.

D.Set the max_tokens parameter to a high value (e.g., 2000).

AnswerC

Setting response_format to 'json_object' instructs the model to output valid JSON, which is exactly what the developer needs for structured data generation.

Why this answer

Option C is correct because Azure OpenAI Service provides a `response_format` parameter that can be set to `json_object`, which instructs the model to always return valid JSON output. This ensures structured data without requiring the developer to include formatting instructions in every prompt, as the service enforces JSON schema compliance at the API level.

Exam trap

The trap here is that candidates often confuse parameters that control randomness (temperature, top_p) or output length (max_tokens) with those that enforce output structure, leading them to incorrectly assume that low temperature alone can produce consistent JSON formatting.

How to eliminate wrong answers

Option A is wrong because setting the temperature parameter to a low value (e.g., 0) reduces randomness and makes output more deterministic, but it does not enforce any specific output format like JSON; it only controls creativity. Option B is wrong because setting top_p to a high value (e.g., 1) allows the model to consider a wider range of token probabilities, increasing diversity, but it does not guarantee structured JSON output. Option D is wrong because setting max_tokens to a high value (e.g., 2000) only controls the maximum length of the response, not its format; it cannot ensure the output is valid JSON.

Practice this question →

46

MCQmedium

What is 'few-shot prompting' and how does it improve model outputs?

A.Training a model with very few labelled examples using transfer learning

B.Including a small number of input-output examples in the prompt to demonstrate the desired task format

C.Generating a short (few-shot) response rather than a detailed answer

D.Running the model for only a few seconds to save compute costs

AnswerB

Few-shot prompting provides task demonstrations in the prompt — no training required, just examples that show the model what's expected.

Why this answer

Few-shot prompting improves model outputs by providing a small number of input-output examples directly in the prompt, which helps the model understand the desired task format, style, or reasoning pattern without requiring any fine-tuning or retraining. This technique leverages the model's in-context learning ability to generalize from the given examples and produce more accurate, consistent responses.

Exam trap

The trap here is that candidates confuse 'few-shot' with 'fewer training data' or 'shorter responses,' when the term specifically refers to the number of examples included in the prompt to guide the model's output.

How to eliminate wrong answers

Option A is wrong because few-shot prompting does not involve training or updating model weights; it relies on in-context learning within a single prompt, not transfer learning or additional training with labelled examples. Option C is wrong because 'few-shot' refers to the number of examples in the prompt, not the length of the response; the model can still generate detailed answers. Option D is wrong because few-shot prompting has nothing to do with compute time or cost savings; it is a prompt engineering technique that may actually increase token usage and latency.

Practice this question →

47

MCQeasy

What is the Whisper model available in Azure OpenAI used for?

A.Generating images from text descriptions

B.Transcribing spoken audio to text with high accuracy across languages

C.Generating very quiet (whispering) text-to-speech audio

D.Summarizing long documents into concise bullet points

AnswerB

Whisper is OpenAI's speech recognition model — it transcribes audio to text across many languages and audio conditions.

Why this answer

The Whisper model in Azure OpenAI is a large-scale speech recognition system designed to transcribe spoken audio into text. It supports multiple languages and is optimized for high accuracy, making it the correct choice for audio-to-text tasks.

Exam trap

The trap here is that the name 'Whisper' misleads candidates into thinking it relates to quiet speech or text-to-speech, when it is actually a speech-to-text model.

How to eliminate wrong answers

Option A is wrong because generating images from text descriptions is the function of DALL-E models, not Whisper. Option C is wrong because Whisper is for speech-to-text transcription, not text-to-speech generation; 'whispering' refers to the model's name, not the volume of output. Option D is wrong because summarizing long documents is a text-based task handled by GPT models, not by Whisper, which focuses on audio processing.

Practice this question →

48

MCQmedium

A writer uses Azure OpenAI Service to generate story ideas. The current configuration uses a temperature setting of 0, causing the model to produce identical outputs for the same prompt. The writer wants more creative and diverse outputs. Which parameter should be increased?

A.max_tokens

B.temperature

C.top_p

D.frequency_penalty

AnswerB

Increasing temperature increases randomness, leading to more creative and diverse outputs.

Why this answer

Temperature controls the randomness of the model's output. A temperature of 0 makes the model deterministic, always choosing the most likely next token, which leads to identical outputs for the same prompt. Increasing the temperature (e.g., to 0.7 or higher) introduces more randomness, allowing the model to sample from less likely tokens and produce more creative, diverse story ideas.

Exam trap

The trap here is that candidates may confuse temperature with top_p, thinking both are equally responsible for randomness, but temperature is the direct control for randomness while top_p is an alternative sampling method that can also affect diversity but is not the parameter to increase for more creative outputs.

How to eliminate wrong answers

Option A is wrong because max_tokens controls the maximum length of the generated output, not the diversity or creativity of the content. Option C is wrong because top_p (nucleus sampling) also influences randomness, but the question specifically asks for the parameter to increase for more creative outputs; while top_p can be adjusted, temperature is the primary parameter for controlling randomness and is the most direct answer. Option D is wrong because frequency_penalty reduces the repetition of tokens by penalizing tokens that have already appeared, which can increase diversity but is not the primary parameter for controlling randomness; it is more about penalizing repetition rather than introducing creative randomness.

Practice this question →

49

MCQmedium

What is 'prompt flow' in Azure AI Foundry?

A.A tool for managing the queue of prompt requests sent to Azure OpenAI during peak usage

B.A visual development tool for building, testing, and deploying LLM application pipelines

C.An automated system that suggests improvements to prompts based on output quality metrics

D.A monitoring dashboard showing the flow of prompts through an AI application in production

AnswerB

Prompt flow chains LLM calls, tools, and functions visually — enabling RAG pipelines and agents to be built, evaluated, and deployed.

Why this answer

Prompt flow in Azure AI Foundry is a visual development tool that enables developers to design, test, and deploy end-to-end pipelines for large language model (LLM) applications. It provides a graph-based interface to orchestrate LLM calls, data processing, and custom logic, making it easier to build complex generative AI workflows without writing extensive code.

Exam trap

The trap here is that candidates confuse 'prompt flow' with a monitoring or optimization tool, when in fact it is a visual pipeline builder for developing and testing LLM application workflows.

How to eliminate wrong answers

Option A is wrong because prompt flow is not a queue management tool for handling request spikes; Azure OpenAI provides built-in rate limiting and quota management for that purpose. Option C is wrong because prompt flow does not automatically suggest prompt improvements based on output metrics; that functionality is more aligned with features like prompt engineering guidance or evaluation tools within Azure AI Foundry. Option D is wrong because prompt flow is primarily a development and testing tool, not a production monitoring dashboard; monitoring is handled by separate services like Azure Monitor or Application Insights.

Practice this question →

50

MCQmedium

A company uses a generative AI model to answer customer questions about their products. They observe that the model sometimes produces factually incorrect or fabricated information. To reduce these inaccuracies, they want to provide the model with relevant, up-to-date product documentation as context before generating a response. Which technique is being applied?

A.Prompt Engineering

B.Grounding

C.Fine-tuning

D.Reinforcement Learning from Human Feedback (RLHF)

AnswerB

Grounding connects the model to external data sources (like product documentation) to provide factual context, significantly reducing hallucinations and improving accuracy.

Why this answer

B is correct because grounding is the technique of providing a generative AI model with specific, authoritative source data (such as product documentation) as context before generating a response. This anchors the model's output to verified facts, directly reducing hallucinations and fabricated information by constraining the generation to the provided context.

Exam trap

Microsoft often tests the distinction between grounding (providing external context at inference time) and fine-tuning (updating model weights), so candidates mistakenly choose fine-tuning when the scenario describes adding new information without retraining.

How to eliminate wrong answers

Option A is wrong because prompt engineering involves crafting input instructions to guide model behavior, but it does not inherently supply new, up-to-date factual context; it only refines how the model uses its existing training data. Option C is wrong because fine-tuning retrains the model on a specific dataset to adapt its weights, which is a more resource-intensive process and does not dynamically inject current documentation at inference time. Option D is wrong because Reinforcement Learning from Human Feedback (RLHF) uses human preferences to align model outputs with desired qualities (e.g., helpfulness, safety), but it does not provide real-time factual context to reduce inaccuracies.

Practice this question →

51

MCQmedium

A developer uses Azure OpenAI to generate product descriptions. They provide five examples of product descriptions that follow a specific format (name, features, price, call to action). They then ask the model to write a new description for a given product, expecting the same format. Which technique is the developer using?

A.Fine-tuning

B.Zero-shot learning

C.Few-shot learning

D.Reinforcement learning

AnswerC

Few-shot learning uses a small number of examples in the prompt to demonstrate the desired output format or style, exactly as the developer does with five examples.

Why this answer

The developer is using few-shot learning, which involves providing a small number of examples (in this case, five product descriptions) to guide the model's output format and style without updating the model's weights. This technique leverages the model's in-context learning ability to follow the demonstrated pattern for a new input.

Exam trap

The trap here is that candidates often confuse few-shot learning with fine-tuning, assuming that providing examples in the prompt constitutes training the model, when in fact fine-tuning involves a separate training phase that modifies model parameters.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires retraining the model on a labeled dataset to adjust its weights, which is not happening here—the developer is simply providing examples in the prompt. Option B is wrong because zero-shot learning involves no examples at all, relying solely on the model's pre-trained knowledge to generate output, whereas here five examples are explicitly given. Option D is wrong because reinforcement learning uses reward signals to iteratively improve model behavior through trial and error, not by providing static examples in a single prompt.

Practice this question →

52

MCQmedium

A marketing team uses Azure OpenAI Service to generate headline ideas for a campaign. They find the generated headlines are often too similar and lack creativity. Which parameter should they increase to introduce more randomness in the generated text?

A.Frequency penalty

B.Top_p (nucleus sampling)

C.Temperature

D.Presence penalty

AnswerC

Temperature directly controls the level of randomness; increasing it makes the model more likely to choose less probable tokens, leading to more creative and varied outputs.

Why this answer

Option C (Temperature) is correct because temperature controls the randomness of token selection in the model's probability distribution. Increasing temperature (e.g., from 0.7 to 1.0) flattens the probability curve, making lower-probability tokens more likely to be chosen, which introduces more diversity and creativity in the generated headlines.

Exam trap

The trap here is that candidates often confuse temperature with frequency or presence penalties, thinking that penalizing repetition (frequency penalty) will increase creativity, when in fact temperature directly controls the randomness of token selection, which is the key to generating more diverse and creative text.

How to eliminate wrong answers

Option A (Frequency penalty) is wrong because it reduces repetition by penalizing tokens that have already appeared in the text, but it does not directly increase randomness; it only discourages the model from reusing the same words or phrases. Option B (Top_p, or nucleus sampling) is wrong because it limits the cumulative probability mass of tokens considered for sampling (e.g., top_p=0.9 means only the top 90% of probability mass is sampled), which can actually reduce randomness by cutting off the long tail of low-probability tokens. Option D (Presence penalty) is wrong because it penalizes tokens that have appeared at least once in the text, encouraging the model to introduce new topics, but it does not increase randomness in token selection; it only promotes novelty in content.

Practice this question →

53

MCQmedium

A marketing team uses Azure OpenAI Service to generate ad copy. They notice the model sometimes uses offensive language. Which Azure OpenAI feature should they use to automatically block such content?

A.Setting the temperature parameter to 0.0

B.Using the frequency_penalty parameter

C.Enabling content filtering

D.Configuring the max_tokens parameter

AnswerC

Content filtering is a built-in safety feature designed to detect and block harmful or offensive content in both prompts and completions.

Why this answer

Option C is correct because Azure OpenAI Service includes built-in content filtering that automatically detects and blocks offensive or harmful language in both prompts and completions. This feature uses AI-based classifiers to enforce responsible AI policies without requiring manual configuration of model parameters.

Exam trap

The trap here is that candidates confuse model parameters (temperature, frequency_penalty, max_tokens) with safety features, assuming they can control content appropriateness, when in fact content filtering is a separate, dedicated mechanism in Azure OpenAI Service.

How to eliminate wrong answers

Option A is wrong because setting the temperature parameter to 0.0 controls randomness in output generation, not content safety; it makes responses more deterministic but does not filter offensive language. Option B is wrong because the frequency_penalty parameter reduces repetition by penalizing tokens that have already appeared, which has no effect on blocking offensive or harmful content. Option D is wrong because configuring the max_tokens parameter limits the length of the generated response but does not inspect or block inappropriate language.

Practice this question →

54

MCQmedium

A marketing team wants to use Azure OpenAI to generate blog post outlines. They have a single example of an outline that follows their preferred structure: introduction, three key points, conclusion. They want the model to generate new outlines that follow the same structure without retraining the model. Which technique should they use?

A.Fine-tuning the model on a large dataset of blog outlines

B.Providing the example outline in the prompt (few-shot learning)

C.Setting the temperature parameter to a high value

D.Using the Azure OpenAI embeddings API

AnswerB

Correct: Few-shot learning uses examples in the prompt to guide the model's output without retraining.

Why this answer

Option B is correct because few-shot learning involves providing a small number of examples (in this case, one example outline) directly in the prompt to guide the model's output format and structure without any retraining. This technique leverages the model's in-context learning ability to mimic the given pattern, making it ideal for generating new outlines that follow the same structure.

Exam trap

The trap here is that candidates often confuse few-shot learning with fine-tuning, assuming that any task requiring consistent output format must involve retraining the model, when in fact in-context learning via prompt engineering is sufficient for small numbers of examples.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires a large, labeled dataset and retraining the model, which is unnecessary and resource-intensive when the goal is to follow a single example structure without modifying the underlying model. Option C is wrong because setting the temperature parameter to a high value increases randomness and creativity in the output, which would likely cause the model to deviate from the desired structured format rather than adhere to it. Option D is wrong because the Azure OpenAI embeddings API is used for semantic similarity and search tasks (e.g., finding related content), not for generating structured text outputs like blog outlines.

Practice this question →

55

MCQeasy

What is GitHub Copilot and how does it use AI?

A.An automated GitHub Actions workflow for running CI/CD pipelines

B.An AI-powered code assistant that generates code completions and suggestions in IDEs using LLMs

C.A bot that automatically reviews and merges GitHub pull requests

D.A GitHub feature for visualizing code repository history

AnswerB

GitHub Copilot uses LLMs to suggest code completions, generate functions, explain code, and write tests directly in the development environment.

Why this answer

GitHub Copilot is an AI-powered code assistant developed by GitHub and OpenAI. It uses large language models (LLMs), specifically a version of OpenAI's Codex model, to analyze the context of the code a developer is writing in an IDE (like VS Code) and generate real-time code completions, suggestions, and even entire functions. This directly aligns with generative AI workloads on Azure, as Copilot leverages generative AI to produce new code content based on natural language prompts or existing code patterns.

Exam trap

The trap here is that candidates confuse GitHub Copilot with GitHub Actions or other automation features, because all are GitHub services, but Copilot is specifically a generative AI code assistant, not a CI/CD or repository management tool.

How to eliminate wrong answers

Option A is wrong because GitHub Copilot is not an automated CI/CD workflow; GitHub Actions is the service that runs CI/CD pipelines, and Copilot is a code generation tool, not a workflow executor. Option C is wrong because Copilot does not automatically review or merge pull requests; that is the function of tools like GitHub's built-in pull request review features or third-party bots (e.g., Dependabot). Option D is wrong because Copilot does not visualize repository history; that is handled by GitHub's Insights or git log commands, not by an AI code assistant.

Practice this question →

56

MCQhard

A company uses Azure OpenAI Service to generate marketing copy for social media posts. They want to prevent the model from producing content that contains offensive language, harmful stereotypes, or violent themes that go against their brand guidelines. Which feature should the company configure within Azure OpenAI Service?

A.Fine-tuning the model with a custom dataset

B.Configuring the content filtering (responsible AI filters)

C.Increasing the token limit per response

D.Using prompt engineering techniques

AnswerB

Azure OpenAI’s content filtering system is a built-in safeguard that automatically screens inputs and outputs for categories like hate, violence, sexual content, and self-harm. Companies can configure severity levels to prevent undesirable content from being generated.

Why this answer

B is correct because Azure OpenAI Service includes built-in content filtering (responsible AI filters) that automatically detects and blocks offensive language, harmful stereotypes, and violent themes in both input prompts and generated outputs. This feature enforces brand guidelines without requiring custom model modifications or manual oversight.

Exam trap

The trap here is that candidates often confuse fine-tuning or prompt engineering as content safety mechanisms, when in fact Azure OpenAI's content filtering is the only built-in feature designed specifically to block offensive or harmful content at inference time.

How to eliminate wrong answers

Option A is wrong because fine-tuning adapts the model to specific tasks or styles using custom data but does not enforce safety guardrails; it can even amplify harmful patterns if the training data contains them. Option C is wrong because increasing the token limit per response only controls the maximum length of generated text, not its content safety. Option D is wrong because prompt engineering techniques can guide model behavior but are not a reliable or enforceable mechanism to prevent offensive content—they can be bypassed by adversarial inputs.

Practice this question →

57

MCQhard

What is 'chain-of-thought prompting' and when is it most effective?

A.Linking multiple AI models in a pipeline where each model's output feeds the next

B.Prompting the model to show explicit reasoning steps before giving a final answer

C.Training a model on a sequence of related documents to build contextual knowledge

D.A method for connecting chatbot conversation turns to maintain long-term memory

AnswerB

CoT prompting ('think step by step') improves multi-step reasoning by externalising the reasoning process — most effective for maths and logic.

Why this answer

Chain-of-thought prompting instructs the model to break down a complex problem into intermediate reasoning steps before producing the final answer. This technique improves accuracy on tasks requiring multi-step logic, such as arithmetic, commonsense reasoning, or symbolic manipulation, by making the model's internal reasoning explicit and reducing errors from shortcut answers.

Exam trap

The trap here is that candidates confuse 'chain-of-thought prompting' with 'model chaining' or 'pipeline architectures' (Option A), because both involve a sequence, but chain-of-thought is a single-model prompting technique, not a multi-model workflow.

How to eliminate wrong answers

Option A is wrong because it describes a model pipeline or ensemble, not a prompting technique; chain-of-thought prompting does not involve linking multiple models. Option C is wrong because it describes sequential training or fine-tuning on related documents, which is a data preparation or transfer learning approach, not a prompting strategy. Option D is wrong because it describes conversation memory or state management in chatbots, which is unrelated to the explicit step-by-step reasoning elicited by chain-of-thought prompts.

Practice this question →

58

MCQmedium

What is the 'presence penalty' parameter in Azure OpenAI API calls?

A.A parameter requiring AI systems to acknowledge their presence as AI to users

B.A flat penalty discouraging repetition of any token already present in the response

C.A parameter indicating whether the AI is present online or offline

D.The minimum number of characters that must be present in a response

AnswerB

Presence penalty adds a flat penalty for any previously used token — encouraging vocabulary diversity regardless of repeat frequency.

Why this answer

The 'presence penalty' parameter in Azure OpenAI API calls applies a flat penalty to any token that has already appeared in the response so far, reducing the model's likelihood of repeating that token. This helps generate more diverse and less repetitive text by discouraging the reuse of tokens already present in the output sequence.

Exam trap

Microsoft often tests the distinction between 'presence penalty' and 'frequency penalty' — the trap here is that candidates confuse the presence penalty with a requirement for AI disclosure or a simple repetition penalty, missing that it specifically penalizes any token that has already appeared at least once, regardless of how many times.

How to eliminate wrong answers

Option A is wrong because it describes a transparency or disclosure requirement (like an AI disclosure policy), not a parameter that modifies token probabilities in the API. Option C is wrong because it confuses a presence/availability status with a model inference parameter; Azure OpenAI does not have an 'online/offline' parameter in API calls. Option D is wrong because it describes a minimum length constraint, which is unrelated to the presence penalty; the presence penalty operates on token-level repetition, not character count.

Practice this question →

59

MCQmedium

What is the purpose of Azure AI Content Safety in the context of generative AI deployments?

A.To compress generated content for faster delivery

B.To detect and filter harmful content in AI prompts and responses

C.To measure the quality and accuracy of AI-generated responses

D.To ensure AI content is written in the correct language

AnswerB

Content Safety screens generative AI inputs and outputs for violence, sexual content, hate speech, and other harmful categories.

Why this answer

Azure AI Content Safety is a service designed to detect and filter harmful content, such as hate speech, violence, self-harm, and sexually explicit material, in both user prompts and AI-generated responses. In generative AI deployments, this ensures that the model's outputs comply with safety policies and regulatory requirements, preventing the dissemination of offensive or dangerous content.

Exam trap

The trap here is that candidates confuse Azure AI Content Safety with general AI quality or language services, but the exam specifically tests its role as a safety filter for harmful content in generative AI pipelines, not for performance, accuracy, or language correctness.

How to eliminate wrong answers

Option A is wrong because Azure AI Content Safety does not perform compression; content delivery optimization is handled by services like Azure Content Delivery Network or Azure Front Door, not by a content safety filter. Option C is wrong because measuring quality and accuracy of AI responses is the role of evaluation metrics (e.g., BLEU, ROUGE, or Azure AI Studio's evaluation tools), not a safety detection service. Option D is wrong because language detection and translation are capabilities of Azure AI Translator or Azure AI Language, not Azure AI Content Safety, which focuses on harmful content regardless of language.

Practice this question →

60

MCQmedium

A marketing team uses Azure OpenAI Service to generate marketing copy. They notice the generated text is often repetitive, using the same phrases and words multiple times. Which parameter should they increase to directly reduce this repetition?

A.Temperature

B.Frequency penalty

C.Top-p

D.Max tokens

AnswerB

Correct. A higher frequency penalty reduces the likelihood of the model repeating the same tokens and phrases, directly addressing repetition.

Why this answer

Frequency penalty directly reduces repetition by penalizing tokens that have already appeared in the generated text. A higher frequency penalty value (e.g., 0.5 to 1.0) decreases the likelihood of the model reusing the same phrases or words, making the output more diverse and less repetitive.

Exam trap

The trap here is that candidates often confuse temperature or Top-p with repetition control, but those parameters affect randomness and diversity of vocabulary, not the direct penalization of repeated tokens that frequency penalty provides.

How to eliminate wrong answers

Option A is wrong because temperature controls the randomness of token selection, not repetition; increasing temperature makes output more random but does not specifically penalize repeated tokens. Option C is wrong because Top-p (nucleus sampling) limits the cumulative probability of token choices, affecting diversity of vocabulary but not directly penalizing repeated tokens. Option D is wrong because max tokens sets the maximum length of the generated response and has no effect on reducing repetition of phrases or words.

Practice this question →

61

MCQmedium

A company uses Azure OpenAI Service to generate marketing copy. They notice that sometimes the generated text contains repetitive phrases or gets stuck in loops. They want to reduce this behavior without changing the overall creativity of the model. Which parameter should they adjust?

A.Increase the frequency_penalty parameter.

B.Decrease the temperature parameter.

C.Increase the presence_penalty parameter.

D.Decrease the top_p parameter.

AnswerA

Correct. Increasing frequency_penalty reduces the likelihood that the model will repeat the same tokens, making it effective against repetitive loops.

Why this answer

Increasing the frequency_penalty parameter reduces the likelihood of the model repeating the same phrases by penalizing tokens that have already appeared in the generated text. This directly addresses the repetitive loops without altering the overall creativity, as frequency_penalty specifically targets token frequency rather than randomness or diversity.

Exam trap

The trap here is that candidates often confuse frequency_penalty with presence_penalty, assuming both reduce repetition equally, but frequency_penalty specifically targets repeated occurrences while presence_penalty only discourages topic reuse.

How to eliminate wrong answers

Option B is wrong because decreasing temperature reduces randomness and makes the model more deterministic, which can actually increase repetition, not reduce it. Option C is wrong because presence_penalty penalizes tokens based on whether they have appeared at all, not their frequency, so it encourages new topics but does not specifically target repetitive phrases. Option D is wrong because decreasing top_p narrows the set of candidate tokens to the most probable ones, which reduces diversity and can worsen repetition, not fix it.

Practice this question →

62

MCQmedium

A game development studio uses Azure OpenAI Service to generate unique backstories for non-player characters (NPCs). They want the generated stories to be coherent and relevant to a given character class (e.g., warrior, mage) but also creative and varied. Which parameter should the studio adjust primarily to increase the creativity and variety of the generated text?

A.Increase the temperature parameter

B.Increase the top_p parameter

C.Increase the frequency_penalty parameter

D.Decrease the max_tokens parameter

AnswerA

Higher temperature values make the model's output more random and creative, leading to greater variety in generated backstories.

Why this answer

Increasing the temperature parameter makes the model's output more random by scaling the probability distribution over tokens, which encourages less likely word choices and thus increases creativity and variety in generated text. For the game studio, a higher temperature (e.g., 0.8–1.0) will produce more diverse and imaginative backstories for different character classes, while still maintaining coherence if not set too high.

Exam trap

Microsoft often tests the distinction between temperature and top_p, where candidates mistakenly think top_p is the primary creativity control, but temperature is the fundamental parameter for adjusting randomness and variety in text generation.

How to eliminate wrong answers

Option B is wrong because increasing top_p (nucleus sampling) also adds randomness but it does so by limiting the cumulative probability mass of token choices; while it can increase variety, it is not the primary parameter for controlling creativity—temperature is more direct. Option C is wrong because increasing frequency_penalty reduces the likelihood of repeating the same tokens or phrases, which can improve diversity but primarily targets repetition, not overall creativity or variety in story content. Option D is wrong because decreasing max_tokens limits the length of the output, which may reduce coherence and detail but does not inherently increase creativity or variety; it can even constrain the model's ability to generate varied content.

Practice this question →

63

MCQmedium

A company uses Azure OpenAI Service to generate long technical reports. To manage costs, the development team needs to accurately estimate the number of tokens that a given prompt will consume before making any API call. Which Azure OpenAI Service feature should they use to obtain this estimate?

A.The Chat Completions API

B.The Embeddings API

C.The Token Counter tool in Azure OpenAI Studio

D.The Content Filter configuration

AnswerC

The Token Counter tool provides an accurate estimate of how many tokens a given prompt will use, allowing developers to predict costs before making an API call.

Why this answer

The Token Counter tool in Azure OpenAI Studio is specifically designed to estimate the number of tokens a prompt will consume before making an API call. This allows developers to predict costs accurately by calculating token usage for both input and expected output, without incurring actual API charges.

Exam trap

Microsoft often tests the misconception that the Chat Completions API itself can provide a pre-call token estimate, but in reality it only returns token usage after the call, making the Token Counter tool the correct pre-call estimation feature.

How to eliminate wrong answers

Option A is wrong because the Chat Completions API is used to generate responses from a model, not to estimate token counts; it consumes tokens during the call and returns usage in the response, but does not provide a pre-call estimate. Option B is wrong because the Embeddings API converts text into vector representations for semantic search or clustering, and while it does report token usage, its primary purpose is not token estimation for generative prompts. Option D is wrong because the Content Filter configuration manages safety filters for harmful content, not token counting or cost estimation.

Practice this question →

64

MCQmedium

What is 'context length' limitation in LLMs and how do 'long-context models' address it?

A.The physical cable length limitation when connecting AI servers in a data centre

B.The maximum text an LLM can process at once — long-context models extend this to 128K+ tokens

C.The minimum number of examples required before the model produces reliable outputs

D.The duration (in seconds) before an Azure OpenAI API request times out

AnswerB

Context windows limit conversation and document size — GPT-4o's 128K context enables full-document analysis and extended conversations.

Why this answer

Option B is correct because 'context length' in large language models (LLMs) refers to the maximum number of tokens (words, subwords, or characters) the model can process in a single input, including both the prompt and the generated output. Long-context models, such as GPT-4 Turbo or Claude 3, extend this limit to 128K tokens or more, enabling the model to handle entire documents, lengthy conversations, or large codebases without truncation.

Exam trap

The trap here is that candidates confuse 'context length' with unrelated operational metrics like API timeouts or hardware limits, rather than recognizing it as a core architectural token limit of the LLM itself.

How to eliminate wrong answers

Option A is wrong because it confuses a physical networking constraint (cable length in a data center) with a software-defined token limit in LLMs, which has nothing to do with hardware cabling. Option C is wrong because it misrepresents 'context length' as a minimum number of training examples for reliability, which is actually a concept related to few-shot learning or model fine-tuning, not the token window size. Option D is wrong because it conflates API timeout duration (a client-server network setting) with the model's internal token processing limit, which is a fixed architectural parameter of the LLM itself.

Practice this question →

65

MCQeasy

What is 'DALL-E' in Azure OpenAI and what does it do?

A.A text summarisation model that condenses long documents

B.An image generation model that creates images from natural language text prompts

C.A data analysis language for querying Azure databases

D.A code generation tool optimised for Python development

AnswerB

DALL-E is a text-to-image model — generating novel images from descriptive prompts with specified content and style.

Why this answer

DALL-E is an image generation model within Azure OpenAI that creates original images from natural language text prompts. It uses a transformer-based architecture trained on image-text pairs to generate visuals that match the semantic content of the input description, making it a core generative AI workload for visual content creation.

Exam trap

The trap here is that candidates may confuse DALL-E with other Azure OpenAI models like GPT for text generation or Codex for code, because all are part of the same service but serve fundamentally different modalities.

How to eliminate wrong answers

Option A is wrong because text summarization models (like GPT-3.5 or GPT-4 with summarization prompts) condense documents, not DALL-E. Option C is wrong because data analysis languages for querying Azure databases include KQL (Kusto Query Language) or T-SQL, not DALL-E. Option D is wrong because code generation tools optimized for Python development, such as GitHub Copilot or Azure OpenAI's Codex models, are distinct from DALL-E's image generation capability.

Practice this question →

66

MCQmedium

What is 'Azure OpenAI's batch API' and when should you use it?

A.An API for training new models in batches on your custom datasets

B.Asynchronous bulk processing of large inference request volumes at reduced cost

C.Grouping multiple Azure OpenAI API keys into a batch for easier management

D.A tool for running multiple prompt experiments simultaneously to find the best prompt

AnswerB

Batch API runs high-volume jobs (thousands of requests) asynchronously within 24h at ~50% cost reduction — ideal for offline processing.

Why this answer

Azure OpenAI's Batch API is designed for asynchronous processing of large volumes of inference requests, such as chat completions or embeddings, at a reduced cost compared to real-time API calls. It is ideal for workloads where immediate responses are not required, allowing you to submit a batch of requests and retrieve results later. This makes it a cost-effective solution for high-throughput, non-latency-sensitive tasks.

Exam trap

The trap here is that candidates confuse batch processing for inference with batch training of models, leading them to select Option A, but Azure OpenAI's Batch API is strictly for inference, not model training.

How to eliminate wrong answers

Option A is wrong because the Batch API is for inference (generating responses from existing models), not for training new models; model training uses separate services like Azure Machine Learning or fine-tuning APIs. Option C is wrong because the Batch API does not manage API keys; it processes inference requests in bulk, and API key management is handled through Azure's access control and key management features. Option D is wrong because the Batch API is not a tool for running prompt experiments; it is for processing a fixed set of prompts asynchronously, while prompt experimentation is typically done via interactive testing or A/B testing frameworks.

Practice this question →

67

MCQmedium

What is 'Azure OpenAI's content filter' configurability and why does it matter?

A.Configuring which users can access Azure OpenAI based on their location

B.Adjustable severity thresholds per harm category for legitimate domain-specific use cases

C.Setting the maximum token count before content is filtered for length

D.Configuring which Azure OpenAI models are available to different teams within an organisation

AnswerB

Some domains (medical, security research) need adjusted filters — Azure OpenAI provides configurable thresholds through an approval process.

Why this answer

Azure OpenAI's content filter configurability allows administrators to adjust severity thresholds for each harm category (e.g., hate, violence, self-harm) to accommodate legitimate domain-specific use cases, such as medical or legal content that may require higher tolerance. This matters because it balances safety with utility, enabling organizations to fine-tune filtering based on their unique content policies and compliance needs without blocking valid applications.

Exam trap

The trap here is that candidates confuse content filter configurability with other Azure OpenAI management features like access control, model selection, or output length limits, rather than recognizing it as a safety-tuning mechanism for harm categories.

How to eliminate wrong answers

Option A is wrong because Azure OpenAI's content filter configurability is about adjusting filtering parameters, not restricting user access by location (which is handled by Azure AD conditional access or network policies). Option C is wrong because the maximum token count is a model parameter for output length, not a content filter setting; content filters evaluate safety regardless of token count. Option D is wrong because model availability per team is managed through Azure RBAC and model deployments, not through the content filter configuration.

Practice this question →

68

MCQhard

A company uses Azure OpenAI to build a customer service chatbot. They want to prevent malicious users from injecting prompts that cause the chatbot to behave unexpectedly, such as revealing its system instructions. Which responsible AI consideration is most directly relevant?

A.Fairness

B.Reliability and Safety

C.Privacy and Security

D.Inclusiveness

AnswerB

Correct. This principle ensures the system is trustworthy and handles inputs safely, including defending against prompt injection attacks.

Why this answer

Prompt injection attacks target the system by embedding malicious instructions in user input, causing the model to override its original directives or reveal sensitive information. This directly undermines the reliability and safety of the AI system, as the chatbot's behavior becomes unpredictable and potentially harmful. Azure OpenAI's safety systems (e.g., content filtering, abuse detection) are designed to mitigate such risks, making Reliability and Safety the most relevant responsible AI consideration.

Exam trap

Microsoft often tests the distinction between 'Privacy and Security' (data protection) and 'Reliability and Safety' (operational integrity), causing candidates to mistakenly choose Privacy and Security because prompt injection can reveal system instructions, which feels like a privacy breach, but the primary responsible AI pillar is Reliability and Safety.

How to eliminate wrong answers

Option A is wrong because Fairness focuses on avoiding bias and ensuring equitable treatment across user groups, not on preventing adversarial manipulation of model behavior. Option C is wrong because Privacy and Security primarily concerns data protection, access control, and encryption, whereas prompt injection is an attack on the model's operational integrity, not on data confidentiality (though it may lead to data leaks, the core issue is behavioral safety). Option D is wrong because Inclusiveness addresses accessibility and accommodating diverse user needs, not defending against malicious inputs that cause unexpected model outputs.

Practice this question →

69

MCQhard

A company uses Azure OpenAI Service to generate creative product descriptions. They want to increase the randomness and variety of the generated outputs to produce more diverse suggestions. Which parameter should they increase?

A.Temperature

B.Top_p

C.Frequency penalty

D.Presence penalty

AnswerA

Increasing temperature raises the entropy of the output distribution, making the model more likely to select less probable tokens, thus increasing randomness and variety.

Why this answer

Temperature controls the randomness of the model's output by scaling the logits before applying the softmax function. Increasing temperature (e.g., from 0.7 to 1.0) flattens the probability distribution, making lower-probability tokens more likely to be chosen, which increases diversity and creativity in generated text.

Exam trap

The trap here is that candidates often confuse temperature with Top_p, assuming both control randomness equally, but temperature directly scales logits while Top_p filters the token set by cumulative probability—a subtle but critical distinction tested in AI-900.

How to eliminate wrong answers

Option B (Top_p) is wrong because it controls nucleus sampling—the cumulative probability threshold for token selection—not randomness; increasing Top_p can also increase diversity but does so by expanding the set of candidate tokens rather than adjusting their probability distribution. Option C (Frequency penalty) is wrong because it reduces the likelihood of tokens that have already appeared frequently in the text, which decreases repetition but does not directly increase randomness or variety. Option D (Presence penalty) is wrong because it penalizes tokens that have appeared at all in the text, encouraging the model to introduce new topics, but it does not increase the randomness of token selection.

Practice this question →

70

MCQeasy

A social media platform uses Azure OpenAI Service to generate summaries of user comments. The development team discovers that sometimes the generated summaries include offensive or harmful language that was present in the original comments. The team wants to ensure that the generated output is always free of hate speech, profanity, and self-harm references. What should the team configure in the Azure OpenAI Service?

A.Set the temperature parameter to 0

B.Configure a content filter

C.Increase the max_tokens parameter

D.Use a grounding source

AnswerB

Content filters in Azure OpenAI Service allow you to specify categories of harmful content to block, directly addressing the requirement.

Why this answer

Option B is correct because Azure OpenAI Service provides built-in content filtering that can be configured to block hate speech, profanity, and self-harm references in both input prompts and generated completions. This ensures that even if offensive language appears in the original user comments, the generated summaries will be free of such harmful content. The content filter operates at the service level, applying predefined severity thresholds to filter out undesirable language.

Exam trap

The trap here is that candidates may confuse model parameters like temperature or max_tokens with safety controls, or assume that grounding sources automatically sanitize output, when in fact content filters are the dedicated mechanism for blocking harmful language.

How to eliminate wrong answers

Option A is wrong because setting the temperature parameter to 0 makes the model deterministic (always choosing the highest-probability token) but does not filter or block offensive language; it only reduces randomness in output. Option C is wrong because increasing the max_tokens parameter only extends the maximum length of the generated response and has no effect on content safety or filtering. Option D is wrong because using a grounding source (e.g., Azure Cognitive Search) provides factual context to reduce hallucinations but does not filter hate speech, profanity, or self-harm references from the generated output.

Practice this question →

71

MCQmedium

A marketing team uses Azure OpenAI Service to generate product descriptions. They want the descriptions to follow a specific brand voice (formal, concise) and avoid generating any harmful or offensive language. Which combination of features should the team use?

A.A: Fine-tune the model with brand-specific data and enable content filtering.

B.B: Use few-shot learning with examples and disable content filtering for creativity.

C.C: Increase the temperature parameter and use the logprobs parameter.

D.D: Use the top_p parameter and set max_tokens to a low value.

AnswerA

Correct: Fine-tuning teaches brand voice; content filtering blocks harmful language.

Why this answer

Fine-tuning the model with brand-specific data allows the model to learn the desired brand voice (formal, concise) by adjusting its weights based on a curated dataset. Enabling content filtering ensures that any harmful or offensive language is blocked, either by Azure's built-in content moderation or by custom filters, meeting the safety requirement. This combination directly addresses both the style and safety needs.

Exam trap

The trap here is that candidates may think few-shot learning (Option B) is sufficient for style control, but it lacks the consistency of fine-tuning, and disabling content filtering is a critical safety oversight that Azure explicitly tests as a non-negotiable requirement.

How to eliminate wrong answers

Option B is wrong because disabling content filtering removes the safeguard against harmful or offensive language, which contradicts the requirement to avoid such content; few-shot learning alone cannot guarantee consistent brand voice adherence. Option C is wrong because increasing the temperature parameter makes the output more random and less predictable, which is counterproductive for maintaining a formal, concise brand voice; the logprobs parameter is used for debugging or ranking tokens, not for controlling style or safety. Option D is wrong because the top_p parameter (nucleus sampling) controls diversity but does not enforce a specific brand voice or filter content; setting max_tokens to a low value only limits output length, not style or safety.

Practice this question →

72

MCQmedium

What is 'Whisper' in Azure OpenAI and what can it do?

A.A low-power mode for running Azure OpenAI at reduced compute cost

B.A speech recognition model that transcribes audio files to text across 100+ languages

C.A secure communication channel for transmitting sensitive data to Azure OpenAI

D.A text-to-speech model that generates very quiet, whispered audio output

AnswerB

Whisper transcribes and translates audio — working across many languages and audio conditions for pre-recorded content.

Why this answer

Whisper is a speech recognition model available in Azure OpenAI that transcribes audio files into text. It supports over 100 languages and is designed for high accuracy in diverse acoustic environments, making it ideal for tasks like meeting transcription, voice note conversion, and multilingual audio processing.

Exam trap

The trap here is that the name 'Whisper' might mislead candidates into thinking it relates to quiet audio output (text-to-speech) or a low-power mode, when in fact it is a speech recognition model for transcribing audio to text.

How to eliminate wrong answers

Option A is wrong because Whisper is not a low-power mode; Azure OpenAI offers provisioned throughput units (PTUs) for cost optimization, but Whisper is a specific model for speech-to-text. Option C is wrong because Whisper does not provide a secure communication channel; Azure OpenAI uses Azure Private Link and encryption for data transmission, but Whisper itself is a model, not a networking feature. Option D is wrong because Whisper is a speech recognition (audio-to-text) model, not a text-to-speech model; Azure OpenAI offers text-to-speech via other models like Neural TTS, and 'whispered audio output' is a fictional feature.

Practice this question →

73

MCQeasy

What is the role of the Azure AI Foundry (AI Studio) playground?

A.A gaming environment where AI plays against human developers

B.An interactive testing environment for experimenting with AI models and prompts without coding

C.A virtual machine for running AI model training jobs

D.A sandbox for testing AI models in isolation from production data

AnswerB

The playground lets developers test models and prompts visually, exploring capabilities before writing application code.

Why this answer

The Azure AI Foundry (AI Studio) playground provides an interactive, no-code environment where developers and data scientists can experiment with generative AI models, test prompts, and adjust parameters like temperature and max tokens before integrating them into applications. This aligns with the need to prototype and validate model behavior without writing code, making it a key tool for rapid iteration in generative AI workloads.

Exam trap

The trap here is that candidates confuse the playground's interactive testing purpose with a training environment or a production isolation tool, overlooking that it is specifically designed for no-code experimentation with deployed models, not for model training or data governance.

How to eliminate wrong answers

Option A is wrong because the Azure AI Foundry playground is not a gaming environment; it is a testing interface for AI models, not a platform for AI-versus-human gameplay. Option C is wrong because the playground is not a virtual machine for training jobs; training is handled by compute clusters or managed compute resources in Azure Machine Learning, not the playground. Option D is wrong because while the playground is a sandbox for experimentation, it is not specifically isolated from production data—its purpose is to test prompts and models interactively, and isolation from production data is a security practice, not the defining role of the playground.

Practice this question →

74

MCQmedium

What is 'citation' in generative AI and why is it important for trust?

A.The model citing academic papers when asked about scientific topics

B.Indicating which source documents support an answer — enabling verification and reducing hallucination risk

C.Quoting user messages back to them to confirm the AI understood the question

D.Copyright attribution when the model quotes text from its training data

AnswerB

Citation grounds responses in sources — users can fact-check against cited documents, building trust in high-stakes applications.

Why this answer

Option B is correct because citation in generative AI refers to explicitly linking generated content back to specific source documents, which allows users to verify the information and reduces the risk of hallucination by grounding the model's output in verifiable data. This is a key feature in Azure OpenAI Service's 'grounding with your data' capability, where citations are provided alongside responses to build trust and transparency.

Exam trap

The trap here is that candidates confuse citation with generic referencing or legal attribution, but the AI-900 exam specifically tests citation as a mechanism for grounding and verifiability in enterprise generative AI workloads.

How to eliminate wrong answers

Option A is wrong because citation is not limited to academic papers; it applies to any source documents used to ground the model, such as internal company files or web content. Option C is wrong because quoting user messages back is a form of echo or confirmation, not citation, and does not involve referencing external sources for verification. Option D is wrong because copyright attribution is a legal or ethical concern, not the primary purpose of citation in generative AI, which is about enabling verification and reducing hallucination risk, not about licensing or ownership.

Practice this question →

75

MCQmedium

What is Azure AI Studio?

A.A video editing tool powered by AI for content creators

B.A unified platform for building, evaluating, and deploying generative AI applications

C.A specialized IDE for writing Python machine learning code only

D.A database service for storing conversation history from AI applications

AnswerB

Azure AI Studio provides an integrated environment for developing generative AI apps with access to models, prompt tools, and deployment.

Why this answer

Azure AI Studio is a unified platform designed specifically for building, evaluating, and deploying generative AI applications. It integrates tools for prompt engineering, model fine-tuning, and safety evaluation, enabling developers to create custom AI solutions using large language models (LLMs) from Azure OpenAI Service and other sources. This makes option B correct as it directly describes the platform's core purpose.

Exam trap

The trap here is that candidates may confuse Azure AI Studio with a general-purpose IDE or a specific tool like Azure Machine Learning studio, but the exam focuses on its unique role as a unified platform for generative AI workloads, not for traditional ML or non-AI tasks.

How to eliminate wrong answers

Option A is wrong because Azure AI Studio is not a video editing tool; it is a platform for developing AI applications, not for media editing. Option C is wrong because Azure AI Studio is not limited to Python machine learning code; it supports multiple languages and includes visual tools for building AI workflows, not just an IDE. Option D is wrong because Azure AI Studio is not a database service; it can integrate with databases like Azure Cosmos DB for storing conversation history, but it is not a database service itself.

Practice this question →

Page 1 of 3 · 206 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Describe features of generative AI workloads on Azure questions.

Start 20-question session