CCNA Generative Ai Solutions Questions — Page 1 of 3

Multi-Selecthard

Which TWO actions can you take to mitigate the risk of generating harmful content when using Azure OpenAI Service? (Choose two.)

Select 2 answers

A.Set a system message that instructs the model to avoid harmful outputs.

B.Fine-tune the model on a dataset of safe examples.

C.Deploy the model in multiple regions.

D.Configure Azure AI Content Safety filters.

E.Increase the maxTokens parameter to allow longer responses.

AnswersA, D

System messages can guide model behavior.

Why this answer

Option A is correct because configuring content filters blocks harmful content. Option D is correct because using system messages to set behavior reduces risk. Option B is wrong as fine-tuning doesn't guarantee safety.

Option C is wrong as increasing maxTokens doesn't mitigate harm. Option E is wrong as multiple deployments don't affect safety.

Practice this question →

MCQeasy

You are testing an Azure OpenAI model with the parameters shown in the exhibit. The model generates very short responses. Which parameter should you modify to allow longer responses?

A.Increase frequency_penalty

B.Increase top_p

C.Increase temperature

D.Increase max_tokens

AnswerD

max_tokens directly controls the maximum length of the response.

Why this answer

The max_tokens parameter controls the maximum number of tokens (words or subwords) the model can generate in a single response. When responses are very short, increasing max_tokens allows the model to produce longer completions up to the specified limit. The other parameters affect randomness, diversity, or probability distribution, not the length cap.

Exam trap

The trap here is that candidates confuse parameters that control output length (max_tokens) with those that control output diversity or creativity (temperature, top_p, frequency_penalty), leading them to incorrectly adjust the latter when the real issue is a token limit.

How to eliminate wrong answers

Option A is wrong because frequency_penalty reduces the likelihood of repeating the same tokens or phrases, which can actually shorten responses by discouraging repetition, not lengthen them. Option B is wrong because top_p (nucleus sampling) controls the cumulative probability threshold for token selection, affecting diversity but not the maximum output length. Option C is wrong because temperature adjusts the randomness of token selection (higher = more creative, lower = more deterministic) and does not impose or remove a length constraint.

Practice this question →

MCQhard

You are building a generative AI application using Azure OpenAI Service. The application must provide citations for answers retrieved from a set of documents. You need to ensure that each answer includes a reference to the source document. Which configuration should you use?

A.Use Azure OpenAI On Your Data with the 'include citations' option

B.Add a system message requesting citations

C.Implement a custom prompt flow with citation logic

D.Fine-tune the model to include citations

AnswerA

This feature automatically returns source documents as citations.

Why this answer

Option A is correct because Azure OpenAI On Your Data provides a built-in 'include citations' feature that automatically retrieves and appends source document references to the generated answer. This configuration leverages the underlying search index to map each response segment back to the original document, ensuring compliance with citation requirements without custom development.

Exam trap

Microsoft often tests the misconception that a system message or prompt engineering alone can enforce reliable citation behavior, when in fact only a retrieval-augmented generation (RAG) configuration with explicit citation grounding—like Azure OpenAI On Your Data—can guarantee source-linked answers.

How to eliminate wrong answers

Option B is wrong because adding a system message requesting citations only instructs the model to include citations in its output, but the model has no inherent mechanism to reliably retrieve or verify source documents—it may hallucinate citations or omit them entirely. Option C is wrong because implementing a custom prompt flow with citation logic requires significant engineering effort and does not leverage the native, optimized citation pipeline in Azure OpenAI On Your Data, which handles retrieval-augmented generation (RAG) with citation grounding out of the box. Option D is wrong because fine-tuning the model to include citations would require a large, curated training dataset with correct citations and does not guarantee accurate source attribution for new, unseen documents; it also lacks the dynamic retrieval capability needed for document-grounded answers.

Practice this question →

MCQmedium

You need to generate a poem using Azure OpenAI. The poem should be about nature and have a cheerful tone. Which parameter should you adjust to influence the tone?

A.system message

B.max_tokens

C.top_p

D.temperature

AnswerA

System message guides the model's overall behavior and tone.

Why this answer

The system message is the correct parameter to influence the tone of the generated poem because it sets the initial context, persona, and behavioral guidelines for the model. By including an instruction like 'You are a cheerful poet who writes about nature in a happy tone,' you directly control the style and emotional quality of the output, which is exactly what the question requires.

Exam trap

Microsoft often tests the distinction between parameters that control randomness (temperature, top_p) versus those that control behavior and style (system message), leading candidates to mistakenly choose temperature as the primary tone influencer.

How to eliminate wrong answers

Option B (max_tokens) is wrong because it controls the maximum length of the generated response, not the tone or style. Option C (top_p) is wrong because it controls nucleus sampling, which affects the diversity of word choices by limiting the cumulative probability of token selection, not the tone. Option D (temperature) is wrong because it controls the randomness of the output (higher values increase creativity, lower values make output more deterministic), but it does not directly set or enforce a specific tone like 'cheerful'.

Practice this question →

MCQeasy

You need to provide a generative AI solution that can answer questions based on a large set of PDF documents stored in Azure Blob Storage. The solution must support natural language queries and return citations from the documents. Which Azure service combination should you use?

A.Azure Machine Learning and Azure Kubernetes Service

B.Azure Cognitive Search and Azure OpenAI Service with 'on your data'

C.Azure AI Bot Service and Azure Functions

D.Azure AI Document Intelligence and Azure AI Translator

AnswerB

Cognitive Search indexes PDFs; Azure OpenAI uses the index for grounded Q&A with citations.

Why this answer

Option C is correct because Azure Cognitive Search indexes the PDFs and Azure OpenAI on your data uses that index to answer with citations. Option A is wrong because Form Recognizer is for extraction, not Q&A. Option B is wrong because Azure ML is for training, not search.

Option D is wrong because Azure AI Bot Service alone doesn't index documents.

Practice this question →

MCQhard

Your team is developing an AI-powered document summarization solution using Azure OpenAI. You need to ensure that the solution complies with Microsoft's Responsible AI principles, specifically transparency. Which configuration should you implement?

A.Configure diagnostic logging to capture all model inputs and outputs.

B.Fine-tune the model on a custom dataset to improve accuracy.

C.Enable content filtering with severity levels high and medium.

D.Add a system message that informs users the summary is generated by AI.

AnswerD

Transparency requires clear disclosure of AI involvement.

Why this answer

Transparency under Microsoft's Responsible AI principles requires that users are aware when they are interacting with an AI system. Adding a system message that explicitly states the summary is AI-generated fulfills this disclosure requirement. Diagnostic logging (A) aids in accountability and debugging but does not directly inform the user.

Fine-tuning (B) improves accuracy but does not address transparency. Content filtering (C) mitigates harmful outputs but does not disclose AI involvement.

Exam trap

The trap here is that candidates confuse 'transparency' with 'accountability' or 'safety' and select diagnostic logging or content filtering, not realizing that transparency specifically requires user-facing disclosure of AI involvement.

How to eliminate wrong answers

Option A is wrong because diagnostic logging captures inputs and outputs for auditing and debugging, but it does not communicate to the end user that the content is AI-generated, which is the core requirement of transparency. Option B is wrong because fine-tuning the model on a custom dataset enhances performance and relevance but has no role in informing users about AI authorship. Option C is wrong because enabling content filtering with severity levels high and medium is a safety measure to block harmful content, not a mechanism for disclosing AI involvement to users.

Practice this question →

MCQeasy

You are using Azure OpenAI Service to generate product descriptions. The output is often too verbose. You need to reduce the length of generated text without changing the model. Which parameter should you adjust?

A.Max tokens

B.Frequency penalty

C.Temperature

D.Top-p (nucleus sampling)

AnswerA

Max tokens limits the length of the generated response.

Why this answer

Max tokens controls the total length of the generated response by capping the number of tokens (words/subwords) the model can output. Reducing this value directly truncates the output, making descriptions shorter without altering the model or its behavior. Other parameters influence randomness or repetition but do not enforce a strict length limit.

Exam trap

The trap here is that candidates confuse parameters that affect output style (temperature, top-p, frequency penalty) with the one parameter that directly controls output length (max tokens), leading them to choose a parameter that changes how the model writes rather than how much it writes.

How to eliminate wrong answers

Option B (Frequency penalty) is wrong because it reduces the likelihood of repeating the same tokens or phrases, which can change content style but does not enforce a maximum output length. Option C (Temperature) is wrong because it controls the randomness of token selection (higher = more creative, lower = more deterministic), not the number of tokens generated. Option D (Top-p) is wrong because it limits the cumulative probability of token choices (nucleus sampling), affecting diversity but not the total token count.

Practice this question →

Drag & Dropmedium

Drag and drop the steps to configure a multi-region disaster recovery for Azure Cognitive Services into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start by creating resources in both regions, then set up replication, configure failover routing, monitor health, and test.

Practice this question →

MCQeasy

A developer wants to use Azure OpenAI to generate text from a prompt. Which parameter controls the diversity of the generated output?

A.presence_penalty

B.frequency_penalty

C.temperature

D.max_tokens

AnswerC

Temperature controls randomness and diversity.

Why this answer

Temperature is the parameter that directly controls the randomness or diversity of the generated output by scaling the logits before applying the softmax function. A higher temperature (e.g., 1.0) increases the probability of less likely tokens, producing more creative and varied responses, while a lower temperature (e.g., 0.1) makes the output more deterministic and focused.

Exam trap

The trap here is that candidates often confuse frequency_penalty or presence_penalty with controlling diversity, but those parameters address repetition and topic novelty, not the fundamental randomness of token selection, which is exclusively governed by temperature.

How to eliminate wrong answers

Option A is wrong because presence_penalty penalizes tokens that have already appeared in the text so far, encouraging the model to introduce new topics or avoid repetition, but it does not directly control the overall diversity or randomness of the token selection. Option B is wrong because frequency_penalty reduces the likelihood of tokens based on how frequently they have occurred in the generated text, which helps prevent repetitive phrasing but does not adjust the probability distribution's entropy like temperature does. Option D is wrong because max_tokens simply limits the maximum number of tokens in the generated response and has no effect on the diversity or randomness of the output.

Practice this question →

Multi-Selectmedium

Which THREE are valid parameters when calling the Azure OpenAI Service chat completions API?

Select 3 answers

A.messages

B.index_name

C.temperature

D.max_tokens

E.embedding_model

AnswersA, C, D

Required input.

Why this answer

The `messages` parameter is required in the Azure OpenAI Service chat completions API call. It defines the conversation history and user input as an array of message objects, each with a `role` (system, user, assistant) and `content`. Without this parameter, the API cannot determine the context or prompt for generating a response.

Exam trap

Microsoft often tests the distinction between parameters for different Azure OpenAI API endpoints (chat completions vs. embeddings vs. search), so candidates may confuse `index_name` or `embedding_model` as valid chat completions parameters due to their familiarity with other Azure AI services.

Practice this question →

MCQeasy

You need to translate a document from English to Spanish using Azure OpenAI. Which parameter should you include in the prompt to specify the target language?

A.temperature

B.system message

C.max_tokens

D.user message

AnswerD

User message should contain the translation instruction.

Why this answer

Option D is correct because in Azure OpenAI, the user message is where you provide the instruction or context for the model, including specifying the target language for translation. By including 'Translate the following English text to Spanish' in the user message, you direct the model to perform the language translation task. The other parameters control generation behavior (temperature, max_tokens) or set the model's persona (system message), but do not specify the target language.

Exam trap

The trap here is that candidates confuse the system message (which sets overall behavior) with the user message (which carries the specific instruction), leading them to incorrectly select system message as the parameter for specifying the target language.

How to eliminate wrong answers

Option A is wrong because temperature controls the randomness of the model's output, not the target language. Option B is wrong because the system message sets the assistant's behavior or persona (e.g., 'You are a helpful translator'), but does not specify the target language for a specific translation request. Option C is wrong because max_tokens limits the length of the generated response, not the language of the output.

Practice this question →

MCQmedium

A company uses Microsoft Copilot for Microsoft 365 to automate email responses. They want to ensure that the Copilot responses comply with their data governance policies and do not expose sensitive information. What should they configure?

A.Microsoft Entra ID conditional access policies

B.Microsoft Purview Data Loss Prevention (DLP) policies

C.Microsoft Intune app protection policies

D.Microsoft Sentinel analytics rules

AnswerB

Purview DLP can prevent sensitive data from being shared in Copilot responses.

Why this answer

Microsoft Purview Data Loss Prevention (DLP) policies are designed to detect and prevent the accidental sharing of sensitive information, such as credit card numbers or personally identifiable information (PII), across Microsoft 365 services. By configuring DLP policies, the company can scan Copilot-generated email responses for sensitive data patterns and block or warn before the email is sent, ensuring compliance with data governance policies.

Exam trap

The trap here is that candidates often confuse data governance and content inspection with access control (Entra ID) or endpoint security (Intune), leading them to select a policy that governs who can access data rather than what data is allowed to be shared.

How to eliminate wrong answers

Option A is wrong because Microsoft Entra ID conditional access policies control authentication and access to applications based on user, device, or location conditions, but they do not inspect or govern the content of Copilot-generated email responses for sensitive data. Option C is wrong because Microsoft Intune app protection policies manage how data is handled within mobile apps (e.g., copy/paste restrictions, encryption at rest), but they do not scan or enforce content-level rules on email responses generated by Copilot. Option D is wrong because Microsoft Sentinel analytics rules are used for security information and event management (SIEM) to detect threats and anomalies in logs, not to enforce data governance or prevent sensitive data exposure in email content.

Practice this question →

MCQhard

Refer to the exhibit. You are using Azure OpenAI Service to generate summaries of meeting notes. The current configuration produces summaries that are too short and sometimes omit key points. How should you modify the parameters to get more complete summaries?

A.Increase max_tokens to 300.

B.Decrease temperature to 0.3.

C.Increase frequency_penalty to 0.5.

D.Set top_p to 0.9.

AnswerA

Higher max_tokens allows longer completions.

Why this answer

Option B is correct because increasing max_tokens allows the model to generate longer summaries. Option A is wrong because lowering temperature increases determinism but not length. Option C is wrong because increasing frequency_penalty reduces repetition, not length.

Option D is wrong because changing top_p affects diversity, not length.

Practice this question →

MCQeasy

You are building a chatbot using Azure OpenAI Service. The chatbot must not disclose sensitive information such as passwords or credit card numbers. Which Azure AI service should you integrate to filter such content?

A.Azure AI Language

B.Azure AI Bot Service

C.Azure AI Search

D.Azure AI Content Safety

AnswerD

It detects and filters unsafe or sensitive content.

Why this answer

Azure AI Content Safety is the correct service because it provides pre-built content filters that can detect and block sensitive information such as passwords, credit card numbers, and other personally identifiable information (PII) in text and images. It integrates directly with Azure OpenAI Service to apply these filters to both input prompts and output completions, ensuring that sensitive data is not disclosed in chatbot responses.

Exam trap

The trap here is that candidates often confuse Azure AI Language's PII detection feature (which identifies but does not block content) with Azure AI Content Safety's filtering capability, leading them to incorrectly choose Azure AI Language for proactive content blocking.

How to eliminate wrong answers

Option A is wrong because Azure AI Language is a natural language processing service for tasks like sentiment analysis, key phrase extraction, and language understanding, but it does not include built-in content filtering for sensitive data like passwords or credit card numbers. Option B is wrong because Azure AI Bot Service is a platform for building, deploying, and managing chatbots, but it does not provide native content filtering capabilities; it relies on other services like Content Safety for such functionality. Option C is wrong because Azure AI Search is a search-as-a-service solution for indexing and querying data, not a content safety or filtering service, and it cannot detect or block sensitive information in real-time chatbot interactions.

Practice this question →

Multi-Selectmedium

You are building a generative AI chatbot using Microsoft Copilot Studio. The chatbot must answer questions from a PDF document and a SQL database. Which THREE data sources can you configure? (Choose three.)

Select 3 answers

A.Custom connector to SQL database

B.SharePoint (store the PDF document)

C.Azure AI Search index

D.Dataverse (store SQL data)

E.Azure Blob Storage

AnswersA, B, D

Custom connectors allow integration with SQL databases.

Why this answer

A is correct because Microsoft Copilot Studio supports custom connectors to access external data sources like SQL databases. This allows the chatbot to query the SQL database directly using the connector's API, enabling real-time data retrieval for generative AI responses.

Exam trap

The trap here is that candidates often assume Azure AI Search or Blob Storage are natively configurable in Copilot Studio, but they require additional middleware or custom connectors, unlike SharePoint and Dataverse which are first-party supported sources.

Practice this question →

MCQhard

Refer to the exhibit. You are troubleshooting an Azure OpenAI API call that is returning incomplete responses. The response stops mid-sentence. Which parameter should you adjust?

A.Increase max_tokens to 1000.

B.Remove the stop parameter.

C.Increase temperature to 1.0.

D.Increase top_p to 1.0.

AnswerA

Increases token budget for response.

Why this answer

The `max_tokens` parameter controls the maximum number of tokens the model can generate in a single response. When a response stops mid-sentence, it typically means the token limit was reached before the model could complete its output. Increasing `max_tokens` to 1000 provides more room for the model to finish its generation, resolving the truncation issue.

Exam trap

The trap here is that candidates confuse parameters that control output length (`max_tokens`) with those that control output diversity (`temperature`, `top_p`) or early stopping (`stop`), leading them to pick options that change style rather than capacity.

How to eliminate wrong answers

Option B is wrong because removing the `stop` parameter would not fix mid-sentence truncation; the `stop` parameter defines sequences that halt generation early, and removing it could actually make responses longer but does not address a hard token limit. Option C is wrong because increasing `temperature` to 1.0 increases randomness and creativity in the output, but does not affect the maximum length of the response; it could even lead to more verbose or erratic completions. Option D is wrong because increasing `top_p` to 1.0 enables nucleus sampling with all tokens considered, which may alter the diversity of the output but does not extend the token budget; the model will still stop when `max_tokens` is exhausted.

Practice this question →

MCQmedium

You are implementing a RAG (Retrieval-Augmented Generation) solution using Azure AI Search and Azure OpenAI Service. The solution is returning answers that are not relevant to the user query. What is the most likely cause?

A.The max_tokens parameter is set too high.

B.The chunk size is too small.

C.The index includes too many documents.

D.The relevance score threshold is set too low.

AnswerD

Low threshold includes less relevant documents, leading to poor responses.

Why this answer

Option B is correct because low relevance score thresholds include irrelevant documents in the context. Option A is wrong because chunk size affects granularity but not relevance directly. Option C is wrong because tokens limit affects response length, not relevance.

Option D is wrong because indexing all documents might be desired; the issue is retrieval quality.

Practice this question →

MCQhard

You run the Azure CLI command shown in the exhibit. After a few minutes, the deployment fails with a quota error. What is the most likely cause?

A.The SKU name 'Standard' is invalid for Azure OpenAI deployments.

B.The model version '0613' is deprecated and no longer available.

C.The requested capacity of 10 exceeds the available quota for the gpt-4 model in that region.

D.The resource group name 'myResourceGroup' does not exist.

AnswerC

Quota errors occur when capacity exceeds regional limits.

Why this answer

The quota error indicates that the requested capacity (10 units) for the gpt-4 model exceeds the available quota in the target region. Azure OpenAI deployments require sufficient model-specific quota, which is region- and model-specific. The error is not related to SKU name validity, model version deprecation, or resource group existence.

Exam trap

The trap here is that candidates might confuse a quota error with a model deprecation or SKU issue, but the error message's explicit mention of 'quota' directly points to capacity limits, not configuration or availability problems.

How to eliminate wrong answers

Option A is wrong because 'Standard' is a valid SKU name for Azure OpenAI deployments; the error message specifically mentions quota, not an invalid SKU. Option B is wrong because model version '0613' is a valid and available version for gpt-4; deprecation would produce a different error (e.g., 'ModelNotFound'), not a quota error. Option D is wrong because if the resource group did not exist, the Azure CLI would fail immediately with a 'ResourceGroupNotFound' error, not after several minutes with a quota error.

Practice this question →

MCQeasy

You are developing a generative AI application that must comply with responsible AI principles. Which Azure AI service should you use to detect and filter harmful content in both input prompts and output responses?

A.Microsoft Purview

B.Azure AI Content Safety

C.Azure OpenAI Service

D.Azure AI Language

AnswerB

Content Safety is designed to detect and filter harmful content.

Why this answer

Option D is correct because Azure AI Content Safety provides content filtering for harmful categories. Option A is wrong because Azure OpenAI Service provides the model, not content safety. Option B is wrong because Azure AI Language offers moderation APIs but Content Safety is dedicated.

Option C is wrong because Microsoft Purview is for data governance, not content safety.

Practice this question →

Multi-Selectmedium

Which TWO options are valid ways to reduce the cost of using Azure OpenAI Service?

Select 2 answers

A.Use provisioned throughput with reserved capacity.

B.Increase the temperature parameter.

C.Use a smaller model like GPT-3.5 instead of GPT-4.

D.Increase the max_tokens parameter to get longer responses.

E.Enable content filtering on all requests.

AnswersA, C

Reserved capacity offers a discount compared to pay-as-you-go.

Why this answer

Options A and D are correct. A: Using a smaller model like GPT-3.5 instead of GPT-4 reduces cost per token. D: Provisioned throughput with reserved capacity offers lower per-token cost for high usage.

B is wrong because increasing max_tokens increases cost. C is wrong because using a higher temperature does not affect token count. E is wrong because content filters do not reduce token consumption.

Practice this question →

MCQhard

You are deploying a conversational AI solution using Microsoft Copilot Studio. The solution must comply with organizational data loss prevention (DLP) policies by preventing sensitive data from being sent to the underlying Azure OpenAI model. What should you configure?

A.Configure content filters in Azure OpenAI Studio

B.Define DLP policies in Microsoft 365 compliance center and apply to Copilot Studio

C.Enable Azure AI Content Safety in the bot's generative AI configuration

D.Set the temperature parameter to 0 to reduce variability

AnswerB

DLP policies in M365 can block sensitive data from being sent to AI models.

Why this answer

Option B is correct because Microsoft Copilot Studio integrates with Microsoft 365 DLP policies to prevent sensitive data from being sent to the underlying Azure OpenAI model. By defining DLP policies in the Microsoft 365 compliance center and applying them to Copilot Studio, you can enforce data loss prevention rules that block or restrict the transmission of sensitive information (e.g., credit card numbers, social security numbers) to the generative AI backend. This ensures compliance with organizational security requirements without modifying the AI model itself.

Exam trap

The trap here is that candidates confuse Azure AI Content Safety (which handles harmful content moderation) with DLP policies (which handle sensitive data protection), leading them to select Option C instead of the correct DLP-based approach.

How to eliminate wrong answers

Option A is wrong because content filters in Azure OpenAI Studio are designed to filter harmful or offensive content in model outputs, not to prevent sensitive data from being sent to the model as input; they operate on the response side, not the request side. Option C is wrong because Azure AI Content Safety is a service for detecting and filtering harmful content (e.g., hate speech, violence) in both inputs and outputs, but it does not enforce DLP policies or block sensitive data based on organizational compliance rules; it focuses on safety, not data loss prevention. Option D is wrong because setting the temperature parameter to 0 reduces the randomness of the model's responses, making them more deterministic, but it has no effect on preventing sensitive data from being sent to the model; it controls output variability, not input filtering.

Practice this question →

Multi-Selecthard

You are designing a generative AI solution using Azure OpenAI Service. The solution must support multiple languages and provide consistent quality across languages. Which THREE actions should you take?

Select 3 answers

A.Fine-tune the model on a dataset of a single language

B.Use a model that supports multiple languages (e.g., GPT-4)

C.Provide examples in multiple languages in the prompt

D.Set the temperature to 0 for all requests

E.Test the solution with representative prompts in each language

AnswersB, C, E

Multilingual models handle multiple languages natively.

Why this answer

Option B is correct because GPT-4 is a multilingual model pre-trained on diverse language corpora, enabling it to generate coherent and contextually appropriate responses across many languages without additional fine-tuning. This ensures consistent quality by leveraging the model's inherent cross-lingual capabilities, which is essential for a generative AI solution that must support multiple languages.

Exam trap

The trap here is that candidates may think fine-tuning on a single language (Option A) is sufficient for multilingual support, or that setting temperature to 0 (Option D) universally improves consistency, when in fact these actions undermine the required cross-lingual quality and flexibility.

Practice this question →

MCQeasy

You need to generate a summary of a long article using Azure OpenAI. The article is 10,000 tokens long. What should you do to fit the article within the model's context window?

A.Split the article into smaller sections and summarize each section separately.

B.Increase the temperature parameter.

C.Use a model with a smaller context window.

D.Set max_tokens to a lower value.

AnswerA

Chunking the input fits within the context window.

Why this answer

Option A is correct because the article exceeds the model's context window (typically 4096 or 8192 tokens for GPT-3.5/4). Splitting the article into smaller sections and summarizing each separately allows you to process the entire content within the token limits, then combine the summaries for a final coherent output. This is a standard chunking strategy for long documents when using Azure OpenAI.

Exam trap

The trap here is that candidates confuse parameters that control output behavior (temperature, max_tokens) with the fundamental input token limit, leading them to incorrectly believe adjusting these parameters can bypass the context window restriction.

How to eliminate wrong answers

Option B is wrong because increasing the temperature parameter affects randomness and creativity of the output, not the input token limit; it does not help fit a long article into the context window. Option C is wrong because using a model with a smaller context window would make the problem worse, as it reduces the maximum input length, not increase it. Option D is wrong because setting max_tokens to a lower value only truncates the output length, not the input; the article still exceeds the context window and will be rejected or truncated at the input stage.

Practice this question →

MCQeasy

You are using Azure OpenAI Service to generate marketing copy. You notice that the output sometimes contains factual inaccuracies about your company's products. Which action can you take to improve factual accuracy?

A.Lower the temperature to 0.

B.Include relevant product information in the system message.

C.Increase the maxTokens to 4000.

D.Add a stop sequence to limit output.

AnswerB

Providing accurate context in the prompt helps the model generate factual responses.

Why this answer

Option A is correct because providing ground truth data in the prompt (e.g., via RAG) improves accuracy. Option B is wrong as temperature affects creativity. Option C is wrong as maxTokens limits length.

Option D is wrong as stop sequences control generation stopping.

Practice this question →

Multi-Selecthard

Which THREE factors should you consider when choosing between Azure OpenAI Service and Azure Machine Learning for deploying a generative AI model?

Select 3 answers

A.Integration with Microsoft Purview for data governance.

B.Latency requirements: Azure OpenAI may offer lower latency for standard models.

C.Ability to scale to thousands of concurrent requests.

D.Need for custom model architecture: Azure ML supports custom models, Azure OpenAI uses pre-trained.

E.Operational overhead: Azure OpenAI is a fully managed service.

AnswersB, D, E

Azure OpenAI endpoints are optimized for low latency, whereas Azure ML may require additional optimization.

Why this answer

Options A, C, and E are correct. A: Azure OpenAI is fully managed with less operational overhead. C: Azure OpenAI offers pre-trained models, while Azure ML allows custom models.

E: Azure OpenAI may have lower latency for similar model sizes. B is wrong because both services support scaling. D is wrong because both services support governance.

Practice this question →

Multi-Selectmedium

Which TWO actions are required to enable a custom chatbot built with Azure OpenAI to answer questions based on a company's internal PDF documents?

Select 2 answers

A.Use Azure AI Document Intelligence to extract text from PDFs before indexing

B.Deploy Azure AI Content Safety to filter responses

C.Fine-tune the GPT model on the PDF content

D.Ingest the PDFs into an Azure Cognitive Search index

E.Configure the Azure OpenAI deployment to use 'Add your data' with the search index

AnswersD, E

Indexing enables retrieval of relevant content from PDFs.

Why this answer

Option D is correct because Azure Cognitive Search provides the indexing and retrieval capabilities needed to make PDF content searchable. By ingesting PDFs into an Azure Cognitive Search index, the chatbot can perform vector or keyword searches over the extracted text, enabling it to retrieve relevant passages to answer user questions. This is the standard approach for grounding a custom chatbot on proprietary documents without modifying the underlying model.

Exam trap

The trap here is that candidates often confuse fine-tuning (option C) with the RAG pattern, mistakenly believing they must retrain the model on proprietary data, when in fact the 'Add your data' feature with a search index is the correct and simpler approach for question-answering over internal documents.

Practice this question →

Multi-Selectmedium

Which TWO are valid ways to manage cost when using Azure OpenAI Service in a production application?

Select 2 answers

A.Fine-tune the model to reduce the number of examples needed in prompts

B.Increase the temperature parameter to 1.0

C.Use a smaller model like GPT-3.5-turbo instead of GPT-4 for simpler tasks

D.Provision more PTUs to get a lower rate per token

E.Set the max_tokens parameter to the minimum needed for the response

AnswersC, E

Smaller models have lower per-token costs.

Why this answer

Option C is correct because using a smaller model like GPT-3.5-turbo for simpler tasks directly reduces the per-token cost compared to GPT-4, which is significantly more expensive. Azure OpenAI Service charges based on model tier and token usage, so selecting the appropriate model for the task complexity is a primary cost management strategy.

Exam trap

The trap here is that candidates may confuse fine-tuning with prompt optimization, or assume that increasing PTUs lowers per-token cost, when in fact PTUs are a fixed-cost commitment that increases total expenditure.

Practice this question →

MCQmedium

Refer to the exhibit. You are deploying an Azure AI Services resource using an ARM template. After deployment, you cannot access the resource from any client, including the Azure portal. What is the most likely cause?

A.The SKU S0 does not support network restrictions.

B.The cognitiveServiceName conflicts with an existing resource.

C.The networkAcls defaultAction is set to Deny with no allowed IP or virtual network rules.

D.The virtualNetworkRules array is empty.

AnswerC

Blocks all traffic, including portal.

Why this answer

The networkAcls defaultAction set to Deny with no allowed IP or virtual network rules blocks all traffic to the Azure AI Services resource, including requests from the Azure portal and any client. This is because the default network access control list (ACL) denies all traffic unless explicitly permitted by IP rules or virtual network rules, rendering the resource inaccessible even for management operations.

Exam trap

The trap here is that candidates often assume an empty virtualNetworkRules array means no restrictions, but the defaultAction property controls the default behavior, and setting it to Deny with no allow rules blocks all traffic regardless of empty arrays.

How to eliminate wrong answers

Option A is wrong because the S0 SKU does support network restrictions; network ACLs are available for all paid SKUs (S0 and above), and the issue is not related to SKU limitations. Option B is wrong because a name conflict would cause a deployment error, not a scenario where the resource is deployed but inaccessible; the ARM template would fail with a conflict error during deployment. Option D is wrong because an empty virtualNetworkRules array is irrelevant when the defaultAction is Deny; the resource would still be blocked since no IP rules are specified to allow traffic, and an empty array does not implicitly permit any traffic.

Practice this question →

MCQeasy

You are building a conversational AI system using Azure OpenAI Service. The system must maintain context across multiple user turns. Which parameter determines how many previous messages are considered for the next response?

A.max_tokens

B.temperature

C.top_p

D.The length of the messages array in the API call

AnswerD

The messages array holds conversation history; its length determines how many previous turns are included.

Why this answer

Option D is correct because the Azure OpenAI Service API uses a `messages` array in the request body to represent the conversation history. Each entry in this array corresponds to a previous turn (with roles like 'user', 'assistant', or 'system'), and the length of this array directly determines how many prior messages are considered when generating the next response. By including more messages, you extend the context window; by truncating the array, you limit it.

Exam trap

The trap here is that candidates often confuse parameters that control output generation (like `max_tokens`, `temperature`, or `top_p`) with the mechanism for maintaining conversation history, which is explicitly managed by the structure of the API call's `messages` array.

How to eliminate wrong answers

Option A is wrong because `max_tokens` controls the maximum number of tokens (words/subwords) in the generated response, not the number of previous messages considered for context. Option B is wrong because `temperature` is a sampling parameter that influences the randomness or creativity of the output, not the conversation history length. Option C is wrong because `top_p` (nucleus sampling) sets a probability threshold for token selection, affecting output diversity, not the number of prior turns used as context.

Practice this question →

Multi-Selectmedium

Which TWO actions should you take to ensure that a generative AI model deployed on Azure Machine Learning is compliant with data privacy regulations?

Select 2 answers

A.Implement data masking during preprocessing.

B.Store all training data in a separate Azure region.

C.Use differential privacy during model training.

D.Encrypt the model at rest using Azure Key Vault.

E.Log all raw input data to Azure Monitor for auditing.

AnswersA, C

Data masking replaces sensitive information with realistic but fictional data, helping to comply with privacy regulations.

Why this answer

Options A and D are correct. A: Data masking in preprocessing helps anonymize sensitive data. D: Differential privacy adds noise to protect individual data points.

B is wrong because exposing raw data is not compliant. C is wrong because storing data in a different region does not address privacy. E is wrong because encryption at rest is about security, not privacy compliance.

Practice this question →

MCQmedium

A company is using Azure OpenAI to generate customer support responses. They want to ensure the model does not use any personally identifiable information (PII) in its outputs. What should they implement?

A.Fine-tune the model on anonymized data.

B.Use prompt engineering to instruct the model to redact PII.

C.Use Azure AI Content Safety to filter PII from the output.

D.Use a system message instructing the model to avoid PII.

AnswerC

Azure AI Content Safety can detect and block PII.

Why this answer

Azure AI Content Safety provides built-in PII detection and redaction capabilities that can automatically scan and filter sensitive information from model outputs. This is the most reliable approach because it operates as a post-processing filter, catching PII that the model might generate despite instructions. Fine-tuning, prompt engineering, and system messages are all fallible because they rely on the model's compliance rather than enforced filtering.

Exam trap

The trap here is that candidates confuse 'instruction-based approaches' (prompts, system messages) with 'enforcement-based approaches' (content safety filters), assuming that telling the model not to do something is as effective as actively filtering the output.

How to eliminate wrong answers

Option A is wrong because fine-tuning on anonymized data does not prevent the model from generating PII during inference; it only reduces the likelihood based on training data, and the model can still hallucinate or leak PII from its pretrained knowledge. Option B is wrong because prompt engineering is a soft instruction that the model may ignore or fail to apply consistently, especially with edge cases or adversarial inputs, and it does not provide guaranteed redaction. Option D is wrong because a system message is merely a directive to the model, not a technical enforcement mechanism; the model can still output PII if it misinterprets or overrides the instruction.

Practice this question →

MCQeasy

You are using Azure OpenAI Service to summarize customer emails. The summaries must be concise and contain only key information. Which prompt engineering technique should you apply?

A.Use few-shot prompting with examples of desired summaries

B.Use chain-of-thought prompting

C.Use zero-shot prompting with a one-sentence instruction

D.Use negative prompting to avoid verbose output

AnswerA

Few-shot examples guide the model to produce concise summaries.

Why this answer

Few-shot prompting is the correct technique because it provides the model with explicit examples of desired input-output pairs (e.g., a verbose email and its concise summary). This guides the model to learn the exact format, tone, and level of detail required for the summaries, which is critical for consistency in a production summarization pipeline. Without examples, the model may default to its training distribution and produce overly verbose or irrelevant output.

Exam trap

The trap here is that candidates often assume a simple instruction (zero-shot) is sufficient for summarization, underestimating how much the model relies on explicit examples to enforce output structure and conciseness, especially when the task requires domain-specific key information extraction.

How to eliminate wrong answers

Option B is wrong because chain-of-thought prompting is designed for multi-step reasoning tasks (e.g., math word problems, logical deduction) where intermediate steps are needed, not for summarization where the goal is direct extraction of key information. Option C is wrong because zero-shot prompting with a one-sentence instruction lacks the concrete examples needed to constrain the model's output style and length, often resulting in summaries that are too long or miss critical details. Option D is wrong because negative prompting (e.g., 'do not be verbose') is unreliable; the model may misinterpret the negation or still produce verbose output because it lacks positive examples of the desired concise format.

Practice this question →

MCQeasy

You are developing a generative AI application that uses Azure OpenAI Service to summarize large documents. The application experiences high latency when processing requests. You need to reduce the latency without changing the model. What should you do?

A.Increase the temperature parameter

B.Increase the top_p parameter

C.Reduce the max_tokens parameter in the API request

D.Increase the max_tokens parameter

AnswerC

Reducing max_tokens limits output length, reducing processing time.

Why this answer

Reducing the max_tokens parameter limits the length of the generated response, which directly reduces the processing time required by the Azure OpenAI Service to produce the output. Since latency is caused by the model generating a long sequence of tokens, capping the output tokens decreases the number of autoregressive decoding steps, thereby lowering response time without altering the underlying model.

Exam trap

The trap here is that candidates often confuse parameters that affect output length (max_tokens) with those that affect output diversity (temperature, top_p), mistakenly believing that adjusting randomness can speed up generation.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter controls the randomness of the output, not the length or speed of generation; it has no direct impact on latency. Option B is wrong because increasing the top_p parameter (nucleus sampling) affects the diversity of token selection but does not reduce the number of tokens generated or the processing time. Option D is wrong because increasing the max_tokens parameter would allow longer responses, which would increase the number of decoding steps and worsen latency, the opposite of the desired outcome.

Practice this question →

Multi-Selecteasy

Which TWO Azure AI services can be used to build a conversational chatbot that uses generative AI? (Choose two.)

Select 2 answers

A.Azure AI Search

B.Azure AI Bot Service

C.Azure AI Translator

D.Azure AI Language

E.Azure OpenAI Service

AnswersB, E

It provides the framework to build and deploy chatbots.

Why this answer

Option A is correct because Azure OpenAI Service provides generative models. Option C is correct because Azure AI Bot Service provides the bot framework. Option B is wrong as Azure AI Search is for search, not conversational.

Option D is wrong as Azure AI Translator is for translation. Option E is wrong as Azure AI Language provides NLP but not a complete chatbot framework.

Practice this question →

MCQmedium

You are deploying a generative AI application that uses Azure OpenAI Service. You need to ensure that the application can handle sudden spikes in traffic without exceeding your quota. Which scaling strategy should you implement?

A.Use a Pay-as-you-go deployment and rely on Azure's automatic scaling.

B.Configure a provisioned throughput deployment with auto-scaling.

C.Create a deployment with a high base TPM and manually adjust during peak times.

D.Deploy multiple instances of the model in different regions.

AnswerB

Provisioned throughput allows you to define a base and max TPM, auto-scaling within quota.

Why this answer

Option D is correct because a provisioned throughput deployment with auto-scaling allows you to handle spikes while staying within quota. Option A is wrong as Pay-as-you-go has limited throughput. Option B is wrong as manual scaling is not automatic.

Option C is wrong as multiple instances don't help if quota is fixed.

Practice this question →

MCQeasy

You are developing a solution that uses Azure OpenAI to generate customer support responses. You want to prevent the model from repeating the same phrases. Which parameter should you adjust?

A.top_p

B.presence_penalty

C.temperature

D.frequency_penalty

AnswerD

Frequency penalty reduces the likelihood of repeating the same tokens.

Why this answer

The frequency_penalty parameter (option D) is correct because it directly reduces the likelihood of the model repeating the same phrases by penalizing tokens that have already appeared in the generated text. A higher frequency_penalty value (e.g., 0.5 to 1.0) decreases the probability of reusing tokens, making the output more diverse and less repetitive. This is specifically designed to address repetition in generative AI responses.

Exam trap

The trap here is that candidates often confuse presence_penalty with frequency_penalty, but presence_penalty only penalizes tokens that have appeared at least once (regardless of count), while frequency_penalty penalizes based on the actual frequency of occurrence, making it the correct choice for preventing repeated phrases.

How to eliminate wrong answers

Option A is wrong because top_p (nucleus sampling) controls the cumulative probability threshold for token selection, influencing randomness and diversity of output, but it does not specifically penalize repeated phrases. Option B is wrong because presence_penalty penalizes tokens that have appeared at least once in the text, encouraging the model to talk about new topics, but it does not target the frequency of repetition of the same phrases. Option C is wrong because temperature controls the randomness of token selection by scaling the logits before softmax, affecting creativity and variability, but it has no direct mechanism to prevent repetition of phrases.

Practice this question →

MCQhard

You are reviewing an ARM template for deploying Azure OpenAI Service. The template includes a deployment for gpt-35-turbo with a capacity of 100. You need to ensure that the deployment uses provisioned throughput instead of standard. What should you modify?

A.Change the sku name to 'ProvisionedManaged'.

B.Remove the raiPolicyName property.

C.Increase the capacity to 200.

D.Change the model format to 'GPT-4'.

AnswerA

ProvisionedManaged is the sku for provisioned throughput.

Why this answer

To use provisioned throughput (PTU) with Azure OpenAI Service, you must set the SKU name to 'ProvisionedManaged' in the ARM template. The default SKU is 'Standard', which uses pay-per-token consumption. Changing the SKU name to 'ProvisionedManaged' tells the resource provider to allocate dedicated throughput capacity for the deployment, ensuring consistent latency and throughput regardless of other workloads.

Exam trap

The trap here is that candidates often think increasing capacity or changing the model version enables provisioned throughput, but the exam tests the specific SKU name 'ProvisionedManaged' as the only way to switch from standard to provisioned throughput in an ARM template.

How to eliminate wrong answers

Option B is wrong because removing the raiPolicyName property does not affect throughput provisioning; it only removes content filtering or responsible AI policies, which are unrelated to capacity allocation. Option C is wrong because increasing capacity to 200 only scales the number of tokens per minute under the current SKU (Standard), but does not change the SKU to provisioned throughput; PTU requires the SKU name change, not just a higher capacity value. Option D is wrong because changing the model format to 'GPT-4' does not enable provisioned throughput; PTU is a SKU-level setting independent of the model version, and GPT-4 can also be deployed with Standard SKU.

Practice this question →

MCQhard

A research lab wants to use Azure OpenAI to generate synthetic data for training a model. They need to generate a large volume of data quickly and cost-effectively. Which approach should they use?

A.Use the batch API to process requests asynchronously.

B.Fine-tune a model to generate the data locally.

C.Use the streaming API with multiple concurrent connections.

D.Deploy a model on Azure Functions and call it in parallel.

AnswerA

Batch API is designed for high volume with lower cost.

Why this answer

The batch API is designed for high-throughput, asynchronous processing of large volumes of requests, making it ideal for generating synthetic data at scale. It allows the lab to submit many prompts in a single batch, which Azure OpenAI processes efficiently, reducing both cost and time compared to real-time processing.

Exam trap

The trap here is that candidates often confuse the batch API with the streaming API, assuming streaming is faster for volume, but the batch API is specifically designed for high-throughput, cost-effective asynchronous processing, not real-time use.

How to eliminate wrong answers

Option B is wrong because fine-tuning a model does not generate data locally; it adapts a pre-trained model to a specific task, and the data generation still requires API calls or local inference, which is not cost-effective for large volumes. Option C is wrong because the streaming API is designed for real-time, low-latency responses, not for high-throughput batch processing, and managing multiple concurrent connections increases complexity and cost without the efficiency of batching. Option D is wrong because deploying a model on Azure Functions and calling it in parallel introduces overhead from serverless scaling and per-execution costs, which is less cost-effective than the batch API's optimized queuing and processing.

Practice this question →

Multi-Selecthard

A developer is using Azure OpenAI to generate code snippets. The developer needs to ensure that the generated code does not contain security vulnerabilities. Which TWO actions should the developer take? (Choose two.)

Select 2 answers

A.Include examples of secure coding practices in the prompt.

B.Set the max_tokens parameter to a high value to allow longer outputs.

C.Fine-tune the model on a dataset containing examples of insecure code.

D.Use the content filtering feature to block malicious code patterns.

E.Add a system message that instructs the model to never generate insecure code.

AnswersA, D

Providing examples of secure code helps guide the model towards generating secure code.

Why this answer

Option A is correct because including examples of secure coding practices in the prompt (few-shot prompting) directly guides the model to generate code that follows those patterns. This technique leverages in-context learning, where the model uses the provided examples to shape its output, reducing the likelihood of producing insecure code without requiring fine-tuning or external filtering.

Exam trap

The trap here is that candidates often overestimate the effectiveness of system messages (Option E) or content filtering (Option D) for code security, while underestimating the power of few-shot prompting (Option A) to directly influence model behavior through example-based guidance.

Practice this question →

MCQeasy

Refer to the exhibit. You are reviewing the configuration of an Azure OpenAI Service resource. The resource is configured with customer-managed keys for encryption. What is the primary benefit of this configuration?

A.Enhanced control over data encryption keys

B.Simplified deployment process

C.Improved model performance

D.Reduced operational costs

AnswerA

Customer-managed keys give you control over encryption keys, improving security and compliance.

Why this answer

Customer-managed keys (CMK) allow you to control and manage the encryption keys used to protect your data at rest in Azure OpenAI Service. This provides enhanced control over who can access the keys, when they are rotated, and how they are stored, which is critical for meeting compliance and security requirements. The primary benefit is not performance, cost, or deployment simplicity, but rather the ability to enforce your own key lifecycle and access policies.

Exam trap

The trap here is that candidates often confuse customer-managed keys with platform-managed keys, assuming the primary benefit is cost savings or performance gains, when in reality the core advantage is granular control over encryption key governance and compliance.

How to eliminate wrong answers

Option B is wrong because customer-managed keys add complexity to the deployment process (you must create and manage a Key Vault, set permissions, and configure key rotation), not simplify it. Option C is wrong because encryption keys have no impact on model inference speed or accuracy; performance is determined by model size, token limits, and compute resources. Option D is wrong because CMK typically increases operational costs due to the need for additional Key Vault resources, key management overhead, and potential charges for key operations.

Practice this question →

MCQhard

Your team is building an application that uses Azure OpenAI Service to summarize legal documents. You need to ensure that the summaries do not include any personally identifiable information (PII) that might appear in the source documents. Which feature should you configure in the Azure OpenAI Service?

A.Configure rate limiting to reduce processing volume.

B.Enable content filtering with PII detection.

C.Set the system message to instruct the model to exclude PII.

D.Use the 'Add your data' feature to ground the model.

AnswerB

Content filtering can detect and redact PII.

Why this answer

Option B is correct because Azure OpenAI Service's content filtering system includes built-in PII detection capabilities that can automatically identify and redact personally identifiable information from model inputs and outputs. This feature operates at the platform level, ensuring PII is filtered regardless of how the model is prompted, providing a reliable safeguard for sensitive legal documents.

Exam trap

The trap here is that candidates often assume a system message is sufficient for content safety, but Microsoft explicitly tests that content filtering is the only guaranteed mechanism for PII removal, as model-level instructions can be overridden or ignored.

How to eliminate wrong answers

Option A is wrong because rate limiting controls the number of requests processed per time period to manage resource usage and prevent abuse, but it has no mechanism to detect or remove PII from summaries. Option C is wrong because while a system message can instruct the model to exclude PII, it relies entirely on the model's compliance and can be bypassed by adversarial prompts or model hallucinations, offering no guaranteed enforcement. Option D is wrong because the 'Add your data' feature grounds the model on your own documents for retrieval-augmented generation, but it does not include any PII detection or redaction capability—it simply provides additional context without filtering sensitive content.

Practice this question →

MCQeasy

You need to monitor usage and costs of your Azure OpenAI Service deployments. Which Azure tool should you use?

A.Azure Cost Management + Billing

B.Azure Monitor

C.Azure Service Health

D.Azure Advisor

AnswerA

It provides detailed cost analysis and budgeting.

Why this answer

Azure Cost Management + Billing is the correct tool for monitoring usage and costs of Azure OpenAI Service deployments because it provides detailed cost analysis, budget tracking, and usage reports across all Azure services. It allows you to set budgets, create cost alerts, and analyze spending patterns specifically for OpenAI model deployments, including per-model and per-region cost breakdowns.

Exam trap

The trap here is that candidates often confuse Azure Monitor (which tracks performance metrics like token usage and latency) with cost monitoring, but Azure Monitor does not provide billing data or cost analysis, which is the specific requirement in this question.

How to eliminate wrong answers

Option B (Azure Monitor) is wrong because it focuses on performance metrics, logs, and alerts for application health and resource utilization, not on cost tracking or billing data. Option C (Azure Service Health) is wrong because it monitors service-level issues, outages, and planned maintenance across Azure services, not usage or cost metrics. Option D (Azure Advisor) is wrong because it provides best-practice recommendations for optimizing cost, performance, and reliability, but it does not directly monitor or report on actual usage and costs in real time.

Practice this question →

MCQhard

Your organization uses Azure AI Document Intelligence to extract data from invoices. The extraction accuracy for total amounts is low. You have a labeled dataset of 500 invoices. You need to improve the model's accuracy for the 'total amount' field. What should you do?

A.Add additional predefined models for invoice processing.

B.Enable OCR enhancement to improve text recognition.

C.Increase the confidence threshold for the total amount field.

D.Create a custom neural model and train it with the labeled dataset.

AnswerD

Custom neural models can be trained to improve accuracy on specific fields.

Why this answer

Option D is correct because training a custom neural model with labeled data directly targets the field accuracy. Option A is wrong because adding more predefined models doesn't improve field-specific accuracy. Option B is wrong because confidence thresholds affect acceptance, not accuracy.

Option C is wrong because OCR is already used; the issue is extraction.

Practice this question →

MCQeasy

You need to generate an image of a cat wearing a hat using Azure OpenAI. Which model should you use?

A.Whisper

B.DALL-E

C.GPT-4

D.Codex

AnswerB

DALL-E generates images from text descriptions.

Why this answer

DALL-E is the correct model because it is specifically designed by OpenAI for generating images from text descriptions. Azure OpenAI Service hosts DALL-E, enabling you to create images like 'a cat wearing a hat' by providing a natural language prompt. Other models like Whisper, GPT-4, and Codex are optimized for speech recognition, text generation, and code generation respectively, not image synthesis.

Exam trap

The trap here is that candidates may confuse GPT-4's multimodal capabilities (which can analyze images but not generate them) with DALL-E's image generation role, or mistakenly think Whisper or Codex can handle image tasks due to their 'AI' branding.

How to eliminate wrong answers

Option A is wrong because Whisper is a speech-to-text model used for transcribing audio, not for generating images. Option C is wrong because GPT-4 is a large language model focused on text generation and understanding, lacking native image generation capabilities. Option D is wrong because Codex is a model specialized in code generation and completion, not image creation.

Practice this question →

MCQeasy

A developer wants to deploy a custom generative AI model using Azure Machine Learning. Which compute target should they choose for low-latency real-time inference?

A.Local deployment

B.Azure Batch

C.Azure Functions

D.Azure Kubernetes Service (AKS)

AnswerD

AKS is designed for real-time inference with low latency.

Why this answer

Azure Kubernetes Service (AKS) is the correct compute target for low-latency real-time inference because it supports horizontal pod autoscaling, GPU acceleration, and can be configured with a low-latency ingress controller (e.g., NGINX or Azure Application Gateway) to route inference requests directly to model containers. AKS also integrates with Azure Machine Learning's real-time inference endpoint, which uses a gRPC or HTTP-based scoring protocol to achieve sub-100ms response times.

Exam trap

The trap here is that candidates often confuse Azure Functions' serverless convenience with real-time capability, overlooking the cold-start penalty and lack of GPU support, while AKS is the only option that provides the necessary infrastructure for consistent low-latency inference.

How to eliminate wrong answers

Option A is wrong because local deployment (e.g., a local Docker container or Jupyter notebook) is intended for development and testing only, not for production-grade low-latency real-time inference, as it lacks scalability, load balancing, and network-level optimizations. Option B is wrong because Azure Batch is designed for high-throughput, parallel batch processing jobs (e.g., offline scoring of large datasets) and is not optimized for low-latency real-time inference due to its job-queue scheduling overhead and lack of persistent endpoints. Option C is wrong because Azure Functions, while serverless and capable of handling HTTP triggers, has a cold-start latency problem (often 1-10 seconds) and limited GPU support, making it unsuitable for sub-second real-time inference workloads.

Practice this question →

MCQhard

A company is developing a generative AI solution that must process sensitive customer data. They need to ensure that data remains within their Azure tenant and is not used to improve the base model. Which configuration is required in Azure OpenAI?

A.Opt out of abuse monitoring and data logging in the Azure OpenAI Studio.

B.Enable customer-managed keys for encryption.

C.Use a private endpoint to restrict access to the service.

D.Configure data residency to keep data in a specific region.

AnswerA

Opting out prevents Microsoft from using your data for model improvement.

Why this answer

Option A is correct because opting out of abuse monitoring and data logging in Azure OpenAI Studio ensures that sensitive customer data is not sent to Microsoft for review or used to improve the base model. This configuration is specifically designed for customers who need to process sensitive data while maintaining data residency within their Azure tenant, as it disables the default logging and human review processes that could expose data outside the tenant.

Exam trap

The trap here is that candidates confuse network-level security controls (private endpoints) or encryption (CMK) with data governance policies that control how Microsoft uses data for model training, leading them to select options that address access or storage but not the specific requirement to prevent data from being used to improve the base model.

How to eliminate wrong answers

Option B is wrong because customer-managed keys (CMK) only control encryption at rest and do not prevent data from being used for model improvement or abuse monitoring; CMK is about key ownership, not data governance for training. Option C is wrong because a private endpoint restricts network access to the service but does not affect how Microsoft processes or stores data for model improvement; it only secures the connection path, not the data usage policy. Option D is wrong because data residency configuration ensures data is stored in a specific geographic region but does not prevent Microsoft from using that data for abuse monitoring or base model training; residency is about location, not usage rights.

Practice this question →

MCQhard

You run the Azure CLI command shown in the exhibit to create an online endpoint for a generative AI model. The deployment fails because the selected VM instance type is not available in the East US region. Which action should you take to resolve the issue?

A.Increase the instance count to 2

B.Specify a different VM type or region that supports Standard_NC6s_v3

C.Use a batch endpoint instead of online endpoint

D.Change --compute-type to CPU

AnswerB

Choosing an available VM type or region resolves the issue.

Why this answer

The deployment failed because the Standard_NC6s_v3 VM instance type is not available in the East US region. The correct action is to either choose a different VM type that is available in East US or deploy to a different region that supports Standard_NC6s_v3. This directly addresses the root cause of the failure, as Azure Machine Learning online endpoints require the selected VM SKU to be available in the target region.

Exam trap

The trap here is that candidates may think increasing instance count or switching to a batch endpoint will bypass the regional SKU limitation, but neither changes the underlying VM type or region, so the deployment will still fail.

How to eliminate wrong answers

Option A is wrong because increasing the instance count does not change the VM type or region; it only scales out the number of instances, which does not resolve the unavailability of the VM SKU. Option C is wrong because switching to a batch endpoint does not address the VM availability issue; batch endpoints also require compatible compute resources and are designed for asynchronous, large-scale inference, not for fixing regional SKU unavailability. Option D is wrong because changing --compute-type to CPU would not help if the model requires GPU acceleration (as implied by the NC-series VM), and it does not solve the regional availability problem for the specified VM type.

Practice this question →

MCQhard

You are building a generative AI application using Azure OpenAI Service. The application must provide factual answers based on your company's internal knowledge base. You need to minimize the risk of the model generating incorrect information (hallucinations). Which approach should you take?

A.Implement Retrieval-Augmented Generation (RAG) with Azure AI Search.

B.Fine-tune the model on your company's documents.

C.Use few-shot prompting with examples of correct answers.

D.Set the max_tokens parameter to a low value.

AnswerA

RAG retrieves relevant documents and uses them as context, reducing hallucinations.

Why this answer

Option A is correct because Retrieval-Augmented Generation (RAG) with Azure AI Search grounds the model's responses in your company's internal knowledge base by retrieving relevant documents in real time and injecting them into the prompt. This reduces hallucinations by ensuring the model generates answers based on retrieved facts rather than relying solely on its parametric memory. Azure AI Search provides vector and hybrid search capabilities that efficiently index and query your documents, making RAG the most effective approach for factual accuracy.

Exam trap

Cisco often tests the misconception that fine-tuning (Option B) is the best way to ground a model in proprietary data, but the trap is that fine-tuning does not provide dynamic, query-specific retrieval and can still produce hallucinations, whereas RAG explicitly forces the model to use retrieved facts.

How to eliminate wrong answers

Option B is wrong because fine-tuning adjusts the model's weights on your documents, which can lead to overfitting and does not guarantee that the model will not hallucinate; it still relies on its internal knowledge and may generate plausible-sounding but incorrect information when queried outside the fine-tuned distribution. Option C is wrong because few-shot prompting provides examples but does not ground the model in your specific knowledge base; the model can still hallucinate if the examples do not cover the exact query context or if it extrapolates incorrectly. Option D is wrong because setting max_tokens to a low value only truncates the output length and does not improve factual accuracy; it may even cause incomplete or misleading answers without addressing the root cause of hallucinations.

Practice this question →

MCQmedium

You are developing a chat application that uses Azure OpenAI Service to answer customer queries. You need to ensure that the model does not generate responses containing internal company policies or confidential information. Which approach should you use?

A.Use Azure AI Search with index filtering to exclude documents with confidential info.

B.Fine-tune the model on a dataset that excludes confidential information.

C.Set the system message to instruct the model not to include confidential information.

D.Configure Azure AI Content Safety with custom categories for confidential terms.

AnswerC

System messages guide model behavior effectively for content restrictions.

Why this answer

Option B is correct because system messages provide instructions to the model on how to behave, including restricting certain topics. Option A is wrong because Azure AI Content Safety filters harmful content but does not prevent model from generating confidential info. Option C is wrong because fine-tuning may not guarantee avoidance of specific topics.

Option D is wrong because embeddings are for retrieval, not for restricting model output.

Practice this question →

Multi-Selectmedium

You are using Azure OpenAI Service to generate text. You need to reduce the likelihood of the model generating repetitive sequences. Which TWO parameters should you adjust?

Select 2 answers

A.frequency_penalty

B.max_tokens

C.top_p

D.temperature

E.presence_penalty

AnswersA, E

Frequency penalty reduces repetition by penalizing frequent tokens.

Why this answer

Frequency penalty reduces the likelihood of the model repeating the same tokens or phrases by subtracting a fixed penalty from the log-probability of tokens that have already appeared in the generated text. This directly discourages repetitive sequences, making it one of the two correct parameters to adjust.

Exam trap

The trap here is that candidates often confuse temperature and top_p with repetition control, but these parameters affect randomness and diversity of token selection, not the direct penalization of repeated tokens.

Practice this question →

MCQmedium

You are testing an Azure OpenAI chat completion. The response shown in the exhibit is returned. What does the finish_reason of 'content_filter' indicate?

A.The model's response was blocked by the content filter.

B.There was a system error during processing.

C.The user's prompt was flagged by the content filter.

D.The model refused to answer due to insufficient data.

AnswerA

The finish_reason indicates the response was filtered.

Why this answer

The 'content_filter' finish_reason indicates that the Azure OpenAI content filtering system detected that the model's generated response violated one of the configured content policies (e.g., hate, violence, self-harm, sexual content). The response was therefore blocked before being returned to the user, and the finish_reason explicitly signals this filtering action rather than a normal completion or a stop due to token limits.

Exam trap

The trap here is that candidates often confuse 'content_filter' with prompt rejection, but the finish_reason specifically indicates the model's output was blocked, not the user's input.

How to eliminate wrong answers

Option B is wrong because a system error during processing would return a different finish_reason such as 'error' or an HTTP 500 status, not 'content_filter'. Option C is wrong because the content_filter finish_reason applies to the model's response, not the user's prompt; if the prompt were flagged, the API would typically return a 400 error with a content filter violation message before any generation occurs. Option D is wrong because the model refusing to answer due to insufficient data would be indicated by a finish_reason of 'stop' (if the model generated a refusal message) or by a specific response text, not by the 'content_filter' reason.

Practice this question →

Matchingmedium

Match each Azure Cognitive Search skill to its capability.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Extract text from images

Identify entities like people or organizations

Extract key phrases from text

Detect language of text

Determine sentiment of text

Why these pairings

These are built-in cognitive skills in Azure Cognitive Search.

Practice this question →

Multi-Selectmedium

You are designing a generative AI solution that uses Azure OpenAI Service. The solution must not generate responses that include personally identifiable information (PII). Which TWO configurations should you implement? (Choose two.)

Select 2 answers

A.Use Azure AI Content Safety to filter PII in the output.

B.Enable diagnostic logging to audit all generated responses.

C.Set data retention policies to delete prompts and completions after 30 days.

D.Configure the system message to instruct the model not to generate PII.

E.Fine-tune the model on a dataset that excludes PII.

AnswersA, D

Content Safety can detect and block PII.

Why this answer

A and D are correct. Using system messages to instruct the model to avoid PII is a direct approach. Azure AI Content Safety can detect and filter PII in outputs.

B is wrong because fine-tuning on non-PII data does not guarantee avoidance. C is wrong because data retention policies affect storage, not generation. E is wrong because enabling logging does not prevent generation of PII.

Practice this question →

MCQhard

You are using Azure OpenAI Service to generate product descriptions. You notice that the model occasionally outputs descriptions that contain factual inaccuracies about product specifications. You want to reduce these hallucinations without changing the model. What should you do?

A.Increase the frequency_penalty parameter.

B.Decrease the temperature parameter.

C.Increase the max_tokens parameter.

D.Provide the product specifications in the prompt and use the system message to instruct the model to base answers on them.

AnswerD

Grounding the model with factual data in the prompt reduces hallucinations by providing accurate context.

Why this answer

Option D is correct because providing the product specifications directly in the prompt and using the system message to instruct the model to base its answers on them grounds the generation in factual data, reducing hallucinations. This technique, known as 'grounding' or 'retrieval-augmented generation' (RAG), does not modify the model itself but constrains its output to the provided context, which is the only way to reduce factual inaccuracies without changing model parameters.

Exam trap

The trap here is that candidates often confuse hyperparameter tuning (like temperature or frequency_penalty) with prompt engineering techniques, mistakenly believing that adjusting randomness or repetition penalties can fix factual hallucinations, when only providing the correct context in the prompt can do so without model changes.

How to eliminate wrong answers

Option A is wrong because increasing frequency_penalty reduces repetition of tokens by penalizing tokens that have already appeared, which does not address factual accuracy or hallucinations. Option B is wrong because decreasing temperature makes the model more deterministic and less creative, but it does not prevent the model from generating plausible-sounding but factually incorrect statements about product specifications. Option C is wrong because increasing max_tokens only allows longer responses, which can actually increase the chance of hallucinations by giving the model more opportunity to generate unsupported content.

Practice this question →

MCQhard

You are deploying an Azure OpenAI model for a healthcare application. You need to ensure that the model does not generate medical advice and that all responses include a disclaimer. Which configuration should you use?

A.Ground the model with your own medical documents.

B.Set max_tokens to 50 to limit response length.

C.Use Azure AI Content Safety to filter medical terms.

D.Configure a system message with instructions and enable content filtering.

AnswerD

System message can instruct disclaimer; content filtering blocks prohibited content.

Why this answer

Option D is correct because configuring a system message with explicit instructions (e.g., 'Do not provide medical advice; always include a disclaimer') combined with Azure AI Content Safety's content filtering allows you to enforce behavioral guardrails and block harmful outputs at the application layer. The system message sets the model's behavior, while content filtering provides a secondary safety net to catch policy violations, ensuring compliance in a regulated healthcare environment.

Exam trap

The trap here is that candidates confuse content filtering with behavioral control, assuming Azure AI Content Safety can enforce custom rules like 'do not generate medical advice' when it only filters predefined harmful categories, not domain-specific instructions.

How to eliminate wrong answers

Option A is wrong because grounding the model with medical documents (e.g., via Azure OpenAI on your data) does not prevent the model from generating medical advice; it only improves factual accuracy by referencing your data, but the model can still produce advice or omit disclaimers. Option B is wrong because setting max_tokens to 50 limits response length but does not control the content or ensure a disclaimer is included; the model could still generate medical advice within that token limit. Option C is wrong because Azure AI Content Safety filters harmful content based on predefined categories (e.g., hate, violence), but it does not have a built-in 'medical terms' filter; it cannot enforce a custom rule like 'do not generate medical advice' or 'include a disclaimer'.

Practice this question →

MCQmedium

A company uses Azure OpenAI to generate product descriptions. They want to ensure that the descriptions are consistent in style and tone. Which strategy should they use?

A.Fine-tune the model on a dataset of product descriptions.

B.Provide a few examples of desired style in the prompt (few-shot learning).

C.Set max_tokens to a small value to limit output length.

D.Increase the temperature to 1.0 for more creativity.

AnswerB

Examples guide the model to mimic the style.

Why this answer

Few-shot learning (option B) is the correct strategy because it directly controls style and tone by providing examples of desired output within the prompt. This leverages the model's in-context learning ability without modifying the underlying model weights, making it ideal for enforcing consistency without the cost and complexity of fine-tuning.

Exam trap

The trap here is that candidates often confuse fine-tuning (option A) as the only way to enforce style, overlooking that few-shot learning is a lighter, more flexible method that achieves the same goal without retraining.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires a large, curated dataset and retraining the model, which is overkill for simple style consistency and introduces risks of catastrophic forgetting or overfitting to narrow patterns. Option C is wrong because setting max_tokens to a small value only truncates the output length; it does not influence the style, tone, or content of the generated text. Option D is wrong because increasing temperature to 1.0 increases randomness and creativity, which would actually reduce consistency in style and tone, not enforce it.

Practice this question →

MCQmedium

You are deploying a generative AI solution that uses DALL-E to generate images. The application must ensure that generated images do not contain violent content. Which feature should you enable?

A.Configure Azure AI Content Safety to moderate images

B.Use grounding with Azure AI Search

C.Fine-tune the DALL-E model

D.Enable content filtering in DALL-E

AnswerA

Azure AI Content Safety can moderate image content.

Why this answer

Azure AI Content Safety is a dedicated service for detecting and moderating harmful content, including violence, in images and text. By integrating this service into your DALL-E image generation pipeline, you can scan generated images for violent content before they are delivered to users, ensuring compliance with safety policies. This is the correct approach because Azure AI Content Safety provides pre-built, customizable content moderation models specifically designed for this purpose.

Exam trap

The trap here is that candidates may assume DALL-E has a built-in content filter that can be toggled on, but in reality, Azure OpenAI Service requires you to use an external service like Azure AI Content Safety for post-generation moderation.

How to eliminate wrong answers

Option B is wrong because grounding with Azure AI Search is used to connect AI models to specific data sources for retrieval-augmented generation (RAG), not for content moderation or filtering violent content. Option C is wrong because fine-tuning a DALL-E model is not supported by Azure OpenAI Service; DALL-E models are pre-trained and cannot be fine-tuned, and even if possible, fine-tuning would not guarantee the removal of violent content from outputs. Option D is wrong because DALL-E itself does not have a built-in content filtering feature that can be enabled; content filtering must be implemented externally using Azure AI Content Safety or similar services.

Practice this question →

MCQmedium

You are building a solution to generate product descriptions using Azure OpenAI Service. You need to ensure that the output adheres to a specific tone (professional, friendly) and length (50-100 words). Which parameter should you adjust?

A.Configure max_tokens to limit response length.

B.Modify the top_p parameter.

C.Set the system message with instructions about tone and length.

D.Adjust the temperature parameter.

AnswerC

System message effectively guides the model's output style.

Why this answer

Option B is correct because the system message defines the assistant's behavior and output characteristics. Option A is wrong because temperature affects creativity, not tone or length. Option C is wrong because top_p is for sampling diversity.

Option D is wrong because max_tokens sets a hard limit but doesn't enforce a minimum or specific style.

Practice this question →

MCQmedium

You are developing a customer support chatbot using Azure OpenAI Service. The chatbot must only answer questions related to the company's product catalog and policies. You want to minimize the risk of the chatbot generating harmful or off-topic responses. Which approach should you use?

A.Set the max_tokens parameter to 100.

B.Use a system message that instructs the model to only answer product-related questions.

C.Set the temperature parameter to 0.

D.Set the top_p parameter to 0.1.

AnswerB

System messages define the assistant's behavior and constraints.

Why this answer

Option B is correct because a system message sets the foundational behavior of the model by providing high-level instructions that guide all subsequent responses. By explicitly instructing the model to only answer product-related questions, you establish a clear boundary that minimizes off-topic or harmful outputs. This approach leverages the model's instruction-following capability, which is more effective than parameter tuning alone for content restriction.

Exam trap

The trap here is that candidates often confuse content filtering parameters (temperature, top_p, max_tokens) with instruction-based control, assuming that reducing randomness or output length can prevent off-topic responses, when in fact only explicit system-level instructions can enforce domain constraints.

How to eliminate wrong answers

Option A is wrong because setting max_tokens to 100 only limits the length of the response, not the content or topic; the model could still generate harmful or off-topic text within that token limit. Option C is wrong because setting temperature to 0 makes the model deterministic and reduces randomness, but it does not prevent the model from generating off-topic or harmful content if the prompt or context leads it there. Option D is wrong because setting top_p to 0.1 narrows the probability distribution for token selection, which reduces diversity but does not constrain the model to a specific domain or topic.

Practice this question →

MCQhard

You are using Azure OpenAI Service to generate marketing copy. You have a requirement to reduce the cost of inference without significantly impacting output quality. Which parameter should you adjust?

A.Decrease the max_tokens.

B.Increase the frequency_penalty.

C.Decrease the temperature.

D.Increase the top_p.

AnswerA

Lower max_tokens limits the output length, reducing tokens consumed and cost.

Why this answer

Decreasing max_tokens directly reduces the number of tokens generated per API call, which lowers the compute cost because Azure OpenAI charges per token (both input and output). Since the requirement is to reduce inference cost without significantly impacting output quality, reducing max_tokens is the most direct and effective parameter. It caps the response length, preventing unnecessarily verbose output while preserving the model's ability to generate high-quality, concise copy.

Exam trap

Microsoft often tests the misconception that temperature or top_p are cost-control parameters, when in fact they only affect output diversity and randomness, not token count or pricing; candidates mistakenly think lowering temperature reduces cost because it 'simplifies' output, but the real cost driver is token length.

How to eliminate wrong answers

Option B is wrong because increasing frequency_penalty reduces the likelihood of repeating the same phrases, which can actually increase token usage (and thus cost) by forcing the model to generate more varied, longer responses to avoid repetition. Option C is wrong because decreasing temperature reduces randomness and makes output more deterministic, but it does not directly affect the number of tokens generated or the cost per call; it changes the sampling behavior, not the length. Option D is wrong because increasing top_p (nucleus sampling) expands the pool of candidate tokens considered, which can lead to longer or more diverse outputs, potentially increasing token count and cost, not reducing it.

Practice this question →

Multi-Selecteasy

Which TWO features of Azure AI Content Safety can help you moderate user-generated content in a social media application?

Select 2 answers

A.Self-harm content detection.

B.Hate speech severity detection.

C.PII redaction.

D.Groundedness detection.

E.Prompt injection detection.

AnswersA, B

Detects self-harm content.

Why this answer

Self-harm content detection (A) is a feature of Azure AI Content Safety that specifically identifies text or images related to self-harm, which is a critical category for moderating user-generated content in social media to prevent harm and comply with safety policies. Hate speech severity detection (B) is another core feature that classifies hate speech into severity levels (e.g., low, medium, high), enabling nuanced moderation of offensive content.

Exam trap

The trap here is that candidates may confuse Azure AI Content Safety's features with those of other Azure AI services (like Azure AI Language for PII or Azure OpenAI for prompt injection), leading them to select options that are technically valid in Azure but not part of Content Safety's core moderation capabilities.

Practice this question →

Multi-Selecthard

You are deploying a generative AI model using Azure AI Foundry. The model must be accessible only from within a specific virtual network. Additionally, you need to monitor all API calls for auditing. Which two configurations are required? (Choose two.)

Select 2 answers

A.Assign a managed identity to the model deployment.

B.Configure CORS to allow only the VNet's domain.

C.Enable public network access from selected IP addresses.

D.Enable diagnostic settings to send logs to a Log Analytics workspace.

E.Disable public network access and configure a private endpoint.

AnswersD, E

Logs enable auditing of all API calls.

Why this answer

Option D is correct because enabling diagnostic settings to send logs to a Log Analytics workspace allows you to capture and audit all API calls made to the model deployment. This is essential for monitoring, security auditing, and compliance, as it records detailed telemetry such as request timestamps, caller IPs, and operation names. Option E is correct because disabling public network access and configuring a private endpoint ensures that the model is only accessible from within the specified virtual network, meeting the isolation requirement.

Exam trap

The trap here is that candidates often confuse network-level access controls (like IP whitelisting or CORS) with true VNet isolation via private endpoints, and they overlook that diagnostic settings are the standard Azure mechanism for auditing API calls, not managed identities or CORS.

Practice this question →

MCQmedium

You are developing a solution that uses Azure Document Intelligence to extract data from invoices and then uses Azure OpenAI to summarize the extracted data. The solution occasionally produces summaries that omit key fields like the invoice total. What should you do to improve accuracy?

A.Set temperature to 0 to make the output more deterministic

B.Use a larger model like GPT-4 instead of GPT-3.5

C.Increase the max_tokens parameter

D.Define a structured prompt that explicitly requests each field and provide examples

AnswerD

Structured prompt with explicit requests improves adherence to required fields.

Why this answer

Option D is correct because the issue is that the summarization prompt lacks explicit instructions for which fields to include. By defining a structured prompt that explicitly requests each key field (e.g., invoice total, date, vendor) and providing examples, you guide the Azure OpenAI model to consistently extract and include those fields in the summary, reducing omission errors. This approach leverages prompt engineering to improve output reliability without changing model parameters or size.

Exam trap

The trap here is that candidates often assume that model size or parameter tuning (temperature, max_tokens) is the primary fix for content omission, when in fact prompt engineering—specifically structured prompts with explicit field requests—is the correct solution for ensuring specific data is included in the output.

How to eliminate wrong answers

Option A is wrong because setting temperature to 0 makes the output more deterministic but does not force the model to include specific fields; it only reduces randomness in token selection, not the likelihood of omitting requested content. Option B is wrong because using a larger model like GPT-4 instead of GPT-3.5 improves general reasoning but does not guarantee that key fields are included unless the prompt explicitly requests them; the omission is a prompt design issue, not a model capability issue. Option C is wrong because increasing max_tokens only allows longer responses but does not influence which content the model chooses to include; the model may still omit fields even with a larger token budget.

Practice this question →

Multi-Selectmedium

You are designing a generative AI solution using Azure OpenAI Service with your own data indexed in Azure AI Search. Which THREE components are essential for the retrieval-augmented generation (RAG) pattern?

Select 3 answers

A.Data ingestion pipeline to Azure AI Search

B.Azure AI Search index

C.Azure Functions for orchestration

D.Azure API Management for rate limiting

E.Azure OpenAI model

AnswersA, B, E

Data must be ingested into the search index.

Why this answer

Option A is correct because a data ingestion pipeline is essential to load and index your data into Azure AI Search, enabling the retrieval step in RAG. Without this pipeline, the search index would have no data to query, breaking the retrieval-augmented generation pattern.

Exam trap

The trap here is that candidates often confuse optional production components (like Azure Functions for orchestration or API Management for rate limiting) with the core, mandatory components of the RAG pattern, which are the data source, search index, and the LLM model.

Practice this question →

MCQmedium

You are developing a generative AI solution that uses Azure OpenAI Service. The solution must generate product descriptions in multiple languages. You need to ensure that the model consistently follows specific formatting rules, such as including a bullet list of features. Which strategy should you use?

A.Fine-tune the model with a dataset containing formatted examples.

B.Set a system message with explicit formatting instructions.

C.Increase the max_tokens parameter to allow longer outputs.

D.Adjust the temperature parameter to a lower value.

AnswerB

System messages define behavior and formatting guidelines for the model.

Why this answer

Option B is correct because system messages in Azure OpenAI Service allow you to set persistent instructions that guide the model's behavior across the entire conversation. By including explicit formatting rules—such as requiring a bullet list of features—in the system message, you enforce consistent output structure without retraining the model. This approach is efficient, cost-effective, and directly leverages the API's design for controlling response format.

Exam trap

Microsoft often tests the misconception that fine-tuning is the only way to enforce output structure, when in fact system messages provide a lightweight, zero-shot alternative for formatting control.

How to eliminate wrong answers

Option A is wrong because fine-tuning requires a large, curated dataset and significant compute resources; it is overkill for simple formatting rules and introduces risk of overfitting or losing generality, whereas a system message achieves the same goal with zero training overhead. Option C is wrong because increasing max_tokens only extends the maximum length of the response, not the structure or format; it does not enforce bullet lists or any specific formatting rules. Option D is wrong because lowering the temperature parameter reduces randomness and makes outputs more deterministic, but it does not impose explicit formatting constraints like bullet lists; it controls creativity, not structure.

Practice this question →

MCQeasy

You need to generate a poem using Azure OpenAI. The poem should be about nature and have a cheerful tone. Which parameter should you adjust to influence the tone?

A.top_p

B.temperature

C.max_tokens

D.system message

AnswerD

System message guides the model's overall behavior and tone.

Why this answer

The system message (D) is the correct parameter to influence the tone of a generated poem because it acts as a high-level instruction that sets the behavior, persona, and style of the model. By including a directive like 'You are a cheerful poet writing about nature,' you directly control the tone without altering randomness or output length.

Exam trap

The trap here is that candidates confuse parameters that control randomness (temperature, top_p) with those that control instruction-following and style (system message), leading them to incorrectly select temperature as the primary tone influencer.

How to eliminate wrong answers

Option A is wrong because top_p controls nucleus sampling—the cumulative probability threshold for token selection—and does not directly set tone; it affects diversity of output, not style. Option B is wrong because temperature adjusts the randomness of token probabilities (higher values increase creativity, lower values make output more deterministic), but it does not specify a cheerful tone; it only influences how likely the model is to choose less probable tokens. Option C is wrong because max_tokens limits the length of the generated response and has no impact on the emotional tone or style of the poem.

Practice this question →

MCQeasy

You are developing a generative AI application that uses Azure OpenAI Service. You want to ensure that the application does not generate offensive content. Which Azure service should you use?

A.Azure AI Bot Service

B.Azure AI Content Safety

C.Azure AI Language

D.Azure AI Search

AnswerB

Azure AI Content Safety is designed to detect and filter offensive or harmful content.

Why this answer

Option B is correct because Azure AI Content Safety provides content filtering and moderation for harmful content. Option A is wrong because Azure AI Language is for NLP tasks, not content safety. Option C is wrong because Azure AI Search is for indexing.

Option D is wrong because Azure AI Bot Service is for building chatbots, not content safety.

Practice this question →

MCQmedium

A company uses Azure OpenAI Service to generate product descriptions. They notice that the descriptions sometimes contain factually incorrect information. Which strategy should they use to reduce hallucinations?

A.Increase the temperature parameter to 1.0.

B.Implement Retrieval-Augmented Generation (RAG) by grounding prompts with a knowledge base.

C.Reduce the max_tokens parameter to limit output length.

D.Add a system message instructing the model to be more careful.

AnswerB

RAG provides factual context from a trusted source, reducing hallucinations.

Why this answer

Option B is correct because Retrieval-Augmented Generation (RAG) grounds the model's output in a trusted, external knowledge base, providing factual context that directly reduces hallucinations. By retrieving relevant documents and injecting them into the prompt, the model generates responses based on verified information rather than relying solely on its parametric memory, which is the primary cause of factual inaccuracies in Azure OpenAI Service.

Exam trap

The trap here is that candidates often confuse hyperparameter tuning (temperature, max_tokens) or prompt engineering (system messages) as solutions for factual accuracy, when in fact only grounding with external data (RAG) directly addresses the hallucination problem by providing a verifiable source of truth.

How to eliminate wrong answers

Option A is wrong because increasing the temperature parameter to 1.0 increases randomness and creativity in the output, which actually exacerbates hallucinations by encouraging the model to generate less predictable and potentially more fabricated content. Option C is wrong because reducing max_tokens only truncates the output length; it does not address the root cause of factual inaccuracies and may even cut off critical context or reasoning. Option D is wrong because adding a system message to 'be more careful' is a vague instruction that the model cannot reliably interpret to correct factual errors; it lacks the concrete, grounded data source that RAG provides.

Practice this question →

MCQmedium

Your organization uses Azure OpenAI Service to generate code snippets. You want to log all user prompts and model responses for auditing purposes. What should you configure?

A.Use Azure API Management to log requests and responses.

B.Store logs in Azure Key Vault.

C.Enable diagnostic settings in Azure OpenAI Service to send logs to a Log Analytics workspace.

D.Use Azure AI Search to index the prompts and responses.

AnswerC

Diagnostic settings capture API call logs including prompts and completions for auditing.

Why this answer

Option A is correct because Azure OpenAI Service supports content filtering and logging via diagnostic settings; you can send logs to Azure Monitor or storage. Option B is wrong because Azure AI Search is not for logging. Option C is wrong because Azure API Management proxies but does not natively log model responses.

Option D is wrong because Azure Key Vault is for secrets management.

Practice this question →

MCQhard

Your organization uses Azure OpenAI Service with a data source configured as 'Azure OpenAI on your data'. You notice that the responses include outdated information even though the underlying data source has been updated. What is the most likely cause?

A.The model is using a cached version of the prompt

B.The index in Azure Cognitive Search has not been refreshed

C.The data source is configured to sync only daily

D.The Azure CDN is caching the responses

AnswerB

The search index must be re-indexed to reflect data changes.

Why this answer

When using Azure OpenAI Service with 'Azure OpenAI on your data', the responses are generated by querying an Azure Cognitive Search index that contains your data. If the underlying data source has been updated but the responses still include outdated information, the most likely cause is that the index in Azure Cognitive Search has not been refreshed to reflect those updates. The model itself does not store or cache the data; it relies on the index at query time, so an outdated index directly leads to outdated responses.

Exam trap

The trap here is that candidates may confuse the model's lack of awareness of data updates with caching mechanisms (like CDN or prompt caching), when the real issue is the decoupled indexing pipeline in Azure Cognitive Search that requires explicit refresh.

How to eliminate wrong answers

Option A is wrong because the model does not cache the prompt; each request is processed independently, and caching would not cause outdated information from the data source—it would only affect repeated identical prompts. Option C is wrong because while a sync schedule could cause delays, the question states the data source 'has been updated' and the responses are outdated, implying the index is not refreshed regardless of schedule; the default sync behavior is not the core issue. Option D is wrong because Azure CDN is used for static content delivery, not for caching Azure OpenAI responses, which are dynamic and not routed through CDN.

Practice this question →

MCQeasy

You are developing a generative AI solution that uses Azure OpenAI Service. You need to control the creativity of the generated responses. Which parameter should you adjust?

A.top_p

B.max_tokens

C.temperature

D.frequency_penalty

AnswerC

Temperature controls the randomness and creativity of the output.

Why this answer

Option B is correct because the temperature parameter controls the randomness of the output. Option A is wrong because max_tokens controls output length. Option C is wrong because top_p is another randomness parameter, but temperature is more direct.

Option D is wrong because frequency_penalty reduces repetition.

Practice this question →

MCQeasy

You are building a generative AI solution using Azure OpenAI Service. You want to deploy the model to a specific Azure region to minimize latency for users in Europe. Which deployment parameter must you set?

A.The model version (e.g., 0613).

B.The temperature parameter.

C.The Azure region of the Azure OpenAI resource.

D.The deployment name.

AnswerC

Region determines physical location.

Why this answer

Option C is correct because the Azure region of the Azure OpenAI resource directly determines the physical data center location where the model is deployed. By selecting a region in or near Europe (e.g., France Central, Sweden Central, UK South), you minimize network round-trip latency for European users. The region is set at the resource creation level and cannot be changed after deployment.

Exam trap

The trap here is that candidates confuse deployment parameters (region, capacity, model version) with inference parameters (temperature, max tokens), leading them to select temperature or model version as the answer for minimizing latency.

How to eliminate wrong answers

Option A is wrong because the model version (e.g., 0613) specifies which iteration of the GPT model to use, not the geographic deployment location; it affects model behavior and capabilities, not latency. Option B is wrong because the temperature parameter is a runtime inference setting that controls randomness in output generation, not a deployment parameter that influences where the model runs. Option D is wrong because the deployment name is a user-defined label for the model endpoint (e.g., 'gpt-35-turbo-deployment') and has no impact on the Azure region or network latency.

Practice this question →

Multi-Selecteasy

You are building a generative AI application using Azure OpenAI Service. You need to ensure that the application handles user data securely. Which TWO practices should you implement?

Select 2 answers

A.Disable data logging in the Azure OpenAI Service resource

B.Store prompts and completions in Azure Blob Storage

C.Use HTTPS for all API calls

D.Use system messages to instruct the model not to store data

E.Configure content filtering to block sensitive data

AnswersA, C

Disabling logging prevents storage of user data.

Why this answer

Option A is correct because disabling data logging in Azure OpenAI Service ensures that prompts and completions are not stored by the service for monitoring or abuse detection. This is a key security practice for handling sensitive user data, as it prevents inadvertent retention of confidential information. By default, Azure OpenAI may log data for operational purposes, so explicitly disabling logging is necessary to meet data privacy requirements.

Exam trap

The trap here is that candidates confuse content filtering (which blocks harmful outputs) with data security controls (which prevent data retention), leading them to select Option E instead of recognizing that disabling logging is the direct method to stop data storage.

Practice this question →

MCQeasy

You are designing a solution that uses Azure AI Document Intelligence to extract data from invoices. The solution must classify invoices by vendor and extract line items. Which prebuilt model should you use?

A.Prebuilt invoice model

B.Custom extraction model

C.Prebuilt receipt model

D.Prebuilt layout model

AnswerA

Invoice model extracts vendor and line items.

Why this answer

The Prebuilt invoice model (Option A) is specifically designed to extract common fields from invoices, including vendor details and line items, without requiring custom training. This model is optimized for invoice documents and provides out-of-the-box extraction of structured data such as vendor name, invoice date, and line-item descriptions, quantities, and amounts.

Exam trap

The trap here is that candidates often confuse the Prebuilt layout model with the Prebuilt invoice model, assuming layout extraction is sufficient for invoice data, but layout only provides raw text and table positions without the semantic understanding needed for vendor classification and line-item extraction.

How to eliminate wrong answers

Option B is wrong because a custom extraction model requires labeled training data and is used when prebuilt models do not meet specific document needs, but here the prebuilt invoice model already covers vendor classification and line-item extraction. Option C is wrong because the Prebuilt receipt model is designed for receipts (e.g., from stores or restaurants), not invoices, and does not extract vendor classification or line-item details in the same structured format. Option D is wrong because the Prebuilt layout model extracts text, tables, and selection marks but does not perform semantic classification or vendor-specific field extraction; it lacks the prebuilt understanding of invoice-specific fields.

Practice this question →

MCQhard

Refer to the exhibit. You are configuring an Azure OpenAI Service deployment for document summarization. The current parameters produce summaries that are often too verbose. You need to make the summaries more concise while maintaining factual accuracy. Which parameter change should you make?

A.Increase top_p to 1.0

B.Increase frequency_penalty to 0.5

C.Decrease max_tokens to 100

D.Increase temperature to 0.7

AnswerC

Reducing max_tokens limits output length, making summaries more concise.

Why this answer

Option C is correct because decreasing max_tokens to 100 directly limits the maximum length of the generated summary, forcing the model to produce shorter output. This addresses the verbosity issue without altering the model's factual accuracy, as max_tokens controls output length, not content selection or creativity.

Exam trap

Microsoft often tests the distinction between parameters that control output length (max_tokens) versus those that control creativity or diversity (temperature, top_p, frequency_penalty), leading candidates to mistakenly adjust the latter when the issue is simply excessive length.

How to eliminate wrong answers

Option A is wrong because increasing top_p to 1.0 makes the model consider a wider set of possible tokens, which can increase diversity and potentially lead to even more verbose or less focused summaries. Option B is wrong because increasing frequency_penalty to 0.5 penalizes tokens that have already appeared, reducing repetition but not directly controlling summary length; it may even cause the model to use more unique words, increasing verbosity. Option D is wrong because increasing temperature to 0.7 increases randomness in token selection, which can produce more creative but less consistent summaries, potentially harming factual accuracy and not reliably reducing length.

Practice this question →