AI-900Chapter 81 of 100Objective 5.3

System Messages and Grounding Prompts

This chapter covers system messages and grounding prompts in Azure OpenAI Service, which are critical for controlling generative AI behavior. For the AI-900 exam, this topic appears in approximately 10-15% of questions under Objective 5.3, focusing on how to guide model outputs and reduce hallucinations. Understanding the distinction between system messages (persistent instructions) and grounding prompts (contextual data) is essential for designing safe and effective AI solutions. You will learn the exact syntax, best practices, and common pitfalls tested on the exam.

25 min read
Intermediate
Updated May 31, 2026

System Messages as the Constitution for AI

Imagine you are the CEO of a large corporation and you hire a new executive assistant. On their first day, you hand them a written document: the 'Company Constitution.' This document explicitly states: 'You are an executive assistant. Your role is to manage my calendar, screen calls, and draft emails. You must never make financial decisions, never share internal data with outsiders, and always respond in a professional tone. When you don't know something, say so, and ask for clarification.' This constitution is the system message. It sets the assistant's identity, boundaries, and behavior rules. Now, when you give a specific task like 'Schedule a meeting with the marketing team,' the assistant interprets that request through the lens of the constitution. The constitution is not the task; it is the persistent framework that shapes every response. If you later give a grounding prompt like 'The meeting is about Q3 budget, and use the attached spreadsheet for data,' that is like handing the assistant a specific reference document for that task. The grounding prompt provides context and facts for the current interaction, but the system message remains the unchanging set of rules. Without the constitution, the assistant might act on the task in unpredictable ways—maybe scheduling a meeting without checking your calendar, or using an informal tone. The system message ensures consistency and safety across all interactions.

How It Actually Works

What Are System Messages and Grounding Prompts?

System messages and grounding prompts are two mechanisms to control the behavior of large language models (LLMs) in Azure OpenAI Service. They are part of the broader concept of prompt engineering, which is the practice of crafting inputs to elicit desired outputs from generative AI models.

System messages are instructions that set the overall context, tone, and rules for the model's responses. They are sent as part of the API call's system role in the chat completions endpoint. The system message persists throughout the entire conversation session, influencing every response the model generates. For example, a system message might state: "You are an AI assistant that helps users find information about Azure services. Always respond in a professional tone and cite your sources."

Grounding prompts (or grounding data) are external sources of information provided to the model to anchor its responses in facts, reducing the likelihood of hallucinations. In Azure OpenAI, grounding is achieved through the Azure OpenAI On Your Data feature, which connects the model to a data source (e.g., Azure Cognitive Search index, blob storage, or a database). The model retrieves relevant chunks from that data source and uses them as context when generating responses. Grounding prompts are not part of the system message; they are injected into the prompt as context or retrieved via a retrieval-augmented generation (RAG) pattern.

Why They Exist

LLMs are trained on vast public datasets but have no inherent knowledge of proprietary data, recent events, or specific business rules. Without guidance, they can produce outputs that are off-brand, factually incorrect, or unsafe. System messages provide a way to constrain the model's behavior without retraining. Grounding prompts ensure the model's responses are based on authoritative data rather than its internal (and potentially outdated) knowledge.

How System Messages Work Internally

When you call the Azure OpenAI chat completions API, you send an array of messages with roles: system, user, and assistant. The system message is typically the first element and sets the stage. For example:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant. Answer only from the provided context. If you don't know, say you don't know."},
    {"role": "user", "content": "What is the capital of France?"}
  ]
}

The model processes the entire conversation history (including the system message) as a single sequence of tokens. The system message is not hidden from the model; it is part of the input. The model learns to follow the system instruction because it has been trained on examples where similar instructions were used. In practice, the system message is most effective when it is clear, concise, and placed at the beginning of the conversation.

Key parameters: - temperature: Controls randomness. Lower values (e.g., 0.0) make the model more deterministic, which is desirable for grounded responses. - top_p: Nucleus sampling. Also controls randomness. - max_tokens: Limits the length of the response. - stop: Sequences that stop generation.

How Grounding Prompts Work Internally

Grounding via Azure OpenAI On Your Data uses a RAG architecture: 1. Indexing: Your data is indexed into a search service (e.g., Azure Cognitive Search). The index stores chunks of text with embeddings. 2. Query: When a user asks a question, Azure OpenAI converts the user's query into an embedding vector. 3. Retrieval: The vector is used to search the index for semantically similar chunks. The top-k (e.g., 5) chunks are retrieved. 4. Augmentation: Those chunks are inserted into the prompt as context, typically after the system message and before the user's question. The model then generates a response based on that context. 5. Response: The model's output is constrained by the retrieved data, reducing hallucinations.

The grounding prompt is not a separate API parameter; it is the combination of the retrieved chunks and the system message. You can also manually provide grounding data by including it in the user message (e.g., "Based on this article: [text], answer: ...").

Key Values and Defaults

System message max length: No hard limit, but best practice is under 1500 tokens.

Top-K retrieval: Default is 5 chunks. Can be configured from 1 to 20.

Chunk size: Default 1024 tokens. Can be set from 256 to 2048.

Strictness: When grounding is enabled, you can set a strictness level (1-5) that controls how strictly the model must stay within the provided context. Higher strictness reduces hallucinations but may cause the model to refuse to answer even when the answer is implicitly in the data.

Role: The system message can define a role (e.g., "You are a customer support agent for Contoso"). The model will adopt that persona.

Configuration and Verification

To configure system messages and grounding in Azure OpenAI, you use the Azure Portal or API.

Azure Portal (Chat Playground): 1. Go to Azure OpenAI Studio. 2. Select your deployment. 3. In the Chat Playground, you can set the system message in the "System message" field. 4. To enable grounding, click "Add your data" and connect to a data source (e.g., Azure Cognitive Search). 5. Configure the retrieval parameters (e.g., number of chunks, strictness).

API Call (Python SDK):

import openai

openai.api_type = "azure"
openai.api_base = "https://your-resource.openai.azure.com/"
openai.api_version = "2023-12-01-preview"
openai.api_key = "your-key"

response = openai.ChatCompletion.create(
  engine="gpt-4",
  messages = [
    {"role": "system", "content": "You are an AI assistant that helps with Azure. Only answer from the provided context."},
    {"role": "user", "content": "What is Azure Cognitive Search?"}
  ],
  temperature=0.0,
  max_tokens=500
)

To verify that grounding is working, check the response's context field (if using On Your Data) which contains the retrieved chunks.

Interaction with Related Technologies

System messages and grounding prompts work alongside: - Content filtering: Azure OpenAI has built-in content filters that can block harmful outputs. System messages can also instruct the model to avoid certain topics. - Prompt engineering: System messages are a core prompt engineering technique. Others include few-shot learning (providing examples) and chain-of-thought prompting. - Fine-tuning: System messages can be used even with fine-tuned models. However, fine-tuning changes the base model, so system messages may need adjustment. - Azure AI Content Safety: Can be integrated to add an additional layer of filtering on top of system messages.

Best Practices for System Messages

Be specific: "You are a helpful assistant" is too vague. Instead: "You are a customer support agent for Contoso Ltd. You answer questions about Contoso products only. If asked about competitors, politely decline."

Use positive instructions: "Do not mention pricing" is less effective than "Only provide pricing when explicitly asked."

Place the system message at the beginning of the conversation.

Keep it concise: Long system messages may be ignored or cause the model to lose focus.

Test iteratively: Small changes can have large effects.

Best Practices for Grounding Prompts

Use high-quality, authoritative data sources.

Set appropriate chunk sizes and retrieval counts.

Use strictness settings to balance accuracy and flexibility.

Monitor for hallucinations even with grounding, as the model may still misinterpret the retrieved data.

Combine with system messages to set boundaries (e.g., "Only answer from the provided context").

Common Pitfalls

Overly restrictive system messages: Can cause the model to refuse legitimate requests.

Conflicting instructions: If the system message says "be creative" but grounding provides strict data, the model may become confused.

Ignoring grounding data: If the system message does not instruct the model to use the context, the model may ignore it.

Leaving system message empty: The model will default to its training behavior, which may not be appropriate for your scenario.

Walk-Through

1

Define the system message

Start by crafting a clear system message that defines the AI's role, tone, and boundaries. For example: 'You are an Azure support engineer. Answer only from the provided documentation. If unsure, say you don't know.' This message is sent as the first entry in the messages array with role 'system'. It will influence every response in the conversation. In the API, you set this in the 'messages' parameter. In Azure OpenAI Studio, you enter it in the 'System message' field. Ensure it is concise—under 1500 tokens—and uses positive language. Avoid contradictions like 'be creative but stick to facts.'

2

Prepare grounding data source

Index your authoritative data into Azure Cognitive Search or another supported data source. This involves uploading documents (PDFs, Word docs, web pages) and configuring a skillset to chunk and embed the text. Default chunk size is 1024 tokens with 200-token overlap. Choose an appropriate search configuration (e.g., semantic search for better relevance). In the Azure Portal, you create a Cognitive Search service, then use the Import Data wizard or a custom pipeline. The index will be used to retrieve relevant chunks at query time.

3

Configure grounding in Azure OpenAI

In Azure OpenAI Studio, navigate to your deployment and open the Chat Playground. Click 'Add your data' and select your data source (e.g., Azure Cognitive Search). Configure the retrieval parameters: number of chunks (default 5), strictness (default 3), and whether to include semantic search. Optionally, set a 'role' for the system message that instructs the model to use the context. For API calls, you use the 'dataSources' parameter in the request body. The model will automatically retrieve and inject context.

4

Test and iterate

Send sample queries to verify the model's behavior. Check that responses are based on the grounded data and adhere to the system message. For example, ask a question outside the data scope (e.g., 'What is the weather?') to see if the model correctly refuses. Adjust the system message if the model is too restrictive or too permissive. Tweak retrieval parameters: increase strictness if hallucinations occur, increase chunk count for more context. Use the 'context' field in the API response to see which chunks were retrieved.

5

Deploy and monitor

Once satisfied, deploy the model via a web app or API endpoint. In production, monitor the system message effectiveness and grounding accuracy. Use Azure Monitor to track metrics like response length, content filter hits, and user feedback. Periodically update the data source to keep it current. If the model starts hallucinating, review the system message and retrieval configuration. Consider adding a 'safety system message' that reinforces the use of grounded data.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Chatbot for a Software Company A software company deploys a customer support chatbot using Azure OpenAI. They have a large knowledge base of troubleshooting articles, release notes, and FAQs. The system message is: 'You are a support agent for Acme Software. Only answer from the provided knowledge base. If you don't have an answer, suggest opening a support ticket.' Grounding is configured with Azure Cognitive Search indexing all support articles. The chatbot handles thousands of queries daily. A common issue is that the model may retrieve irrelevant chunks due to poor indexing or vague queries. To mitigate, the team sets strictness to 4 and uses semantic search. They also add a fallback system message: 'If the context does not contain the answer, say: I need to escalate this to a human agent.' Performance considerations: latency increases with the number of retrieved chunks; they use 3 chunks to balance speed and accuracy.

Enterprise Scenario 2: Internal HR Policy Assistant A large enterprise uses an AI assistant to answer employee questions about HR policies. The data source includes PDFs of employee handbooks, benefit documents, and compliance guidelines. The system message is: 'You are an HR assistant. Provide accurate information based on the company's official policies. Do not give legal advice or speculate.' Grounding is essential to avoid outdated or incorrect answers. The HR team updates the data source quarterly. A misconfiguration occurred when the indexing pipeline failed to update after a policy change, causing the assistant to give old information. To prevent this, they set up a scheduled indexer with a 24-hour refresh. They also monitor the assistant's responses for policy citations. The strictness is set to 5 to ensure the model never fabricates policies.

Enterprise Scenario 3: Medical Information Bot (Non-diagnostic) A healthcare provider creates a bot that answers general medical questions using trusted medical textbooks and guidelines. The system message explicitly states: 'You are an informational assistant. Do not provide diagnoses or treatment recommendations. Always advise users to consult a doctor.' Grounding uses a curated set of medical literature. A critical issue is that the model might retrieve a chunk that is out of context and provide misleading information. To address this, the team uses a custom chunking strategy with 512-token chunks and 50-token overlap, and they manually review the top 10 retrieved chunks for quality. They also implement a content filter that blocks any response containing certain keywords (e.g., 'diagnosis'). The system message includes: 'If the context is ambiguous, say: Please consult your healthcare provider.'

How AI-900 Actually Tests This

What AI-900 Tests (Objective 5.3) The exam focuses on understanding the purpose and use of system messages and grounding prompts, not on deep technical implementation. Key points:

System messages set the persona, tone, and rules for the AI.

Grounding prompts provide context and data to reduce hallucinations.

You need to know that system messages are part of the prompt engineering technique.

Grounding is achieved through Azure OpenAI On Your Data feature, which connects to Azure Cognitive Search or other data sources.

The exam may ask about the order of messages: system message comes first.

Common exam question: 'Which technique would you use to ensure an AI assistant only answers from company documents?' Answer: Grounding prompts (On Your Data).

Common Wrong Answers and Why 1. 'Fine-tuning the model' – Many candidates think fine-tuning is the only way to inject domain knowledge. But fine-tuning requires labeled data and retraining, whereas grounding is simpler and does not change the model. The exam tests that grounding is the recommended approach for domain-specific answers without retraining. 2. 'Using a higher temperature' – Some believe that increasing temperature makes the model more accurate. In reality, higher temperature increases creativity and randomness, which can worsen hallucinations. Lower temperature (0.0) is used for grounded responses. 3. 'Setting the user message instead of system message' – Candidates might think instructions can go anywhere. But system messages are specifically designed for persistent instructions; user messages are for the current query. The exam distinguishes the roles. 4. 'Using content filters' – Content filters block harmful content but do not ground the model in specific data. They are a separate safety mechanism.

Specific Numbers and Terms - Default number of retrieved chunks: 5. - Default chunk size: 1024 tokens. - Strictness range: 1 to 5 (higher = stricter). - The feature name: 'Azure OpenAI On Your Data'. - Data source types: Azure Cognitive Search, Azure Blob Storage, Azure Cosmos DB, Azure SQL Database. - The API parameter for grounding in the request: 'dataSources'.

Edge Cases the Exam Loves - If the system message and grounding data conflict, the model may follow the system message. For example, if system message says 'be creative' but grounding data is strict, the model might ignore grounding. - If no system message is provided, the model uses its default behavior (helpful but unconstrained). - Grounding does not guarantee 100% accuracy; the model can still misinterpret retrieved data. - The exam may ask what happens when the grounding data does not contain the answer: the model should refuse to answer (if instructed) or might hallucinate.

How to Eliminate Wrong Answers - If the question asks about 'changing the model's behavior permanently,' it is fine-tuning, not grounding. - If the question mentions 'providing current data without retraining,' it is grounding. - If the question talks about 'setting rules for the AI's personality,' it is system message. - The word 'system message' always refers to the system role in the chat API.

Key Takeaways

System messages define the AI's persona, tone, and rules; they are sent as the first message with role `system`.

Grounding prompts provide external data to reduce hallucinations, typically via Azure OpenAI On Your Data.

The default number of retrieved chunks for grounding is 5; default chunk size is 1024 tokens.

Strictness ranges from 1 to 5; higher values force the model to stay closer to the provided context.

Grounding does not require fine-tuning and can be used with GPT-3.5, GPT-4, and other models.

System messages should be specific, positive, and placed at the beginning of the conversation.

Common exam scenario: Use grounding to answer questions from company documents without retraining.

If no system message is provided, the model defaults to its training behavior (helpful but unconstrained).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

System Messages

Set persistent role, tone, and behavior rules.

Applied via the `system` role in the chat API.

Do not provide specific factual data.

Effective for defining persona and constraints.

Example: 'You are a professional assistant.'

Grounding Prompts

Provide context-specific data to anchor responses.

Applied via data sources (On Your Data) or manual context injection.

Reduce hallucinations by supplying authoritative information.

Effective for ensuring factual accuracy.

Example: 'Based on the following document: [text]...'

Watch Out for These

Mistake

System messages are optional and have little effect on the model's output.

Correct

System messages are highly influential. They set the overall context and can significantly change the model's behavior. Leaving them empty or using generic phrases like 'You are a helpful assistant' may lead to unpredictable outputs. The exam expects you to know that system messages are a critical part of prompt engineering.

Mistake

Grounding prompts are the same as system messages.

Correct

They are different. System messages set persistent rules and persona. Grounding prompts provide specific data for the current query. System messages are part of the `system` role; grounding data is injected as context from a data source or manually in the `user` message. The exam tests this distinction.

Mistake

Once you enable grounding, the model will never hallucinate.

Correct

Grounding reduces but does not eliminate hallucinations. The model can still misinterpret retrieved data, combine information incorrectly, or ignore the context if the system message does not instruct it to use the data. Strictness settings help but are not foolproof.

Mistake

You can only use grounding with Azure Cognitive Search.

Correct

Azure OpenAI On Your Data supports multiple data sources: Azure Cognitive Search, Azure Blob Storage, Azure Cosmos DB, Azure SQL Database, and more. The exam may list these as options. Cognitive Search is the most common but not the only one.

Mistake

System messages are only for the first conversation turn.

Correct

System messages persist for the entire conversation session. Each API call includes the system message, so it influences every response. However, you can update the system message mid-conversation by sending a new system message in a subsequent turn, but best practice is to keep it consistent.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a system message and a user message in Azure OpenAI?

A system message sets the overall context, persona, and rules for the AI assistant. It is sent with role `system` and persists for the entire conversation. A user message is the actual query or instruction from the user, sent with role `user`. The model treats system messages as high-level instructions and user messages as the immediate task. For example, the system message might say 'You are a helpful assistant,' while the user message asks 'What is Azure AI?' The exam tests that system messages are for persistent behavior, not for one-off queries.

How do I ensure my AI model only answers from company data?

Use the Azure OpenAI On Your Data feature to ground the model in your data. Configure a data source (e.g., Azure Cognitive Search) that indexes your company documents. Then, in the system message, instruct the model to only answer from the provided context. For example: 'You are a company assistant. Only answer from the provided context. If the context does not contain the answer, say you don't know.' Set strictness to a high value (e.g., 4 or 5) to enforce this. The exam expects you to know that grounding is the key technique, not fine-tuning.

What is the default number of chunks retrieved when using Azure OpenAI On Your Data?

The default number of retrieved chunks is 5. This is a specific number that appears on the exam. You can configure it from 1 to 20. More chunks provide more context but increase latency and token usage. For most scenarios, 3-5 chunks are recommended. The chunk size default is 1024 tokens. The exam may ask about these defaults.

Can I update the system message mid-conversation?

Yes, you can send a new system message in a subsequent API call, but it will replace the previous one. However, best practice is to keep the system message consistent throughout the conversation to avoid confusing the model. The exam does not focus on this nuance but expects you to know that the system message is set at the start and applies to the whole session.

What happens if the grounding data does not contain the answer to the user's question?

If the system message instructs the model to only answer from the provided context, the model should refuse to answer or say it does not know. However, if the system message is not explicit, the model may hallucinate an answer. The strictness setting also affects this: with high strictness, the model is more likely to refuse; with low strictness, it may guess. The exam tests that you should set an appropriate system message to handle this scenario.

Is fine-tuning required to use grounding prompts?

No, grounding prompts do not require fine-tuning. Grounding works with any Azure OpenAI model (e.g., GPT-3.5, GPT-4) by providing context at inference time. Fine-tuning modifies the model's weights, which is a separate process. The exam emphasizes that grounding is a simpler and faster way to inject domain knowledge without retraining.

What is the 'strictness' parameter in Azure OpenAI On Your Data?

Strictness is a parameter (1-5) that controls how strictly the model must adhere to the provided context. A value of 5 means the model will almost never deviate from the retrieved chunks, reducing hallucinations but potentially causing the model to refuse to answer even when the answer is implicitly present. A value of 1 allows more flexibility. The default is 3. The exam may ask about the purpose of strictness.

Terms Worth Knowing

Ready to put this to the test?

You've just covered System Messages and Grounding Prompts — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?