AZ-900Chapter 97 of 127Objective 2.2

Azure OpenAI Service

This chapter covers Azure OpenAI Service, a managed service that brings OpenAI's powerful language models to Azure with enterprise security, compliance, and scalability. For AZ-900, this topic falls under Domain 2 (Azure Architecture and Services), Objective 2.2 (Describe Azure AI and machine learning services). While not a high-weight area, Azure OpenAI Service is a hot topic and frequently appears in scenario-based questions. Understanding its key features, use cases, and pricing model is essential for the exam. This chapter will give you the depth you need to answer any question confidently.

25 min read
Intermediate
Updated May 31, 2026

The AI Chef and the Recipe Book

Imagine you own a busy restaurant and want to create new dishes quickly without hiring a world-class chef. Azure OpenAI Service is like having a master chef (the AI model) who has studied millions of recipes (public internet data) and can cook anything you ask. But instead of cooking in your kitchen, you call the chef via a phone order (API call). You pay per dish (token usage) and can choose the chef's specialty level: GPT-4o is a Michelin-star chef for complex dishes, GPT-4o-mini is a line cook for simple tasks. The chef never learns your secret family recipes (your data is not used for training) and never stores your order history unless you ask (data retention). You can also customize the chef's knowledge by giving them a special recipe card (your own data via 'bring your own data') that they read before cooking. If you send too many orders at once, the chef gets overwhelmed and you might get a busy signal (rate limit). This service runs in Microsoft's cloud kitchen (Azure data centers) with enterprise-grade security and compliance, so you can trust the chef with your proprietary ingredients.

How It Actually Works

What is Azure OpenAI Service and What Business Problem Does It Solve?

Azure OpenAI Service is a cloud-based platform that provides access to OpenAI's advanced language models, including GPT-4o, GPT-4 Turbo, GPT-3.5-Turbo, and the DALL-E 3 image generation model, all hosted on Microsoft Azure. The core business problem it solves is the complexity and cost of building, training, and deploying large language models (LLMs) from scratch. Training a model like GPT-4 would require massive GPU clusters, petabytes of data, months of time, and millions of dollars. Azure OpenAI Service removes that barrier by offering pre-trained models as a pay-as-you-go API. This allows businesses to integrate AI capabilities—such as natural language understanding, text generation, summarization, translation, code generation, and image creation—into their applications without needing deep AI expertise.

How It Works: A Step-by-Step Mechanism

Azure OpenAI Service operates on a request-response model. Here's the flow:

1.

User sends a prompt: An application (e.g., a chatbot, content generator) sends a text input (prompt) to the Azure OpenAI endpoint via HTTP request. The prompt can be a question, a command, or a conversation history.

2.

Authentication and authorization: The request must include a valid API key or Azure Active Directory token. Azure checks the key against the service instance and verifies that the caller has permissions (e.g., Cognitive Services OpenAI User role).

3.

Content filtering: Before the prompt reaches the model, Azure's content safety system scans it for harmful content (e.g., hate speech, violence, self-harm). If flagged, the request is blocked and an error is returned.

4.

Model processing: The prompt is tokenized (converted into numerical tokens) and sent to the chosen model (e.g., GPT-4o). The model processes the tokens through its neural network layers, generating a response token by token. The model uses a "temperature" parameter to control randomness: lower temperature (e.g., 0.1) for factual responses, higher (e.g., 0.9) for creative generation.

5.

Response generation: The model outputs a text response, which is then detokenized back to human-readable text. The response is also subject to content filtering before being returned to the user.

6.

Billing and logging: Azure records the number of tokens used (both input and output) for billing. Optionally, you can enable diagnostic logging to capture request/response data for monitoring and debugging.

Key Components, Tiers, and Pricing Models

Models: Azure OpenAI Service offers several model families: - GPT-4o: Latest flagship model, multimodal (text and image input), fast and cost-effective. - GPT-4 Turbo: Previous generation, optimized for instruction following and longer context (up to 128K tokens). - GPT-3.5-Turbo: Faster and cheaper, suitable for simple tasks. - DALL-E 3: Generates images from text descriptions. - Whisper: Speech-to-text transcription. - Text-to-speech (TTS): Converts text to natural-sounding speech.

Pricing: Pay-per-token. Tokens are units of text (roughly 4 characters or 0.75 words). Pricing varies by model:

GPT-4o: $2.50 per 1M input tokens, $10 per 1M output tokens.

GPT-4 Turbo: $10 per 1M input tokens, $30 per 1M output tokens.

GPT-3.5-Turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens.

DALL-E 3: Per image generation (e.g., $0.04 per image for standard resolution).

Provisioned throughput (PTU): For high-volume, latency-sensitive workloads, you can purchase provisioned throughput units (PTUs) that guarantee a certain number of tokens per second (TPS). This is a reserved capacity model, billed hourly, and is more cost-effective for steady-state usage.

Content safety: Built-in content filtering across multiple categories (hate, sexual, violence, self-harm) with configurable severity thresholds. You can also create custom content filters using Azure AI Content Safety.

Data residency and compliance: Azure OpenAI Service is available in many Azure regions. Data processed can be stored at rest in the chosen region. Microsoft does not use your data to retrain the models (unless you explicitly opt in). The service complies with certifications like ISO 27001, SOC 2, HIPAA, and FedRAMP.

How It Compares to On-Premises or Self-Hosted Alternatives

Building an equivalent on-premises solution would require: - Hardware: Hundreds of GPUs (e.g., NVIDIA A100 or H100) costing millions of dollars. - Software: Deep learning frameworks (PyTorch, TensorFlow), model weights (if open-source like Llama 2), and custom infrastructure for scaling. - Expertise: Data scientists, ML engineers, and DevOps to train, fine-tune, and deploy models. - Ongoing costs: Electricity, cooling, hardware maintenance, and regular model updates.

Azure OpenAI Service eliminates all of that. You simply call an API. The trade-off is less control over the model internals and dependency on Azure's network. However, for most businesses, the speed to market and lower total cost of ownership (TCO) vastly outweigh the loss of control.

Azure Portal and CLI Touchpoints

Azure Portal:

Navigate to "Azure OpenAI" and create a new instance (requires approval for new subscriptions).

In the instance, go to "Model deployments" to deploy models like GPT-4o or DALL-E 3.

Use "Playground" (Chat, Completions, DALL-E) to test prompts interactively.

View metrics (tokens used, requests) under "Monitoring".

Set up content filters under "Content Filters".

Azure CLI:

Create an instance: az cognitiveservices account create --name myOpenAI --resource-group myRG --kind OpenAI --sku S0 --location eastus

Deploy a model: az cognitiveservices account deployment create --name myOpenAI --resource-group myRG --deployment-name myGPT4o --model-name gpt-4o --model-version 2024-08-06 --model-format OpenAI

Get endpoint and keys: az cognitiveservices account show --name myOpenAI --resource-group myRG --query properties.endpoint

Bicep/ARM: You can deploy the service as part of infrastructure-as-code. Example Bicep snippet:

resource openAi 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
  name: 'myOpenAI'
  location: 'eastus'
  kind: 'OpenAI'
  sku: {
    name: 'S0'
  }
  properties: {
    customSubDomainName: 'myopenai'
  }
}

Walk-Through

1

Create Azure OpenAI Resource

First, you need to create an Azure OpenAI resource in your Azure subscription. In the Azure portal, search for 'Azure OpenAI' and click 'Create'. Fill in the subscription, resource group, region (e.g., East US), and name. For pricing tier, choose 'Standard S0' for pay-as-you-go. Note: New subscriptions may require explicit access approval via a registration form. Once created, you'll get an endpoint URL and two API keys (Key1 and Key2) that you use to authenticate requests. The keys are shown only once; you must store them securely, e.g., in Azure Key Vault.

2

Deploy a Model

After creating the resource, you need to deploy a specific model (e.g., GPT-4o) to make it accessible via the API. In the Azure portal, go to your OpenAI resource and select 'Model deployments' under 'Resource Management'. Click 'Create new deployment', choose the model (e.g., gpt-4o), version (e.g., 2024-08-06), and give it a deployment name (e.g., my-gpt4o). You can also set the content filter and rate limit defaults. Behind the scenes, Azure allocates compute resources to host that model and exposes an endpoint like `https://<resource-name>.openai.azure.com/openai/deployments/<deployment-name>/chat/completions?api-version=2024-08-01-preview`.

3

Set Up Content Filters

Azure OpenAI Service includes default content filters that block harmful content. You can customize severity thresholds (low, medium, high) for categories like hate, sexual, violence, and self-harm. In the Azure portal, under 'Content Filters', you can create new filter configurations and assign them to deployments. For example, you may want stricter filtering for a public-facing chatbot but relaxed for internal use. You can also disable filters entirely (not recommended). Content filtering happens before the prompt reaches the model and again on the response. If a request is blocked, the API returns a 400 error with a content filtering violation message.

4

Call the API from an Application

To use the deployed model, your application sends HTTP requests to the endpoint. For chat models, you send a POST request with a JSON body containing messages (system, user, assistant) and parameters like temperature, max_tokens, etc. Example using Python: ```python import openai openai.api_base = 'https://<resource-name>.openai.azure.com/' openai.api_key = '<your-key>' openai.api_type = 'azure' openai.api_version = '2024-08-01-preview' response = openai.ChatCompletion.create( engine='my-gpt4o', messages=[{'role':'user','content':'Hello!'}] ) ``` Azure handles authentication via the API key (or Azure AD token), processes the request, and returns the model's response. You are billed for the tokens used in both the request and response.

5

Monitor Usage and Costs

To avoid unexpected bills, you should monitor your token usage. In the Azure portal, under 'Metrics', you can view charts for 'Total Tokens', 'Prompt Tokens', 'Completion Tokens', and 'Requests'. You can set up alerts (e.g., email when token usage exceeds a threshold). Also, consider using Azure Cost Management to track spending by resource. For production applications, you might implement a token budget per user or session. Additionally, you can enable diagnostic settings to stream logs to Log Analytics or Storage for further analysis. Remember that provisioned throughput units (PTUs) have a different billing model—hourly, regardless of usage—so choose wisely based on traffic patterns.

What This Looks Like on the Job

Scenario 1: Customer Support Chatbot for an E-commerce Company

A large online retailer wants to reduce customer service costs by automating responses to common inquiries like order status, returns, and product questions. They deploy Azure OpenAI Service with GPT-4o to power a chatbot on their website. The team configures the model with a system message that instructs it to act as a helpful customer support agent. They also use 'bring your own data' to inject the company's FAQ and return policy into the prompt, so the model answers accurately. The chatbot handles 80% of queries without human intervention. The team monitors usage and sees an average of 500 tokens per conversation. At $10 per 1M output tokens, the cost is roughly $0.005 per conversation. However, they hit a snag: the default content filter blocks some legitimate conversations (e.g., mentioning 'kill' in a gaming context). They adjust the severity threshold for the violence category to 'high' to reduce false positives. The main risk is cost overrun if traffic spikes; they implement a daily token cap using Azure API Management.

Scenario 2: Code Generation Assistant for a Software Development Team

A software company uses Azure OpenAI Service to help developers write code faster. They deploy GPT-4 Turbo (128K context) to handle large codebases. The team creates an internal tool where developers can paste a function and ask for optimization suggestions. The model is configured with a temperature of 0.2 for consistent, deterministic outputs. They also enable streaming to show responses token-by-token, improving user experience. The team uses provisioned throughput (PTU) because they have steady usage of 1000 requests per minute. They purchase 10 PTUs, which guarantees a certain throughput. The cost is predictable but higher than pay-as-you-go during off-peak hours. A common mistake is not setting a max_tokens limit, causing the model to generate overly long responses and increasing costs. They set max_tokens to 500 per request. Data security is critical: they ensure no proprietary code is logged by disabling request/response logging.

Scenario 3: Generating Marketing Content for a Global Brand

A marketing agency uses DALL-E 3 within Azure OpenAI to generate product images for campaigns. They create a web app where users describe an image (e.g., 'a futuristic car in a city at night') and receive a generated image. The team deploys the DALL-E 3 model and integrates it with their content management system. They use the 'quality' parameter to balance speed and detail. The cost is per image: $0.04 for standard resolution, $0.08 for HD. They implement a queue to handle concurrent requests because the service has a rate limit of 10 requests per minute per deployment. They also set up content filters to block inappropriate image generations. A key challenge is that DALL-E 3 may produce images that inadvertently include copyrighted elements; they add a post-processing step to check for IP infringement. The agency uses Azure's content moderation API to flag risky images before publishing.

How AZ-900 Actually Tests This

Exam Objective 2.2: Describe Azure AI and machine learning services

For AZ-900, you are not expected to know how to code or deploy models. Instead, focus on: - What Azure OpenAI Service is: A managed service providing access to OpenAI's models on Azure with enterprise features. - Key benefits: Built-in content filtering, data residency, compliance, and integration with other Azure services. - Pricing model: Pay-per-token for standard, or provisioned throughput for reserved capacity. - Use cases: Text generation, summarization, code generation, image generation (DALL-E), and natural language understanding. - Data privacy: Microsoft does not use your data to retrain models; data stays in your chosen region.

Common Wrong Answers and Why Candidates Choose Them

1.

"Azure OpenAI Service is the same as using OpenAI directly." Many candidates think it's just a rebrand. Reality: Azure OpenAI adds enterprise SLAs, data residency, Azure Active Directory integration, content filtering, and compliance certifications not available on openai.com.

2.

"You must train the models yourself." Some assume you need to provide training data. Reality: The models are pre-trained. You can fine-tune (customize) them, but that's an advanced topic not tested on AZ-900.

3.

"It is free for Azure subscribers." No, it's a paid service with pay-as-you-go pricing. The confusion comes from Azure's free tier for other services, but OpenAI is not included.

4.

"It only works with text." The exam may test that GPT-4o is multimodal (text and image input). DALL-E 3 generates images. So it's not just text.

Specific Terms and Values - Token: The unit of billing. Roughly 4 characters or 0.75 words. - GPT-4o: The latest model, multimodal, and cost-effective. - DALL-E 3: Image generation model. - Content filtering: Built-in, configurable per severity level. - Provisioned throughput (PTU): Reserved capacity for predictable performance. - Standard S0: The pay-as-you-go pricing tier.

Edge Cases and Tricky Distinctions - Azure OpenAI vs. Azure Cognitive Services: Azure OpenAI is part of Cognitive Services but is listed separately in the portal. The exam may ask which service to use for text generation: answer is Azure OpenAI (not Language Understanding or QnA Maker). - Fine-tuning vs. Prompt engineering: Fine-tuning requires you to upload your own data; prompt engineering does not. For AZ-900, only know that fine-tuning is possible but not required. - Data residency: If a question mentions data sovereignty, Azure OpenAI supports regional deployment. The data stays in the region where the resource is created.

Memory Trick: Think "A-OAI" = Azure OpenAI = Added On Azure Infrastructure. The key exam points: enterprise security, content filtering, pay-per-token, no data used for training.

Key Takeaways

Azure OpenAI Service provides enterprise-grade access to OpenAI models (GPT-4o, GPT-4 Turbo, DALL-E 3) with built-in content filtering, data residency, and compliance certifications.

Pricing is pay-per-token (standard) or provisioned throughput (reserved capacity). Tokens are the unit of billing (~4 characters per token).

Microsoft does not use your data to retrain models by default; data stays in your chosen Azure region.

The service can be used for text generation, code generation, image generation, summarization, and translation.

Content filtering is enabled by default and can be configured per severity level for categories like hate, sexual, violence, and self-harm.

To use the service, you create an Azure OpenAI resource, deploy a model, and call the API with an endpoint and key.

Azure OpenAI is part of Azure Cognitive Services but is managed separately in the portal.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure OpenAI Service

Provides access to general-purpose large language models (GPT-4, DALL-E).

Suitable for open-ended text generation, chat, summarization, code generation.

Billed per token (input + output) or provisioned throughput.

Models are pre-trained on internet-scale data; fine-tuning optional.

Supports multimodal (text, image, speech) via different models.

Azure Cognitive Services (e.g., Language Service)

Provides specialized pre-built AI models for specific tasks (e.g., sentiment analysis, key phrase extraction, translation).

Suitable for structured NLP tasks with predefined outputs.

Billed per transaction (e.g., per 1000 text records) or per character.

Models are pre-trained and task-specific; no fine-tuning (except custom models).

Supports text only (except for vision and speech services).

Watch Out for These

Mistake

Azure OpenAI Service is the same as using ChatGPT.

Correct

ChatGPT is a consumer product by OpenAI. Azure OpenAI Service is an enterprise API that gives you programmatic access to the same underlying models (e.g., GPT-4o) but with Azure's security, compliance, and scalability features. You can build your own applications on top of it, while ChatGPT is a standalone chat interface.

Mistake

Your data is used to improve the models.

Correct

By default, Microsoft does not use your data to retrain or improve the models. Your data remains in your Azure tenant and is not shared with OpenAI. You can opt in to data sharing for model improvement, but it is off by default. This is a key enterprise differentiator.

Mistake

You need to train the model from scratch.

Correct

Azure OpenAI Service provides pre-trained models. You do not need to train them. You can optionally fine-tune a model with your own data, but that is an advanced feature not required for basic usage. The service is designed to be used out-of-the-box.

Mistake

Azure OpenAI Service is only for text generation.

Correct

It supports multiple modalities: GPT-4o can accept text and image inputs, DALL-E 3 generates images, and Whisper transcribes speech. So it is not limited to text.

Mistake

It is free to use with an Azure subscription.

Correct

Azure OpenAI Service is a paid service. You pay for the tokens you use (or provisioned throughput). There is no free tier for this service, though you may get initial credits with a new subscription.

Frequently Asked Questions

What is the difference between Azure OpenAI Service and OpenAI's API?

Azure OpenAI Service is a Microsoft-managed version of OpenAI's models hosted on Azure. It provides additional enterprise features like Azure Active Directory integration, data residency (your data stays in the region you choose), content filtering, and compliance certifications (HIPAA, FedRAMP, etc.). OpenAI's API is directly from OpenAI and may have different data handling policies. For AZ-900, remember that Azure OpenAI is the enterprise-ready option.

How is Azure OpenAI Service billed?

It is billed based on token usage (pay-as-you-go) or provisioned throughput (reserved capacity). Tokens are units of text; roughly 4 characters or 0.75 words. Each model has its own per-token price. For example, GPT-4o costs $2.50 per 1M input tokens and $10 per 1M output tokens. Provisioned throughput (PTU) is billed per hour regardless of usage, but guarantees a certain number of tokens per second.

Can I use my own data with Azure OpenAI Service?

Yes, you can use 'bring your own data' to inject your own documents (e.g., PDFs, web pages) into the model's context. This allows the model to answer based on your specific information. Additionally, you can fine-tune a model with your own dataset, but that is more advanced. For AZ-900, just know that you can use your own data without retraining the base model.

Does Azure OpenAI Service support image generation?

Yes, through the DALL-E 3 model. You can generate images from text descriptions. This is a separate model deployment. The billing is per image, not per token. For example, standard resolution images cost $0.04 each. The exam may test that Azure OpenAI includes DALL-E for image generation.

What content filtering is available in Azure OpenAI Service?

Azure OpenAI Service includes built-in content filters that block harmful content in prompts and responses. Filters cover categories: hate, sexual, violence, self-harm. You can configure severity thresholds (low, medium, high) for each category. You can also create custom content filters using Azure AI Content Safety. If a request is blocked, the API returns a 400 error.

Is Azure OpenAI Service available in all Azure regions?

No, it is available in select regions, including East US, West Europe, and France Central. New regions are added over time. For data residency requirements, you must choose a region that meets your compliance needs. The exam may test that you can select a region for data residency.

What is the difference between GPT-4o and GPT-4 Turbo?

GPT-4o is the latest model, offering faster response times and lower cost. It is multimodal (accepts text and image inputs). GPT-4 Turbo is the previous generation, optimized for instruction following and supports up to 128K tokens of context. For AZ-900, you don't need to memorize details, but know that different models exist for different use cases and costs.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure OpenAI Service — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Done with this chapter?