AI-900Chapter 20 of 100Objective 5.1

What is Generative AI?

This chapter covers generative AI, a transformative technology that creates new content rather than just analyzing existing data. For the AI-900 exam, understanding generative AI is crucial as it represents a growing domain in AI solutions. Approximately 10-15% of exam questions touch on generative AI concepts, including its definition, capabilities, and responsible AI considerations. You will learn what generative AI is, how it differs from discriminative AI, and key examples like GPT and DALL-E.

25 min read
Intermediate
Updated May 31, 2026

Generative AI as a Master Chef

Imagine a master chef who has studied thousands of recipes. When asked to create a new dish, the chef doesn't just repeat an existing recipe; instead, they understand the underlying principles of flavor combinations, cooking techniques, and ingredient properties. The chef can generate a novel recipe by predicting the next ingredient based on the previous ones, ensuring coherence and taste. Similarly, generative AI models are trained on vast datasets of text, images, or code. They learn the statistical patterns and structures within the data. When prompted, they generate new content by predicting the next token (word, pixel, etc.) in a sequence, conditioned on the input. The model doesn't simply memorize; it creates new combinations that are statistically plausible based on its training. Just as the chef might sometimes produce a dish that doesn't work, generative AI can produce outputs that are incorrect or nonsensical, requiring human oversight.

How It Actually Works

What is Generative AI?

Generative AI refers to a category of artificial intelligence models that generate new content—text, images, audio, video, or code—based on patterns learned from training data. Unlike discriminative models that classify or predict labels (e.g., 'Is this a cat?'), generative models produce original outputs that resemble the training distribution. For example, GPT-4 generates human-like text, while DALL-E creates images from textual descriptions. Generative AI is powered by deep learning, particularly transformer architectures.

Why Generative AI Exists

Traditional AI focused on perception and prediction: recognizing objects, translating languages, or recommending products. Generative AI addresses the need for creation—automating content production, aiding creativity, and enabling new forms of human-computer interaction. It powers applications like chatbots, code assistants, and art generation. The AI-900 exam tests the fundamental understanding that generative AI creates new data, not just analyzes it.

How Generative AI Works Internally

Generative models learn the probability distribution of the training data. For text, this means learning the likelihood of word sequences. During generation, the model samples from this distribution to produce new sequences. - Training Phase: The model is trained on a large corpus (e.g., billions of words). It learns to predict the next token given previous tokens. This is done using self-supervised learning, where the model masks parts of input and predicts them. - Architecture: Most modern generative models use transformers. Transformers consist of an encoder and decoder (or decoder-only for GPT). They use attention mechanisms to weigh the importance of different tokens in the context. Key components: multi-head self-attention, feed-forward neural networks, layer normalization. - Generation Phase: Given a prompt, the model generates tokens autoregressively—one token at a time. At each step, it computes a probability distribution over the vocabulary (e.g., 50,000 tokens for GPT-3) and selects a token based on sampling strategies like top-k (e.g., k=50) or top-p (e.g., p=0.9). Temperature controls randomness: lower temperature (e.g., 0.2) makes output more deterministic; higher (e.g., 0.8) increases diversity.

Key Components and Defaults

Tokenization: Text is split into tokens (subwords). GPT-3 uses byte pair encoding with a vocabulary of ~50,000 tokens.

Context Window: The maximum number of tokens the model can consider. GPT-3.5 has a context window of 4,096 tokens; GPT-4 can handle up to 32,768 tokens (8,000 for standard version).

Parameters: Number of weights. GPT-3 has 175 billion parameters; GPT-4 has reportedly 1.76 trillion (not officially confirmed).

Sampling Parameters:

Temperature: 0.7 is common default.

Top-p (nucleus sampling): 0.9 typical.

Top-k: 50 common.

Fine-tuning: Adapting a pre-trained model to a specific task using a smaller dataset. This adjusts weights slightly, not from scratch.

Configuration and Verification Commands

In Azure OpenAI Service, you interact with generative models via REST API or SDK. Example using Python:

import openai
openai.api_type = "azure"
openai.api_base = "https://your-resource.openai.azure.com/"
openai.api_version = "2023-05-15"
openai.api_key = "your-api-key"

response = openai.Completion.create(
  engine="text-davinci-003",
  prompt="What is generative AI?",
  max_tokens=200,
  temperature=0.7,
  top_p=0.9
)
print(response.choices[0].text)

To list available models:

az cognitiveservices account list-deployments --name <resource-name> --resource-group <rg>

Interaction with Related Technologies

Generative AI often works with: - Azure Cognitive Search: To provide grounding data for retrieval-augmented generation (RAG), reducing hallucinations. - Azure AI Content Safety: To filter harmful outputs. - Azure Machine Learning: For custom fine-tuning. - Power Platform: To generate content in low-code apps (e.g., Power Automate with GPT).

Example: Generative AI vs. Discriminative AI

Discriminative: A model trained to classify emails as spam or not spam. It learns decision boundaries.

Generative: A model trained to generate new spam emails that look realistic. It learns the distribution of spam emails.

Responsible AI Considerations

Generative AI can produce biased, harmful, or false content. Microsoft's Responsible AI principles include fairness, reliability, privacy, inclusivity, transparency, and accountability. For AI-900, you must know that generative AI requires careful oversight, content filters, and human review.

Exam-Relevant Details

GPT (Generative Pre-trained Transformer): A family of language models from OpenAI. Used in Azure OpenAI Service.

DALL-E: Generates images from text prompts.

Codex: Generates code (now part of GPT-4).

Azure OpenAI Service: Provides access to these models with enterprise-grade security.

Prompt Engineering: Crafting input to get desired output. Techniques include zero-shot, few-shot, and chain-of-thought prompting.

Hallucination: When the model generates plausible but incorrect information.

Temperature: Controls randomness. Lower values (e.g., 0.2) for factual tasks; higher (e.g., 0.8) for creative tasks.

Max Tokens: Limits the length of generated output. Default often 16, but can be set up to context window size.

Common Exam Traps

Trap: Generative AI can only generate text. Reality: It can generate text, images, audio, video, and code.

Trap: Generative AI is always accurate. Reality: It can hallucinate.

Trap: Generative AI does not need training data. Reality: It learns from vast datasets.

Trap: All AI is generative. Reality: Discriminative AI is also prevalent.

Summary of Key Numbers

GPT-3: 175B parameters, 4,096 token context.

GPT-4: Up to 32,768 tokens, multi-modal (text and image input).

Default temperature: 0.7.

Default top-p: 0.9.

Azure OpenAI Service: Requires Azure subscription, resource creation, and model deployment.

Conclusion

Generative AI is a powerful tool for creating content, but it must be used responsibly. For AI-900, focus on understanding its definition, capabilities, and the importance of responsible AI practices.

Walk-Through

1

Tokenize the Input Prompt

The input prompt is broken into tokens (subwords) using a tokenizer like Byte Pair Encoding (BPE). GPT-3 uses a vocabulary of ~50,000 tokens. Each token is mapped to an integer ID. For example, 'Generative AI' might become tokens ['Gener', 'ative', ' AI']. This step is invisible to the user but critical for model processing.

2

Encode Context with Attention

The token IDs are passed through embedding layers to create vector representations. The transformer's self-attention mechanism computes attention scores between every pair of tokens, allowing the model to weigh context. For example, in 'The cat sat on the mat', the word 'sat' attends to 'cat' and 'mat'. Multi-head attention (e.g., 96 heads in GPT-3) captures different relationships.

3

Predict Next Token Probabilities

The decoder (or decoder-only stack) outputs a probability distribution over the vocabulary for the next token. This is a softmax over 50,000 logits. The model selects a token using a sampling strategy: greedy (highest probability), top-k (only top k tokens considered), or top-p (cumulative probability p). Temperature scales logits before softmax. For example, temperature=0.7 flattens distribution slightly.

4

Append Token and Repeat

The selected token is appended to the input sequence. The new sequence is fed back into the model for the next token prediction. This autoregressive loop continues until the model generates an end-of-sequence token or the max_tokens limit is reached. The process is sequential, so generation time scales with output length.

5

Post-process and Return Output

The generated token IDs are decoded back into text using the tokenizer. The output may be further filtered by content safety models to block harmful content. In Azure OpenAI Service, the response includes the generated text, usage tokens (prompt and completion tokens), and optionally logprobs. The user receives the final string.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Chatbot

A large e-commerce company deploys a GPT-4 powered chatbot via Azure OpenAI Service to handle customer inquiries. The problem: high volume of repetitive questions (order status, returns) overwhelming human agents. The solution: a chatbot that generates natural responses based on a knowledge base. Configuration: the model is fine-tuned on past support conversations and integrated with Azure Cognitive Search for retrieval-augmented generation (RAG) to ground answers in up-to-date product data. The deployment uses a temperature of 0.3 to ensure factual consistency. Scale: handles 10,000 queries per hour with average latency of 2 seconds. Common issue: hallucination—the model sometimes invents policies. Mitigation: content filters and human-in-the-loop review for sensitive queries. Misconfiguration: setting temperature too high (e.g., 0.9) leads to creative but incorrect answers, increasing escalation rates.

Enterprise Scenario 2: Code Generation for Developers

A software company uses GitHub Copilot (powered by OpenAI Codex) to assist developers. The problem: developers spend 30% of time writing boilerplate code. The solution: an AI pair programmer that suggests code in real-time within VS Code. Configuration: the model is integrated via the Copilot API, with context from the current file and open tabs. It uses a top-p of 0.9 and max tokens of 256 per suggestion. Scale: millions of developers use it daily. Performance: suggestions appear in under 500ms. Common issue: the model may suggest insecure code (e.g., SQL injection). Mitigation: developers must review all suggestions. Misconfiguration: if the model is not properly grounded with project-specific libraries, suggestions may use outdated APIs.

Enterprise Scenario 3: Marketing Content Generation

A media agency uses DALL-E 3 to generate images for ad campaigns. The problem: high cost of hiring graphic designers for initial concepts. The solution: generate multiple image variants from text prompts, then refine. Configuration: prompts are engineered with specific styles and negative prompts to avoid unwanted elements. The model generates 1024x1024 images with a quality setting of 'hd'. Scale: generates 500 images per day. Common issue: the model sometimes produces images with distorted faces or text. Mitigation: human selection and post-processing. Misconfiguration: failing to set appropriate content filters can result in inappropriate images being created.

How AI-900 Actually Tests This

What AI-900 Tests on Generative AI (Objective 5.1)

The exam focuses on the fundamental understanding of generative AI, not deep technical implementation. Key points: - Definition: Generative AI creates new content; discriminative AI classifies or predicts. - Capabilities: Text generation (GPT), image generation (DALL-E), code generation (Codex). - Azure Services: Azure OpenAI Service provides access to these models. - Responsible AI: Awareness of biases, hallucinations, and need for human oversight. - Prompt Engineering: Basic concept that input affects output.

Common Wrong Answers and Why

1.

Wrong: 'Generative AI can only generate text.' Why: Candidates confuse generative AI with language models. Reality: It also generates images, audio, video, and code.

2.

Wrong: 'Generative AI always produces accurate results.' Why: Because models seem intelligent, candidates assume correctness. Reality: Hallucinations are common.

3.

Wrong: 'Generative AI does not require training.' Why: Misunderstanding of pre-trained models. Reality: Models are trained on massive datasets.

4.

Wrong: 'Azure OpenAI Service is the only way to use generative AI in Azure.' Why: Candidates overlook Azure Machine Learning and other services. Reality: You can also fine-tune models in AML.

Specific Numbers and Terms

GPT-3: 175 billion parameters.

GPT-4: Up to 32,768 tokens context.

Temperature: Range 0-2; 0.7 default.

Top-p: 0.9 default.

Max tokens: Default 16; can be set up to model limit.

Azure OpenAI Service: Requires resource creation and model deployment.

Edge Cases and Exceptions

Zero-shot vs. few-shot: Zero-shot means no examples; few-shot provides a few examples in the prompt. Exam may test which is more likely to improve accuracy.

Fine-tuning vs. prompt engineering: Fine-tuning updates model weights; prompt engineering does not. Exam asks which is cheaper or faster.

Content filtering: Azure OpenAI Service has built-in filters for hate, violence, etc. Exam may ask about responsible AI features.

How to Eliminate Wrong Answers

If an answer says 'always' or 'never', it is likely wrong (e.g., 'Generative AI always produces accurate results').

If an answer confuses generative with discriminative (e.g., 'classifies images'), eliminate.

If an answer mentions 'training from scratch' for a specific task, it is likely wrong because most generative AI uses pre-trained models.

Focus on the word 'create' vs. 'analyze'. Generative AI creates; discriminative analyzes.

Key Takeaways

Generative AI creates new content; discriminative AI classifies or predicts.

Azure OpenAI Service provides access to GPT-4, DALL-E, and Codex models.

GPT-3 has 175 billion parameters; GPT-4 supports up to 32,768 tokens context.

Temperature controls randomness: lower for factual, higher for creative.

Top-p (nucleus sampling) and top-k are common sampling strategies.

Generative AI can hallucinate—produce plausible but incorrect information.

Responsible AI principles require content filters and human oversight.

Prompt engineering (zero-shot, few-shot) influences output quality.

Fine-tuning adapts a pre-trained model to a specific task with additional data.

Generative AI is not AGI; it is limited to patterns in training data.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Generative AI

Creates new content (text, images, etc.)

Learns the joint probability distribution P(X, Y)

Can generate data similar to training data

Examples: GPT, DALL-E, Codex

Used for content creation, chatbots, code generation

Discriminative AI

Classifies or predicts labels

Learns the conditional probability P(Y|X)

Cannot generate new data

Examples: Logistic regression, SVM, CNN classifiers

Used for spam detection, image recognition, sentiment analysis

Watch Out for These

Mistake

Generative AI is the same as artificial general intelligence (AGI).

Correct

Generative AI is a subset of AI focused on content generation, not general intelligence. AGI would perform any intellectual task, whereas generative AI is limited to patterns in its training data.

Mistake

Generative AI models understand language like humans.

Correct

Models do not understand meaning; they learn statistical patterns. They can produce coherent text without comprehension. This is why they can hallucinate.

Mistake

You can use generative AI without any data privacy concerns.

Correct

Prompts and outputs may contain sensitive data. Azure OpenAI Service processes data in compliance with Azure policies, but users must avoid sending PII. The model may also memorize training data.

Mistake

Higher temperature always gives better results.

Correct

Higher temperature increases randomness, which can be good for creativity but bad for factual tasks. Optimal temperature depends on the use case.

Mistake

Generative AI models are infallible when fine-tuned.

Correct

Fine-tuning improves performance on specific tasks but does not eliminate hallucinations or biases. The model still relies on its training data and can produce errors.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between generative AI and discriminative AI?

Generative AI creates new data instances, while discriminative AI distinguishes between different types of data. For example, a generative model can write a poem, while a discriminative model can classify whether an email is spam. Generative models learn the joint probability distribution P(X,Y), whereas discriminative models learn P(Y|X).

What is temperature in generative AI?

Temperature is a parameter that controls the randomness of token selection. Lower values (e.g., 0.2) make the model more deterministic and focused, suitable for factual tasks. Higher values (e.g., 0.8) increase diversity and creativity, but may produce less coherent outputs. Default is often 0.7.

Can generative AI be used with Azure Cognitive Search?

Yes, Azure Cognitive Search can be integrated with generative AI models to implement retrieval-augmented generation (RAG). This grounds the model's responses in factual data from indexed documents, reducing hallucinations. The model generates answers based on retrieved context.

What are common pitfalls when using generative AI in enterprise?

Common pitfalls include: (1) Hallucinations—model invents facts; (2) Bias—model reproduces biases from training data; (3) Security—prompt injection attacks; (4) Cost—high token usage can increase expenses; (5) Compliance—outputs may violate regulations if not filtered. Mitigations include content filters, human review, and grounding with external data.

How does Azure OpenAI Service ensure responsible AI?

Azure OpenAI Service includes content filters that block harmful categories (hate, violence, self-harm, etc.). It also supports abuse monitoring and provides transparency documentation. Customers are required to implement their own safety measures and adhere to Microsoft's Responsible AI principles.

What is the maximum context length for GPT-4 in Azure OpenAI?

Azure OpenAI Service offers GPT-4 with an 8K context window (8,192 tokens) and a 32K context window (32,768 tokens). The 32K version can process longer documents but costs more per token. The context includes both prompt and generated tokens.

What is fine-tuning and how is it different from prompt engineering?

Fine-tuning involves updating the model's weights on a specific dataset to improve performance on a particular task. Prompt engineering involves crafting the input prompt to guide the model's output without changing the model. Fine-tuning is more expensive and requires training data, while prompt engineering is cheaper and faster.

Terms Worth Knowing

Ready to put this to the test?

You've just covered What is Generative AI? — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?