AI-900Chapter 80 of 100Objective 5.3

Zero-Shot, Few-Shot, and Chain-of-Thought Prompts

How do advanced prompt engineering techniques—zero-shot, few-shot, and chain-of-thought—help you get accurate, structured, and safe outputs from LLMs? These methods are essential for the AI-900 exam's Generative AI domain (Objective 5.3) because they directly influence how you interact with large language models (LLMs) to get accurate, structured, and safe outputs. Approximately 10-15% of exam questions touch on prompt engineering, with a focus on distinguishing these techniques and knowing when to apply each. Mastering these concepts will help you optimize model performance in Azure OpenAI Service and other generative AI tools.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

The Master Chef's Recipe Variations

A master chef knows the principles of cooking without ever having seen a specific dish. A customer asks for a 'savory fruit pie.' In zero-shot prompting, the chef immediately creates a pie using general knowledge: fruits, pastry, and savory seasonings, without any example. In few-shot prompting, the customer provides three examples: 'Here are recipes for apple, pear, and peach pies.' The chef studies them, then crafts a new cherry pie following the pattern. For chain-of-thought prompting, the customer says, 'Explain how you'd make a savory fruit pie step by step.' The chef verbalizes: 'First, select firm fruits like apples or pears. Then, sauté them with onions and herbs. Next, prepare a buttery crust. Finally, bake at 375°F until golden.' This step-by-step reasoning helps the chef (model) produce a better result by showing its work. The chef's knowledge (pre-training) provides the base, but the type of prompt determines how precisely the dish matches the request. Zero-shot is fastest but riskiest; few-shot is more reliable with examples; chain-of-thought is best for complex, multi-step tasks.

How It Actually Works

What Are Zero-Shot, Few-Shot, and Chain-of-Thought Prompts?

Prompt engineering is the practice of designing input text to guide a large language model (LLM) toward a desired output. The three techniques—zero-shot, few-shot, and chain-of-thought (CoT)—vary in how much context or demonstration you provide. They are not mutually exclusive; you can combine them (e.g., few-shot with chain-of-thought). The AI-900 exam expects you to understand their definitions, use cases, and trade-offs.

Zero-Shot Prompting

Zero-shot prompting means you give the model a task without any examples. The model relies entirely on its pre-training knowledge to infer the desired output. For example, you might input:

Translate the following English sentence to French: 'Hello, how are you?'

The model outputs 'Bonjour, comment allez-vous?' without ever seeing an example of English-to-French translation in the prompt. Zero-shot works well for common tasks like translation, summarization, and simple classification. However, accuracy suffers for niche or ambiguous tasks. The exam tests that zero-shot is the default mode for many Azure OpenAI deployments, but you often need few-shot for better results.

Few-Shot Prompting

Few-shot prompting provides a small number of examples (typically 2-5) in the prompt to demonstrate the task. The model learns from these examples and generalizes to new inputs. For instance:

Classify the sentiment of the following movie reviews as positive or negative.
Review: 'This movie was fantastic! I loved every minute.'
Sentiment: Positive
Review: 'A waste of time. Boring and predictable.'
Sentiment: Negative
Review: 'The acting was good, but the plot was confusing.'
Sentiment:

Here, two examples guide the model to output 'Positive' or 'Negative' for the third review. Few-shot is especially useful when the task is complex or domain-specific. The exam emphasizes that few-shot requires more tokens (cost and latency increase) but yields higher accuracy. A common wrong answer is to think more examples always improve performance—in reality, 3-5 examples often suffice; too many can confuse the model.

Chain-of-Thought Prompting

Chain-of-thought (CoT) prompting instructs the model to reason step by step before giving a final answer. This technique dramatically improves performance on arithmetic, logic, and multi-step reasoning tasks. For example:

Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Let's think step by step.
Roger started with 5 balls.
2 cans of 3 balls each is 6 balls.
5 + 6 = 11.
The answer is 11.

The phrase 'Let's think step by step' triggers the model to produce intermediate reasoning. CoT can be combined with few-shot by providing examples that include reasoning steps. The exam tests that CoT reduces errors in mathematical and logical tasks but increases token usage. A common trap: candidates think CoT is only for math problems, but it applies to any multi-step reasoning, such as legal analysis or planning.

How These Techniques Interact with Azure OpenAI

In Azure OpenAI Service, you configure prompts in the 'System Message' and 'User Message' fields. Zero-shot is the simplest: you just write the user message. Few-shot requires adding examples in the conversation history or within the user message. CoT is typically implemented by including a reasoning instruction in the system message or user message. The exam may ask you to identify which technique is being used given a sample prompt.

Key Parameters and Defaults

When using these techniques via Azure OpenAI's API, you control the model's behavior with parameters:

temperature: Controls randomness. Lower values (e.g., 0.2) for factual tasks; higher (e.g., 0.8) for creative tasks. Default is 1.0.

max_tokens: Maximum number of tokens in the response. Default varies by model (e.g., 4096 for GPT-3.5).

top_p: Nucleus sampling; alternative to temperature. Default is 1.0.

For few-shot and CoT, you must manage token limits because examples and reasoning steps consume tokens. The exam may ask about the trade-off between accuracy and token usage.

Verification and Testing

You can test these techniques in Azure OpenAI Studio's Chat Playground. There, you can set system messages, user messages, and adjust parameters. The exam expects you to know that the Chat Playground is the primary tool for prompt engineering experimentation.

Interaction with Related Technologies

These techniques are foundational for retrieval-augmented generation (RAG), where you combine prompts with external data. For example, you might use few-shot prompting to format retrieved documents into a coherent answer. CoT can help the model reason over multiple retrieved pieces of information. The AI-900 exam covers RAG separately but expects you to understand how prompts integrate with it.

Trap Patterns on the Exam

Trap 1: Confusing zero-shot with few-shot. If the prompt contains examples, it's few-shot, even if only one example.

Trap 2: Believing that chain-of-thought always requires explicit 'step-by-step' phrasing. In practice, you can use other phrasing like 'Explain your reasoning.'

Trap 3: Assuming that more examples always improve few-shot performance. The optimal number is task-dependent; too many can exceed the context window or introduce noise.

Trap 4: Thinking that zero-shot is always inferior. For simple, well-defined tasks, zero-shot can be sufficient and more efficient.

Summary of Internal Mechanism

At a high level, LLMs process prompts by tokenizing the input, running it through transformer layers, and generating output tokens one by one. Zero-shot relies solely on the model's learned representations. Few-shot adds examples that the model attends to via self-attention, effectively conditioning the output on the pattern. CoT forces the model to allocate attention to intermediate reasoning steps, which improves accuracy on tasks requiring multiple inferential hops. The exam does not require deep knowledge of transformers but expects you to understand these behavioral differences.

Walk-Through

Identify the Task Type

Determine whether the task is simple (e.g., classification, translation) or complex (e.g., multi-step reasoning, arithmetic). Simple tasks may only need zero-shot. Complex tasks benefit from few-shot or chain-of-thought. For the exam, you must recognize that tasks like 'summarize this article' are zero-shot-appropriate, while 'solve this math problem' typically requires CoT.

Choose Zero-Shot for Simple Tasks

If the task is common and unambiguous, use zero-shot. Write a clear instruction without examples. For example: 'Translate to Spanish: Good morning.' The model uses its pre-training to respond. This minimizes token usage and latency. But beware: if the task is novel or requires specific formatting, zero-shot may fail. The exam tests that zero-shot is the baseline approach.

Add Examples for Few-Shot

If zero-shot yields poor results, add 2-5 examples demonstrating input-output pairs. Place them before the actual query. Ensure examples are representative and cover edge cases. The model will infer the pattern. For instance, for sentiment classification, provide examples of positive and negative reviews. The exam emphasizes that the number of examples should be kept small to avoid exceeding token limits.

Instruct Step-by-Step for CoT

For reasoning tasks, append an instruction like 'Let's think step by step' or 'Explain your reasoning.' The model will generate intermediate steps before the final answer. This improves accuracy but increases token usage. You can combine with few-shot by providing examples that include reasoning steps. The exam expects you to know that CoT is especially effective for math and logic.

Test and Iterate in Playground

Use Azure OpenAI Studio's Chat Playground to experiment with different prompts and parameters. Adjust temperature, max_tokens, and top_p. Compare outputs from zero-shot, few-shot, and CoT versions. For the exam, remember that the Playground is the recommended tool for prompt engineering, and you can save and share prompt templates.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Ticket Classification

A large e-commerce company uses Azure OpenAI to automatically classify customer support tickets into categories like 'Billing', 'Technical Issue', 'Returns', etc. Initially, they used zero-shot prompting: 'Classify the following ticket: ...' Accuracy was only 70% because ticket language varied widely. They switched to few-shot prompting with 3 examples per category. Accuracy rose to 92%. They also added a chain-of-thought instruction for ambiguous tickets: 'Explain your reasoning before giving the category.' This helped the model handle edge cases like 'I want to return my order because the payment failed.' The system processes 10,000 tickets daily. Token costs increased by 15% due to examples and reasoning, but the reduction in manual reclassification saved $50,000/month. Misconfiguration: when they tried 10 examples per category, the context window was exceeded, causing truncation and errors. They learned to keep examples concise and limit to 5.

Enterprise Scenario 2: Legal Document Summarization

A law firm uses GPT-4 to summarize contracts. They need summaries that highlight key clauses and potential risks. Zero-shot prompts like 'Summarize this contract' produced generic summaries missing important details. They adopted few-shot prompting with 2 example summaries that included risk analysis. They also used chain-of-thought: 'Read the contract step by step. Identify each clause, then summarize.' This forced the model to list clauses before the final summary. The result was a structured output with bullet points. The firm processes 500 contracts per week. Token usage per contract increased from 2,000 to 4,000 tokens, but accuracy improved from 60% to 95%. A common pitfall was forgetting to clear the conversation history, causing the model to 'remember' previous contracts and produce inconsistent outputs. They now reset the session for each contract.

Enterprise Scenario 3: Code Generation for Internal Tools

A software company uses Azure OpenAI to generate code snippets for internal tools. Zero-shot prompts like 'Write a Python function to sort a list' worked well for simple tasks. But for complex tasks like 'Write a function to parse a CSV file and generate a report', zero-shot often produced buggy code. They used few-shot with 2 examples of similar functions, including docstrings and error handling. They also added chain-of-thought: 'First, outline the steps: read CSV, validate data, compute statistics, generate report. Then write the code.' This reduced bugs by 80%. They deploy the generated code after review. The team learned that zero-shot is fine for boilerplate, but few-shot+CoT is essential for complex logic. One mistake was using too high temperature (0.9), which produced creative but incorrect code. They now use temperature 0.2 for code generation.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic

The exam objective 5.3 (Generative AI) includes prompt engineering techniques. Specifically, you must:

Define zero-shot, few-shot, and chain-of-thought prompting.

Identify which technique is being used given a sample prompt.

Understand the trade-offs: accuracy vs. token usage vs. latency.

Know when to use each technique based on task complexity.

Recognize that Azure OpenAI Studio's Chat Playground is the tool for experimentation.

Common Wrong Answers and Why Candidates Choose Them

'Few-shot prompting means providing no examples.' This is the opposite. Candidates confuse 'few' with 'zero.' Remember: zero-shot = zero examples; few-shot = a few examples.

'Chain-of-thought prompting is only for math problems.' While CoT is famous for math, it works for any reasoning task. The exam may use a logic puzzle or legal reasoning example. Don't limit CoT to arithmetic.

'More examples always improve few-shot performance.' Adding too many examples can exceed the context window or introduce conflicting patterns. The optimal number is typically 3-5. The exam may test that exceeding the context window causes errors.

'Zero-shot is always worse than few-shot.' For simple, common tasks, zero-shot is sufficient and more efficient. The exam expects you to recognize scenarios where zero-shot is appropriate.

Specific Numbers, Values, and Terms on the Exam

The phrase 'Let's think step by step' is the canonical chain-of-thought trigger.

Token limits: GPT-3.5 has a 4096-token context window; GPT-4 has 8192 or 32768. Few-shot and CoT consume tokens, so you must stay within limits.

Temperature: default 1.0; lower for deterministic tasks.

The Chat Playground is the primary tool for prompt engineering in Azure.

Edge Cases and Exceptions

Zero-shot with system message: You can use a system message to set context without examples. This is still zero-shot if no examples are provided.

Few-shot with zero examples in user message but examples in conversation history: This counts as few-shot because the model sees past interactions.

Chain-of-thought without explicit instruction: Some models internalize CoT from training, but for Azure OpenAI, you must explicitly prompt for step-by-step reasoning.

How to Eliminate Wrong Answers

If the prompt contains any example input-output pairs, it's few-shot (not zero-shot).

If the prompt asks the model to reason step by step, it's chain-of-thought (even if also few-shot).

If the task is multi-step reasoning, eliminate zero-shot as too simplistic.

If the question asks about token usage, remember that few-shot and CoT increase token consumption.

Key Takeaways

Zero-shot: no examples; few-shot: 2-5 examples; chain-of-thought: step-by-step reasoning.

Chain-of-thought is triggered by phrases like 'Let's think step by step'.

Few-shot improves accuracy for complex tasks but increases token usage.

Zero-shot is sufficient for simple, common tasks and is more efficient.

The optimal number of few-shot examples is typically 3-5; more can cause issues.

Use Azure OpenAI Studio's Chat Playground to experiment with prompts.

Temperature controls randomness: lower for factual tasks, higher for creative.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Zero-Shot

No examples provided in the prompt.

Fastest response and lowest token usage.

Works well for common, unambiguous tasks.

May fail for niche or complex tasks.

Best for simple classification, translation, summarization.

Few-Shot

Provides 2-5 examples in the prompt.

Higher accuracy for specialized tasks.

Increases token usage and latency.

Requires careful example selection.

Best for domain-specific classification, formatting, or style transfer.

Few-Shot

Focuses on pattern matching from examples.

Does not explicitly require reasoning steps.

Effective for tasks with clear input-output mappings.

Less effective for multi-step reasoning problems.

Token usage increases with number of examples.

Chain-of-Thought

Focuses on step-by-step reasoning.

Explicitly asks for reasoning before answer.

Effective for math, logic, and planning tasks.

Can be combined with few-shot for even better results.

Token usage increases due to reasoning steps.

Watch Out for These

Mistake

Zero-shot prompting means the model has never seen the task before.

Correct

Zero-shot means the prompt contains no examples. The model has seen countless similar tasks during pre-training; it just doesn't receive in-context examples.

Mistake

Few-shot prompting requires at least 5 examples.

Correct

Few-shot can work with as few as 1 example. The term 'few' is relative; typical usage is 2-5, but even 1 example constitutes few-shot.

Mistake

Chain-of-thought prompting always improves accuracy.

Correct

CoT improves accuracy for tasks requiring reasoning, but for simple tasks, it can add unnecessary tokens and sometimes confuse the model. It should be used selectively.

Mistake

The order of examples in few-shot prompts doesn't matter.

Correct

The order can significantly affect output. Models tend to be biased toward the last examples (recency bias). Placing diverse examples early and consistent ones later is a best practice.

Mistake

You cannot combine few-shot and chain-of-thought.

Correct

You can combine them by providing examples that include step-by-step reasoning. This is often more effective than either alone.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between zero-shot and few-shot prompting?

Zero-shot prompting provides no examples in the prompt; the model relies solely on its pre-training. Few-shot prompting includes a few examples (usually 2-5) to demonstrate the task. For AI-900, remember that zero-shot is faster and cheaper but may be less accurate for specialized tasks. Few-shot is more accurate but uses more tokens.

When should I use chain-of-thought prompting?

Use chain-of-thought prompting for tasks that require multi-step reasoning, such as arithmetic problems, logical puzzles, or complex decision-making. It forces the model to show its work, reducing errors. Avoid it for simple tasks where it adds unnecessary tokens. The exam may present a math problem and ask which technique to use—CoT is the answer.

Can I combine few-shot and chain-of-thought?

Yes, you can combine them by providing examples that include step-by-step reasoning. For instance, in a math problem, give two examples with the reasoning steps, then ask the model to solve a new problem using similar reasoning. This often yields the best accuracy. The exam expects you to know that techniques are not mutually exclusive.

How does temperature affect these prompting techniques?

Temperature controls randomness. For zero-shot and few-shot with factual tasks, use a low temperature (e.g., 0.2) to get deterministic outputs. For creative tasks, use higher temperature (e.g., 0.8). For chain-of-thought, lower temperature helps keep reasoning focused. The exam may ask about parameter settings for specific scenarios.

What is the context window and why does it matter?

The context window is the maximum number of tokens the model can process in a single request. For GPT-3.5, it's 4096 tokens; for GPT-4, up to 32768. Few-shot and chain-of-thought consume tokens from this window. If your prompt exceeds the limit, it gets truncated, leading to errors. The exam may test that you must manage token usage.

How do I test different prompting techniques in Azure?

Use Azure OpenAI Studio's Chat Playground. You can set system messages, user messages, and parameters like temperature. You can also save and share prompt templates. The exam expects you to know that the Playground is the primary tool for prompt engineering experimentation.

What is an example of a zero-shot prompt on the exam?

A zero-shot prompt might be: 'Translate the following sentence to French: Hello.' No examples are given. The exam will ask you to identify the technique. If the prompt contains any example, it's few-shot. If it includes 'step by step', it's chain-of-thought.

Terms Worth Knowing

Artificial intelligence Computer vision Generative AI Machine learning Natural language processing Responsible AI

Ready to put this to the test?

You've just covered Zero-Shot, Few-Shot, and Chain-of-Thought Prompts — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

Azure OpenAI Deployments and API Access

System Messages and Grounding Prompts

See the full AI-900 study guide