AZ-204Chapter 81 of 102Objective 5.1

Azure Document Intelligence (Form Recognizer)

Azure Document Intelligence (formerly Form Recognizer) is a key AI service for extracting structured data from documents. For the AZ-204 exam, this topic appears in about 5-8% of questions, focusing on integrating the service, choosing the right model (prebuilt vs. custom), and understanding its capabilities. You'll learn how to build solutions that automate data extraction from invoices, receipts, and custom forms, which is critical for passing the 'Integrate' domain objective 5.1.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Document Intelligence as a Skilled Tax Auditor

Thousands of tax forms from different companies, each with its own layout, await a skilled tax auditor. The auditor doesn't just take a picture of the form; she extracts specific data: names, amounts, dates, and signatures. She uses a magnifying glass (OCR) to read the text, then applies her knowledge of tax rules (prebuilt models) to understand where each piece of data belongs. If a form is unusual, she uses a flexible approach (custom model) to learn its structure. She can also classify forms into piles (document classification) before auditing them. Like Document Intelligence, she can process forms in batches (batch processing) and handle forms with missing fields (optional fields). Her work is automated, but she needs training data to learn new form types (custom model training). The key is that she doesn't just store images; she extracts structured data that can be fed into a database or workflow. In the cloud, this auditor works 24/7, scales to handle millions of forms, and provides confidence scores for each extraction, so you know which data might need human review.

How It Actually Works

What is Azure Document Intelligence?

Azure Document Intelligence (formerly known as Azure Form Recognizer) is a cloud-based AI service that uses machine learning to extract key-value pairs, tables, and text from documents. It is part of Azure Applied AI Services and leverages Optical Character Recognition (OCR) combined with deep learning models to understand document structure. The service is designed to handle both structured and unstructured documents, such as forms, invoices, receipts, and identity documents.

Why It Exists

Before Document Intelligence, extracting data from documents required manual data entry or complex custom OCR solutions. Manual entry is slow, error-prone, and doesn't scale. Custom OCR solutions require significant development effort to handle different layouts. Document Intelligence solves this by providing prebuilt models for common document types and the ability to train custom models for unique forms, all through a simple REST API or SDK.

How It Works Internally

Document Intelligence processes documents in several steps: 1. Document Ingestion: The document is submitted via API as an image, PDF, or TIFF file. The service supports up to 2000 pages for PDFs and TIFFs (up to 50 MB per document for the standard tier). 2. OCR: The service uses Azure Cognitive Services OCR to extract text, including handwritten text (for some models). OCR is performed at the character level, capturing bounding boxes and confidence scores. 3. Layout Analysis: The document is analyzed to identify structural elements like tables, selection marks, and field labels. This uses a deep learning model trained on millions of documents. 4. Model Application: The extracted text and layout are passed to the selected model (prebuilt or custom). The model identifies fields and their values based on training. 5. Post-processing: The service returns a JSON response containing extracted data, confidence scores, and metadata. For custom models, you can also include a human review loop via Azure AI Document Intelligence Studio.

Key Components

Prebuilt Models: Ready-to-use models for invoices, receipts, identity documents, business cards, and more. Each model has a specific set of fields. For example, the invoice model extracts fields like InvoiceId, InvoiceDate, VendorName, and TotalAmount.

Custom Models: Trained on your own documents. You can train a custom model using labeled data (supervised) or unlabeled data (unsupervised). The service supports two types: template models (for fixed layouts) and neural models (for varied layouts).

Document Analysis: A generic API that returns raw OCR and layout analysis without applying a specific model. Useful for custom processing.

Composed Models: Allows combining multiple custom models into a single endpoint. The service automatically selects the best model for each document based on the form type.

Document Classification: A custom classification model that categorizes documents before routing them to the appropriate extraction model.

Key Values, Defaults, and Timers

API Version: Current stable version is 2023-07-31 (preview version 2024-02-29-preview). The exam focuses on the stable version.

Pricing Tier: Free tier (F0) allows 20 pages per month, Standard tier (S0) starts at $1.50 per 1000 pages for prebuilt models. Custom model training has separate costs.

Limits: PDFs and images up to 50 MB, 10,000 pages per document for some operations, 500 pages per training dataset for custom template models.

Confidence Threshold: Default confidence threshold is 0.8 (80%) for field extraction. You can adjust this in your application.

Training Time: Custom template model training takes about 5-10 minutes for up to 500 pages. Neural model training can take up to an hour.

Read API: For OCR-only, the Read API is best, but Document Intelligence's Read model is optimized for documents.

Configuration and Verification Commands

Using the Azure CLI, you can create a Document Intelligence resource:

az cognitiveservices account create --name myDocIntel --resource-group myRG --kind FormRecognizer --sku S0 --location eastus

To get the endpoint and key:

az cognitiveservices account keys list --name myDocIntel --resource-group myRG
az cognitiveservices account show --name myDocIntel --resource-group myRG --query "properties.endpoint"

In code, you use the SDK (C# example):

var client = new DocumentAnalysisClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-invoice", invoiceUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
    foreach (KeyValuePair<string, DocumentField> field in document.Fields)
    {
        Console.WriteLine($"{field.Key}: {field.Value.Content}");
    }
}

How It Interacts with Related Technologies

Azure Logic Apps: You can trigger document processing when a new file is added to Blob Storage, then call Document Intelligence to extract data and store it in a database.

Azure Functions: Use a function to process documents on demand or in batch, integrating with other services like Cognitive Search for indexing.

Power Automate: Prebuilt connectors allow low-code document processing workflows.

Cognitive Search: Extracted data can be indexed to enable full-text search over document content.

Azure AI Document Intelligence Studio: A web UI to test models, train custom models, and label documents.

Best Practices

Use the prebuilt models for common document types to avoid training overhead.

For custom forms, start with a neural model if layouts vary; use template models for fixed layouts.

Always check confidence scores and implement fallback logic for low-confidence fields.

For large volumes, use batch processing with Azure Batch or Logic Apps.

Encrypt sensitive data at rest and in transit (Document Intelligence encrypts data at rest by default).

Walk-Through

Create Document Intelligence Resource

First, create an Azure Cognitive Services resource of kind 'FormRecognizer' in the Azure portal or via CLI. Choose a region (e.g., East US) and pricing tier (S0 for production). Once created, note the endpoint and API key. These credentials are used to authenticate API calls. The resource must be in the same region as your other Azure services for optimal latency.

Select or Train a Model

Choose a prebuilt model (e.g., 'prebuilt-invoice') if your document matches a common type. For custom forms, you need to train a model. Use the Document Intelligence Studio to upload sample documents and label fields. For template models, provide at least 5 samples of the same layout. For neural models, provide as many varied samples as possible. Training creates a model ID that you use in API calls.

Submit Document for Analysis

Call the 'AnalyzeDocument' API with the document URL or base64-encoded content. Specify the model ID (e.g., 'prebuilt-invoice' or your custom model ID). The service returns a JSON response containing the extracted fields, their values, confidence scores, and bounding boxes. The response is structured per page and per document.

Parse and Process the Response

The API response includes an array of 'documents', each containing 'fields'. Iterate through the fields to extract values. For each field, check the confidence score; if below a threshold (e.g., 0.8), flag it for human review. Use the bounding box coordinates to highlight the extracted text in the original document for verification.

Implement Error Handling and Retry

Handle potential errors like invalid document format, exceeding size limits, or throttling (HTTP 429). Implement exponential backoff retry logic. For large-scale processing, use asynchronous analysis with 'WaitUntil.Started' and poll for completion. Also, consider using Azure Queue Storage to decouple document submission from processing.

What This Looks Like on the Job

Enterprise Scenario 1: Automated Invoice Processing

A large retail company receives thousands of invoices daily from suppliers in various formats. They use Document Intelligence with the prebuilt invoice model to extract line items, totals, and due dates. The extracted data is fed into an ERP system for automated payment processing. In production, they process over 10,000 invoices per day with a 95% extraction accuracy. They implemented a human review queue for invoices with low confidence scores. Common pitfalls include missing fields due to unusual layouts or poor image quality. To mitigate, they preprocess documents (e.g., convert to high-res PDF) and use custom models for problematic suppliers.

Enterprise Scenario 2: Insurance Claims Processing

An insurance company uses custom models to extract data from claim forms. They trained a neural model on 500 varied claim forms. The model extracts policy numbers, dates, and descriptions. The extracted data is used to auto-populate claims systems. They encountered issues with handwritten text (e.g., signatures) and used the OCR's handwriting recognition feature. They also use document classification to route claims to different processing pipelines. Misconfiguration of the confidence threshold led to too many false positives, so they set it to 0.9 for critical fields.

Enterprise Scenario 3: Medical Record Digitization

A hospital network digitizes patient intake forms. They use a custom template model because the forms have a fixed layout. The model extracts patient name, date of birth, and medical history. They process over 5,000 forms per week. Performance considerations include using a dedicated S0 tier and caching model IDs. They encountered issues with forms that had missing fields; they handled this by marking fields as optional during training. Common misconfiguration: not retraining models when forms change, leading to degraded accuracy. They now have a quarterly retraining schedule.

How AZ-204 Actually Tests This

The AZ-204 exam tests Azure Document Intelligence under objective 5.1 (Integrate with Cognitive Services). Key areas: 1. Choosing the right model: Prebuilt vs. custom. The exam will ask when to use a prebuilt model (e.g., for invoices) vs. a custom model (e.g., for company-specific forms). 2. Understanding model types: Template vs. neural. Template models require fixed layouts; neural models handle varied layouts. The exam tests this distinction. 3. API usage: The 'AnalyzeDocument' API and its parameters, especially the model ID. 4. Confidence scores: How to use them and what threshold to set (default 0.8). 5. Limits: Maximum document size (50 MB), pages per document (2000 for PDF), and training data requirements (minimum 5 samples for template).

Common Wrong Answers:

Choosing 'Form Recognizer' over 'Document Intelligence' (old name, but the exam uses the new name).

Assuming prebuilt models can be retrained (they cannot; you must use custom models for custom data).

Thinking custom models require labeling all fields (they don't; you can label only the fields you need).

Confusing the 'Read' API with Document Intelligence (Read only does OCR, not field extraction).

Edge Cases:

Documents with multiple pages: The API returns results per page. For invoices spanning multiple pages, the model may not link fields across pages.

Handwritten text: Only certain prebuilt models (e.g., invoice) support handwriting; others do not.

Tables: Custom models can extract tables, but the structure must be labeled during training.

Exam Tips:

If a question mentions 'varied layouts', think 'neural model'.

If a question mentions 'fixed layout', think 'template model'.

The exam may ask about the 'composed model' feature for routing to the correct custom model.

Know the default confidence threshold (0.8) and that you can adjust it.

Remember that Document Intelligence is now the official name, but 'Form Recognizer' may appear in legacy questions.

Key Takeaways

Azure Document Intelligence is the new name for Azure Form Recognizer; the exam uses 'Document Intelligence'.

Prebuilt models are for common documents (invoices, receipts, ID documents) and require no training.

Custom models come in two types: template (fixed layout) and neural (varied layout).

Minimum training samples: 5 for template models, 50+ for neural models.

Maximum document size: 50 MB (standard tier), up to 2000 pages for PDF/TIFF.

Default confidence threshold is 0.8; adjust in application code.

Use the 'AnalyzeDocument' API with the model ID to extract data.

Document classification models can route documents to the correct custom model via composed models.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Prebuilt Model

Ready to use for common document types like invoices, receipts, and business cards.

No training required; just call the API with the model ID.

Extracts a fixed set of fields defined by Microsoft.

Cannot be customized or retrained.

Ideal for standard documents with predictable layouts.

Custom Model

Trained on your own documents to extract specific fields.

Requires manual labeling of sample documents (at least 5 for template, more for neural).

Extracts only the fields you label.

Can be retrained with new samples if the form changes.

Best for company-specific forms or documents with unique layouts.

Watch Out for These

Mistake

Document Intelligence can extract data from any document without training.

Correct

Prebuilt models only work for specific document types (invoices, receipts, etc.). For custom forms, you must train a custom model. The generic 'Read' API only extracts text, not structured fields.

Mistake

Custom template models can handle documents with varying layouts.

Correct

Template models require a fixed layout. For varying layouts, use neural models, which are trained on diverse samples and can adapt to different structures.

Mistake

You need to label every field in a document for custom model training.

Correct

You only need to label the fields you want to extract. Unlabeled fields are ignored. The model learns to find the labeled fields based on their relative position or content.

Mistake

Document Intelligence stores documents permanently after processing.

Correct

Documents are temporarily stored for processing and are deleted after analysis (typically within 24 hours). You are responsible for storing the original documents if needed.

Mistake

The prebuilt invoice model extracts all fields from any invoice.

Correct

The prebuilt model extracts a defined set of fields (e.g., InvoiceId, VendorName, Total). If your invoice has custom fields, you need a custom model.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Azure Document Intelligence and the Read API?

The Read API (part of Computer Vision) performs OCR to extract text from images, but does not extract structured fields like key-value pairs or tables. Document Intelligence builds on OCR by applying models to extract specific fields (e.g., invoice total) and understand document structure. For the exam, if you need to extract text only, use Read; if you need structured data, use Document Intelligence.

How do I train a custom model in Document Intelligence?

Use the Document Intelligence Studio or the REST API. For a template model: upload at least 5 sample documents of the same layout, label the fields you want to extract, and train. For a neural model: upload at least 50 varied samples, label fields, and train. The training process creates a model ID. You can then call the AnalyzeDocument API with that model ID.

What are the pricing tiers for Document Intelligence?

There are two tiers: Free (F0) with 20 pages per month, and Standard (S0) with pay-as-you-go pricing. Prebuilt models cost $1.50 per 1000 pages, custom model training costs $1 per 1000 pages, and custom model analysis costs $1.50 per 1000 pages. There are also costs for storage and other services if used.

Can Document Intelligence process handwritten text?

Yes, some prebuilt models (e.g., invoice) support handwriting, but not all. The Read API also supports handwriting. For custom models, handwriting support depends on the training data. If you include handwritten samples, the model can learn to read handwriting.

What is a composed model in Document Intelligence?

A composed model is a collection of up to 100 custom models that are combined into a single endpoint. When you submit a document, the service automatically selects the best model to analyze it based on the form type. This is useful when you have multiple form types and want a single API endpoint.

How can I handle low confidence scores in extracted data?

You should check the confidence score for each field in the API response. If a field's confidence is below your threshold (e.g., 0.8), flag it for human review. You can also implement a fallback process, such as sending the document to a manual data entry queue or using a different model.

What are the input format requirements for Document Intelligence?

Supported formats: JPEG, PNG, BMP, TIFF, and PDF (text-based or scanned). For PDF and TIFF, up to 2000 pages per document. Maximum file size is 50 MB for the standard tier. For best results, use high-resolution images (300 DPI) and ensure text is clearly visible.

Terms Worth Knowing

API Gateway Azure App Service Azure Functions Azure Key Vault Cloud computing Managed identity Microsoft Entra ID Storage account

Ready to put this to the test?

You've just covered Azure Document Intelligence (Form Recognizer) — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.

Try AZ-204 practice questions Back to all chapters

Done with this chapter?

Azure Language and Speech Services

Deployment Slots and Traffic Splitting

See the full AZ-204 study guide