This chapter covers Azure Document Intelligence, a key AI service for extracting structured data from documents. It is tested under AI-900 Objective 3.4 (Computer Vision workloads). Expect 2-3 questions touching on its capabilities, prebuilt models, and use cases. You must understand the difference between Document Intelligence and simple OCR, the available prebuilt models, and how custom models are trained.
Jump to a section
Imagine a large company's mailroom receives thousands of diverse documents daily: invoices, contracts, receipts, forms. A traditional OCR clerk can only read printed text line by line, outputting raw characters without understanding context. Azure Document Intelligence is like hiring a team of specialized clerks, each trained on specific document types. One clerk handles invoices: she knows where to find the total amount, due date, and vendor name. Another handles contracts: she extracts parties, effective dates, and signature blocks. They don't just read text; they understand the document's structure and semantics. The lead clerk (the service) routes each document to the right specialist based on the document's layout and content. If a document is a new type, the team can be trained with just five examples (custom model). They work asynchronously: you drop a document in a batch, and they return structured JSON with confidence scores. They also handle handwriting, barcodes, and checkboxes. This is far beyond raw OCR—it's intelligent extraction that adapts to your business forms.
What is Azure Document Intelligence?
Azure Document Intelligence (formerly Form Recognizer) is a cloud-based AI service that applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents. It goes beyond optical character recognition (OCR) by understanding the layout and semantics of forms and documents. The service is part of Azure Cognitive Services and is designed to automate data entry from invoices, receipts, contracts, and custom forms.
Why it Exists
Traditional OCR extracts raw text but does not understand context. For example, OCR might output "$1,234.56" and "Invoice Date: 2023-01-15" as mere strings. Document Intelligence identifies that "$1,234.56" is the total amount and "2023-01-15" is the invoice date. This semantic understanding reduces manual data entry and enables downstream automation.
How It Works Internally
Document Intelligence uses a combination of OCR (Read API) and deep learning models. The process involves: - Layout Analysis: The service first analyzes the document's structure, identifying regions like paragraphs, tables, and form fields. It uses a layout model trained on millions of documents. - Key-Value Extraction: For forms, it identifies keys (labels) and their associated values. For example, on a tax form, "Name:" is a key, and the filled-in name is the value. - Table Extraction: It detects tables and extracts rows, columns, and cell content, preserving spatial relationships. - Prebuilt Models: Specialized models for invoices, receipts, identity documents, and business cards are fine-tuned to extract specific fields (e.g., invoice total, receipt merchant name). - Custom Models: You can train models with as few as five sample documents (for template-based forms) or more for complex forms. Custom models learn the layout and field locations.
Key Components, Values, and Defaults
- API Versions: The service has several API versions; the latest stable is 2023-07-31 (preview versions exist). The exam may reference "v3.0" or "v3.1".
- Prebuilt Models:
- prebuilt-invoice: Extracts invoice fields (e.g., VendorName, CustomerName, InvoiceTotal, DueDate).
- prebuilt-receipt: Extracts receipt fields (e.g., MerchantName, TransactionDate, Total).
- prebuilt-idDocument: Extracts from driver licenses and passports (e.g., FirstName, LastName, DocumentNumber, DateOfBirth).
- prebuilt-businessCard: Extracts contact info from business cards.
- prebuilt-read: OCR only, outputs text with bounding boxes.
- prebuilt-layout: Extracts text, tables, and selection marks (checkboxes).
- Custom Models: Two types: template (fixed layout, needs at least 5 samples) and neural (complex layouts, needs more samples). Neural models are more accurate but slower.
- Confidence Scores: Each extracted field includes a confidence value (0-1). The exam may ask about thresholds; default is 0.5 for inclusion.
- Pricing: Based on pages processed. First 500 pages per month are free (S0 tier).
Configuration and Verification Commands
Using the Azure CLI, you can create a Document Intelligence resource:
az cognitiveservices account create --name myDocInt --resource-group myRG --kind FormRecognizer --sku F0 --location eastusTo analyze a document via REST API:
curl -v -X POST "https://myDocInt.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice:analyze?api-version=2023-07-31" -H "Ocp-Apim-Subscription-Key: <key>" -H "Content-Type: application/json" --data-ascii "{\"urlSource\":\"https://example.com/invoice.pdf\"}"The response includes an operation-location header; you poll that URL for results:
curl "https://myDocInt.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice/analyzeResults/<resultId>?api-version=2023-07-31" -H "Ocp-Apim-Subscription-Key: <key>"Interaction with Related Technologies
Document Intelligence integrates with Azure Logic Apps, Power Automate, and custom applications via REST API or SDKs (C#, Python, Java, JavaScript). It can output results to Azure Blob Storage, trigger downstream workflows, or feed into Azure Cognitive Search for indexing. It is often used alongside Azure Functions for serverless document processing pipelines.
Exam-Relevant Details
Document Intelligence is part of Cognitive Services under the Form Recognizer label (older name). The exam may use either term.
It supports PDF, TIFF, PNG, JPEG, and BMP formats.
The Read model extracts text only; Layout adds tables and selection marks.
Custom models require a minimum of 5 sample documents for template models; neural models need more (at least 5 but better with 50+).
The service can handle handwriting via the Read API (prebuilt-read) but not in all prebuilt models.
Billing is per page; each page analyzed incurs a cost, even if blank.
Common Exam Traps
Confusing Document Intelligence with OCR: Document Intelligence does OCR plus semantic extraction.
Thinking custom models need hundreds of samples: template models only need 5.
Assuming all prebuilt models extract the same fields: each is specialized.
Overlooking the difference between prebuilt-read and prebuilt-layout: layout includes tables and selection marks.
Create Document Intelligence Resource
In the Azure portal, search for 'Form Recognizer' and create a resource. Choose the 'Free F0' tier for testing (20 pages per month) or 'Standard S0' for production. Select a region (e.g., East US). Note the endpoint and key. Alternatively, use Azure CLI: `az cognitiveservices account create --kind FormRecognizer --sku F0 --name myDocInt --resource-group myRG --location eastus`. The resource must be in the same region as your storage if using managed identity.
Upload Documents to Blob Storage
For batch processing, store documents in Azure Blob Storage in a container. Use a folder structure if needed. Ensure the documents are in supported formats: PDF, TIFF, PNG, JPEG, BMP. Each page is counted separately. For large volumes, consider using a SAS token for secure access. The service can read directly from a URL or from a blob with public access or SAS.
Choose a Model (Prebuilt or Custom)
Select the appropriate model based on document type. For invoices, use `prebuilt-invoice`. For receipts, `prebuilt-receipt`. For identity documents, `prebuilt-idDocument`. For custom forms, train a custom model using at least 5 samples. The model ID is used in the API call. For layout analysis without field extraction, use `prebuilt-layout`. For pure OCR, use `prebuilt-read`.
Analyze Document via API
Send a POST request to the endpoint: `https://<endpoint>/formrecognizer/documentModels/<modelId>:analyze?api-version=2023-07-31`. Include the subscription key in the header. The request body specifies the document URL (or base64 content for small files). The response includes an `operation-location` header with a URL to poll for results. The analysis is asynchronous; results are available after processing.
Poll for Results and Parse JSON
Poll the operation-location URL with GET requests until the `status` field is 'succeeded'. The response JSON contains extracted fields with confidence scores. For example, an invoice response includes `fields` with `InvoiceTotal`, `VendorName`, etc. Each field has `type` (string, number, date) and `value`. Use confidence thresholds (e.g., >0.8) to filter low-confidence extractions. The JSON structure varies by model.
Enterprise Scenario 1: Accounts Payable Automation
A large retailer receives thousands of invoices daily from suppliers. Previously, clerks manually entered invoice data into the ERP system. Using Document Intelligence with the prebuilt-invoice model, they automate extraction of vendor name, invoice number, date, and total amount. The service processes PDFs uploaded to Azure Blob Storage. Extracted data is sent to Azure Logic Apps, which validates against purchase orders and triggers payment approval workflows. At scale, they use the S0 tier with 10,000 pages per month. Common issues: invoices with complex layouts or handwriting require custom model training. Misconfiguration (e.g., using prebuilt-receipt for invoices) yields incorrect fields.
Scenario 2: Mortgage Document Processing
A bank processes mortgage applications with forms like W-2, pay stubs, and tax returns. They use custom models trained on their specific forms. The bank uploads scanned documents (TIFF) to blob storage. Document Intelligence extracts key fields (e.g., employer name, income, tax year). The extracted JSON is integrated with a loan origination system via a custom application. Performance considerations: neural custom models require more samples but handle varied layouts. A common mistake is not providing enough training samples (only 5 for template models) leading to poor accuracy. They monitor confidence scores and set up manual review for scores below 0.7.
Scenario 3: Expense Report Digitization
A consulting firm uses the prebuilt-receipt model to digitize employee expense receipts. Employees upload photos of receipts via a mobile app. The service extracts merchant name, date, and total. The output is integrated with an expense management tool. At 5000 receipts per month, they use the S0 tier. They discovered that low-resolution images cause extraction failures; they enforce minimum resolution (300 DPI). Another issue: receipts in non-English languages require the language parameter set appropriately (e.g., de for German). The exam may test that the service supports multiple languages via the Read API.
AI-900 Objective 3.4: Computer Vision Workloads
This topic falls under "Describe computer vision workloads on Azure" and specifically "Identify document intelligence solutions." The exam expects you to:
Recognize that Document Intelligence (Form Recognizer) extracts structured data from forms.
Differentiate between prebuilt models: invoice, receipt, ID document, business card.
Know that custom models require at least 5 sample documents.
Understand that it supports handwriting via the Read API.
Common Wrong Answers and Why
"Document Intelligence only works with printed text." Wrong: It also handles handwriting (via prebuilt-read). Many candidates overlook this.
"You need hundreds of documents to train a custom model." Wrong: Template custom models need only 5. Neural models need more but not hundreds.
"All prebuilt models extract the same fields." Wrong: Each is specialized. For example, prebuilt-invoice extracts InvoiceTotal; prebuilt-receipt extracts Total (no invoice fields).
"Document Intelligence is the same as OCR." Wrong: It includes OCR plus semantic extraction.
Specific Numbers and Terms on the Exam
Minimum samples for custom template model: 5.
Supported file formats: PDF, TIFF, PNG, JPEG, BMP.
Prebuilt model names: prebuilt-invoice, prebuilt-receipt, prebuilt-idDocument, prebuilt-businessCard, prebuilt-layout, prebuilt-read.
API version often referenced: 2023-07-31 (or v3.0).
Free tier: 500 pages per month (F0).
Edge Cases and Exceptions
If a document has multiple pages, each page is billed separately.
The service can analyze documents stored in Azure Blob Storage using a SAS URI.
For security, use managed identities instead of keys in production.
The prebuilt-read model does not extract key-value pairs; it only returns text with bounding boxes.
How to Eliminate Wrong Answers
If a question asks about extracting 'total amount' from a receipt, look for prebuilt-receipt, not prebuilt-invoice.
If a question mentions 'training with 5 samples', it's a custom template model.
If a question says 'extracts tables and selection marks', it's prebuilt-layout.
If a question says 'extracts only text', it's prebuilt-read.
Azure Document Intelligence is a Cognitive Service for extracting structured data from documents, formerly called Form Recognizer.
Prebuilt models include invoice, receipt, ID document, business card, layout, and read.
Custom template models require a minimum of 5 sample documents.
Document Intelligence supports PDF, TIFF, PNG, JPEG, and BMP formats.
The service can extract handwriting via the prebuilt-read or layout model.
Each page processed incurs a cost; the free tier offers 500 pages per month.
The latest stable API version is 2023-07-31 (v3.0).
Confidence scores are provided for each extracted field; typically a threshold of 0.5 is used for inclusion.
These come up on the exam all the time. Here's how to tell them apart.
Azure Document Intelligence (Form Recognizer)
Extracts structured data (key-value pairs, tables) from forms.
Includes prebuilt models for invoices, receipts, ID documents, business cards.
Supports custom model training with as few as 5 samples.
Outputs JSON with field names and confidence scores.
Billed per page, with higher cost than OCR alone.
Azure Computer Vision OCR (Read API)
Extracts raw text only, with bounding boxes.
No prebuilt specialized models; single OCR capability.
No custom model training; uses a general OCR engine.
Outputs lines and words with text and bounding box coordinates.
Lower cost per page; suitable for simple text extraction.
Mistake
Document Intelligence requires hundreds of training documents for custom models.
Correct
Template custom models need only 5 sample documents. Neural models need more (at least 5, but 50+ recommended). The exam often tests the '5 minimum' for template models.
Mistake
Document Intelligence is just OCR.
Correct
Document Intelligence includes OCR but also provides semantic extraction (key-value pairs, tables, selection marks). It understands the document structure and meaning, not just raw text.
Mistake
All prebuilt models extract the same fields.
Correct
Each prebuilt model is specialized. For example, `prebuilt-invoice` extracts invoice-specific fields like `InvoiceTotal` and `VendorName`; `prebuilt-receipt` extracts receipt-specific fields like `MerchantName` and `Total`. They are not interchangeable.
Mistake
Document Intelligence cannot process handwritten text.
Correct
The `prebuilt-read` model (and layout model) can extract handwriting. However, prebuilt models like invoice and receipt may not handle handwriting well; they are optimized for printed text.
Mistake
Document Intelligence only works with documents in English.
Correct
Document Intelligence supports multiple languages through the Read API. The prebuilt models have limited language support (e.g., invoice model supports English, Spanish, French). For full language support, use `prebuilt-read` with the language parameter.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
OCR (Optical Character Recognition) extracts raw text from images, returning words and lines with coordinates. Azure Document Intelligence includes OCR but also applies machine learning to understand the document's structure and semantics. It extracts key-value pairs (e.g., 'Invoice Total: $100'), tables, and selection marks. It also offers prebuilt models for specific document types and the ability to train custom models. For the exam, remember that Document Intelligence goes beyond simple OCR by providing structured output.
For a custom template model (fixed-layout forms), you need at least 5 sample documents. For a custom neural model (complex or varied layouts), you need at least 5 samples but more (50+) yield better accuracy. The exam often tests the '5 minimum' for template models. If a question says 'hundreds of samples', it's likely referring to neural models or a misconception.
Yes, but only through the prebuilt-read or prebuilt-layout models, which include handwriting recognition. The specialized prebuilt models (invoice, receipt, etc.) are optimized for printed text and may not handle handwriting well. For the exam, know that handwriting support exists but is not guaranteed in all models.
It supports PDF, TIFF, PNG, JPEG, and BMP. For PDF, it processes each page individually. The exam may ask about supported formats; remember these five. TIFF files with multiple frames are also supported, each frame treated as a page.
The free F0 tier allows 500 pages per month. After that, you pay per page. The exam may test this number. The S0 standard tier has no fixed limit but incurs costs per page.
Use prebuilt-invoice for invoices (contains fields like InvoiceTotal, VendorName, CustomerName, DueDate). Use prebuilt-receipt for receipts (fields like MerchantName, TransactionDate, Total). They are not interchangeable. The exam may give a scenario and ask which model to use.
prebuilt-read extracts only text (words, lines) with bounding boxes. prebuilt-layout also extracts tables, selection marks (checkboxes), and document structure (paragraphs, headings). Both can handle handwriting. For layout understanding, use prebuilt-layout.
You've just covered Azure Document Intelligence — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?