AI-900Chapter 88 of 100Objective 5.3

Azure AI Search for RAG Applications

This chapter covers Azure AI Search in the context of Retrieval-Augmented Generation (RAG) applications, a critical topic for the AI-900 exam under Domain 5 (Generative AI). You will learn how Azure AI Search enables LLMs to ground responses in your own data, reducing hallucinations and improving accuracy. This area accounts for approximately 10-15% of exam questions, focusing on understanding the search service's role, its indexing pipeline, and how it integrates with Azure OpenAI Service. By the end, you'll be able to describe the components, configuration, and best practices for building a RAG solution.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

RAG as a Researcher with Indexed Library

Imagine a researcher (the LLM) who has a vast general knowledge but needs to answer specific questions about a company's internal policies. Instead of relying only on memory, the researcher is given a personal librarian (Azure AI Search) who maintains a meticulously organized card catalog (the search index). When a question arrives, the researcher doesn't guess—he asks the librarian: "Find all cards related to 'expense reimbursement policy'." The librarian instantly retrieves the top 5 most relevant cards, each pointing to a specific document and page. The researcher then reads those documents (the retrieved chunks) and synthesizes an answer grounded in the actual text. Without the librarian, the researcher might guess incorrectly or say "I don't know." The librarian's catalog is built in advance by indexing every document—extracting key terms, embedding them into vectors, and storing metadata. When a query comes, the librarian uses both keyword search (like scanning the catalog's subject headings) and vector search (like finding conceptually similar cards) to fetch the best matches. This hybrid approach ensures high recall and precision. The entire process is mechanistic: the librarian never interprets the question; she just matches patterns. The researcher then does the interpretation. This separation of retrieval and generation is the core of RAG, and Azure AI Search is the librarian that makes it fast, scalable, and accurate.

How It Actually Works

What is Azure AI Search and Why Does It Exist?

Azure AI Search (formerly Azure Cognitive Search) is a cloud search-as-a-service solution that provides full-text search, vector search, and hybrid search capabilities. In the context of RAG, it serves as the retrieval engine that supplies relevant documents or text chunks to an LLM. The LLM then uses these retrieved pieces to generate a grounded answer. Without a retrieval system, an LLM relies solely on its training data, which may be outdated or lack domain-specific information. RAG solves this by fetching fresh, relevant content from a search index at inference time.

How It Works Internally: The Indexing and Query Pipeline

The process has two main phases: indexing and querying.

Indexing Phase: 1. Data Ingestion: Documents (PDFs, Word, HTML, etc.) are loaded into Azure AI Search via a data source connector (e.g., Azure Blob Storage, Cosmos DB). 2. Document Cracking: The service parses documents into text and metadata. For PDFs, it extracts text and preserves structure (headings, tables). 3. Text Splitting (Chunking): Documents are split into smaller chunks, typically 500-1000 tokens, with overlap (e.g., 100 tokens) to preserve context across chunk boundaries. This is often done using Azure AI Document Intelligence or custom logic. 4. Vectorization: Each chunk is converted into a vector embedding using a text-embedding model (e.g., Azure OpenAI's text-embedding-ada-002). The embedding is a high-dimensional vector (1536 dimensions for ada-002) that captures semantic meaning. 5. Index Creation: The chunks, their vectors, and metadata are stored in the search index. The index includes fields like id, content, content_vector, source_file, page_number. The content_vector field is configured with a vector search algorithm (e.g., HNSW – Hierarchical Navigable Small World) for efficient similarity search.

Querying Phase: 1. User Query: The user submits a question (e.g., "What is the expense reimbursement policy?"). 2. Query Vectorization: The query is converted into a vector using the same embedding model used during indexing. 3. Search Execution: Azure AI Search performs a hybrid search: - Keyword search: Uses BM25 ranking to find chunks containing the exact words "expense," "reimbursement," "policy." - Vector search: Uses cosine similarity to find chunks with vectors close to the query vector. - Hybrid fusion: Combines results using Reciprocal Rank Fusion (RRF) to produce a unified ranked list. 4. Result Retrieval: The top K results (e.g., 5) are returned, each containing the chunk text and metadata. 5. LLM Generation: The chunks are injected into a prompt template (e.g., "Context: {chunk1} {chunk2} ... Question: {query} Answer:") and sent to an LLM (e.g., GPT-4). The LLM generates an answer grounded in the provided context.

Key Components, Values, Defaults, and Timers

Search Service Tier: Basic, Standard (S1, S2, S3), Storage Optimized (L1, L2). For production RAG, S1 or higher is recommended. The Free tier is limited to 3 indexes and 50 MB storage.

Index Size Limits: S1 allows 15 GB per partition, up to 25 GB total. S2 allows 100 GB per partition, up to 200 GB total.

Vector Search Dimensions: Up to 2048 dimensions per vector field. Azure OpenAI text-embedding-ada-002 uses 1536 dimensions.

HNSW Parameters: M (default 4, max 64) controls the number of neighbors per node; efConstruction (default 400) controls index quality; efSearch (default 500) controls search accuracy.

Semantic Search: An optional feature that uses language understanding to improve ranking. It re-ranks the top 50 results from the initial search using a semantic model.

Indexers: Automate the indexing pipeline. They can run on a schedule (e.g., every 5 minutes) or on demand. The max indexing interval is 24 hours.

Data Sources: Supported connectors include Azure Blob Storage, Azure SQL Database, Cosmos DB, SharePoint, and more.

Skillsets: Use AI enrichment (e.g., OCR, entity recognition, key phrase extraction) during indexing. This is part of Azure AI Search's cognitive skills.

Quotas: Free tier: 3 indexes, 10 indexers, 10 data sources. S1: 50 indexes, 50 indexers, 50 data sources.

Configuration and Verification Commands

You can configure Azure AI Search via the Azure portal, REST API, or Azure CLI. Example CLI commands:

# Create a search service
az search service create --name myservice --resource-group myrg --sku basic

# Create an index
az search index create --service-name myservice --name myindex --fields @fields.json

# Create a data source
az search datasource create --service-name myservice --name myblob --type azureblob --credentials "connectionString=..." --container mycontainer

# Create an indexer
az search indexer create --service-name myservice --name myindexer --data-source-name myblob --target-index-name myindex

# Run an indexer
az search indexer run --service-name myservice --name myindexer

To verify indexing status:

az search indexer show --service-name myservice --name myindexer --query status

How It Interacts with Related Technologies

Azure OpenAI Service: Provides the embedding model for vectorization and the LLM for generation. The search service and OpenAI are connected via a RAG pattern, often using Azure AI Studio or custom orchestration (e.g., LangChain, Semantic Kernel).

Azure AI Document Intelligence: Used to extract text, layout, and tables from complex documents (e.g., scanned PDFs) before chunking.

Azure AI Studio: Provides a no-code interface to build RAG apps, connecting Azure AI Search as a data source and Azure OpenAI as the model.

Azure Functions or Logic Apps: Can orchestrate the RAG pipeline, handling pre-processing and post-processing.

Azure Cosmos DB: Often used to store chat history and metadata alongside the search index.

Walk-Through

Provision Azure AI Search Service

Create a search service in your Azure subscription. Choose the tier based on expected volume and performance needs. For RAG, S1 is typical. Configure networking (public or private endpoint) and authentication (API key or Azure AD). This is the foundational step; without a service, no indexing or querying can occur.

Prepare and Ingest Source Documents

Upload your documents (PDFs, Word files, etc.) to a data source like Azure Blob Storage. Ensure documents are in a supported format and are not encrypted. The search service will connect to this storage via a data source definition, which includes connection string and container path. For large volumes, consider partitioning data into multiple containers.

Create an Index Schema

Define the index structure in JSON, specifying fields for ID, chunk content, vector embedding, and metadata (e.g., source, page number). The vector field must have a 'dimensions' property (e.g., 1536) and a 'vectorSearchProfile' pointing to an HNSW algorithm configuration. This schema determines how data is stored and queried.

Configure an Indexer with Skillset

Set up an indexer that automates the pipeline: from data source to index. Attach a skillset if you need AI enrichment (e.g., OCR, key phrase extraction). The indexer will split documents into chunks (you can define chunk size and overlap). It will also call the embedding model (via a custom skill or integrated vectorization preview) to generate vectors. Run the indexer to populate the index.

Implement Hybrid Search Query

Build a query that combines keyword and vector search. Use a POST request to the search endpoint with parameters: 'search' for full-text, 'vectors' for vector query, and 'searchMode=any' or 'all'. The response includes ranked results. In a RAG app, you retrieve the top K results (e.g., 5) and pass the 'content' field to the LLM prompt. Test the query to ensure relevant chunks are returned.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Knowledge Base

A large insurance company uses Azure AI Search to power a RAG-based customer support chatbot. They have thousands of PDF policy documents and claim-handling guides. The problem: agents spent minutes searching for answers, and the previous chatbot hallucinated incorrect policy details. Solution: They ingested all documents into Azure Blob Storage, created an index with vector embeddings from text-embedding-ada-002, and set up an indexer to run nightly. The chatbot's LLM (GPT-4) is prompted with the top 5 retrieved chunks. Results: First-call resolution improved by 40%, and hallucination rates dropped below 1%. Common pitfall: Not chunking properly—if chunks are too large (e.g., entire 50-page document), the LLM context window fills with irrelevant text, degrading accuracy. They optimized chunk size to 512 tokens with 100-token overlap.

Enterprise Scenario 2: Internal HR Policy Assistant

A multinational corporation deployed a RAG assistant for employees to query HR policies (benefits, leave, code of conduct). They used Azure AI Search with hybrid search because employees often used different terminology (e.g., "vacation" vs. "annual leave"). The vector search captured semantic similarity, while keyword search ensured exact matches for policy numbers. They configured semantic search re-ranking to improve relevance. Scale: 50,000 documents, 1 million chunks, S2 tier. Performance: Average query latency under 500 ms. Misconfiguration: Initially, they used only keyword search, which missed semantically related queries. After enabling hybrid search, accuracy improved by 60%.

Enterprise Scenario 3: Legal Document Review

A law firm uses RAG to help lawyers quickly find precedents in a corpus of 500,000 legal documents. They needed high precision because missing a relevant case could have serious consequences. They used Azure AI Search with vector search only (no keyword) because legal language is highly specific and exact word matches often miss synonyms. They tuned HNSW parameters: M=32, efConstruction=400, efSearch=800. They also used metadata filtering (e.g., court, date range) to narrow results. Pitfall: Without proper access control, sensitive documents were exposed to all users. They implemented index-level security using Azure AD and field-level security filters.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic

The AI-900 exam (objective 5.3) expects you to understand the role of Azure AI Search in a RAG solution, not the deep technical configuration. Key points tested: - Purpose: Azure AI Search retrieves relevant information from your own data to ground LLM responses. - Components: Index, indexer, data source, skillset (optional). - Search types: Keyword search, vector search, hybrid search (the exam may ask which is best for semantic understanding). - Integration: It connects to Azure OpenAI Service for embeddings and generation. - Benefits: Reduces hallucinations, provides up-to-date information, and allows access to private data.

Common Wrong Answers and Why Candidates Choose Them

"Azure AI Search generates the answer directly." – Wrong. It only retrieves chunks; the LLM generates the answer. Candidates confuse retrieval with generation.

"You must use the same embedding model for indexing and querying." – True, but many candidates think you can use different models. The exam may test that consistency is required.

"Vector search is always better than keyword search." – Wrong. Hybrid search is best. Candidates overvalue vectors because they are newer.

"RAG eliminates the need for fine-tuning." – Not necessarily. RAG and fine-tuning solve different problems. RAG is for accessing external data; fine-tuning is for model behavior.

Specific Numbers and Terms That Appear Verbatim

1536 dimensions: The default for text-embedding-ada-002.

HNSW algorithm: The default vector search algorithm.

BM25: The keyword search ranking function.

Reciprocal Rank Fusion (RRF): The hybrid fusion method.

Semantic search: A separate feature that re-ranks results.

Indexer: Automates the pipeline.

Chunking: Splitting documents into smaller pieces.

Edge Cases and Exceptions

If no relevant chunks are found: The LLM should be instructed to say "I don't know" rather than hallucinate.

When using semantic search: It only re-ranks the top 50 results from the initial search; it doesn't retrieve new documents.

Free tier limitations: Only 3 indexes, 50 MB storage – not suitable for production RAG.

How to Eliminate Wrong Answers

Focus on the retrieval vs. generation distinction. If an answer says the search service "generates" or "creates" an answer, it's wrong. Also, remember that RAG uses external data, so any answer suggesting the LLM uses only training data is incorrect.

Key Takeaways

Azure AI Search is the retrieval component in RAG; it does not generate answers.

Hybrid search (keyword + vector) is the recommended approach for best accuracy.

The vector embedding model used during indexing must be the same as during querying.

Chunk size typically ranges from 500-1000 tokens with overlap (e.g., 100 tokens).

HNSW algorithm is default for vector search; parameters like M and efConstruction affect accuracy vs. speed.

Semantic search re-ranks the top 50 results from the initial search; it does not retrieve new documents.

Indexers automate the pipeline from data source to index, supporting scheduled runs.

The Free tier is limited to 3 indexes, 50 MB storage, and 10 indexers – unsuitable for production RAG.

RAG reduces hallucinations but does not eliminate the need for fine-tuning for behavior modification.

Access control is critical: use Azure AD or field-level security to protect sensitive data.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Keyword Search (BM25)

Matches exact words and phrases from the query.

Works well for proper names, codes, and specific terms.

Fails on synonyms or semantically similar but different wording.

Ranking based on term frequency and inverse document frequency.

Faster for small indexes but less accurate for conceptual queries.

Vector Search (Cosine Similarity)

Matches semantic meaning using vector embeddings.

Handles synonyms, paraphrases, and conceptual similarity.

Requires an embedding model to vectorize both documents and queries.

Ranking based on cosine distance between vectors.

More computationally expensive but captures deeper semantic relationships.

Watch Out for These

Mistake

Azure AI Search is a vector database only.

Correct

It is a hybrid search service that supports keyword, vector, and hybrid search. It is not exclusively a vector database; it also provides full-text search with BM25 ranking.

Mistake

RAG requires fine-tuning the LLM on your data.

Correct

RAG does not modify the LLM. It retrieves relevant chunks from a search index and includes them in the prompt. The LLM remains unchanged; only the context provided to it changes.

Mistake

You must use Azure OpenAI for embeddings with Azure AI Search.

Correct

While Azure OpenAI is a common choice, you can use any embedding model that outputs vectors compatible with the index's dimensions. Azure AI Search also provides built-in vectorization via integrated vectorization (preview) using Azure OpenAI.

Mistake

Chunking is unnecessary; you can index entire documents as a single entry.

Correct

Large documents exceed the LLM's context window and reduce relevance. Chunking into smaller pieces (e.g., 500 tokens) improves retrieval accuracy and allows the LLM to focus on the most relevant parts.

Mistake

Semantic search and vector search are the same.

Correct

Semantic search uses a language model to re-rank results based on semantic understanding, but it still relies on an initial keyword or vector search. Vector search uses embeddings to find semantically similar vectors. They are complementary, not identical.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Azure AI Search and Azure OpenAI Service in a RAG app?

Azure AI Search is the retrieval engine that indexes your data and returns relevant chunks. Azure OpenAI Service provides the embedding model (to vectorize text) and the generative LLM (to produce the final answer). They work together: the search service retrieves context, and OpenAI generates the response.

Do I need to fine-tune the LLM if I use RAG?

No. RAG does not require fine-tuning. The LLM remains generic; you provide relevant context from the search index in the prompt. Fine-tuning is a separate technique used to change the model's behavior or style, not to inject new factual knowledge.

Can I use Azure AI Search without vector search?

Yes. Azure AI Search supports full-text keyword search (BM25) alone. However, for RAG, vector search is highly recommended to capture semantic meaning. You can also use hybrid search (both) for best results.

What is the recommended chunk size for RAG?

A common starting point is 500 tokens with 100-token overlap. Adjust based on your document structure and LLM context window. For example, GPT-4 has an 8K or 32K context window, so you can retrieve multiple chunks (e.g., 5 chunks of 500 tokens = 2500 tokens) plus the query and instruction.

How do I secure my search index so only authorized users can query certain documents?

You can implement security trimming using Azure AD authentication and role-based access control (RBAC). Alternatively, use field-level security filters in your queries to restrict documents based on user identity or group membership.

What is the cost of Azure AI Search for a production RAG app?

Cost depends on the tier and number of partitions. S1 starts at ~$70/month per partition (1 partition = 15 GB storage). You also pay for indexer runs, skillset executions, and outbound data transfer. For a typical app with 100 GB of data, you might need S2 (~$250/month per partition).

Can I use Azure AI Search with open-source LLMs?

Yes. Azure AI Search is agnostic to the LLM. You can retrieve chunks and pass them to any LLM (e.g., Llama 2, Mistral) hosted on Azure or elsewhere. However, the embedding model must be compatible (same dimensions).

Terms Worth Knowing

Artificial intelligence Machine learning Responsible AI

Ready to put this to the test?

You've just covered Azure AI Search for RAG Applications — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

Azure AI Foundry (Azure AI Hub)

Azure AI Services Overview

See the full AI-900 study guide