AI-900Chapter 83 of 100Objective 5.3

Embeddings and Vector Search

As a core technique in Generative AI, embeddings and vector search enable semantic understanding and similarity search. For the AI-900 exam, this topic appears in Domain 5: Generative AI, Objective 5.3: 'Describe embeddings and vector search.' Approximately 5–10% of exam questions touch this area, often in the context of Retrieval Augmented Generation (RAG) and Azure AI Search. You will need to understand what embeddings are, how vector search works, and how Azure implements these technologies.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

The Library Card Catalog for AI

A library is a vast repository of every book, article, and note imaginable. Finding a specific book by its title is easy (like a keyword search). But what if you want to find books *similar* to a given one, or books about a concept without knowing the exact words? A traditional card catalog organizes books by title, author, or subject keywords. That fails for similarity. Now imagine a card catalog that assigns every book a unique coordinate in a three-dimensional space based on its content: one axis for 'scientificness', one for 'emotional tone', one for 'length'. Books with similar content cluster together. To find books similar to 'Moby Dick', you locate its coordinates and pull books within a certain radius. That is vector search. The coordinates are the embedding—a numerical representation that captures semantic meaning. The library's coordinate system is the embedding model. The act of finding nearby books is the vector search algorithm (e.g., cosine similarity). This is exactly how Azure AI Search and OpenAI embeddings work: text is converted to a vector (list of numbers) by a model like text-embedding-ada-002, and then a vector index enables efficient retrieval of similar vectors. The library analogy is mechanistic: the coordinate axes correspond to latent semantic dimensions learned by the model, not manually assigned.

How It Actually Works

What Are Embeddings?

Embeddings are numerical representations of data—text, images, audio, or any modality—that capture semantic meaning in a high-dimensional vector space. In the context of AI-900, you focus on text embeddings. An embedding model, such as OpenAI's text-embedding-ada-002 (1536 dimensions) or the newer text-embedding-3-small (512 dimensions), converts a piece of text into a list of floating-point numbers. The key property: texts with similar meanings produce vectors that are close together (by cosine distance), while unrelated texts produce vectors far apart.

How Embeddings Are Generated Internally

Embedding models are transformer-based neural networks. The input text is tokenized into subword units (e.g., using Byte Pair Encoding). The tokens are passed through multiple layers of self-attention and feedforward networks. The final hidden state (often the average of all token embeddings, or a special [CLS] token) is projected into a fixed-size vector. The model is trained on a massive corpus with a contrastive loss objective: maximize cosine similarity for semantically similar pairs and minimize it for dissimilar pairs. The resulting vector space is not interpretable by humans—each dimension does not correspond to a specific concept—but the relative positions encode meaning.

What Is Vector Search?

Vector search is the process of finding vectors in a database that are most similar to a query vector. Unlike keyword search (which matches exact terms), vector search retrieves results based on semantic similarity. The most common distance metric is cosine similarity, defined as:

cosine_similarity(A, B) = (A · B) / (||A|| * ||B||)

Values range from -1 (opposite) to 1 (identical). In practice, vectors are often normalized to unit length, making cosine similarity equivalent to dot product.

How Vector Search Works: The Index

Naively, comparing a query vector against every vector in a database is O(N) per search, which is too slow for large datasets. Vector indexes enable approximate nearest neighbor (ANN) search. Azure AI Search supports several algorithms:

HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph where each layer is a coarser representation of the data. Search starts at the top layer and descends, refining the candidate set. HNSW offers high recall and low latency but uses more memory.

IVF (Inverted File Index): Partitions the vector space into clusters (using k-means). Search only compares the query to vectors in the nearest clusters. IVF is memory-efficient but may have lower recall.

DiskANN: A scalable algorithm that uses a compressed graph stored on disk, suitable for very large datasets.

In Azure AI Search, you configure the index with a vectorSearch profile that specifies the algorithm, metric, and parameters (e.g., m for HNSW connections, efConstruction for build quality).

Key Components in Azure

Azure OpenAI Service: Provides embedding models via API. You call POST https://{resource}.openai.azure.com/openai/deployments/{deployment-id}/embeddings with input text and get back a vector.

Azure AI Search: A cloud search service that can store and index vectors. You define an index with a vectorSearch configuration and a field of type Collection(Edm.Single) for the vector. You can combine vector search with traditional keyword search (hybrid search).

Semantic Kernel or LangChain: Libraries that orchestrate embedding generation and vector search in RAG pipelines.

Configuration and Verification Commands

To create a vector index in Azure AI Search using the REST API:

{
  "name": "my-vector-index",
  "fields": [
    {"name": "id", "type": "Edm.String", "key": true},
    {"name": "content", "type": "Edm.String", "searchable": true},
    {"name": "contentVector", "type": "Collection(Edm.Single)", "searchable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile"}
  ],
  "vectorSearch": {
    "algorithms": [
      {"name": "my-hnsw-algorithm", "kind": "hnsw", "hnswParameters": {"metric": "cosine", "m": 4, "efConstruction": 400, "efSearch": 500}}
    ],
    "profiles": [
      {"name": "my-vector-profile", "algorithm": "my-hnsw-algorithm"}
    ]
  }
}

To query using vector search:

POST /indexes/my-vector-index/docs/search?api-version=2024-05-01-preview
{
  "search": "",
  "vectors": [{"value": [0.1, 0.2, ...], "fields": "contentVector", "k": 10}],
  "select": "id, content"
}

The response includes @search.score based on cosine similarity.

How Embeddings and Vector Search Interact with Related Technologies

Embeddings are the foundation of Retrieval Augmented Generation (RAG). In RAG, user queries are embedded, used to retrieve relevant documents via vector search, and those documents are injected into the prompt to a generative model (e.g., GPT-4). This grounds the model in factual data and reduces hallucinations. Azure AI Search is the primary vector store for RAG on Azure. Other related services include Azure Cosmos DB (which supports vector indexing) and Azure Cache for Redis (with Redisearch vector similarity).

Performance Considerations

Dimensions: Higher dimensions capture more nuance but increase storage and compute. OpenAI's ada-002 uses 1536; text-embedding-3-small uses 512; text-embedding-3-large uses 3072.

Indexing time: HNSW construction is O(N log N). For large datasets, use incremental indexing.

Search latency: ANN searches are typically <100ms for millions of vectors. Exact search (brute force) is only feasible for small sets.

Recall: Trade-off between speed and accuracy. HNSW with high efSearch yields >99% recall but slower.

Exam-Relevant Details

Cosine similarity is the default metric for OpenAI embeddings.

text-embedding-ada-002 outputs 1536 dimensions.

text-embedding-3-small outputs 512 dimensions (can be shortened via dimensions parameter).

text-embedding-3-large outputs 3072 dimensions.

Vector search in Azure AI Search uses HNSW or IVF algorithms.

Azure AI Search supports hybrid search (vector + keyword) with semantic ranking.

RAG combines embeddings, vector search, and generative models.

Cosine similarity range: -1 to 1. For normalized vectors, it's equivalent to dot product (0 to 1).

Walk-Through

Generate embeddings for documents

Use an embedding model like text-embedding-ada-002 to convert each document into a vector. Send a POST request to the Azure OpenAI embeddings endpoint with the document text. The response contains a vector of 1536 floating-point numbers. Each document gets its own vector stored alongside its original text and metadata.

Create a vector index in Azure AI Search

Define an index schema with a field of type Collection(Edm.Single) to hold the embedding vector. Set the dimensions property to 1536. Configure a vectorSearch profile specifying the HNSW algorithm with cosine metric. This index will enable efficient ANN search.

Index documents with their vectors

Upload documents including the vector field to the index using the push API. Each document must have a unique key. The index builds the HNSW graph during indexing. The efConstruction parameter controls the quality of the graph; higher values yield better recall but slower indexing.

Generate embedding for user query

When a user submits a query, call the same embedding model to convert the query text into a vector. Use the same model and dimensions as for the documents. The query vector is not stored; it is used only for the search.

Perform vector search

Send a search request to Azure AI Search with the query vector in the vectors array. Specify the field to search against and the number of top results (k). The search engine computes cosine similarity between the query vector and all indexed vectors using the HNSW graph, returning the k nearest neighbors with their similarity scores.

Return and optionally use results in RAG

The search results include the original document text and a similarity score. In a RAG pipeline, these documents are inserted into the prompt for a generative model (e.g., GPT-4) to produce a grounded answer. The score can be used to filter low-relevance results.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Chatbot

A large e-commerce company builds a chatbot to answer product questions. They have thousands of product descriptions and FAQs. Using Azure OpenAI, they generate embeddings for each document and store them in Azure AI Search. When a customer asks 'How do I return a damaged item?', the query is embedded and vector search retrieves the most relevant return policy documents. These are fed to GPT-4 to generate a concise answer. The system handles 10,000 queries/day with <200ms latency. Misconfiguration: using too few dimensions (e.g., 256) caused poor recall; switching to 1536 fixed it.

Enterprise Scenario 2: Internal Knowledge Base Search

A law firm uses vector search to find relevant case law. They embed legal documents (briefs, rulings) and allow lawyers to search by describing a legal issue. The embedding model captures synonyms and related concepts (e.g., 'breach of contract' matches 'failure to perform'). They use hybrid search (vector + keyword) to ensure exact citation matches are also found. Performance: index of 500k vectors, HNSW with efSearch=500 returns results in 50ms. Common issue: lawyers expected exact phrase matches; they added keyword search as a fallback.

Enterprise Scenario 3: Image Similarity Search

A retail company wants to find visually similar products. They use a multimodal embedding model (e.g., CLIP) to embed product images into the same vector space as text descriptions. A user uploads a photo of a dress; it is embedded and vector search finds similar products from the catalog. They deploy Azure AI Search with a vector index of 1M images. Scaling: they partition the index across multiple replicas. Pitfall: not normalizing vectors led to incorrect similarity scores; they normalized to unit length.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic (Objective 5.3)

Definition of embeddings: Understand that embeddings are numerical representations of data that capture semantic meaning.

Purpose: Enable similarity search, clustering, and classification.

Vector search: Know that it finds items with similar embeddings using distance metrics like cosine similarity.

Azure services: Be able to identify Azure OpenAI Service for generating embeddings and Azure AI Search for vector indexing and search.

RAG: Understand that vector search is a key component of Retrieval Augmented Generation.

Common Wrong Answers and Why Candidates Choose Them

Wrong: 'Embeddings are the same as tokens.' Candidates confuse tokenization (splitting text into words/subwords) with embedding (converting to vectors). Correct: Tokens are inputs; embeddings are outputs of a model.

Wrong: 'Vector search uses exact keyword matching.' Candidates may think vector search is just like SQL LIKE. Correct: Vector search finds semantically similar items, not exact matches.

Wrong: 'Cosine similarity ranges from 0 to 1.' They forget it can be negative. Correct: Cosine similarity ranges from -1 to 1.

Wrong: 'text-embedding-ada-002 outputs 512 dimensions.' They confuse with text-embedding-3-small. Correct: 1536 dimensions.

Specific Numbers and Terms That Appear on the Exam

1536: Dimensions of text-embedding-ada-002.

512: Dimensions of text-embedding-3-small.

Cosine similarity: Default metric for OpenAI embeddings.

HNSW: Default algorithm in Azure AI Search for vector search.

Azure AI Search: The service for indexing and searching vectors.

RAG: Retrieval Augmented Generation.

Edge Cases and Exceptions

Embedding models can produce biased vectors if trained on biased data.

Vector search may return irrelevant results if the query is out-of-domain.

For very large datasets (>1B vectors), consider DiskANN or partitioned indexes.

Cosine similarity with non-normalized vectors is not equivalent to dot product.

How to Eliminate Wrong Answers

If the answer mentions 'exact match' or 'keyword', it is wrong for vector search.

If the answer says 'tokens' instead of 'vectors', it is wrong.

If the answer gives a dimension count other than 1536 for ada-002, it is wrong.

If the answer says 'Azure SQL Database' for vector search, it is wrong (use Azure AI Search).

Key Takeaways

Embeddings convert data into numerical vectors that capture semantic meaning.

The default OpenAI embedding model text-embedding-ada-002 outputs 1536 dimensions.

Vector search finds items with similar embeddings using cosine similarity.

Azure AI Search provides vector indexing with HNSW or IVF algorithms.

RAG uses embeddings and vector search to retrieve relevant information for generative models.

Cosine similarity ranges from -1 to 1; for normalized vectors, it is equivalent to dot product.

Hybrid search combines vector and keyword search for better results.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Keyword Search (BM25)

Matches exact terms in documents.

Cannot handle synonyms or paraphrases.

Uses inverted index and TF-IDF/BM25 scoring.

Fast and easy to implement.

Fails for semantic queries like 'how to fix a leaky faucet' if the word 'leak' is absent.

Vector Search

Matches based on semantic similarity.

Handles synonyms, paraphrases, and related concepts.

Uses ANN algorithms (HNSW, IVF).

Requires embedding model and vector index.

Can retrieve documents with completely different wording but similar meaning.

Watch Out for These

Mistake

Embeddings are just a list of random numbers.

Correct

Embeddings are learned, dense vectors where each dimension encodes latent semantic features. They are not random; they are produced by a trained neural network to preserve semantic similarity.

Mistake

Vector search always returns exact nearest neighbors.

Correct

Most vector search implementations use approximate nearest neighbor (ANN) algorithms for speed. They may miss some true nearest neighbors, but recall is typically >95% with proper tuning.

Mistake

Higher dimensions always give better results.

Correct

Higher dimensions can capture more nuance but also increase noise and computational cost. The optimal dimension depends on the dataset and task. OpenAI offers 512, 1536, and 3072 options.

Mistake

Cosine similarity is the only metric for vector search.

Correct

Other metrics include Euclidean distance (L2), dot product, and Manhattan distance. Cosine is common for normalized vectors, but Azure AI Search supports multiple metrics.

Mistake

Embeddings can only be generated for text.

Correct

Embeddings can be generated for images, audio, video, and multimodal data. Models like CLIP produce embeddings that align images and text in the same vector space.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between embedding and tokenization?

Tokenization splits text into smaller units (tokens), which are then converted into numerical IDs. Embedding maps each token (or the entire sequence) into a dense vector of floating-point numbers. Tokenization is a preprocessing step; embedding is the output of a neural network.

Which Azure service is used for vector search?

Azure AI Search (formerly Azure Cognitive Search) is the primary service for vector search on Azure. It supports vector indexing, hybrid search, and semantic ranking.

How do I choose the right embedding model?

Consider the dimensionality, performance, and cost. text-embedding-ada-002 (1536 dims) is a good default. text-embedding-3-small (512 dims) is faster and cheaper. text-embedding-3-large (3072 dims) offers higher accuracy but more storage. Test with your data.

What is cosine similarity and why is it used?

Cosine similarity measures the angle between two vectors. It ranges from -1 (opposite direction) to 1 (same direction). It is used because it is invariant to vector magnitude, focusing on direction, which captures semantic orientation.

Can I use vector search with SQL databases?

Yes, some SQL databases like Azure SQL Database and Azure Cosmos DB now support vector search. However, Azure AI Search is purpose-built for search and offers better scalability and features like hybrid search.

What is HNSW in Azure AI Search?

HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor algorithm. It builds a multi-layer graph for efficient search. Parameters: m (number of connections), efConstruction (build quality), efSearch (search depth).

How does RAG use embeddings?

In RAG, user queries are embedded to retrieve relevant documents via vector search. The retrieved documents are added to the prompt for a generative model, grounding its response in factual data and reducing hallucinations.

Terms Worth Knowing

Artificial intelligence Computer vision Generative AI Machine learning Natural language processing Responsible AI

Ready to put this to the test?

You've just covered Embeddings and Vector Search — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

Retrieval Augmented Generation (RAG)

Azure AI Content Safety

See the full AI-900 study guide