This chapter covers embeddings and vector search, a core technique in Generative AI that enables semantic understanding and similarity search. For the AI-900 exam, this topic appears in Domain 5: Generative AI, Objective 5.3: 'Describe embeddings and vector search.' Approximately 5–10% of exam questions touch this area, often in the context of Retrieval Augmented Generation (RAG) and Azure AI Search. You will need to understand what embeddings are, how vector search works, and how Azure implements these technologies.
Jump to a section
Imagine a vast library where every book, article, and note is stored. Finding a specific book by its title is easy (like a keyword search). But what if you want to find books *similar* to a given one, or books about a concept without knowing the exact words? A traditional card catalog organizes books by title, author, or subject keywords. That fails for similarity. Now imagine a card catalog that assigns every book a unique coordinate in a three-dimensional space based on its content: one axis for 'scientificness', one for 'emotional tone', one for 'length'. Books with similar content cluster together. To find books similar to 'Moby Dick', you locate its coordinates and pull books within a certain radius. That is vector search. The coordinates are the embedding—a numerical representation that captures semantic meaning. The library's coordinate system is the embedding model. The act of finding nearby books is the vector search algorithm (e.g., cosine similarity). This is exactly how Azure AI Search and OpenAI embeddings work: text is converted to a vector (list of numbers) by a model like text-embedding-ada-002, and then a vector index enables efficient retrieval of similar vectors. The library analogy is mechanistic: the coordinate axes correspond to latent semantic dimensions learned by the model, not manually assigned.
What Are Embeddings?
Embeddings are numerical representations of data—text, images, audio, or any modality—that capture semantic meaning in a high-dimensional vector space. In the context of AI-900, you focus on text embeddings. An embedding model, such as OpenAI's text-embedding-ada-002 (1536 dimensions) or the newer text-embedding-3-small (512 dimensions), converts a piece of text into a list of floating-point numbers. The key property: texts with similar meanings produce vectors that are close together (by cosine distance), while unrelated texts produce vectors far apart.
How Embeddings Are Generated Internally
Embedding models are transformer-based neural networks. The input text is tokenized into subword units (e.g., using Byte Pair Encoding). The tokens are passed through multiple layers of self-attention and feedforward networks. The final hidden state (often the average of all token embeddings, or a special [CLS] token) is projected into a fixed-size vector. The model is trained on a massive corpus with a contrastive loss objective: maximize cosine similarity for semantically similar pairs and minimize it for dissimilar pairs. The resulting vector space is not interpretable by humans—each dimension does not correspond to a specific concept—but the relative positions encode meaning.
What Is Vector Search?
Vector search is the process of finding vectors in a database that are most similar to a query vector. Unlike keyword search (which matches exact terms), vector search retrieves results based on semantic similarity. The most common distance metric is cosine similarity, defined as:
cosine_similarity(A, B) = (A · B) / (||A|| * ||B||)
Values range from -1 (opposite) to 1 (identical). In practice, vectors are often normalized to unit length, making cosine similarity equivalent to dot product.
How Vector Search Works: The Index
Naively, comparing a query vector against every vector in a database is O(N) per search, which is too slow for large datasets. Vector indexes enable approximate nearest neighbor (ANN) search. Azure AI Search supports several algorithms:
HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph where each layer is a coarser representation of the data. Search starts at the top layer and descends, refining the candidate set. HNSW offers high recall and low latency but uses more memory.
IVF (Inverted File Index): Partitions the vector space into clusters (using k-means). Search only compares the query to vectors in the nearest clusters. IVF is memory-efficient but may have lower recall.
DiskANN: A scalable algorithm that uses a compressed graph stored on disk, suitable for very large datasets.
In Azure AI Search, you configure the index with a vectorSearch profile that specifies the algorithm, metric, and parameters (e.g., m for HNSW connections, efConstruction for build quality).
Key Components in Azure
Azure OpenAI Service: Provides embedding models via API. You call POST https://{resource}.openai.azure.com/openai/deployments/{deployment-id}/embeddings with input text and get back a vector.
Azure AI Search: A cloud search service that can store and index vectors. You define an index with a vectorSearch configuration and a field of type Collection(Edm.Single) for the vector. You can combine vector search with traditional keyword search (hybrid search).
Semantic Kernel or LangChain: Libraries that orchestrate embedding generation and vector search in RAG pipelines.
Configuration and Verification Commands
To create a vector index in Azure AI Search using the REST API:
{
"name": "my-vector-index",
"fields": [
{"name": "id", "type": "Edm.String", "key": true},
{"name": "content", "type": "Edm.String", "searchable": true},
{"name": "contentVector", "type": "Collection(Edm.Single)", "searchable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile"}
],
"vectorSearch": {
"algorithms": [
{"name": "my-hnsw-algorithm", "kind": "hnsw", "hnswParameters": {"metric": "cosine", "m": 4, "efConstruction": 400, "efSearch": 500}}
],
"profiles": [
{"name": "my-vector-profile", "algorithm": "my-hnsw-algorithm"}
]
}
}To query using vector search:
POST /indexes/my-vector-index/docs/search?api-version=2024-05-01-preview
{
"search": "",
"vectors": [{"value": [0.1, 0.2, ...], "fields": "contentVector", "k": 10}],
"select": "id, content"
}The response includes @search.score based on cosine similarity.
How Embeddings and Vector Search Interact with Related Technologies
Embeddings are the foundation of Retrieval Augmented Generation (RAG). In RAG, user queries are embedded, used to retrieve relevant documents via vector search, and those documents are injected into the prompt to a generative model (e.g., GPT-4). This grounds the model in factual data and reduces hallucinations. Azure AI Search is the primary vector store for RAG on Azure. Other related services include Azure Cosmos DB (which supports vector indexing) and Azure Cache for Redis (with Redisearch vector similarity).
Performance Considerations
Dimensions: Higher dimensions capture more nuance but increase storage and compute. OpenAI's ada-002 uses 1536; text-embedding-3-small uses 512; text-embedding-3-large uses 3072.
Indexing time: HNSW construction is O(N log N). For large datasets, use incremental indexing.
Search latency: ANN searches are typically <100ms for millions of vectors. Exact search (brute force) is only feasible for small sets.
Recall: Trade-off between speed and accuracy. HNSW with high efSearch yields >99% recall but slower.
Exam-Relevant Details
Cosine similarity is the default metric for OpenAI embeddings.
text-embedding-ada-002 outputs 1536 dimensions.
text-embedding-3-small outputs 512 dimensions (can be shortened via dimensions parameter).
text-embedding-3-large outputs 3072 dimensions.
Vector search in Azure AI Search uses HNSW or IVF algorithms.
Azure AI Search supports hybrid search (vector + keyword) with semantic ranking.
RAG combines embeddings, vector search, and generative models.
Cosine similarity range: -1 to 1. For normalized vectors, it's equivalent to dot product (0 to 1).
Generate embeddings for documents
Use an embedding model like text-embedding-ada-002 to convert each document into a vector. Send a POST request to the Azure OpenAI embeddings endpoint with the document text. The response contains a vector of 1536 floating-point numbers. Each document gets its own vector stored alongside its original text and metadata.
Create a vector index in Azure AI Search
Define an index schema with a field of type Collection(Edm.Single) to hold the embedding vector. Set the dimensions property to 1536. Configure a vectorSearch profile specifying the HNSW algorithm with cosine metric. This index will enable efficient ANN search.
Index documents with their vectors
Upload documents including the vector field to the index using the push API. Each document must have a unique key. The index builds the HNSW graph during indexing. The efConstruction parameter controls the quality of the graph; higher values yield better recall but slower indexing.
Generate embedding for user query
When a user submits a query, call the same embedding model to convert the query text into a vector. Use the same model and dimensions as for the documents. The query vector is not stored; it is used only for the search.
Perform vector search
Send a search request to Azure AI Search with the query vector in the vectors array. Specify the field to search against and the number of top results (k). The search engine computes cosine similarity between the query vector and all indexed vectors using the HNSW graph, returning the k nearest neighbors with their similarity scores.
Return and optionally use results in RAG
The search results include the original document text and a similarity score. In a RAG pipeline, these documents are inserted into the prompt for a generative model (e.g., GPT-4) to produce a grounded answer. The score can be used to filter low-relevance results.
Enterprise Scenario 1: Customer Support Chatbot
A large e-commerce company builds a chatbot to answer product questions. They have thousands of product descriptions and FAQs. Using Azure OpenAI, they generate embeddings for each document and store them in Azure AI Search. When a customer asks 'How do I return a damaged item?', the query is embedded and vector search retrieves the most relevant return policy documents. These are fed to GPT-4 to generate a concise answer. The system handles 10,000 queries/day with <200ms latency. Misconfiguration: using too few dimensions (e.g., 256) caused poor recall; switching to 1536 fixed it.
Enterprise Scenario 2: Internal Knowledge Base Search
A law firm uses vector search to find relevant case law. They embed legal documents (briefs, rulings) and allow lawyers to search by describing a legal issue. The embedding model captures synonyms and related concepts (e.g., 'breach of contract' matches 'failure to perform'). They use hybrid search (vector + keyword) to ensure exact citation matches are also found. Performance: index of 500k vectors, HNSW with efSearch=500 returns results in 50ms. Common issue: lawyers expected exact phrase matches; they added keyword search as a fallback.
Enterprise Scenario 3: Image Similarity Search
A retail company wants to find visually similar products. They use a multimodal embedding model (e.g., CLIP) to embed product images into the same vector space as text descriptions. A user uploads a photo of a dress; it is embedded and vector search finds similar products from the catalog. They deploy Azure AI Search with a vector index of 1M images. Scaling: they partition the index across multiple replicas. Pitfall: not normalizing vectors led to incorrect similarity scores; they normalized to unit length.
What AI-900 Tests on This Topic (Objective 5.3)
Definition of embeddings: Understand that embeddings are numerical representations of data that capture semantic meaning.
Purpose: Enable similarity search, clustering, and classification.
Vector search: Know that it finds items with similar embeddings using distance metrics like cosine similarity.
Azure services: Be able to identify Azure OpenAI Service for generating embeddings and Azure AI Search for vector indexing and search.
RAG: Understand that vector search is a key component of Retrieval Augmented Generation.
Common Wrong Answers and Why Candidates Choose Them
Wrong: 'Embeddings are the same as tokens.' Candidates confuse tokenization (splitting text into words/subwords) with embedding (converting to vectors). Correct: Tokens are inputs; embeddings are outputs of a model.
Wrong: 'Vector search uses exact keyword matching.' Candidates may think vector search is just like SQL LIKE. Correct: Vector search finds semantically similar items, not exact matches.
Wrong: 'Cosine similarity ranges from 0 to 1.' They forget it can be negative. Correct: Cosine similarity ranges from -1 to 1.
Wrong: 'text-embedding-ada-002 outputs 512 dimensions.' They confuse with text-embedding-3-small. Correct: 1536 dimensions.
Specific Numbers and Terms That Appear on the Exam
1536: Dimensions of text-embedding-ada-002.
512: Dimensions of text-embedding-3-small.
Cosine similarity: Default metric for OpenAI embeddings.
HNSW: Default algorithm in Azure AI Search for vector search.
Azure AI Search: The service for indexing and searching vectors.
RAG: Retrieval Augmented Generation.
Edge Cases and Exceptions
Embedding models can produce biased vectors if trained on biased data.
Vector search may return irrelevant results if the query is out-of-domain.
For very large datasets (>1B vectors), consider DiskANN or partitioned indexes.
Cosine similarity with non-normalized vectors is not equivalent to dot product.
How to Eliminate Wrong Answers
If the answer mentions 'exact match' or 'keyword', it is wrong for vector search.
If the answer says 'tokens' instead of 'vectors', it is wrong.
If the answer gives a dimension count other than 1536 for ada-002, it is wrong.
If the answer says 'Azure SQL Database' for vector search, it is wrong (use Azure AI Search).
Embeddings convert data into numerical vectors that capture semantic meaning.
The default OpenAI embedding model text-embedding-ada-002 outputs 1536 dimensions.
Vector search finds items with similar embeddings using cosine similarity.
Azure AI Search provides vector indexing with HNSW or IVF algorithms.
RAG uses embeddings and vector search to retrieve relevant information for generative models.
Cosine similarity ranges from -1 to 1; for normalized vectors, it is equivalent to dot product.
Hybrid search combines vector and keyword search for better results.
These come up on the exam all the time. Here's how to tell them apart.
Keyword Search (BM25)
Matches exact terms in documents.
Cannot handle synonyms or paraphrases.
Uses inverted index and TF-IDF/BM25 scoring.
Fast and easy to implement.
Fails for semantic queries like 'how to fix a leaky faucet' if the word 'leak' is absent.
Vector Search
Matches based on semantic similarity.
Handles synonyms, paraphrases, and related concepts.
Uses ANN algorithms (HNSW, IVF).
Requires embedding model and vector index.
Can retrieve documents with completely different wording but similar meaning.
Mistake
Embeddings are just a list of random numbers.
Correct
Embeddings are learned, dense vectors where each dimension encodes latent semantic features. They are not random; they are produced by a trained neural network to preserve semantic similarity.
Mistake
Vector search always returns exact nearest neighbors.
Correct
Most vector search implementations use approximate nearest neighbor (ANN) algorithms for speed. They may miss some true nearest neighbors, but recall is typically >95% with proper tuning.
Mistake
Higher dimensions always give better results.
Correct
Higher dimensions can capture more nuance but also increase noise and computational cost. The optimal dimension depends on the dataset and task. OpenAI offers 512, 1536, and 3072 options.
Mistake
Cosine similarity is the only metric for vector search.
Correct
Other metrics include Euclidean distance (L2), dot product, and Manhattan distance. Cosine is common for normalized vectors, but Azure AI Search supports multiple metrics.
Mistake
Embeddings can only be generated for text.
Correct
Embeddings can be generated for images, audio, video, and multimodal data. Models like CLIP produce embeddings that align images and text in the same vector space.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Tokenization splits text into smaller units (tokens), which are then converted into numerical IDs. Embedding maps each token (or the entire sequence) into a dense vector of floating-point numbers. Tokenization is a preprocessing step; embedding is the output of a neural network.
Azure AI Search (formerly Azure Cognitive Search) is the primary service for vector search on Azure. It supports vector indexing, hybrid search, and semantic ranking.
Consider the dimensionality, performance, and cost. text-embedding-ada-002 (1536 dims) is a good default. text-embedding-3-small (512 dims) is faster and cheaper. text-embedding-3-large (3072 dims) offers higher accuracy but more storage. Test with your data.
Cosine similarity measures the angle between two vectors. It ranges from -1 (opposite direction) to 1 (same direction). It is used because it is invariant to vector magnitude, focusing on direction, which captures semantic orientation.
Yes, some SQL databases like Azure SQL Database and Azure Cosmos DB now support vector search. However, Azure AI Search is purpose-built for search and offers better scalability and features like hybrid search.
HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor algorithm. It builds a multi-layer graph for efficient search. Parameters: m (number of connections), efConstruction (build quality), efSearch (search depth).
In RAG, user queries are embedded to retrieve relevant documents via vector search. The retrieved documents are added to the prompt for a generative model, grounding its response in factual data and reducing hallucinations.
You've just covered Embeddings and Vector Search — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?