DP-900Chapter 38 of 101Objective 2.3

Full-Text Search in Azure SQL

This chapter covers full-text search in Azure SQL, a powerful feature for performing advanced text queries against character-based data. Full-text search is a key topic in DP-900 exam domain 'Relational Data' (objective 2.3), which focuses on querying relational data in Azure SQL Database and Azure SQL Managed Instance. Approximately 10-15% of exam questions touch on full-text search capabilities, often comparing them to basic LIKE queries. By the end of this chapter, you will understand how full-text indexing works, how to create and manage full-text indexes, and how to write full-text queries using CONTAINS and FREETEXT.

25 min read
Intermediate
Updated May 31, 2026

Library Card Catalog with Subject Index

Full-text search in Azure SQL is like a library that goes beyond a simple card catalog. Imagine a library where each book is stored on a shelf, and the card catalog only lists books by title and author. If you want to find books that mention 'quantum computing' anywhere in their content, you'd have to manually flip through every book. That's like a regular SQL LIKE query. Now, imagine the library builds a special index: a massive, alphabetized list of every significant word from every book, along with the exact page numbers where each word appears. This is the full-text index. When you search for 'quantum computing', the librarian doesn't go to the shelves. She goes to this index, finds the entries for 'quantum' and 'computing', intersects the page lists, and returns only the books that have both words on the same page. She can even rank the results by how many times the words appear and how close they are together. This index is huge, so it's stored in a separate structure called a full-text catalog. The librarian updates this index periodically, not in real-time, because rebuilding the index is expensive. In Azure SQL, the full-text index is stored in the database but managed by the Full-Text Engine, which uses a filter daemon process to extract text from various document types like PDFs or Word files. The index itself is a set of inverted lists, mapping each token to its locations. The engine uses a thesaurus to handle synonyms and a stoplist to ignore common words like 'the' or 'and'. This is exactly how Azure SQL's full-text search works: it builds an inverted index of tokens from text columns, enables fast linguistic searches, and supports ranking and proximity queries.

How It Actually Works

What is Full-Text Search and Why Does It Exist?

Full-text search in Azure SQL provides linguistic search capabilities against character-based data stored in SQL tables. Unlike the LIKE operator, which performs simple pattern matching and cannot index individual words efficiently, full-text search builds an inverted index that maps each significant word (token) to its locations within the indexed columns. This enables fast, flexible queries that can match words, phrases, inflectional forms (e.g., 'run', 'ran', 'running'), synonyms, and proximity (words near each other).

Full-text search is essential for applications that require search functionality similar to web search engines, such as e-commerce product search, document management, and content management systems. The DP-900 exam tests your ability to distinguish between basic pattern matching and full-text search, and to identify scenarios where full-text search is appropriate.

How Full-Text Search Works Internally

The full-text search process involves several components working together:

1. Full-Text Index Population: When you create a full-text index on a table, the Full-Text Engine (FTE) reads the data from the specified columns. For each row, it extracts the text and passes it to a word breaker (which identifies word boundaries) and a stemmer (which reduces words to their root forms). The resulting tokens are stored in an inverted index. This process is called a population. There are three types: - Full population: Indexes all rows. Occurs when the index is first created or rebuilt. - Incremental population: Indexes only rows that changed since the last population. Requires a timestamp column (timestamp or rowversion). - Manual population: Triggered by ALTER FULLTEXT INDEX START UPDATE POPULATION.

2.

Inverted Index Structure: The inverted index is a set of tables that store each token, its frequency, and the list of document keys (row identifiers) where the token appears. The index is stored in a full-text catalog, which is a logical container that can hold one or more full-text indexes. The catalog itself is stored as a filegroup in the database.

3.

Filter Daemon (Fdhost.exe): For columns that contain documents (e.g., varbinary(max) with a file extension), the Full-Text Engine uses a filter daemon process to extract text from the document format (PDF, Word, etc.). This process runs outside SQL Server for security and stability.

4.

Stoplist: A stoplist defines words that are ignored during indexing (noise words like 'the', 'a', 'and'). Azure SQL includes a system stoplist by default, but you can create custom stoplists.

5.

Thesaurus: A thesaurus file (XML) defines synonyms and expansions. For example, you can configure that searching for 'car' also returns results for 'automobile'. The thesaurus is applied at query time, not during indexing.

Key Components, Values, Defaults, and Timers

Full-Text Catalog: A logical container. Default name is sys.fulltext_catalogs. You can have multiple catalogs per database.

Full-Text Index: Created on a table with a unique, single-column, non-nullable index (usually a primary key). The columns indexed must be character-based (char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary(max)).

Word Breaker: Language-specific. Azure SQL supports over 50 languages. The language used affects stemming and stopwords.

Population Timer: By default, full population occurs immediately after index creation. For incremental populations, the engine tracks changes using a timestamp column. The population is asynchronous; queries can still run during population but may not see all results until population completes.

Full-Text Index Size: Typically 10-20% of the source data size, but can vary based on tokenization.

Full-Text Query Performance: Queries like CONTAINS and FREETEXT use the index directly, unlike LIKE which scans the table.

Configuration and Verification Commands

To create a full-text catalog:

CREATE FULLTEXT CATALOG ProductCatalog AS DEFAULT;

To create a full-text index on a table Products with a primary key ProductID and columns Name and Description:

CREATE FULLTEXT INDEX ON Products
(
    Name LANGUAGE 1033,  -- English
    Description LANGUAGE 1033
)
KEY INDEX PK_Products
ON ProductCatalog
WITH (CHANGE_TRACKING AUTO);

The CHANGE_TRACKING AUTO option enables automatic incremental population. Other options: MANUAL (manual population) and OFF (no tracking).

To view full-text catalogs:

SELECT * FROM sys.fulltext_catalogs;

To view full-text indexes:

SELECT * FROM sys.fulltext_indexes;

To populate a full-text index manually:

ALTER FULLTEXT INDEX ON Products START UPDATE POPULATION;

Interaction with Related Technologies

Full-text search integrates with: - Azure SQL Database: Fully supported in single databases and elastic pools. No need for separate service. - Azure SQL Managed Instance: Same support as SQL Server. - Azure Synapse Analytics: Not supported for dedicated SQL pools; use LIKE or other text search methods. - Semantic Search: An extension of full-text search that extracts key phrases and document similarity. Not commonly tested on DP-900.

Full-text search is often compared to LIKE queries. The exam expects you to know that LIKE '%keyword%' cannot use an index efficiently and performs a table scan, while full-text search uses an index for fast retrieval. However, full-text search has overhead in index maintenance and storage, so it's not suitable for every scenario.

Trap Patterns

Common wrong answers on the exam:

Thinking that LIKE can perform linguistic searches (it cannot; it only does pattern matching).

Assuming full-text search is always faster than LIKE (false for small datasets where index overhead outweighs benefit).

Confusing full-text indexing with clustered or nonclustered indexes (they are separate structures).

Believing that full-text search can index binary data directly (it requires a filter for document types).

Edge Cases and Exceptions

Full-text index cannot be created on tables without a unique key.

Columns of type text and ntext are deprecated; use varchar(max) and nvarchar(max) instead.

Full-text search does not support nvarchar(max) with LANGUAGE specification for certain languages? Actually, it does.

In Azure SQL Database, the full-text index is always stored in the same database; you cannot use a separate filegroup.

Thesaurus files are not directly manageable in Azure SQL Database; you must use the sp_fulltext_load_thesaurus_file stored procedure.

Verification Queries

To verify full-text index population status:

SELECT * FROM sys.dm_fts_index_population;

To search for a word using CONTAINS:

SELECT ProductID, Name
FROM Products
WHERE CONTAINS(Name, 'bike');

To search for meaning using FREETEXT:

SELECT ProductID, Name
FROM Products
WHERE FREETEXT(Name, 'mountain bicycle');

Both CONTAINS and FREETEXT can be used in a SELECT statement with a FROM clause. Note that CONTAINSTABLE and FREETEXTTABLE return a rank column, but these are more advanced.

Walk-Through

1

Create a full-text catalog

First, create a full-text catalog to logically group one or more full-text indexes. Use the `CREATE FULLTEXT CATALOG` statement. The catalog is stored in the database as a filegroup. You can specify a name and set it as the default catalog. For example: `CREATE FULLTEXT CATALOG ProductCatalog AS DEFAULT;`. This step is optional because you can also use an existing catalog or let Azure SQL create one automatically. However, for exam purposes, remember that a catalog must exist before creating an index.

2

Create a full-text index on a table

Use `CREATE FULLTEXT INDEX` to define which columns to index, the language for word breaking, and the unique key index. The table must have a unique, single-column, non-nullable index (usually the primary key). The columns can be `char`, `varchar`, `nchar`, `nvarchar`, `text`, `ntext`, `image`, `xml`, or `varbinary(max)`. Specify the language using a locale identifier (LCID), e.g., 1033 for English. The index is created on the specified catalog. Example: `CREATE FULLTEXT INDEX ON Products (Name LANGUAGE 1033, Description LANGUAGE 1033) KEY INDEX PK_Products ON ProductCatalog WITH (CHANGE_TRACKING AUTO);`

3

Populate the full-text index

After creation, the full-text index must be populated with data. With `CHANGE_TRACKING AUTO`, the index is automatically populated after creation and updated when data changes. Alternatively, you can use `CHANGE_TRACKING MANUAL` and manually start population with `ALTER FULLTEXT INDEX ON Products START UPDATE POPULATION;`. The population process reads the data, tokenizes it, and builds the inverted index. During population, queries still run but may return incomplete results. Use `sys.dm_fts_index_population` to monitor progress.

4

Query using CONTAINS or FREETEXT

Write queries using `CONTAINS` (for precise word or phrase matching) or `FREETEXT` (for meaning-based matching). `CONTAINS` supports exact words, phrases, prefixes, proximity, inflectional forms, and thesaurus. `FREETEXT` breaks the search string into individual words and matches based on meaning, ignoring stopwords. Both can be used in a `WHERE` clause. Example: `SELECT * FROM Products WHERE CONTAINS(Name, 'mountain')`. Full-text queries use the inverted index, so they are fast even on large tables. The `LIKE` operator, in contrast, does not use the full-text index.

5

Maintain the full-text index

Over time, maintain the full-text index by rebuilding or reorganizing it. Use `ALTER FULLTEXT INDEX ON Products REBUILD` to rebuild the index (full population). Use `ALTER FULLTEXT INDEX ON Products REORGANIZE` to merge internal fragments. In Azure SQL, index maintenance is mostly automatic, but you can manually trigger population. Also manage stoplists and thesaurus files. For example, to add a stopword: `ALTER FULLTEXT STOPLIST MyStoplist ADD 'the' LANGUAGE 1033;`. These maintenance tasks ensure optimal query performance.

What This Looks Like on the Job

Enterprise Scenarios for Full-Text Search in Azure SQL

1. E-commerce Product Search

An online retailer stores product names and descriptions in an Azure SQL Database. Customers need to search for products using natural language queries like 'blue running shoes'. Using full-text search with FREETEXT on the Description column allows the application to return relevant results even if the exact phrase isn't present. The full-text index is created with CHANGE_TRACKING AUTO so that new products are searchable immediately after insertion. The database is scaled to handle thousands of queries per second. A common misconfiguration is not using a stoplist, causing common words like 'the' and 'and' to bloat the index. In production, the team monitors the full-text index size and schedules a weekly rebuild during low traffic to defragment the index.

2. Document Management System

A law firm stores legal documents as varbinary(max) in Azure SQL Managed Instance. They use full-text search with filters (e.g., PDF filter) to extract text from the documents. Users can search for specific clauses or terms across thousands of documents. The full-text index includes a custom thesaurus to handle legal synonyms. The firm uses CONTAINS with proximity terms to find words within a certain distance (e.g., NEAR((breach, contract), 10)). A pitfall is that the filter daemon process can be resource-intensive; they had to increase the ft_ftdi_host_cpu setting (though not configurable in Azure SQL DB) and ensure the database has enough DTUs/vCores. Without proper indexing, searches would take minutes; with full-text, they take seconds.

3. Content Management for a News Portal

A news website uses Azure SQL Database to store articles. Editors need to search for articles by topic, author, or keyword. Full-text search enables ranking of results based on word frequency and proximity. They use CONTAINSTABLE to get ranked results. The full-text index is on the Title and Body columns. They also use language-specific word breakers for multilingual content. A common issue is that the full-text index does not automatically update for bulk inserts; they had to use CHANGE_TRACKING MANUAL and schedule population after nightly data loads. Without correct change tracking, the index becomes stale, and searches miss recent articles.

What Goes Wrong When Misconfigured

Missing unique key index: Full-text index creation fails.

Incorrect language specification: Stemming and stopwords behave incorrectly, leading to poor search results.

Not using a stoplist: Index size balloons with noise words, slowing down queries.

Overusing full-text search on small tables: The overhead of index maintenance outweighs the query performance benefit; LIKE might be faster.

Ignoring population status: Queries return incomplete results if the index hasn't finished populating.

How DP-900 Actually Tests This

DP-900 Exam Focus on Full-Text Search

Objective Codes: This topic falls under 'Relational Data' (objective 2.3) – 'Query relational data in Azure SQL Database and Azure SQL Managed Instance'. The exam specifically tests your ability to identify appropriate query methods for text data.

What the Exam Tests:

The difference between LIKE and full-text search (FTS).

When to use CONTAINS vs FREETEXT.

Requirements for creating a full-text index (unique key, character columns).

That FTS uses an inverted index, not a B-tree.

That FTS supports linguistic features (stemming, thesaurus, stoplists).

Common Wrong Answers and Why Candidates Choose Them:

1.

'LIKE is faster than full-text search for large text columns' – Candidates assume LIKE uses an index, but LIKE '%keyword%' cannot use an index and scans the entire table. FTS uses an inverted index, so it is faster for large datasets. The exam may present a scenario with millions of rows; FTS is the correct answer.

2.

'Full-text search can only search one column at a time' – Actually, you can include multiple columns in a full-text index and query them together using CONTAINS(*, 'keyword').

3.

'Full-text search is always better than LIKE' – False for small tables (e.g., 100 rows) where index overhead outweighs benefit. The exam tests that you consider the trade-off.

4.

'Full-text index is a B-tree index' – Candidates confuse it with clustered/nonclustered indexes. The full-text index is an inverted index stored in a separate structure.

Specific Numbers, Values, and Terms:

LCID 1033 for English.

CHANGE_TRACKING AUTO vs MANUAL.

CONTAINS for precise matching; FREETEXT for meaning.

sys.fulltext_indexes and sys.dm_fts_index_population.

Thesaurus files (.xml) and stoplists.

Edge Cases and Exceptions:

Full-text search cannot be used on tables without a unique key.

It does not support text or ntext columns (deprecated).

In Azure SQL Database, you cannot use a separate filegroup for the catalog.

FREETEXT ignores stopwords; CONTAINS does not (unless you use CONTAINS with FORMSOF(THESAURUS, ...)).

How to Eliminate Wrong Answers:

If the question asks for 'fastest way to find a word in a large text column', full-text search is correct.

If the question mentions 'linguistic search' or 'stemming', full-text search is the answer.

If the question mentions 'pattern matching with wildcards', LIKE is the answer.

If the question says 'no additional index needed', then LIKE is the only option (since full-text requires an index).

Exam Tips:

Remember that full-text search is an optional feature; you must enable it (it's enabled by default in Azure SQL).

Know that CONTAINSTABLE and FREETEXTTABLE return a rank column but are not required for basic queries.

Understand that full-text search is not available in Azure Synapse Analytics dedicated SQL pools.

Sample Exam Question:

You have a table with 10 million product descriptions. You need to search for products that contain the word 'waterproof' in the description. Which query method should you use?

A. SELECT * FROM Products WHERE Description LIKE '%waterproof%' B. SELECT * FROM Products WHERE CONTAINS(Description, 'waterproof') C. SELECT * FROM Products WHERE Description = 'waterproof' D. SELECT * FROM Products WHERE Description IN ('waterproof')

Correct answer: B. LIKE '%waterproof%' cannot use an index and will scan 10 million rows. CONTAINS uses the full-text index and returns results quickly. Option C and D only match exact strings.

Key Takeaways

Full-text search uses an inverted index, not a B-tree index.

To create a full-text index, the table must have a unique, single-column, non-nullable key index.

CONTAINS performs exact word or phrase matching; FREETEXT matches meaning and ignores stopwords.

LIKE '%keyword%' cannot use an index and performs a table scan; full-text search is faster for large datasets.

Full-text search supports multiple languages via language-specific word breakers and stoplists.

Change tracking can be AUTO (automatic) or MANUAL; population is asynchronous.

Full-text index can be created on char, varchar, nchar, nvarchar, text, ntext, image, xml, and varbinary(max) columns.

Full-text search is not available in Azure Synapse Analytics dedicated SQL pools.

The full-text catalog is a logical container for one or more full-text indexes.

Use sys.dm_fts_index_population to monitor index population status.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

LIKE Operator

Performs simple pattern matching with wildcards (% and _).

Cannot use indexes for patterns with leading wildcard ('%keyword').

No linguistic support (stemming, thesaurus, stopwords).

Works on any character data type.

Best for small tables or exact prefix matches where index can be used (e.g., 'keyword%').

Full-Text Search (CONTAINS/FREETEXT)

Performs linguistic search with word breaking, stemming, thesaurus, and stoplists.

Uses an inverted index for fast retrieval regardless of pattern position.

Supports proximity, inflectional forms, and weighted terms.

Requires a full-text index on the table with a unique key.

Best for large tables with natural language search requirements.

Watch Out for These

Mistake

Full-text search uses a B-tree index like regular indexes.

Correct

Full-text search uses an inverted index, which maps each word to its locations. B-tree indexes are used for equality and range queries, not for linguistic searches.

Mistake

LIKE '%keyword%' is faster than full-text search for large tables.

Correct

LIKE with a leading wildcard cannot use any index and performs a full table scan. Full-text search uses an inverted index, making it much faster for large tables.

Mistake

Full-text search can be created on any column without restrictions.

Correct

Full-text index requires a unique, single-column, non-nullable key index (usually primary key). The columns to be indexed must be character-based or binary with a filter.

Mistake

FREETEXT and CONTAINS are interchangeable and work the same way.

Correct

CONTAINS performs exact word or phrase matching, while FREETEXT matches meaning by breaking the search string into words and using stemming and thesaurus. They have different syntax and behavior.

Mistake

Full-text search indexes are automatically updated in real-time.

Correct

Change tracking can be automatic (AUTO) but updates are asynchronous. There is a delay between data modification and index update. For real-time requirements, use AUTO with a small population interval, but it's not instantaneous.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between CONTAINS and FREETEXT in Azure SQL full-text search?

CONTAINS performs precise matching for words, phrases, prefixes, proximity, and inflectional forms. It is case-insensitive but respects exact word boundaries. For example, `CONTAINS(Column, 'bike')` matches 'bike' but not 'bikes' unless you use `FORMSOF(INFLECTIONAL, bike)`. FREETEXT, on the other hand, breaks the search string into individual words and matches based on meaning, using stemming and thesaurus. It ignores stopwords and returns results where any of the words appear. For example, `FREETEXT(Column, 'bicycle riding')` might match rows containing 'bike' or 'cycling'. For the DP-900 exam, remember that CONTAINS is for exact searches and FREETEXT is for meaning-based searches.

Can I use full-text search on an Azure SQL Database without any additional setup?

What are the requirements for creating a full-text index?

To create a full-text index, the table must have a unique, single-column, non-nullable key index (usually the primary key). The columns you want to index must be of a character-based data type: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max). For binary columns like varbinary(max), you need to specify a file extension column so that the filter daemon can extract text. Additionally, you need a full-text catalog (which can be created automatically if you don't specify one). The language for word breaking must be specified using a locale identifier (LCID), e.g., 1033 for English.

Is full-text search faster than LIKE for all queries?

No. Full-text search is faster than `LIKE '%keyword%'` for large tables because it uses an inverted index. However, for small tables (e.g., fewer than 1,000 rows), the overhead of the full-text index may outweigh the benefit, and `LIKE` might be comparable or even faster. Also, `LIKE 'keyword%'` (without leading wildcard) can use a regular index and may be very fast. For the DP-900 exam, you should understand that full-text search excels for large datasets and linguistic searches, while `LIKE` is simpler and works without additional index setup.

How do I monitor the status of a full-text index population?

Use the dynamic management view `sys.dm_fts_index_population`. This view returns the database ID, table ID, population type (full, incremental, manual), status (e.g., 'Running', 'Completed'), and other details. For example: `SELECT * FROM sys.dm_fts_index_population WHERE database_id = DB_ID();`. You can also check `sys.fulltext_indexes` for the `is_enabled` and `change_tracking_state` columns. Monitoring is important to ensure that the index is up-to-date, especially after large data loads.

Can I use full-text search on XML columns?

Yes, full-text search can be used on XML columns. The full-text engine can extract text from XML content, ignoring tags. However, it indexes the text content, not the XML structure. If you need to query specific XML elements, you should use XQuery instead. For the exam, know that XML is a supported data type for full-text indexing.

What is a stoplist and how do I use it?

A stoplist is a list of words that are ignored during full-text indexing (noise words). These are common words like 'the', 'and', 'a', etc. Azure SQL includes a system stoplist for each language. You can create a custom stoplist using `CREATE FULLTEXT STOPLIST` and attach it to a full-text index. To add a stopword: `ALTER FULLTEXT STOPLIST MyStoplist ADD 'the' LANGUAGE 1033;`. Using a stoplist reduces index size and improves query performance. The exam may test that stoplists help eliminate noise words.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Full-Text Search in Azure SQL — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?