This chapter covers Entity Linking and Knowledge Base, a key concept in Azure AI Language services. You will learn how entity linking disambiguates and identifies real-world entities from text, connecting them to a knowledge base like Wikipedia or a custom database. For the AI-900 exam, this topic appears in roughly 5-10% of questions under objective 4.2 (Natural Language Processing workloads). Mastery of entity linking is essential for understanding how Azure Cognitive Services enrich text with structured data.
Jump to a section
Imagine a vast library with millions of books. Each book has a unique entry in the card catalog: a card with the book's title, author, subject, and a specific shelf location. When a patron asks for 'the book about the French Revolution by Charles Dickens,' the librarian doesn't search the shelves blindly. Instead, she goes to the card catalog, looks up 'French Revolution' and 'Dickens,' and finds that no such book exists—because Dickens didn't write about the French Revolution. The catalog disambiguates: it links the ambiguous phrase 'French Revolution' to the precise subject heading 'France—History—Revolution, 1789-1799' and 'Dickens' to the author 'Dickens, Charles, 1812-1870.' The librarian can then see that 'A Tale of Two Cities' is set during that revolution, so she retrieves that book. In entity linking, the knowledge base is the card catalog: it contains known entities (books) with unique identifiers (catalog entries) and relationships (author, subject). The system takes a mention (like 'Dickens') and links it to the correct entity (Charles Dickens, not his son) by consulting the knowledge base, just as the librarian uses the catalog to find the right book. Without the catalog, the librarian would guess and often get the wrong book; without a knowledge base, entity linking would produce ambiguous or incorrect results.
What is Entity Linking?
Entity linking (also called entity disambiguation or named entity linking) is the process of identifying mentions of entities in text and linking them to a unique, unambiguous identifier in a knowledge base. Unlike named entity recognition (NER), which only labels entities (e.g., 'Person', 'Location'), entity linking goes further by resolving ambiguity. For example, the mention 'Washington' could refer to George Washington, the U.S. state, or the city D.C. Entity linking determines which specific entity is intended and returns a unique ID (e.g., DBpedia or Wikidata ID) along with additional metadata like a description and link to a knowledge base article.
Why Entity Linking Exists
Natural language is inherently ambiguous. Words and phrases often map to multiple real-world entities. Without disambiguation, downstream applications (search, question answering, recommendation) produce incorrect results. Entity linking solves this by leveraging context and a structured knowledge base to resolve ambiguity. In Azure, the Entity Linking API is part of the Azure AI Language service and uses a large knowledge base derived from Wikipedia (Microsoft Concept Graph) to provide disambiguation. The exam expects you to understand that entity linking is about disambiguation using a knowledge base, not just identification.
How Entity Linking Works Internally
The entity linking process in Azure AI Language involves several steps:
Input Text: The user provides a text document (up to 5,120 characters per document).
Mention Detection: The service first identifies potential entity mentions using NER. This step spots phrases that might be entities.
Candidate Generation: For each mention, the service queries the knowledge base to retrieve candidate entities that match the mention string. For example, 'Washington' generates candidates: George Washington, Washington state, Washington D.C., etc.
Disambiguation: Using contextual features (surrounding words, entity types, relationships), the service calculates a confidence score for each candidate. It selects the candidate with the highest score above a threshold (default is 0.5). The disambiguation uses a machine learning model trained on millions of documents.
Linking: The service returns the unique ID (e.g., Wikidata Q-number), a name, and a URL to the knowledge base entry (e.g., Wikipedia URL).
Key Components and Defaults
Knowledge Base: By default, Azure uses a knowledge base derived from Wikipedia. You cannot change this in the standard API, but you can build custom entity linking using Azure Cognitive Search or custom models.
Confidence Score: A value between 0 and 1. The default threshold is 0.5. If no candidate scores above 0.5, the entity is not linked.
Languages: The API supports multiple languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese Simplified, and Arabic. For each language, the knowledge base is specific (e.g., English Wikipedia vs. Spanish Wikipedia).
Input Limits: Each document must be under 5,120 characters. The API can process up to 10 documents per request.
Pricing: Entity linking is billed per text record (1,000 characters per record). The free tier includes 5,000 text records per month.
Configuration and Verification
Entity linking is part of the Azure AI Language service. You can call it via REST API or SDK. Here is an example using Python SDK:
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
endpoint = "https://your-resource.cognitiveservices.azure.com/"
key = "your-key"
client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
documents = ["I visited Washington last year."]
response = client.recognize_linked_entities(documents)
for doc in response:
for entity in doc.entities:
print(f"Name: {entity.name}, ID: {entity.data_source_entity_id}")
print(f"URL: {entity.url}")
print(f"Matches: {[match.text for match in entity.matches]}")Output:
Name: Washington (state), ID: Washington (state)
URL: https://en.wikipedia.org/wiki/Washington_(state)
Matches: ['Washington']If the context were "George Washington", the output would link to George Washington.
Interaction with Other Azure Services
Entity linking works alongside NER and other NLP features. In a typical pipeline, you might first extract entities with NER, then link them to get more details. The results can be fed into knowledge mining solutions (e.g., Azure Cognitive Search) to enrich search indexes. For example, a legal document processing system can link case mentions to a knowledge base of laws and precedents.
Common Exam Scenarios
Differentiating NER vs. Entity Linking: NER labels entities (Person, Location) but does not disambiguate. Entity linking provides a unique ID. Exam questions often ask: "Which service would you use to get a Wikipedia link for a person mentioned in text?" Answer: Entity Linking.
Knowledge Base Requirement: Entity linking always requires a knowledge base. Azure's default is Wikipedia-derived. You cannot use entity linking without a knowledge base.
Languages Supported: The exam may test that entity linking supports multiple languages, but English is the most robust.
Confidence Threshold: If the score is below 0.5, no link is returned. The exam might ask what happens when confidence is low.
Performance Considerations
At scale, entity linking can be computationally expensive because it involves searching a large knowledge base. Azure handles this with pre-built indexes. For custom scenarios, you can use Azure Cognitive Search with custom indexes to perform entity linking on your own data. The standard API has a latency of a few hundred milliseconds per document.
Edge Cases
Multiple Entities in One Mention: Rare, but the API returns the best match.
No Match: If the mention is not in the knowledge base, no entities are returned.
Ambiguous Mentions with Low Confidence: The API may return no entity if confidence is below threshold.
Cross-language Linking: The knowledge base is language-specific; linking a French mention to an English KB may fail.
Summary of Core Concepts
Entity linking = disambiguation + linking to a knowledge base.
Uses context to resolve ambiguity.
Returns unique IDs and URLs.
Part of Azure AI Language service.
Exam focus: differentiate from NER, understand knowledge base requirement, know supported languages and confidence threshold.
1. Input Text Submission
The user sends a document (or multiple documents) to the Azure AI Language endpoint. Each document must be a string of text up to 5,120 characters. The API accepts up to 10 documents per request. The request includes the language code (e.g., 'en') or lets the service auto-detect. The service validates the input and returns an error if limits are exceeded.
2. Mention Detection via NER
The service first performs Named Entity Recognition to identify potential entity mentions in the text. This step extracts phrases that could be entities, such as 'Washington', 'Microsoft', or 'Albert Einstein'. The NER model labels these mentions with types (Person, Location, Organization, etc.). However, NER does not disambiguate; it only identifies that a phrase is an entity.
3. Candidate Generation from KB
For each detected mention, the service queries the knowledge base (Wikipedia-derived) for candidate entities that match the mention string. For example, 'Washington' yields candidates like 'Washington (state)', 'Washington, D.C.', 'George Washington', and 'Washington (surname)'. Each candidate has a unique ID (e.g., Wikidata Q-number) and a description.
4. Disambiguation Using Context
The service uses a machine learning model to score each candidate based on context from the surrounding text. Features include words near the mention, the NER type, and entity relationships. The candidate with the highest confidence score above 0.5 is selected. If multiple candidates tie, the first in alphabetical order may be chosen, but the model usually breaks ties.
5. Return Linked Entity Result
The API returns a list of linked entities for each document. Each entity includes: name (e.g., 'Washington (state)'), ID (e.g., 'Washington (state)'), URL (e.g., 'https://en.wikipedia.org/wiki/Washington_(state)'), and matches (the exact mention text with offset and length). If no candidate exceeds 0.5, the mention is not linked.
Enterprise Scenario 1: News Aggregation and Topic Tagging
A global news agency processes millions of articles daily. They use entity linking to automatically tag articles with relevant entities (people, organizations, locations) and link them to Wikipedia for enrichment. For example, an article mentioning 'Trump' is linked to Donald Trump (not Trump Tower). This enables personalized news feeds and automated fact-checking. In production, they use Azure AI Language with batch processing (up to 10 documents per request) and handle high throughput by scaling the Azure resource. Common issues: ambiguous names like 'Jordan' (country vs. person) require additional context; the default confidence threshold may need tuning (e.g., 0.7) to reduce false positives. Misconfiguration leads to incorrect links, causing poor recommendations.
Enterprise Scenario 2: Customer Support Ticket Routing
A large telecom company uses entity linking to route support tickets. When a customer writes 'My iPhone 12 is not working with Verizon,' the system links 'iPhone 12' to the product entity and 'Verizon' to the carrier entity. This information routes the ticket to the correct team (mobile devices) and provides links to knowledge base articles. They use custom entity linking via Azure Cognitive Search with a private knowledge base of products and services. Performance: latency under 500ms per ticket. Misconfiguration: if the knowledge base lacks product variants, the linking fails, and tickets are misrouted.
Enterprise Scenario 3: Legal Document Analysis
A law firm uses entity linking to extract case references and legal entities from documents. They link mentions of 'Brown v. Board of Education' to the specific case in a legal knowledge base. This helps in legal research and document comparison. They use the standard API but supplement with custom models for domain-specific entities. Scale: thousands of documents per day. Pitfall: the default Wikipedia KB may not include obscure legal cases, so they build a custom KB using Azure Cognitive Search.
What AI-900 Tests on Entity Linking
AI-900 objective 4.2 (Natural Language Processing workloads) includes entity linking as a sub-topic. The exam expects you to:
Differentiate between NER and Entity Linking: NER identifies entities; entity linking disambiguates and links to a knowledge base.
Understand the need for a knowledge base: Entity linking cannot work without a knowledge base (Wikipedia or custom).
Know the default knowledge base: Wikipedia (Microsoft Concept Graph).
Recognize the output: Unique ID, name, URL, and confidence score.
Know supported languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese Simplified, Arabic.
Understand confidence threshold: Default 0.5; if below, no link.
Common Wrong Answers and Why
'Entity linking is the same as NER': Wrong. NER labels entities but does not disambiguate. Candidates confuse the two because both extract entities. Remember: NER = identification; entity linking = identification + disambiguation.
'Entity linking does not require a knowledge base': Wrong. It always requires a knowledge base. Some think it can use any database or no database. The exam emphasizes that linking is to a knowledge base.
'Entity linking returns only the entity type': Wrong. It returns a unique ID and URL, not just the type. NER returns type; entity linking returns more.
'Entity linking supports only English': Wrong. It supports multiple languages, though English is most robust.
Specific Numbers and Terms on the Exam
Confidence threshold: 0.5 (default).
Maximum characters per document: 5,120.
Maximum documents per request: 10.
Knowledge base: Wikipedia (Microsoft Concept Graph).
Output fields: name, data_source_entity_id, url, matches.
Edge Cases and Exceptions
If the mention is not in the knowledge base, no entity is returned (empty list).
If multiple candidates have equal confidence, the API returns one (the first alphabetically by ID).
The service does not link entities if the confidence is below 0.5; it returns an empty list for that mention.
Language auto-detection may misidentify; it's best to specify the language.
How to Eliminate Wrong Answers
If the question asks for 'disambiguation' or 'linking to Wikipedia', choose Entity Linking.
If the question mentions 'type' or 'category', think NER.
If the question says 'unique identifier', think Entity Linking.
If the question mentions 'confidence score', think Entity Linking (NER also has confidence but for type, not disambiguation).
Entity linking disambiguates mentions and links them to a knowledge base (Wikipedia by default).
Entity linking is different from NER: NER identifies; entity linking identifies + disambiguates.
The default confidence threshold is 0.5; entities below this are not linked.
Maximum input: 5,120 characters per document, up to 10 documents per request.
Supported languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese Simplified, Arabic.
Entity linking returns a unique ID (e.g., Wikidata Q-number) and a URL to the knowledge base entry.
If a mention is not in the knowledge base, no linked entity is returned.
Entity linking is part of Azure AI Language service and is billed per text record (1,000 characters).
These come up on the exam all the time. Here's how to tell them apart.
Named Entity Recognition (NER)
Identifies entity mentions in text.
Labels entities with types (Person, Location, etc.).
Does not disambiguate between multiple entities with the same name.
Returns entity type and confidence for type.
Does not require a knowledge base.
Entity Linking
Identifies entity mentions AND disambiguates them.
Links mentions to a unique knowledge base entry.
Resolves ambiguity (e.g., 'Washington' as state vs. person).
Returns unique ID, name, URL, and confidence score.
Requires a knowledge base (Wikipedia or custom).
Mistake
Entity linking and named entity recognition are the same thing.
Correct
NER identifies and labels entities (person, location, etc.) but does not disambiguate. Entity linking disambiguates and links to a unique knowledge base entry. They are separate capabilities in Azure AI Language.
Mistake
Entity linking can be used without a knowledge base.
Correct
Entity linking always requires a knowledge base to resolve ambiguity. Azure's default knowledge base is derived from Wikipedia. Without a KB, the service cannot provide unique IDs or URLs.
Mistake
Entity linking only supports English.
Correct
Entity linking supports multiple languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese Simplified, and Arabic. However, English has the most comprehensive coverage.
Mistake
Entity linking returns the entity type (Person, Location, etc.).
Correct
Entity linking returns a unique ID, name, URL, and confidence score. It does not return entity types; that is the output of NER.
Mistake
The confidence threshold for entity linking is 0.8 by default.
Correct
The default confidence threshold is 0.5. If the confidence score for the best candidate is below 0.5, no linked entity is returned.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
NER (Named Entity Recognition) identifies entities in text and classifies them into predefined types like Person, Location, Organization. It does not resolve ambiguity. Entity linking goes further: it disambiguates entities by linking them to a unique entry in a knowledge base (e.g., Wikipedia). For example, NER labels 'Washington' as a Location, but entity linking determines whether it's Washington state, Washington D.C., or George Washington, and returns a unique ID and URL. On the AI-900 exam, remember that entity linking is about disambiguation using a knowledge base.
Yes, entity linking always requires a knowledge base to provide unique identifiers and disambiguation. In Azure AI Language, the default knowledge base is derived from Wikipedia (Microsoft Concept Graph). You cannot use entity linking without a knowledge base. If you need to link to a custom knowledge base, you can build a custom solution using Azure Cognitive Search or other tools. The exam tests that entity linking depends on a knowledge base.
The default confidence threshold is 0.5. If the best candidate's confidence score is below 0.5, the service does not return a linked entity for that mention. You cannot change this threshold in the standard API. The exam may ask what happens when confidence is low: the entity is not linked.
Azure AI Language entity linking supports the following languages: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese Simplified, and Arabic. English has the most comprehensive knowledge base coverage. The exam may test that multiple languages are supported, but English is the primary one.
Each document can be up to 5,120 characters. A single request can include up to 10 documents. If your text is longer, you must split it into multiple documents. The exam may ask about these limits, so remember: 5,120 characters per document and 10 documents per request.
Yes, that is exactly what entity linking does. It identifies mentions of entities (e.g., a person's name) and links them to the corresponding Wikipedia page (or other knowledge base entry). For example, the mention 'Albert Einstein' would be linked to the Wikipedia article about Albert Einstein, returning the URL and unique ID. This is a common exam scenario.
If the mention is not found in the knowledge base, the service returns an empty list for that mention; no linked entity is returned. For example, a very obscure person not in Wikipedia would not be linked. The exam may test that entity linking cannot link to entities outside the knowledge base.
You've just covered Entity Linking and Knowledge Base — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?