AI-900Chapter 72 of 100Objective 4.5

Custom Translator

This chapter covers Custom Translator, a feature of Azure Cognitive Services that allows you to build custom neural machine translation models tailored to your domain-specific terminology and style. For the AI-900 exam, Custom Translator falls under NLP workload objective 4.5, which tests your understanding of translation services and customization capabilities. Approximately 5-8% of exam questions touch on translation services, with one or two specifically about Custom Translator's purpose, data requirements, and how it differs from the prebuilt Translator service.

25 min read
Intermediate
Updated May 31, 2026

Custom Translator as a Bilingual Editor

Imagine you run a business translating legal contracts from English to Japanese. A generic translator (like Microsoft's prebuilt neural machine translation) knows standard vocabulary and grammar. But your contracts use specialized legal terms like 'force majeure' and 'indemnification' that the generic translator often mistranslates. You hire a bilingual editor (Custom Translator) who reads your existing translated contracts (parallel documents) and learns your specific terminology and style. The editor creates a custom glossary and style guide (custom model) that the generic translator must follow. Now, when the generic translator produces a draft, the editor automatically corrects 'force majeure' to the precise Japanese legal term you use, and adjusts sentence structure to match your preferred formality. The editor doesn't translate from scratch; it fine-tunes the generic output using your specialized knowledge. If you later add new terms, you just update the glossary. The editor works 24/7, never gets tired, and can handle millions of words. This is exactly how Custom Translator works: it takes Microsoft's powerful prebuilt translation model and adapts it using your parallel documents to produce domain-specific, consistent translations without retraining the entire model from scratch.

How It Actually Works

What is Custom Translator and Why Does It Exist?

Custom Translator is a cloud-based service within Azure Cognitive Services that enables you to create custom translation models for the Translator Text API. The prebuilt Translator service supports over 100 languages and works well for general text, but it often struggles with domain-specific terminology, brand names, acronyms, and stylistic preferences. For example, a medical device company might need translations that consistently use 'pacemaker' instead of 'artificial pacemaker' or 'cardiac resynchronization therapy device' in a specific way. Custom Translator solves this by allowing you to train a model on your own parallel documents (source-target sentence pairs) and monolingual data.

How It Works Internally

Custom Translator uses transfer learning from Microsoft's large-scale neural machine translation (NMT) models. Instead of training a model from scratch, which would require millions of sentence pairs and weeks of compute time, Custom Translator fine-tunes a pre-existing base model using your data. The process involves:

Base Model: Microsoft maintains a generic NMT model for each language pair, trained on billions of sentences from web content, news, and other public sources. This model already understands general grammar and vocabulary.

Training Data: You upload parallel documents (e.g., English source with French target) in sentence-aligned format. The service splits your data into training (80%), tuning (10%), and testing (10%) sets automatically.

Fine-Tuning: Custom Translator adjusts the weights of the base model's neural network using your training data. This is a supervised learning process where the model learns to map your source sentences to your target sentences. The tuning set is used to prevent overfitting, and the test set evaluates final model quality.

Bilingual Dictionary (Glossary): You can also provide a glossary of specific term translations (e.g., 'Azure' -> 'Azure' in French, not 'Bleu'). The glossary is applied as hard constraints during translation, ensuring those terms are always translated as specified.

Model Training Time: Training typically takes a few hours for small datasets (10,000 sentences) to up to 24 hours for large datasets (1 million+ sentences). Training is billed per hour of compute.

Key Components, Values, Defaults, and Timers

- Supported Languages: Custom Translator supports language pairs where both languages are available in the Translator Text API. As of 2024, it supports over 100 languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Russian, and many more. A current list is available in the Azure documentation. - Data Requirements: Minimum 10,000 parallel sentences for training, though 100,000+ yields better quality. Each sentence pair must be aligned — one source sentence per line, one target sentence per line in a UTF-8 encoded file. File formats: TMX (Translation Memory eXchange), XLIFF (XML Localization Interchange File Format), or plain text (source and target files). - Glossary: You can upload a glossary file in either: - Plain text format: Two columns (source term, target term) separated by tab. - TMX format: Standard translation memory format. - Maximum glossary size: 5 MB. - Model Deployment: After training, you can deploy the model to a custom endpoint. The endpoint URL is: https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&category=your-category-id - The category ID is generated when you create a project. You must include it in API calls to use your custom model. - Pricing: Custom Translator has two cost components:

Training: $10 per hour of training compute.

Translation: Standard Translator pricing ($10 per million characters for S1 tier) plus a $1 per million character surcharge for custom models.

Model Lifecycle: You can have up to 10 trained models per workspace. Models are stored indefinitely but you can delete them anytime.

Configuration and Verification Commands

Custom Translator is primarily managed through the Azure portal (Custom Translator UI) and REST APIs. There is no native CLI for Custom Translator, but you can use Azure CLI to manage the Translator resource itself. Here are key steps:

1. Create a Translator resource in Azure portal (or via CLI):

az cognitiveservices account create --name MyTranslator --resource-group MyRG --kind TextTranslation --sku S1 --location global

2. Access Custom Translator portal: Navigate to https://portal.customtranslator.azure.com and sign in with your Azure subscription. Create a workspace linked to your Translator resource. 3. Upload documents: Use the portal to upload source and target files. Or use the REST API:

curl -X POST "https://customtranslator.azure.com/api/workspaces/{workspaceId}/documents" -H "Authorization: Bearer {token}" -F "file=@mydocs.zip"

4. Train a model: In the portal, choose your data, glossary, and click 'Train'. Monitor training progress in the 'Models' tab. 5. Deploy model: After training completes (status 'Deployable'), click 'Deploy'. The endpoint URL and category ID appear. 6. Test translation: Use the API with category parameter:

curl -X POST "https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=fr&category=mycategoryid" -H "Ocp-Apim-Subscription-Key: {key}" -H "Content-Type: application/json" -d "[{'Text':'Hello world'}]"

How It Interacts with Related Technologies

Custom Translator is part of the Translator Text API ecosystem. It works alongside: - Prebuilt Translator: The base model that Custom Translator fine-tunes. Without a custom model, the API uses the prebuilt model. - Translator Text API: The same API endpoint handles both standard and custom translations. The only difference is the category parameter. - Azure Cognitive Services: Custom Translator uses the same subscription key and region as other Cognitive Services. You can integrate it with Logic Apps, Power Automate, or custom applications via REST calls. - Azure DevOps: You can automate model training using CI/CD pipelines with the Custom Translator REST API. - Microsoft Translator Hub (legacy): The earlier version of Custom Translator, now deprecated. All new models must use Custom Translator.

Important Limits and Quotas

Document size: Each document can be up to 100 MB (uncompressed).

Number of documents per workspace: 1000.

Parallel sentences per training: Up to 10 million.

Glossary entries: Up to 50,000 entries.

Concurrent training jobs: 1 per workspace.

API rate limits: Standard Translator limits apply (e.g., S1 tier: 2000 requests per minute). Custom model calls count toward this limit.

Quality Evaluation

Custom Translator provides a BLEU (Bilingual Evaluation Understudy) score to evaluate model quality. The BLEU score ranges from 0 to 100 (higher is better). A score above 40 indicates good quality; above 60 is excellent. However, BLEU is a reference-based metric — it compares model output to a human reference translation. It does not measure fluency or adequacy perfectly, but it is the standard benchmark. You can also manually review sample translations from the test set in the portal.

Walk-Through

1

Prepare Parallel Documents

Collect or create sentence-aligned parallel documents in your source and target languages. Each source sentence must correspond exactly to one target sentence. For example, if your source file has 'Product X is launched.' then your target file must have the translation of that exact sentence. Use UTF-8 encoding. Common file formats: plain text (one sentence per line), TMX, or XLIFF. Ensure your data is clean — remove duplicate sentences, correct misalignments, and avoid HTML tags. Minimum 10,000 parallel sentences for meaningful training; 100,000+ for high quality. Also consider including domain-specific terminology in a glossary file (tab-separated source-target pairs).

2

Create Custom Translator Workspace

Sign in to the Custom Translator portal (portal.customtranslator.azure.com) with your Azure account. Create a workspace linked to your Translator resource. You will need the resource key and region. The workspace is a container for your projects, documents, models, and deployments. You can create multiple workspaces per subscription. Each workspace can have up to 10 models deployed simultaneously.

3

Upload Documents and Glossary

In the workspace, navigate to the 'Documents' tab and upload your source and target files. You can upload them as separate files (source.txt and target.txt) or as a single ZIP file containing both. The system will automatically detect and align sentences. For glossaries, upload a .txt file with tab-separated source and target terms. The glossary entries act as hard constraints — the model will always use the specified target term for the given source term. This is useful for brand names, product names, and fixed phrases.

4

Train the Custom Model

Go to the 'Models' tab and click 'Create new model'. Select your uploaded documents for training, tuning, and testing. The system auto-splits data (80/10/10) but you can override. Optionally attach a glossary. Choose a base model (usually the latest 'General' model). Click 'Train'. Training time varies: ~1 hour for 10k sentences, up to 24 hours for 1M+ sentences. Monitor progress in the portal. Once complete, the model status changes to 'Deployable' and a BLEU score is shown.

5

Deploy and Test the Model

After training, click 'Deploy' on the model. Deployment takes a few minutes. You will receive a category ID (a GUID). To test, use the Translator Text API with the category parameter set to that ID. For example: POST to https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=fr&category=your-category-id with your subscription key. The response will reflect your custom model's translations. You can also test interactively in the Custom Translator portal using the 'Test' tab.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Product Descriptions

A global online retailer sells products in 20 languages. Their product descriptions contain specific brand names, technical specifications, and marketing slogans. The prebuilt Translator often mistranslates brand names (e.g., 'Nike Air Max' becomes literal 'Nike Air Maximum') and uses inconsistent terminology for components like 'waterproof rating IPX7'. The retailer creates a Custom Translator model trained on 500,000 parallel sentences from their existing translated product catalog. They also upload a glossary with 2,000 brand names and technical terms. The custom model achieves a BLEU score of 65, up from 35 with the prebuilt model. After deployment, they integrate the API with their content management system. Every new product description is automatically translated using the custom endpoint. The custom model is retrained quarterly with new data. A common misconfiguration is forgetting to include the category parameter in API calls, which causes fallback to the generic model. Monitoring via Azure Monitor shows translation volume of 10 million characters per month, costing approximately $110 (standard $10 + surcharge $10 + training amortized).

Enterprise Scenario 2: Legal Document Translation

A law firm specializes in international contracts. They need precise translations of legal terms like 'force majeure', 'indemnification', and 'arbitration clause'. The prebuilt Translator often produces incorrect or ambiguous translations that could lead to legal liability. The firm trains a Custom Translator model on 200,000 parallel sentences from their archive of bilingual contracts. They also use a glossary with 500 legal terms and their approved translations. The model is deployed to a private endpoint. Lawyers use a custom web application that sends text to the API with the category ID. The firm also uses the 'dictionary' feature to verify specific term translations. A critical success factor is data quality — they spend significant effort aligning sentences perfectly, as misaligned data degrades model quality. The model is tested on a held-out set of 10,000 sentences and achieves BLEU 72. Without the custom model, they would need to hire human translators for every document, costing 10x more.

Common Pitfalls in Production

Insufficient Data: Training with less than 10,000 sentences often yields minimal improvement over the base model. Candidates often think 1,000 sentences is enough — it is not.

Misaligned Data: If source and target sentences are not perfectly aligned, the model learns incorrect mappings, reducing quality. Always use sentence-aligned files.

Overfitting: Using only a small, narrow dataset can cause the model to memorize rather than generalize. The tuning set helps, but diversifying training data is better.

Glossary Overuse: Adding too many glossary entries (e.g., every word) can make translations sound unnatural. Use glossaries only for fixed terms that must not vary.

Ignoring BLEU Score: A high BLEU score does not guarantee human-quality translation. Always do human evaluation for critical content.

How AI-900 Actually Tests This

What AI-900 Tests on Custom Translator

AI-900 objective 4.5 covers 'Identify capabilities of the Translation service' and specifically tests your understanding of when to use the prebuilt Translator vs. Custom Translator. The exam focuses on: - Purpose: Custom Translator is for domain-specific or organization-specific terminology and style. - Data Requirement: Minimum 10,000 parallel sentences. - Training Process: Fine-tuning a base model, not training from scratch. - Glossary: A bilingual dictionary for hard constraints. - Deployment: Using a custom category ID in API calls. - Comparison with other services: Custom Translator is different from Translator Text API (prebuilt) and from Azure AI Document Intelligence (which extracts text from documents).

Common Wrong Answers and Why Candidates Choose Them

1.

'Custom Translator trains a model from scratch.' — Wrong. Candidates assume 'custom' means building from zero, but it uses transfer learning on a prebuilt base model. The correct answer: it fine-tunes an existing model.

2.

'Custom Translator requires 1,000 parallel sentences.' — Wrong. The minimum is 10,000. 1,000 is too few for meaningful fine-tuning.

3.

'Custom Translator is used for real-time speech translation.' — Wrong. Custom Translator is for text translation only. Speech translation uses a different service (Speech Translation API).

4.

'You must retrain the model every time you add new terms.' — Wrong. You can use a glossary to add new terms without retraining. Retraining is only needed to improve overall model quality.

5.

'Custom Translator is the same as the Translator Text API.' — Wrong. The Translator Text API is the prebuilt service; Custom Translator is an add-on that customizes it.

Specific Numbers and Terms That Appear on the Exam

Minimum parallel sentences: 10,000

BLEU score range: 0 to 100

Glossary file format: tab-separated (TSV)

API parameter to use custom model: category

Pricing surcharge: $1 per million characters on top of standard Translator pricing

Training cost: $10 per hour

Edge Cases and Exceptions the Exam Loves

What if you have fewer than 10,000 sentences? The portal will warn you, but you can still train. However, quality improvement will be marginal. The exam expects you to know the minimum recommended is 10,000.

Can you use Custom Translator for language pairs not in the prebuilt list? No. Both languages must be supported by the Translator Text API.

What happens if you don't include the category parameter? The API uses the prebuilt model.

Can you deploy multiple custom models? Yes, but each must have a unique category ID.

Is Custom Translator available in all regions? No. It is available in specific Azure regions (e.g., West US, West Europe, Southeast Asia). Check documentation.

How to Eliminate Wrong Answers Using Underlying Mechanism

If a question asks about 'customizing translation for medical terminology', remember:

Custom Translator uses parallel documents (source-target sentence pairs) to learn terminology.

It does NOT use a medical-specific base model — it fine-tunes the general model.

It does NOT require you to provide a medical dictionary (though glossary helps).

It does NOT change the underlying neural network architecture.

When you see answer options like 'Trains a new model from scratch' or 'Uses a separate API endpoint', eliminate them because Custom Translator uses transfer learning and the same API endpoint with a category parameter.

Key Takeaways

Custom Translator fine-tunes Microsoft's prebuilt NMT model using your parallel sentences — it does not train from scratch.

Minimum recommended parallel sentences: 10,000. More data (100k+) yields better quality.

Glossary files provide hard constraints for specific term translations; use tab-separated plain text or TMX format.

To use a custom model, include the 'category' parameter with your model's category ID in Translator Text API calls.

BLEU score (0-100) measures translation quality; higher is better, but human evaluation is still needed.

Custom Translator is not for speech translation — it is text-only.

Training costs $10 per hour; translation costs standard rate plus $1 per million characters surcharge.

Only language pairs supported by Translator Text API can be customized.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Custom Translator

Requires parallel documents (min 10k sentences) for training

Fine-tunes a base model using your data

Uses a custom category ID to invoke in API calls

Better accuracy for domain-specific terminology

Additional cost: $10/hour training + $1/million char surcharge

Prebuilt Translator Text API

No training required, ready to use immediately

Uses a generic base model trained on general web data

No category parameter needed

May mistranslate domain-specific terms

Standard pricing: $10/million characters (S1 tier)

Watch Out for These

Mistake

Custom Translator trains a machine translation model from scratch using your data.

Correct

Custom Translator uses transfer learning from Microsoft's prebuilt NMT model. It fine-tunes the existing model with your parallel sentences, not trains from scratch. Training from scratch would require millions of sentences and weeks of compute.

Mistake

You need at least 100,000 parallel sentences to get any improvement.

Correct

The minimum recommended is 10,000 parallel sentences. Smaller datasets can still yield improvements, especially with a glossary. 100,000+ gives better quality but is not a hard requirement.

Mistake

Custom Translator replaces the Translator Text API entirely.

Correct

Custom Translator is an add-on to the Translator Text API. You still use the same API endpoint and subscription key. The custom model is invoked by adding a 'category' parameter to your API calls.

Mistake

Once deployed, a custom model cannot be updated without retraining.

Correct

You can update the glossary and retrain, but you can also deploy a new model version. Old models remain available until deleted. The glossary can be updated without retraining the whole model by providing a new glossary during training.

Mistake

Custom Translator works for any language pair, even unsupported ones.

Correct

Custom Translator only supports language pairs that are available in the Translator Text API. If a language is not supported by the prebuilt service, you cannot create a custom model for it.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the minimum number of parallel sentences required for Custom Translator?

The minimum recommended is 10,000 parallel sentences. While you can train with fewer, the quality improvement over the prebuilt model will be minimal. For best results, use 100,000 or more. The portal will warn if you upload less than 10,000.

How do I use a custom model after training?

After training, deploy the model in the Custom Translator portal. You will receive a category ID (a GUID). In your API calls to the Translator Text API, add the parameter 'category=your-category-id'. For example: POST https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=fr&category=abc123. Include your subscription key in the header.

Can I update a custom model without retraining?

You can update the glossary and retrain the model, which creates a new version. You cannot modify the model weights without retraining. However, you can deploy multiple versions and switch between them by changing the category ID in your API calls.

What is the difference between a glossary and training data?

Training data consists of parallel sentences that teach the model sentence-level translation patterns. A glossary is a list of individual term translations that act as hard constraints — the model will always use the specified target term for the source term. Glossaries are useful for brand names and fixed phrases, while training data improves overall fluency.

Does Custom Translator support real-time speech translation?

No. Custom Translator is for text translation only. For real-time speech translation, you would use the Speech Translation API, which also supports custom models but through a different process (Custom Speech and Custom Translator integration).

How is Custom Translator priced?

There are two costs: training and translation. Training costs $10 per hour of compute. Translation costs the standard Translator Text API rate (e.g., $10 per million characters for S1 tier) plus a $1 per million characters surcharge for using a custom model. So total cost is $11 per million characters for custom translations.

Can I use Custom Translator for any language pair?

Only for language pairs that are already supported by the Translator Text API. If a language is not in the Translator list, you cannot create a custom model for it. Check the official list of supported languages.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Custom Translator — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?