This chapter covers Azure AI Foundry (formerly Azure AI Hub), the unified platform for building, evaluating, and deploying generative AI solutions at scale. For the AI-900 exam, this topic falls under Domain 5: Generative AI, Objective 5.2 (Describe capabilities of Azure AI Foundry). Approximately 5-10% of exam questions touch on this area. Understanding Azure AI Foundry is critical because it is the primary tool for operationalizing generative AI models in Azure, and the exam tests your knowledge of its components, workflow, and integration with other Azure AI services.
Jump to a section
Imagine a smart factory that produces AI solutions instead of physical products. Azure AI Foundry is the factory floor itself — the physical space and infrastructure. It has designated areas for different stages: a raw materials storage (data sources), a design studio (AI Studio), an assembly line (model training and fine-tuning), a quality control station (evaluation and testing), and a shipping dock (deployment). The factory floor manager (the Azure AI Foundry portal) oversees everything, ensuring that each workstation has the right tools (compute, storage, connections) and that the workflow from raw data to finished AI model is smooth. Just as a factory floor provides the environment, utilities, and logistics for manufacturing, Azure AI Foundry provides the unified environment, security, and resource management for building, training, and deploying AI models. You don't build the factory floor each time you make a new product — you reuse the same floor for multiple production lines. Similarly, Azure AI Foundry is a shared platform that hosts multiple projects (AI hubs) and their associated resources, allowing teams to collaborate efficiently while maintaining governance and cost control.
What is Azure AI Foundry?
Azure AI Foundry is a unified platform that brings together various Azure AI services — including Azure OpenAI Service, Azure AI Search, Azure AI Studio, and model catalog — into a single, integrated experience. It provides a centralized hub for managing the entire lifecycle of generative AI applications: from data ingestion and model selection to fine-tuning, evaluation, deployment, and monitoring. Azure AI Foundry was previously known as Azure AI Hub (the name changed in late 2024). The exam may still reference the old name, so be aware of both.
Why Azure AI Foundry Exists
Before Azure AI Foundry, building a generative AI application often required stitching together multiple separate services: you'd use Azure OpenAI Service for the model, Azure AI Search for retrieval-augmented generation (RAG), Azure Functions for serverless logic, and Azure Monitor for observability. This fragmented approach led to complexity in managing connections, security, and costs. Azure AI Foundry solves this by providing a unified workspace where all these components are pre-integrated, with shared security policies, a common data foundation, and streamlined deployment pipelines. The primary goal is to reduce the time from idea to production-ready AI application.
How Azure AI Foundry Works Internally
Azure AI Foundry is built on a hierarchical resource model:
Azure AI Foundry Portal: The web-based UI for managing projects, models, and deployments.
AI Hub (formerly AI Hub): The top-level resource that acts as a container for all AI projects within an organization. It provides shared networking, security, and governance settings. Each AI Hub has an associated Azure Storage account for storing data, models, and logs, and an Azure Key Vault for managing secrets.
AI Projects: Sub-resources within an AI Hub. Each project corresponds to a specific application or use case. Projects inherit the hub's security and networking configuration but can have their own data connections, model deployments, and evaluations.
AI Studio: The development environment within the portal where you can experiment with models, build prompts, and test RAG flows. AI Studio is essentially the 'design studio' of the factory.
Model Catalog: A repository of pre-trained models from Microsoft, Meta, OpenAI, Hugging Face, and others. Models can be deployed as serverless APIs or managed compute endpoints.
Connections: Managed data sources (e.g., Azure AI Search, Azure Blob Storage, Azure SQL Database) that can be used for grounding data in RAG scenarios.
Deployments: Endpoints that serve the model for inference. Deployments can be real-time (online) or batch.
When you create an AI Hub, Azure automatically provisions:
A Storage account (standard tier, locally redundant storage by default)
A Key Vault (standard tier)
An Application Insights instance (for monitoring)
A Container Registry (for storing custom containers if needed)
A Cognitive Services resource (for unified API keys)
These resources are created in the same region as the hub and are managed by the hub. You can also bring your own resources if you have compliance requirements.
Key Components, Values, and Defaults
Hub SKU: There are two tiers: Free (F0) and Standard (S0). The free tier is limited to one project and 1,000 transactions per month. The standard tier is required for production workloads.
Storage Account: Default is Standard_LRS (locally redundant storage). For high availability, you can change to GRS or RA-GRS after creation.
Key Vault: Standard tier. Soft-delete and purge protection are enabled by default.
Application Insights: Pay-as-you-go. Default sampling rate is 100% (all telemetry collected).
Container Registry: Basic tier by default. For production, consider Standard or Premium for geo-replication.
Model Deployment Quotas: Each region has quotas for model deployment capacity. For example, GPT-4 deployments are limited by the number of provisioned throughput units (PTUs) available in that region. Default quota for GPT-4 is often 0 PTUs; you must request an increase.
Token Limits: For serverless deployments, the default token limit per minute varies by model. For GPT-4o, the default is 450,000 tokens per minute (TPM) in many regions. This can be increased via quota request.
Configuration and Verification Commands
You can create an AI Hub using the Azure CLI. The following command creates a hub named 'myaihub' in the East US region with standard tier:
az ml hub create --name myaihub --resource-group myrg --location eastus --kind defaultTo list all hubs in a resource group:
az ml hub list --resource-group myrg --output tableTo create a project within a hub:
az ml project create --name myproject --hub-name myaihub --resource-group myrgTo deploy a model from the catalog (e.g., GPT-4o-mini) as a serverless endpoint:
az ml model deploy --name gpt4o-mini-deploy --model azureml://registries/azure-openai/models/gpt-4o-mini/versions/1 --endpoint-name myendpoint --resource-group myrg --workspace-name myprojectTo verify the deployment status:
az ml endpoint show --name myendpoint --resource-group myrg --workspace-name myprojectHow Azure AI Foundry Interacts with Related Technologies
Azure OpenAI Service: Azure AI Foundry uses Azure OpenAI as the underlying model serving infrastructure. When you deploy a model like GPT-4o from the catalog, it actually creates an Azure OpenAI resource in the background. The hub manages the API keys and endpoint URLs.
Azure AI Search: For RAG scenarios, you can connect an Azure AI Search index as a 'data source' in the project. The hub handles the connection string and authentication.
Azure Machine Learning: Azure AI Foundry is built on top of Azure Machine Learning. Each project is essentially a Machine Learning workspace with additional generative AI capabilities. You can use ML pipelines for training and evaluation.
Azure Monitor: All telemetry from deployed models (e.g., request latency, token usage, errors) is sent to Application Insights, which is part of Azure Monitor. You can set up alerts for anomalies.
Azure Policy: You can apply Azure Policy to enforce compliance rules on AI Hubs, such as requiring specific encryption keys or restricting regions.
Create an AI Hub
Navigate to the Azure AI Foundry portal (ai.azure.com) or use Azure CLI. Choose a subscription, resource group, region, and hub name. Select the tier (Free or Standard). Azure automatically provisions the associated resources: Storage account, Key Vault, Application Insights, Container Registry, and a Cognitive Services resource. This takes about 2-3 minutes. The hub becomes the security boundary for all projects within it. All projects inherit the hub's networking configuration (public or private endpoint) and encryption settings.
Create an AI Project
Within the hub, create a project for each distinct application or use case. Provide a name and description. Optionally, you can specify a 'hub' (the parent) and a 'compute' target if you plan to do fine-tuning (otherwise serverless is used). The project inherits the hub's storage, key vault, and other resources. You can also create project-specific data connections. The project workspace is where you will manage models, datasets, and deployments. Each project has its own Application Insights resource for monitoring.
Select and Deploy a Model
In the Model Catalog, browse available models (e.g., GPT-4o, GPT-4o-mini, Llama 3, Mistral). Select a model and click 'Deploy'. Choose the deployment type: 'Serverless API' (pay-per-token) or 'Managed Compute' (pay for the underlying VM). For serverless, you specify the deployment name and optionally request a quota increase for higher throughput. The deployment takes about 5-10 minutes. Once deployed, you get an endpoint URL and API key. You can test the model directly in the playground.
Configure Data Connections for RAG
To ground the model with your own data, add a data connection. In the project, go to 'Data connections' and select a source: Azure AI Search, Azure Blob Storage, Azure SQL Database, or upload files directly. For Azure AI Search, you need an existing search service and index. The connection stores the endpoint and authentication method (API key or managed identity). Then, in the playground or in a prompt flow, you can reference this data source to perform retrieval-augmented generation. The hub manages the connection securely via Key Vault.
Evaluate and Monitor the Deployment
After deployment, use the built-in evaluation tools to test the model's performance. You can run manual tests or automated evaluations using predefined metrics (e.g., groundedness, relevance, fluency). All inference requests are logged to Application Insights. You can view metrics like token consumption, latency, error rates, and user feedback. Set up alerts for thresholds (e.g., error rate > 5%). The hub also provides cost analysis per project. If you need to update the model, you can create a new deployment and shift traffic gradually.
Scenario 1: Enterprise Customer Support Chatbot
A large financial services company wants to build a customer support chatbot that answers questions about account balances, transaction history, and loan products. They use Azure AI Foundry to manage the project. They create an AI Hub in the East US region with Standard tier. Within the hub, they create a project called 'CustomerSupportBot'. They deploy GPT-4o-mini as the base model (serverless, with a quota of 1M TPM). They connect to an Azure AI Search index containing their knowledge base of FAQ documents and policy manuals. The data connection uses managed identity for secure access. The chatbot is deployed as a web app using Azure App Service, calling the endpoint via the API key stored in Key Vault. The team uses the evaluation tools to run 100 test questions and achieve a groundedness score of 95%. They set up Application Insights alerts for latency > 2 seconds and error rate > 1%. The system handles about 10,000 requests per day with an average latency of 800 ms.
Scenario 2: Internal Code Generation Assistant
A technology company wants to provide its developers with an AI assistant that can generate code snippets and answer technical questions based on internal code repositories. They create an AI Hub in West Europe. They deploy a Llama 3-70B model from the catalog on managed compute (Standard_NC24ads_A100_v4) for lower latency and data residency. They connect to an Azure Blob Storage container that holds the internal codebase. They use prompt flow in Azure AI Studio to create a custom prompt that includes the retrieved code context. The assistant is integrated into Microsoft Teams via a bot. The team notices that the model sometimes generates code that resembles open-source libraries with restrictive licenses; they add a content filter to block such outputs. They also set up a custom evaluation metric for code correctness using unit tests.
Common Pitfalls and Misconfigurations
Using Free tier for production: The Free tier has a 1,000 transaction per month limit, which is easily exceeded. Always use Standard for production.
Not requesting quota early: For popular models like GPT-4, default quotas are often zero. You must submit a quota request through Azure portal, which can take 1-2 business days. Plan ahead.
Incorrect networking: If you enable private endpoints on the hub, all projects inherit that setting. Ensure that your development machines have network access (e.g., via VPN or jump box).
Overlooking cost management: Serverless deployments can incur high costs if not monitored. Set budget alerts and consider using provisioned throughput for predictable workloads.
What AI-900 Tests on Azure AI Foundry (Objective 5.2)
The exam focuses on the capabilities and components of Azure AI Foundry, not on deep configuration details. You should know:
The purpose of Azure AI Foundry: unified platform for building, evaluating, and deploying generative AI applications.
The key components: AI Hub, AI Project, AI Studio, Model Catalog, Connections, Deployments.
The difference between Azure AI Foundry and individual services like Azure OpenAI Service.
The workflow: create hub → create project → select model → connect data → deploy → monitor.
The integration with Azure AI Search for RAG.
The ability to use pre-built models from the catalog or bring your own model.
Common Wrong Answers and Why Candidates Choose Them
'Azure AI Foundry is the same as Azure OpenAI Service' — Wrong because Azure OpenAI Service is only one of the services integrated into Foundry. Foundry also includes model catalog, data connections, evaluation, and monitoring. Candidates confuse the two because Foundry uses Azure OpenAI for model serving.
'You must use Azure AI Search with Azure AI Foundry' — Wrong. While Azure AI Search is commonly used for RAG, it is optional. You can use other data sources or no data at all. The exam tests that connections are flexible.
'Azure AI Foundry only supports OpenAI models' — Wrong. The model catalog includes models from Meta, Hugging Face, and others. The exam may ask which models are available.
'Azure AI Foundry replaces Azure Machine Learning' — Wrong. Foundry is built on top of Azure Machine Learning and uses ML workspaces as projects. They coexist.
Specific Numbers and Terms That Appear on the Exam
Model Catalog: Know that it contains both Microsoft and third-party models.
AI Hub: The top-level resource with shared security and networking.
AI Project: A workspace within a hub for a specific application.
Serverless API: Pay-per-token deployment.
Managed Compute: Pay for VM time.
Quota: Default token limits per minute (e.g., 450K TPM for GPT-4o).
Content Filters: Built-in safety filters (e.g., hate, violence, sexual content).
Evaluation Metrics: Groundedness, relevance, fluency.
Edge Cases and Exceptions
Free tier limitations: Only one project, 1,000 transactions/month. Exam may ask which tier is suitable for production.
Region availability: Not all models are available in all regions. For example, GPT-4o may not be in certain sovereign clouds.
Data residency: When using managed compute, data stays in the selected region; serverless may process data in other regions for safety filtering.
Private endpoints: If enabled on the hub, all projects must use private networking; this can break web app deployments if not configured correctly.
How to Eliminate Wrong Answers
If the question mentions 'unified platform for generative AI lifecycle', the answer is Azure AI Foundry.
If it mentions 'deploying a model as a serverless API', the answer is Azure AI Foundry or Azure OpenAI Service; but if it also mentions 'evaluation' or 'data connections', it's Foundry.
If it asks for 'the top-level resource in Foundry', the answer is AI Hub.
If it asks for 'the tool for building prompts and testing', the answer is AI Studio.
If it asks about 'retrieval-augmented generation', the answer is Azure AI Search integrated via Foundry connections.
Azure AI Foundry is a unified platform for building, evaluating, and deploying generative AI applications.
The top-level resource is the AI Hub, which contains projects, storage, key vault, and monitoring.
Each project inherits the hub's security and networking settings.
The Model Catalog includes models from Microsoft, OpenAI, Meta, Hugging Face, and more.
Deployments can be serverless (pay-per-token) or managed compute (pay-per-VM).
RAG is supported via data connections to Azure AI Search, Blob Storage, SQL Database, etc.
Default token limit for GPT-4o is 450,000 TPM (varies by region).
Free tier is limited to 1,000 transactions/month and one project.
Evaluation metrics include groundedness, relevance, and fluency.
Content filters are built-in and can be configured per project.
These come up on the exam all the time. Here's how to tell them apart.
Azure AI Foundry
Unified platform for entire generative AI lifecycle (build, evaluate, deploy, monitor).
Includes model catalog with multiple providers.
Provides built-in data connections for RAG (Azure AI Search, Blob, etc.).
Offers evaluation tools and content filters.
Supports both serverless and managed compute deployments.
Azure OpenAI Service
Focused solely on serving OpenAI models (GPT-4, GPT-4o, etc.).
No model catalog — only OpenAI models.
Does not include built-in data connections; you must integrate separately.
No evaluation tools; you must use separate services.
Only serverless (pay-per-token) deployment.
Mistake
Azure AI Foundry is just a new name for Azure Machine Learning.
Correct
Azure AI Foundry is a higher-level platform that uses Azure Machine Learning as a foundation but adds generative AI-specific features like model catalog, playground, evaluation tools, and pre-built connections for RAG. They are not the same.
Mistake
You can only use OpenAI models in Azure AI Foundry.
Correct
The Model Catalog includes models from multiple sources, including Meta (Llama), Hugging Face, and Microsoft. You can also bring your own custom model.
Mistake
Azure AI Foundry requires you to write code to deploy models.
Correct
You can deploy models entirely through the graphical interface in AI Studio. Code is optional, though CLI and SDK are available for automation.
Mistake
Each AI Project requires its own Storage account and Key Vault.
Correct
Projects share the Storage account and Key Vault of their parent AI Hub. This reduces cost and simplifies management.
Mistake
Azure AI Foundry is only for generative AI and not for traditional ML.
Correct
While it emphasizes generative AI, you can also use it for traditional ML models via the underlying Azure Machine Learning capabilities.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure AI Foundry is the overall platform (the portal and resource hierarchy). Azure AI Studio is the development environment within the Foundry portal where you experiment with models, build prompts, and test RAG flows. Think of Foundry as the factory and Studio as the design lab.
Yes. You can register a custom model (e.g., fine-tuned model) in the Model Catalog and deploy it as a managed compute endpoint. You can also bring a model containerized with Docker.
In your project, go to 'Data connections' and add a new connection of type 'Azure AI Search'. You need the search service name and authentication method (API key or managed identity). Once added, you can reference the search index in prompt flows.
The default is 450,000 tokens per minute (TPM) in many regions. This can be increased by requesting a quota increase via the Azure portal. The exam may ask for this number.
No. It is available in most major regions, but some models may not be available in all regions. For example, GPT-4o is not available in all sovereign clouds. Check the Azure region availability documentation.
No. You need an Azure subscription to create AI Hubs and projects. However, you can sign up for a free trial with $200 credit.
The hub will stop serving requests until the next billing period or until you upgrade to Standard tier. You will receive an error message.
You've just covered Azure AI Foundry (Azure AI Hub) — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?