AI-900Chapter 57 of 100Objective 3.3

Face Detection vs Face Recognition

This chapter covers the critical distinction between face detection and face recognition, a topic that appears in approximately 10-15% of AI-900 exam questions under domain 3.3 (Computer Vision). You will learn the exact technical differences, Azure service implementations, and common exam traps. Mastering this chapter ensures you can confidently answer scenario-based questions about when to use each feature and how they interact with Azure AI services.

25 min read
Intermediate
Updated May 31, 2026

The Airport Security vs. Detective

Imagine an airport security checkpoint. Face detection is the camera and software that spots every person walking through the terminal. It identifies that a face is present in the frame—nothing more. It doesn't know who you are; it only knows that there is a face at coordinates (x, y). This is like the security officer who sees you approach and acknowledges your presence. Face recognition, on the other hand, is like a detective who not only sees you but also checks your ID against a database of known criminals. The detective compares your face (biometric template) to a list of wanted persons and either matches you or not. The key difference: detection says "a face is here," recognition says "this face belongs to Alice." In Azure, Face API can perform both tasks, but they require different endpoints and parameters. Detection uses the 'Detect' operation, while recognition uses 'Identify' or 'Verify' operations against a PersonGroup. Just as airport security can work without a detective, face detection can work without recognition, but recognition always requires detection first.

How It Actually Works

What is Face Detection?

Face detection is a computer vision technique that locates human faces in digital images or video streams. It outputs bounding box coordinates (x, y, width, height) for each detected face, along with optional attributes like head pose, age estimation, emotion, and facial landmarks (eye corners, nose tip, mouth). The Azure Face API's Detect operation returns these attributes when you specify the returnFaceAttributes parameter. Detection does not identify the person; it merely confirms the presence of a face.

How Face Detection Works Internally

The Face API uses deep neural networks trained on millions of labeled images. The algorithm scans the image at multiple scales (image pyramid) and uses a sliding window approach. At each window position, it runs a convolutional neural network (CNN) to classify whether the region contains a face. This is similar to the Viola-Jones algorithm but with modern deep learning. The output includes a confidence score (0 to 1) indicating the likelihood that the region is a face. Azure's default confidence threshold is 0.5, but you can adjust it using the returnFaceConfidence parameter.

Face Detection Attributes

When calling the Detect API, you can request specific attributes: - age: estimated age in years (float) - gender: male or female (string) - emotion: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise (dictionary with probability scores) - smile: smile intensity (0 to 1) - facialHair: beard, moustache, sideburns (each 0 to 1) - headPose: roll, yaw, pitch in degrees - glasses: NoGlasses, ReadingGlasses, Sunglasses, SwimmingGoggles - occlusion: eyeOccluded, foreheadOccluded, mouthOccluded (boolean) - blur: low, medium, high (with blur level) - exposure: underExposure, goodExposure, overExposure - noise: low, medium, high - accessories: headwear, glasses, mask (list) - qualityForRecognition: high, medium, low (indicates if the face is suitable for recognition)

What is Face Recognition?

Face recognition goes beyond detection by identifying or verifying a person's identity. It involves two main operations: Verify and Identify. Verification checks whether two faces belong to the same person (1:1 matching). Identification matches a detected face against a database of known persons (1:N matching). Both rely on generating a face template—a numerical vector (embedding) that uniquely represents facial features.

How Face Recognition Works

Recognition starts with detection: you must first detect the face to get its bounding box. Then, the Face API extracts a face template using a deep CNN that outputs a 128-dimensional vector (or 512-dimensional in newer versions). This vector is compared against stored templates using cosine similarity. The similarity score ranges from 0 to 1; a higher score indicates a closer match. Azure recommends a confidence threshold of 0.6 for identification (default) and 0.5 for verification. The PersonGroup stores multiple face templates per person (up to 248 faces per person, up to 10,000 persons per PersonGroup, with a maximum of 1,000,000 total faces per group).

Azure Services for Face Detection and Recognition

Azure provides the Face API under Cognitive Services. The key operations are: - Face - Detect: detects faces and returns attributes. - Face - Find Similar: finds similar faces in a face list. - Face - Group: divides faces into groups based on similarity. - Face - Identify: identifies a detected face against a PersonGroup. - Face - Verify: checks if two faces belong to the same person.

PersonGroup and PersonGroup Person

To use recognition, you must create a PersonGroup (or LargePersonGroup for up to 1 million persons). Each PersonGroup contains Person objects, and each Person has persisted face images. The PersonGroup Person - Add Face operation adds a face to a person. Then PersonGroup - Train must be called to generate templates. Training can take minutes for large groups. The Identify operation requires a trained PersonGroup.

Face Detection vs. Recognition: Key Differences

| Feature | Face Detection | Face Recognition | |---------|----------------|------------------| | Output | Bounding box, attributes | Person ID, confidence | | Requires Training | No | Yes (PersonGroup must be trained) | | Identifies Identity | No | Yes | | Uses Templates | No | Yes (extracts and compares) | | Azure Operation | Detect | Identify, Verify, Find Similar | | Pricing | Per transaction | Per transaction (same cost) |

Common Exam Scenarios

The AI-900 exam tests your ability to choose between detection and recognition based on the scenario: - Scenario A: A security camera needs to count the number of people entering a building. → Use face detection (no identity needed). - Scenario B: An app unlocks a door only for authorized employees. → Use face recognition (identify against known persons). - Scenario C: A photo album app groups photos of the same person. → Use face recognition with Find Similar or Group. - Scenario D: A system blurs faces in a video for privacy. → Use face detection to locate faces, then apply blur.

Pricing and Limits

Both detection and recognition are billed per 1,000 transactions. The Free tier allows 20 transactions per minute and 30K total per month. The S0 tier has no monthly limit but is rate-limited: 10 transactions per second for Detect and 10 for Identify. LargePersonGroups have higher limits but require more training time.

Integration with Other Azure Services

Face detection and recognition can be combined with: - Azure Cognitive Search: index detected faces for search. - Azure Video Indexer: detect and recognize faces in videos. - Azure Logic Apps: automate workflows based on face detection triggers. - Azure Custom Vision: not used for face recognition; use Face API specifically.

Code Example: Detect Faces

from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials

face_client = FaceClient(endpoint, CognitiveServicesCredentials(key))
with open('photo.jpg', 'rb') as image:
    detected_faces = face_client.face.detect_with_stream(image, return_face_attributes=['age','emotion'])
for face in detected_faces:
    print('Face at', face.face_rectangle.left, face.face_rectangle.top)

Code Example: Identify Face

# Assume person_group_id is trained
face_client.face.identify([detected_face.face_id], person_group_id)
# Returns list of candidates with person_id and confidence

Error Handling

Common errors: - BadArgument: image too large (>6MB) or no face detected. - RateLimitExceeded: too many requests per second. - PersonGroupNotTrained: Identify called before training completes.

Best Practices

For recognition, use high-quality images (frontal, well-lit, no occlusion).

Train PersonGroup with multiple angles (up to 248 per person).

Use qualityForRecognition attribute to filter low-quality faces.

Adjust confidence thresholds based on security needs (higher for stricter matching).

Walk-Through

1

Capture Image or Video Frame

The process begins with acquiring an image from a camera, file, or video stream. The image must be in JPEG, PNG, GIF, or BMP format, with a maximum size of 6 MB (or 4 MB for free tier). For video, frames are extracted at a specified rate (e.g., 1 FPS). The Face API accepts binary data or a URL. Ensure the image has adequate lighting and the face is not too small (minimum 36x36 pixels).

2

Detect Faces in the Image

Call the Face - Detect API with `returnFaceId=true` (needed for recognition) and optionally `returnFaceAttributes`. The API returns a list of detected face objects, each with a unique `faceId` (valid for 24 hours) and bounding box. If `returnFaceId=false`, only detection is performed. The API uses a confidence threshold (default 0.5) to filter false positives. You can also specify `recognitionModel` (recognition_01, recognition_02, etc.) to control accuracy vs. speed.

3

Extract Face Template (for Recognition)

If you requested `returnFaceId=true`, the Face API internally extracts a face template (a vector of numbers) from the detected face region. This template is not returned to the client; instead, the `faceId` serves as a reference. The template is used for all subsequent recognition operations. The extraction uses a deep neural network trained on millions of faces. The template is ephemeral for the `faceId` or persistent if added to a PersonGroup.

4

Create and Train PersonGroup

Before identification, you must create a PersonGroup (or LargePersonGroup) using the `PersonGroup - Create` operation. Then create Person objects within the group using `PersonGroup Person - Create`. For each person, add face images (up to 248) using `PersonGroup Person - Add Face`. Finally, call `PersonGroup - Train` to generate and store templates for all faces. Training is asynchronous; poll `PersonGroup - Get Training Status` until status is 'succeeded'. Training can take seconds to minutes depending on group size.

5

Identify or Verify Faces

With a trained PersonGroup and a detected face (with `faceId`), call `Face - Identify`. Pass the `faceId` and the `personGroupId`. The API compares the face template against all templates in the group and returns a list of candidate persons with confidence scores above the threshold (default 0.6). For verification, call `Face - Verify` with two `faceId`s (or one `faceId` and one `personId`). The API returns `isIdentical` (boolean) and `confidence` (float).

6

Post-Processing and Action

Based on recognition results, trigger an action: unlock a door, log access, display a name, etc. If using detection only, you may blur faces, count people, or analyze demographics. The results can be sent to Azure Logic Apps, Event Grid, or stored in Cosmos DB. For high-accuracy scenarios, implement a secondary verification (e.g., PIN) if confidence is below a custom threshold (e.g., 0.7).

What This Looks Like on the Job

Enterprise Scenario 1: Access Control for Office Buildings A multinational corporation deploys face recognition at building entrances to replace badge swiping. They use Azure Face API with a LargePersonGroup containing 5,000 employees, each with 5-10 face images. Cameras capture a frame when motion is detected. The Detect operation runs first to locate faces, then Identify matches against the group. The system is configured with a confidence threshold of 0.7 to minimize false positives. During peak hours, the system handles 10 requests per second, requiring careful rate limit management. A common issue is poor lighting causing low confidence; they added infrared cameras to improve image quality. Misconfiguration often occurs when PersonGroup is not retrained after adding new employees, leading to 'PersonGroupNotTrained' errors. The IT team uses Azure Monitor to track API latency and error rates.

Enterprise Scenario 2: Retail Customer Analytics A retail chain uses face detection (not recognition) to analyze foot traffic and customer demographics. Cameras at store entrances send frames to the Face API with returnFaceAttributes for age, gender, and emotion. This data is aggregated in Azure Data Lake for store layout optimization. They explicitly disable returnFaceId to avoid privacy concerns and reduce costs. The system processes 1,000 images per minute across 50 stores. A common pitfall is exceeding the free tier limit; they moved to S0 tier with a budget alert. They also use qualityForRecognition to filter blurry faces, though recognition is not used. The main challenge is handling occlusions (masks, sunglasses) which reduce detection accuracy. They trained a custom model using Custom Vision? No, for face detection, they stick with the pre-built Face API as it is optimized for faces.

Scenario 3: Photo Album Organization A cloud photo storage service uses face recognition to group photos by person. Users upload photos, and the backend calls Detect to find faces, then Find Similar to group them. They use a FaceList (temporary list) instead of PersonGroup for ad-hoc grouping. The API groups faces into clusters based on similarity. Users can then name each cluster. This avoids the need for training. A common issue is that the same person in different lighting may not cluster together; the service allows manual merging. They use the recognition_03 model for better accuracy. Performance is critical: they process 100,000 photos daily, so they use batch operations and async calls.

How AI-900 Actually Tests This

The AI-900 exam tests your ability to distinguish between face detection and face recognition in specific scenarios. Objective 3.3 states: 'Identify capabilities of computer vision solutions, including face detection and face recognition.' You will see scenario-based multiple-choice questions where you must select the correct service or operation.

Common Wrong Answers and Why Candidates Choose Them: 1. Choosing 'Face Recognition' when the scenario only needs detection. Example: 'Count the number of people in a room.' Candidates see 'face' and think recognition, but detection suffices. The key is whether identity is needed. 2. Thinking Face API is the same as Custom Vision. Custom Vision can detect objects but is not optimized for face recognition. The exam expects you to know that Face API is the dedicated service. 3. Confusing 'Identify' with 'Detect'. Identify requires a trained PersonGroup; Detect does not. If the scenario mentions 'known persons,' it's recognition. 4. Assuming face detection always returns emotions. Emotions are optional attributes; you must request them. The exam may ask which attribute is available.

Specific Numbers and Terms That Appear on the Exam: - Maximum faces per person in PersonGroup: 248 - Default confidence threshold for identification: 0.6 - Maximum persons per PersonGroup: 10,000 (or 1,000,000 for LargePersonGroup) - Valid faceId duration: 24 hours - Required operation for 1:N matching: Identify - Required operation for 1:1 matching: Verify

Edge Cases and Exceptions: - If multiple faces are detected, Identify returns results for each face independently. - Verify can accept two faceIds OR a faceId and a personId (from a trained PersonGroup). - The 'Find Similar' operation uses a FaceList (not PersonGroup) and returns similar faces based on appearance, not identity. - Face detection can be performed on images with multiple faces; it returns all faces.

How to Eliminate Wrong Answers: - If the question asks about 'identifying a person,' look for keywords like 'known,' 'database,' 'authorized,' 'employee.' That points to recognition. - If the question asks about 'counting,' 'blurring,' 'location,' without identity, it's detection. - If the question mentions 'training,' it must be recognition. - Remember: Detection is a prerequisite for recognition, but not vice versa.

Exam Tip: The exam may show a diagram of the Face API workflow. Know that detection comes first, then optionally recognition. Also, know that the Face API is part of Cognitive Services, not Azure Machine Learning.

Key Takeaways

Face detection locates faces and returns bounding boxes; it does not identify individuals.

Face recognition identifies or verifies a person against a known set of faces (PersonGroup).

Detection is always a prerequisite for recognition; you must first detect a face to get a faceId.

The Face API's Identify operation uses a 1:N matching against a trained PersonGroup.

The default confidence threshold for identification is 0.6; for verification it is 0.5.

A PersonGroup can hold up to 10,000 persons (1,000,000 with LargePersonGroup), each with up to 248 faces.

Face detection can return attributes like age, emotion, and head pose when requested.

The faceId generated by detection expires after 24 hours.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Face Detection

Outputs bounding box and optional attributes (age, emotion, etc.)

Does not require any prior setup or training

Cannot determine identity; only says 'a face is here'

Use case: counting people, blurring faces, demographic analysis

API call: Face - Detect with returnFaceId=false

Face Recognition

Outputs person ID and confidence score

Requires a trained PersonGroup with registered faces

Can identify or verify a person's identity

Use case: access control, photo tagging, security

API call: Face - Identify or Face - Verify

Watch Out for These

Mistake

Face detection can identify a person by name.

Correct

Face detection only locates faces; it does not identify who they are. Identification requires face recognition using the Identify or Verify operations against a trained PersonGroup.

Mistake

Face recognition always returns the person's name.

Correct

Recognition returns a person ID (GUID) that you must map to a name in your application. The Face API does not store names; you manage that mapping.

Mistake

You must use Custom Vision for face detection.

Correct

Azure provides a dedicated Face API for face detection and recognition. Custom Vision can detect faces but is not optimized and does not provide facial attributes or recognition.

Mistake

Face detection and face recognition are the same cost.

Correct

Both are billed per transaction at the same rate, but recognition requires additional operations (training, adding faces) that may incur costs. The per-call cost is identical.

Mistake

You can identify a face without training a PersonGroup.

Correct

Identification requires a trained PersonGroup. Without training, the API will return a 'PersonGroupNotTrained' error. The training step generates face templates.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between face detection and face recognition in Azure?

Face detection is a computer vision operation that finds human faces in an image and returns their locations (bounding boxes) and optional attributes like age or emotion. It does not identify who the person is. Face recognition goes a step further by matching a detected face against a database of known faces to identify or verify a person's identity. In Azure, the Face API provides both capabilities: the Detect operation for detection, and Identify/Verify for recognition. Recognition requires a trained PersonGroup with registered faces.

Do I need to train a model for face detection?

No. Face detection in Azure Face API uses a pre-trained model. You can call the Detect operation immediately without any training. However, for face recognition (Identify or Verify), you must create and train a PersonGroup by adding face images of known persons and calling the Train operation. Training generates face templates used for matching.

What is the default confidence threshold for face identification?

The default confidence threshold for the Identify operation is 0.6. This means the API will only return candidates with a similarity score of 0.6 or higher. You can adjust this threshold using the `confidenceThreshold` parameter. For verification, the default is 0.5.

Can I use face recognition without first detecting faces?

No. Face recognition requires a faceId, which is obtained from the Detect operation. You must first detect faces in an image to get faceIds, then pass those faceIds to Identify or Verify. The detection step is always necessary.

What is the maximum number of faces I can add per person in a PersonGroup?

You can add up to 248 face images per person in a PersonGroup. This allows the model to learn variations in appearance (lighting, angle, expression) for better accuracy. Adding more than 248 will result in an error.

How long is a faceId valid?

A faceId returned by the Detect operation is valid for 24 hours. After that, it expires and cannot be used for recognition. You must re-detect the face to obtain a new faceId. This is a security measure to prevent misuse.

Can I use Custom Vision for face recognition?

Custom Vision can detect objects, including faces, but it is not optimized for face recognition. Azure recommends using the dedicated Face API for face detection and recognition because it provides specialized features like facial landmarks, attributes, and high-accuracy recognition models.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Face Detection vs Face Recognition — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Done with this chapter?