This chapter covers the critical distinction between face detection and face recognition, a topic that appears in approximately 10-15% of AI-900 exam questions under domain 3.3 (Computer Vision). You will learn the exact technical differences, Azure service implementations, and common exam traps. Mastering this chapter ensures you can confidently answer scenario-based questions about when to use each feature and how they interact with Azure AI services.
Jump to a section
Imagine an airport security checkpoint. Face detection is the camera and software that spots every person walking through the terminal. It identifies that a face is present in the frame—nothing more. It doesn't know who you are; it only knows that there is a face at coordinates (x, y). This is like the security officer who sees you approach and acknowledges your presence. Face recognition, on the other hand, is like a detective who not only sees you but also checks your ID against a database of known criminals. The detective compares your face (biometric template) to a list of wanted persons and either matches you or not. The key difference: detection says "a face is here," recognition says "this face belongs to Alice." In Azure, Face API can perform both tasks, but they require different endpoints and parameters. Detection uses the 'Detect' operation, while recognition uses 'Identify' or 'Verify' operations against a PersonGroup. Just as airport security can work without a detective, face detection can work without recognition, but recognition always requires detection first.
What is Face Detection?
Face detection is a computer vision technique that locates human faces in digital images or video streams. It outputs bounding box coordinates (x, y, width, height) for each detected face, along with optional attributes like head pose, age estimation, emotion, and facial landmarks (eye corners, nose tip, mouth). The Azure Face API's Detect operation returns these attributes when you specify the returnFaceAttributes parameter. Detection does not identify the person; it merely confirms the presence of a face.
How Face Detection Works Internally
The Face API uses deep neural networks trained on millions of labeled images. The algorithm scans the image at multiple scales (image pyramid) and uses a sliding window approach. At each window position, it runs a convolutional neural network (CNN) to classify whether the region contains a face. This is similar to the Viola-Jones algorithm but with modern deep learning. The output includes a confidence score (0 to 1) indicating the likelihood that the region is a face. Azure's default confidence threshold is 0.5, but you can adjust it using the returnFaceConfidence parameter.
Face Detection Attributes
When calling the Detect API, you can request specific attributes:
- age: estimated age in years (float)
- gender: male or female (string)
- emotion: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise (dictionary with probability scores)
- smile: smile intensity (0 to 1)
- facialHair: beard, moustache, sideburns (each 0 to 1)
- headPose: roll, yaw, pitch in degrees
- glasses: NoGlasses, ReadingGlasses, Sunglasses, SwimmingGoggles
- occlusion: eyeOccluded, foreheadOccluded, mouthOccluded (boolean)
- blur: low, medium, high (with blur level)
- exposure: underExposure, goodExposure, overExposure
- noise: low, medium, high
- accessories: headwear, glasses, mask (list)
- qualityForRecognition: high, medium, low (indicates if the face is suitable for recognition)
What is Face Recognition?
Face recognition goes beyond detection by identifying or verifying a person's identity. It involves two main operations: Verify and Identify. Verification checks whether two faces belong to the same person (1:1 matching). Identification matches a detected face against a database of known persons (1:N matching). Both rely on generating a face template—a numerical vector (embedding) that uniquely represents facial features.
How Face Recognition Works
Recognition starts with detection: you must first detect the face to get its bounding box. Then, the Face API extracts a face template using a deep CNN that outputs a 128-dimensional vector (or 512-dimensional in newer versions). This vector is compared against stored templates using cosine similarity. The similarity score ranges from 0 to 1; a higher score indicates a closer match. Azure recommends a confidence threshold of 0.6 for identification (default) and 0.5 for verification. The PersonGroup stores multiple face templates per person (up to 248 faces per person, up to 10,000 persons per PersonGroup, with a maximum of 1,000,000 total faces per group).
Azure Services for Face Detection and Recognition
Azure provides the Face API under Cognitive Services. The key operations are:
- Face - Detect: detects faces and returns attributes.
- Face - Find Similar: finds similar faces in a face list.
- Face - Group: divides faces into groups based on similarity.
- Face - Identify: identifies a detected face against a PersonGroup.
- Face - Verify: checks if two faces belong to the same person.
PersonGroup and PersonGroup Person
To use recognition, you must create a PersonGroup (or LargePersonGroup for up to 1 million persons). Each PersonGroup contains Person objects, and each Person has persisted face images. The PersonGroup Person - Add Face operation adds a face to a person. Then PersonGroup - Train must be called to generate templates. Training can take minutes for large groups. The Identify operation requires a trained PersonGroup.
Face Detection vs. Recognition: Key Differences
| Feature | Face Detection | Face Recognition | |---------|----------------|------------------| | Output | Bounding box, attributes | Person ID, confidence | | Requires Training | No | Yes (PersonGroup must be trained) | | Identifies Identity | No | Yes | | Uses Templates | No | Yes (extracts and compares) | | Azure Operation | Detect | Identify, Verify, Find Similar | | Pricing | Per transaction | Per transaction (same cost) |
Common Exam Scenarios
The AI-900 exam tests your ability to choose between detection and recognition based on the scenario: - Scenario A: A security camera needs to count the number of people entering a building. → Use face detection (no identity needed). - Scenario B: An app unlocks a door only for authorized employees. → Use face recognition (identify against known persons). - Scenario C: A photo album app groups photos of the same person. → Use face recognition with Find Similar or Group. - Scenario D: A system blurs faces in a video for privacy. → Use face detection to locate faces, then apply blur.
Pricing and Limits
Both detection and recognition are billed per 1,000 transactions. The Free tier allows 20 transactions per minute and 30K total per month. The S0 tier has no monthly limit but is rate-limited: 10 transactions per second for Detect and 10 for Identify. LargePersonGroups have higher limits but require more training time.
Integration with Other Azure Services
Face detection and recognition can be combined with: - Azure Cognitive Search: index detected faces for search. - Azure Video Indexer: detect and recognize faces in videos. - Azure Logic Apps: automate workflows based on face detection triggers. - Azure Custom Vision: not used for face recognition; use Face API specifically.
Code Example: Detect Faces
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
face_client = FaceClient(endpoint, CognitiveServicesCredentials(key))
with open('photo.jpg', 'rb') as image:
detected_faces = face_client.face.detect_with_stream(image, return_face_attributes=['age','emotion'])
for face in detected_faces:
print('Face at', face.face_rectangle.left, face.face_rectangle.top)Code Example: Identify Face
# Assume person_group_id is trained
face_client.face.identify([detected_face.face_id], person_group_id)
# Returns list of candidates with person_id and confidenceError Handling
Common errors:
- BadArgument: image too large (>6MB) or no face detected.
- RateLimitExceeded: too many requests per second.
- PersonGroupNotTrained: Identify called before training completes.
Best Practices
For recognition, use high-quality images (frontal, well-lit, no occlusion).
Train PersonGroup with multiple angles (up to 248 per person).
Use qualityForRecognition attribute to filter low-quality faces.
Adjust confidence thresholds based on security needs (higher for stricter matching).
Capture Image or Video Frame
The process begins with acquiring an image from a camera, file, or video stream. The image must be in JPEG, PNG, GIF, or BMP format, with a maximum size of 6 MB (or 4 MB for free tier). For video, frames are extracted at a specified rate (e.g., 1 FPS). The Face API accepts binary data or a URL. Ensure the image has adequate lighting and the face is not too small (minimum 36x36 pixels).
Detect Faces in the Image
Call the Face - Detect API with `returnFaceId=true` (needed for recognition) and optionally `returnFaceAttributes`. The API returns a list of detected face objects, each with a unique `faceId` (valid for 24 hours) and bounding box. If `returnFaceId=false`, only detection is performed. The API uses a confidence threshold (default 0.5) to filter false positives. You can also specify `recognitionModel` (recognition_01, recognition_02, etc.) to control accuracy vs. speed.
Extract Face Template (for Recognition)
If you requested `returnFaceId=true`, the Face API internally extracts a face template (a vector of numbers) from the detected face region. This template is not returned to the client; instead, the `faceId` serves as a reference. The template is used for all subsequent recognition operations. The extraction uses a deep neural network trained on millions of faces. The template is ephemeral for the `faceId` or persistent if added to a PersonGroup.
Create and Train PersonGroup
Before identification, you must create a PersonGroup (or LargePersonGroup) using the `PersonGroup - Create` operation. Then create Person objects within the group using `PersonGroup Person - Create`. For each person, add face images (up to 248) using `PersonGroup Person - Add Face`. Finally, call `PersonGroup - Train` to generate and store templates for all faces. Training is asynchronous; poll `PersonGroup - Get Training Status` until status is 'succeeded'. Training can take seconds to minutes depending on group size.
Identify or Verify Faces
With a trained PersonGroup and a detected face (with `faceId`), call `Face - Identify`. Pass the `faceId` and the `personGroupId`. The API compares the face template against all templates in the group and returns a list of candidate persons with confidence scores above the threshold (default 0.6). For verification, call `Face - Verify` with two `faceId`s (or one `faceId` and one `personId`). The API returns `isIdentical` (boolean) and `confidence` (float).
Post-Processing and Action
Based on recognition results, trigger an action: unlock a door, log access, display a name, etc. If using detection only, you may blur faces, count people, or analyze demographics. The results can be sent to Azure Logic Apps, Event Grid, or stored in Cosmos DB. For high-accuracy scenarios, implement a secondary verification (e.g., PIN) if confidence is below a custom threshold (e.g., 0.7).
Enterprise Scenario 1: Access Control for Office Buildings A multinational corporation deploys face recognition at building entrances to replace badge swiping. They use Azure Face API with a LargePersonGroup containing 5,000 employees, each with 5-10 face images. Cameras capture a frame when motion is detected. The Detect operation runs first to locate faces, then Identify matches against the group. The system is configured with a confidence threshold of 0.7 to minimize false positives. During peak hours, the system handles 10 requests per second, requiring careful rate limit management. A common issue is poor lighting causing low confidence; they added infrared cameras to improve image quality. Misconfiguration often occurs when PersonGroup is not retrained after adding new employees, leading to 'PersonGroupNotTrained' errors. The IT team uses Azure Monitor to track API latency and error rates.
Enterprise Scenario 2: Retail Customer Analytics
A retail chain uses face detection (not recognition) to analyze foot traffic and customer demographics. Cameras at store entrances send frames to the Face API with returnFaceAttributes for age, gender, and emotion. This data is aggregated in Azure Data Lake for store layout optimization. They explicitly disable returnFaceId to avoid privacy concerns and reduce costs. The system processes 1,000 images per minute across 50 stores. A common pitfall is exceeding the free tier limit; they moved to S0 tier with a budget alert. They also use qualityForRecognition to filter blurry faces, though recognition is not used. The main challenge is handling occlusions (masks, sunglasses) which reduce detection accuracy. They trained a custom model using Custom Vision? No, for face detection, they stick with the pre-built Face API as it is optimized for faces.
Scenario 3: Photo Album Organization
A cloud photo storage service uses face recognition to group photos by person. Users upload photos, and the backend calls Detect to find faces, then Find Similar to group them. They use a FaceList (temporary list) instead of PersonGroup for ad-hoc grouping. The API groups faces into clusters based on similarity. Users can then name each cluster. This avoids the need for training. A common issue is that the same person in different lighting may not cluster together; the service allows manual merging. They use the recognition_03 model for better accuracy. Performance is critical: they process 100,000 photos daily, so they use batch operations and async calls.
The AI-900 exam tests your ability to distinguish between face detection and face recognition in specific scenarios. Objective 3.3 states: 'Identify capabilities of computer vision solutions, including face detection and face recognition.' You will see scenario-based multiple-choice questions where you must select the correct service or operation.
Common Wrong Answers and Why Candidates Choose Them: 1. Choosing 'Face Recognition' when the scenario only needs detection. Example: 'Count the number of people in a room.' Candidates see 'face' and think recognition, but detection suffices. The key is whether identity is needed. 2. Thinking Face API is the same as Custom Vision. Custom Vision can detect objects but is not optimized for face recognition. The exam expects you to know that Face API is the dedicated service. 3. Confusing 'Identify' with 'Detect'. Identify requires a trained PersonGroup; Detect does not. If the scenario mentions 'known persons,' it's recognition. 4. Assuming face detection always returns emotions. Emotions are optional attributes; you must request them. The exam may ask which attribute is available.
Specific Numbers and Terms That Appear on the Exam: - Maximum faces per person in PersonGroup: 248 - Default confidence threshold for identification: 0.6 - Maximum persons per PersonGroup: 10,000 (or 1,000,000 for LargePersonGroup) - Valid faceId duration: 24 hours - Required operation for 1:N matching: Identify - Required operation for 1:1 matching: Verify
Edge Cases and Exceptions: - If multiple faces are detected, Identify returns results for each face independently. - Verify can accept two faceIds OR a faceId and a personId (from a trained PersonGroup). - The 'Find Similar' operation uses a FaceList (not PersonGroup) and returns similar faces based on appearance, not identity. - Face detection can be performed on images with multiple faces; it returns all faces.
How to Eliminate Wrong Answers: - If the question asks about 'identifying a person,' look for keywords like 'known,' 'database,' 'authorized,' 'employee.' That points to recognition. - If the question asks about 'counting,' 'blurring,' 'location,' without identity, it's detection. - If the question mentions 'training,' it must be recognition. - Remember: Detection is a prerequisite for recognition, but not vice versa.
Exam Tip: The exam may show a diagram of the Face API workflow. Know that detection comes first, then optionally recognition. Also, know that the Face API is part of Cognitive Services, not Azure Machine Learning.
Face detection locates faces and returns bounding boxes; it does not identify individuals.
Face recognition identifies or verifies a person against a known set of faces (PersonGroup).
Detection is always a prerequisite for recognition; you must first detect a face to get a faceId.
The Face API's Identify operation uses a 1:N matching against a trained PersonGroup.
The default confidence threshold for identification is 0.6; for verification it is 0.5.
A PersonGroup can hold up to 10,000 persons (1,000,000 with LargePersonGroup), each with up to 248 faces.
Face detection can return attributes like age, emotion, and head pose when requested.
The faceId generated by detection expires after 24 hours.
These come up on the exam all the time. Here's how to tell them apart.
Face Detection
Outputs bounding box and optional attributes (age, emotion, etc.)
Does not require any prior setup or training
Cannot determine identity; only says 'a face is here'
Use case: counting people, blurring faces, demographic analysis
API call: Face - Detect with returnFaceId=false
Face Recognition
Outputs person ID and confidence score
Requires a trained PersonGroup with registered faces
Can identify or verify a person's identity
Use case: access control, photo tagging, security
API call: Face - Identify or Face - Verify
Mistake
Face detection can identify a person by name.
Correct
Face detection only locates faces; it does not identify who they are. Identification requires face recognition using the Identify or Verify operations against a trained PersonGroup.
Mistake
Face recognition always returns the person's name.
Correct
Recognition returns a person ID (GUID) that you must map to a name in your application. The Face API does not store names; you manage that mapping.
Mistake
You must use Custom Vision for face detection.
Correct
Azure provides a dedicated Face API for face detection and recognition. Custom Vision can detect faces but is not optimized and does not provide facial attributes or recognition.
Mistake
Face detection and face recognition are the same cost.
Correct
Both are billed per transaction at the same rate, but recognition requires additional operations (training, adding faces) that may incur costs. The per-call cost is identical.
Mistake
You can identify a face without training a PersonGroup.
Correct
Identification requires a trained PersonGroup. Without training, the API will return a 'PersonGroupNotTrained' error. The training step generates face templates.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Face detection is a computer vision operation that finds human faces in an image and returns their locations (bounding boxes) and optional attributes like age or emotion. It does not identify who the person is. Face recognition goes a step further by matching a detected face against a database of known faces to identify or verify a person's identity. In Azure, the Face API provides both capabilities: the Detect operation for detection, and Identify/Verify for recognition. Recognition requires a trained PersonGroup with registered faces.
No. Face detection in Azure Face API uses a pre-trained model. You can call the Detect operation immediately without any training. However, for face recognition (Identify or Verify), you must create and train a PersonGroup by adding face images of known persons and calling the Train operation. Training generates face templates used for matching.
The default confidence threshold for the Identify operation is 0.6. This means the API will only return candidates with a similarity score of 0.6 or higher. You can adjust this threshold using the `confidenceThreshold` parameter. For verification, the default is 0.5.
No. Face recognition requires a faceId, which is obtained from the Detect operation. You must first detect faces in an image to get faceIds, then pass those faceIds to Identify or Verify. The detection step is always necessary.
You can add up to 248 face images per person in a PersonGroup. This allows the model to learn variations in appearance (lighting, angle, expression) for better accuracy. Adding more than 248 will result in an error.
A faceId returned by the Detect operation is valid for 24 hours. After that, it expires and cannot be used for recognition. You must re-detect the face to obtain a new faceId. This is a security measure to prevent misuse.
Custom Vision can detect objects, including faces, but it is not optimized for face recognition. Azure recommends using the dedicated Face API for face detection and recognition because it provides specialized features like facial landmarks, attributes, and high-accuracy recognition models.
You've just covered Face Detection vs Face Recognition — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?