This chapter covers face attribute detection and emotion recognition using Azure Cognitive Services Face API. These capabilities are part of the Computer Vision domain (Objective 3.3) and appear in approximately 10-15% of AI-900 exam questions. You will learn the exact attributes that can be detected, how emotion detection works under the hood, and how to call the Face API to retrieve these insights. Mastery of this topic is essential because Microsoft frequently tests your ability to distinguish between face detection, face identification, and attribute extraction.
Jump to a section
Imagine a border control checkpoint where every traveler must present a passport. The passport contains structured fields: name, date of birth, gender, and a photo. The officer scans the passport and compares the photo to the traveler's face. In Azure Face API, detecting face attributes is like automatically reading those passport fields from a live photo. The service first locates the face (the photo), then extracts attributes such as age (date of birth), gender, emotion (mood indicator), and facial hair (like a beard descriptor). Emotion detection is like an additional stamp that says "happy" or "sad," inferred from the arrangement of facial muscles (the geometry of landmarks). The officer doesn't guess—they use a standardized set of categories. Similarly, Azure's emotion model uses a fixed set of eight emotions: happiness, sadness, surprise, anger, fear, disgust, contempt, and neutral. The confidence scores for each emotion are like the officer's certainty level. If the officer is 95% sure the traveler is happy, that's a high-confidence prediction. The system returns a JSON object with these scores, exactly as a passport scan returns structured data. The key difference: the Azure service can also detect attributes like accessories (glasses, mask) or blur—like noting if the passport photo is smudged. This mechanistic process of extracting predefined fields from an image is what the AI-900 exam expects you to understand.
What Are Face Attributes and Emotion Detection?
Face attributes are structured data points extracted from a detected face in an image. They go beyond simply locating a face (bounding box) and include characteristics like age, gender, emotion, facial hair, glasses, and more. Emotion detection is a subset of attribute extraction that classifies facial expressions into predefined emotional states. On the AI-900 exam, you are expected to know which attributes the Face API can return and how emotion confidence scores work.
The Mechanism: How the Face API Extracts Attributes
The Face API uses deep neural networks trained on millions of labeled faces. The process has three steps:
Face Detection: First, the API finds all faces in the image and returns bounding box coordinates. This is a prerequisite for attribute extraction.
Landmark Detection: The API identifies 27 facial landmarks (e.g., eye corners, nose tip, mouth edges). These points are critical for aligning the face and normalizing for pose.
Attribute Classification: Using the aligned face, separate classifiers predict each attribute. For example, an age regressor outputs a single number; a gender classifier outputs a binary label with confidence; an emotion classifier outputs confidence scores for eight emotion categories.
Key Attributes Available
The Face API can return the following attributes:
age: Estimated age in years (float).
gender: 'male' or 'female'.
smile: Smile intensity from 0 to 1 (float).
facialHair: Object with 'moustache', 'beard', 'sideburns' (each 0-1).
glasses: 'NoGlasses', 'ReadingGlasses', 'Sunglasses', 'SwimmingGoggles'.
headPose: Roll, yaw, pitch angles in degrees.
emotion: Object with confidence scores for: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise. Each score is between 0 and 1, and they sum to 1 (or nearly 1 due to rounding).
hair: Object with 'bald' (0-1) and 'hairColor' array.
makeup: 'eyeMakeup' and 'lipMakeup' booleans.
occlusion: 'foreheadOccluded', 'eyeOccluded', 'mouthOccluded' booleans.
accessories: Array of objects with 'type' (e.g., 'glasses', 'headwear', 'mask') and 'confidence'.
blur: 'blurLevel' ('low', 'medium', 'high') and 'value' (0-1).
exposure: 'exposureLevel' ('goodExposure', 'overExposure', 'underExposure') and 'value'.
noise: 'noiseLevel' ('low', 'medium', 'high') and 'value'.
Emotion Detection Deep Dive
Emotion detection is based on the Facial Action Coding System (FACS), which maps facial muscle movements (Action Units) to emotions. The Azure model uses a convolutional neural network (CNN) trained on a large dataset of labeled expressions. The output is a probability distribution over eight emotion classes. Importantly, the scores are not mutually exclusive—a face could show both surprise and fear. However, the API returns confidence scores that sum to 1. The highest score indicates the most likely emotion.
Exam tip: The AI-900 exam will ask which emotions are supported. The list is exactly: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise. Contempt is often the one candidates forget.
API Call Example
To retrieve attributes, you must specify them in the request. Here's a sample REST call using curl:
curl -v -X POST "https://<your-endpoint>/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,emotion" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-key>" \
-d '{
"url": "https://example.com/photo.jpg"
}'The response will include a JSON array of face objects. Each face object contains:
{
"faceId": "abc123",
"faceRectangle": {
"top": 100,
"left": 200,
"width": 150,
"height": 150
},
"faceAttributes": {
"age": 30.5,
"gender": "female",
"emotion": {
"anger": 0.001,
"contempt": 0.002,
"disgust": 0.001,
"fear": 0.003,
"happiness": 0.95,
"neutral": 0.03,
"sadness": 0.01,
"surprise": 0.002
}
}
}Note: If you don't specify returnFaceAttributes, no attributes are returned. The exam often tests this configuration requirement.
Interaction with Face Identification
Attributes are separate from face identification. Face identification requires a PersonGroup and compares a detected face against enrolled faces. Attributes can be extracted during detection without any enrollment. However, you can combine both—for example, detect a face, get its attributes, and then identify it. The exam will ask you to differentiate: detection finds faces and optionally returns attributes; identification matches faces to known persons.
Pricing and Limits
The Face API is priced per transaction. Each API call counts as one transaction regardless of how many faces are in the image. However, there is a limit on the number of faces returned per image (default 100). For attribute extraction, the same transaction cost applies. The exam may test that attribute extraction does not incur additional cost beyond the detection call.
Regional Availability
The Face API is available in many Azure regions. However, some features like emotion detection may not be available in all regions due to compliance. For the exam, know that the Face API is generally available in West US, East US, West Europe, Southeast Asia, and others.
Responsible AI Considerations
Microsoft has retired the use of Face API for emotion detection in certain scenarios due to privacy and bias concerns. As of June 2023, new customers cannot use the Face API to infer emotional states; existing customers have limited access. The AI-900 exam reflects this: you should know that emotion detection is a sensitive capability and Microsoft restricts its use. However, the exam still tests the technical understanding of how it works.
Common Pitfalls
Not specifying attributes: If you don't include returnFaceAttributes, you get only faceId and rectangle.
Assuming attributes are always returned: Attributes may be missing if the face is too small, blurred, or occluded.
Confusing emotion with sentiment: Emotion is from facial expressions; sentiment analysis is from text.
Thinking age is precise: Age is an estimate; the exam may ask if it's exact (no).
Create Face API Resource
In the Azure portal, create a Cognitive Services resource of type 'Face'. Choose a pricing tier (F0 for free, S0 for standard). Note the endpoint and subscription key. This is your entry point for all Face API calls. The exam expects you to know that you need a Face resource, not a generic Cognitive Services resource, although the generic resource also works if Face is enabled.
Prepare Image Input
The image must be JPEG, PNG, GIF, BMP, or TIFF. Minimum size is 36x36 pixels. Maximum size is 4 MB. The image can be provided as a URL or as binary data in the request body. The API will detect up to 100 faces per image. For attribute extraction, faces should be at least 200x200 pixels for best accuracy.
Call Detect with Attributes
Send a POST request to the detect endpoint with the required parameters. The URL format is: `https://{endpoint}/face/v1.0/detect`. Include query parameters: `returnFaceId` (true/false), `returnFaceLandmarks` (true/false), and `returnFaceAttributes` (comma-separated list of attributes you want). For example: `returnFaceAttributes=age,gender,emotion`. The request header must include `Ocp-Apim-Subscription-Key` and `Content-Type`. The body contains the image URL or binary data.
Parse Response JSON
The API returns a JSON array. Each object has a faceId (if requested), faceRectangle (top, left, width, height), and faceAttributes (if requested). The emotion object contains eight confidence scores. The sum of these scores is approximately 1. The highest score indicates the predicted emotion. For example, if happiness is 0.95, the person is likely happy. The exam may ask you to interpret such output.
Handle Errors and Edge Cases
Common errors: 400 if image is invalid or too large; 401 if subscription key is wrong; 403 if rate limit exceeded (20 per second for S0 tier). If no face is detected, an empty array is returned. If attributes cannot be extracted for a face (e.g., too blurred), the attribute object may be missing or have default values. The exam tests understanding of these error conditions.
Enterprise Scenario 1: Retail Customer Sentiment Analysis
A large retail chain wants to gauge customer reactions to new product displays. They install cameras at eye level near displays. The Face API is called on each frame to detect faces and extract emotion attributes. The system aggregates emotion data over time to measure happiness and surprise levels. This helps the marketing team decide which displays are most engaging. In production, the system must handle high throughput—up to 30 frames per second from multiple cameras. The solution uses Azure Functions to process images asynchronously and stores results in Cosmos DB. A common misconfiguration is not setting returnFaceAttributes to include 'emotion', resulting in no emotion data. Also, the system must comply with privacy regulations; faces are not stored, only aggregated emotion counts. When misconfigured, the system might return neutral for all faces if the image quality is poor (e.g., low light). The team learned to preprocess images to enhance brightness before calling the API.
Enterprise Scenario 2: Access Control with Liveness Detection
A financial institution uses face detection to verify identity at ATMs. They use Face API to detect a face and extract attributes like glasses and facial hair to compare with a stored profile. However, they need to prevent spoofing with photos. They combine attribute extraction with liveness detection (not part of Face API) to ensure a real person. In this scenario, attribute extraction helps filter out faces that don't match the expected features (e.g., wrong gender). Performance considerations: the API must respond within 2 seconds to avoid user frustration. The team uses the S0 tier with a dedicated endpoint. A common mistake is to assume that the Face API includes liveness detection—it does not. The exam may test that liveness detection is a separate feature.
Enterprise Scenario 3: Social Media Photo Tagging
A social media platform uses Face API to automatically tag users in photos. They first detect faces and extract attributes like age and gender to suggest tags. Then they use face identification to match against friends. The platform processes millions of photos daily. They use Azure Blob Storage to store images and Azure Functions to orchestrate. They found that attribute extraction sometimes fails for faces with heavy makeup or unusual angles. They mitigated by requesting multiple attributes and using the ones with highest confidence. Cost is a major factor—each call costs money, so they cache results for identical images. The exam may ask about cost optimization: you can reduce costs by not requesting attributes you don't need.
What AI-900 Tests on This Topic
Objective 3.3: "Identify computer vision capabilities" includes face detection, attribute extraction, and emotion recognition. Specific sub-objectives: describe capabilities of the Face API (detect, identify, verify, find similar), and list attributes that can be detected (age, gender, emotion, etc.). The exam expects you to know:
The exact list of detectable emotions (eight: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise).
That attributes are optional and must be requested via returnFaceAttributes.
That emotion detection returns confidence scores that sum to 1.
That age is an estimate, not an exact value.
That face identification is different from attribute extraction.
Most Common Wrong Answers
Wrong emotion list: Candidates often include 'boredom' or 'excitement'. The exam uses only the eight standard emotions. The trap: a question might list 'surprise' and 'fear' but also 'disgust'—all correct. But if they include 'confusion', that's wrong.
Assuming age is exact: A question might ask if age is returned as an integer. The correct answer is a float (estimated). The wrong answer says 'exact age'.
Confusing detection with identification: A scenario describes finding a person in a crowd. Many choose 'face detection' but the correct answer is 'face identification' because it matches against a known set.
Forgetting to specify attributes: A question asks what is returned if you call detect without returnFaceAttributes. The wrong answer includes attributes; the correct answer is only faceId and rectangle.
Numbers and Terms That Appear Verbatim
27 facial landmarks
8 emotions
100 faces per image limit
4 MB maximum image size
36x36 minimum image size
returnFaceAttributes parameter
faceId (string)
faceRectangle (top, left, width, height)
Edge Cases the Exam Loves
Image with multiple faces: The API returns an array of face objects. Each face has its own attributes.
No face detected: Returns an empty array.
Face partially occluded: Attributes may be missing or have lower confidence.
Emotion scores sum to 1: If a candidate thinks they sum to 100, they are wrong (they are 0-1).
How to Eliminate Wrong Answers
If a question asks which emotion is NOT supported, check for 'contempt' or 'surprise'—these are supported. 'Boredom' is not.
If a question asks what is required to get attributes, look for 'returnFaceAttributes' in the answer choices.
If a question mentions matching a face to a database, it's identification, not detection.
If a question asks about age, remember it's an estimate (float).
Face detection is the first step; attribute extraction is optional and must be requested.
The eight emotions supported are: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise.
Emotion confidence scores are between 0 and 1 and sum to 1.
Age is an estimated float, not an exact integer.
Face identification requires a PersonGroup; detection does not.
Maximum 100 faces can be detected per image.
Attributes like glasses, facial hair, and makeup are also available.
The Face API is a RESTful service; you call it via HTTP POST.
Image size must be between 36x36 and 4 MB.
Liveness detection is not part of the Face API.
These come up on the exam all the time. Here's how to tell them apart.
Face Detection
Finds faces in an image
Returns bounding box and faceId
Does not require a PersonGroup
Can optionally return attributes
One-to-many: one image to many faces
Face Identification
Matches a detected face to known persons
Requires a PersonGroup with enrolled faces
Returns a personId and confidence
Does not return attributes (unless combined)
Many-to-one: many faces to one person
Mistake
Face detection and face identification are the same thing.
Correct
Face detection finds faces in an image and returns bounding boxes and optional attributes. Face identification matches a detected face against a database of known persons (PersonGroup). They are separate capabilities; identification requires a prior enrollment step.
Mistake
The Face API can detect emotions from text.
Correct
Emotion detection in Face API is based on facial expressions in images. Text-based sentiment analysis is a different service (Text Analytics). The exam tests that Face API works with images only.
Mistake
Age is returned as an exact integer.
Correct
Age is returned as a floating-point number (e.g., 30.5), representing an estimate. It is not guaranteed to be accurate.
Mistake
Emotion detection returns a single emotion label.
Correct
It returns confidence scores for eight emotions. The highest score indicates the most likely emotion, but all scores are provided.
Mistake
You must use a separate API call for each attribute.
Correct
You can request multiple attributes in a single call by comma-separating them in the returnFaceAttributes parameter. This reduces cost and latency.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Face detection locates faces in an image and returns their bounding box coordinates and an optional faceId. It can also extract attributes like age and emotion. Face identification compares a detected face against a database of known persons (PersonGroup) to find a match. Detection is a prerequisite for identification. The exam often tests this distinction: detection finds faces; identification matches them.
The Face API can detect eight emotions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. Each emotion is returned as a confidence score between 0 and 1. The sum of all scores for a face is approximately 1. The highest score indicates the predicted emotion. The exam expects you to know this exact list.
You must include the query parameter `returnFaceAttributes` in your API request, with a comma-separated list of attributes you want. For example: `returnFaceAttributes=age,gender,emotion`. If omitted, no attributes are returned. The exam tests this configuration requirement.
No, the age is an estimate returned as a floating-point number. It is not guaranteed to be accurate. The exam may ask if it's exact or approximate; the correct answer is approximate.
Yes, the Face API includes a Verify operation that takes two faceIds and returns a confidence score indicating whether they are the same person. This is different from identification, which matches against a group.
The Face API supports JPEG, PNG, GIF, BMP, and TIFF. The minimum image size is 36x36 pixels, and the maximum file size is 4 MB.
No, the Face API does not include liveness detection. Liveness detection is a separate capability that determines if a face is from a real person or a spoof. The exam may test that this is not part of the Face API.
You've just covered Face Attributes and Emotion Detection — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.
Done with this chapter?