AI-900Chapter 58 of 100Objective 3.3

Face Attributes and Emotion Detection

How can you extract face attributes and recognise emotions with the Azure Cognitive Services Face API? These capabilities are part of the Computer Vision domain (Objective 3.3) and appear in approximately 10-15% of AI-900 exam questions. You will learn the exact attributes that can be detected, how emotion detection works under the hood, and how to call the Face API to retrieve these insights. Mastery of this topic is essential because Microsoft frequently tests your ability to distinguish between face detection, face identification, and attribute extraction.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Face Attributes as a Biometric Passport

A passport is an identity document that every traveller must present at border control. The passport contains structured fields: name, date of birth, gender, and a photo. The officer scans the passport and compares the photo to the traveler's face. In Azure Face API, detecting face attributes is like automatically reading those passport fields from a live photo. The service first locates the face (the photo), then extracts attributes such as age (date of birth), gender, emotion (mood indicator), and facial hair (like a beard descriptor). Emotion detection is like an additional stamp that says "happy" or "sad," inferred from the arrangement of facial muscles (the geometry of landmarks). The officer doesn't guess—they use a standardized set of categories. Similarly, Azure's emotion model uses a fixed set of eight emotions: happiness, sadness, surprise, anger, fear, disgust, contempt, and neutral. The confidence scores for each emotion are like the officer's certainty level. If the officer is 95% sure the traveler is happy, that's a high-confidence prediction. The system returns a JSON object with these scores, exactly as a passport scan returns structured data. The key difference: the Azure service can also detect attributes like accessories (glasses, mask) or blur—like noting if the passport photo is smudged. This mechanistic process of extracting predefined fields from an image is what the AI-900 exam expects you to understand.

How It Actually Works

What Are Face Attributes and Emotion Detection?

Face attributes are structured data points extracted from a detected face in an image. They go beyond simply locating a face (bounding box) and include characteristics like age, gender, emotion, facial hair, glasses, and more. Emotion detection is a subset of attribute extraction that classifies facial expressions into predefined emotional states. On the AI-900 exam, you are expected to know which attributes the Face API can return and how emotion confidence scores work.

The Mechanism: How the Face API Extracts Attributes

The Face API uses deep neural networks trained on millions of labeled faces. The process has three steps:

Face Detection: First, the API finds all faces in the image and returns bounding box coordinates. This is a prerequisite for attribute extraction.

Landmark Detection: The API identifies 27 facial landmarks (e.g., eye corners, nose tip, mouth edges). These points are critical for aligning the face and normalizing for pose.

Attribute Classification: Using the aligned face, separate classifiers predict each attribute. For example, an age regressor outputs a single number; a gender classifier outputs a binary label with confidence; an emotion classifier outputs confidence scores for eight emotion categories.

Key Attributes Available

The Face API can return the following attributes:

age: Estimated age in years (float).

gender: 'male' or 'female'.

smile: Smile intensity from 0 to 1 (float).

facialHair: Object with 'moustache', 'beard', 'sideburns' (each 0-1).

glasses: 'NoGlasses', 'ReadingGlasses', 'Sunglasses', 'SwimmingGoggles'.

headPose: Roll, yaw, pitch angles in degrees.

emotion: Object with confidence scores for: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise. Each score is between 0 and 1, and they sum to 1 (or nearly 1 due to rounding).

hair: Object with 'bald' (0-1) and 'hairColor' array.

makeup: 'eyeMakeup' and 'lipMakeup' booleans.

occlusion: 'foreheadOccluded', 'eyeOccluded', 'mouthOccluded' booleans.

accessories: Array of objects with 'type' (e.g., 'glasses', 'headwear', 'mask') and 'confidence'.

blur: 'blurLevel' ('low', 'medium', 'high') and 'value' (0-1).

exposure: 'exposureLevel' ('goodExposure', 'overExposure', 'underExposure') and 'value'.

noise: 'noiseLevel' ('low', 'medium', 'high') and 'value'.

Emotion Detection Deep Dive

Emotion detection is based on the Facial Action Coding System (FACS), which maps facial muscle movements (Action Units) to emotions. The Azure model uses a convolutional neural network (CNN) trained on a large dataset of labeled expressions. The output is a probability distribution over eight emotion classes. Importantly, the scores are not mutually exclusive—a face could show both surprise and fear. However, the API returns confidence scores that sum to 1. The highest score indicates the most likely emotion.

Exam tip: The AI-900 exam will ask which emotions are supported. The list is exactly: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise. Contempt is often the one candidates forget.

API Call Example

To retrieve attributes, you must specify them in the request. Here's a sample REST call using curl:

curl -v -X POST "https://<your-endpoint>/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,emotion" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-key>" \
-d '{
  "url": "https://example.com/photo.jpg"
}'

The response will include a JSON array of face objects. Each face object contains:

{
  "faceId": "abc123",
  "faceRectangle": {
    "top": 100,
    "left": 200,
    "width": 150,
    "height": 150
  },
  "faceAttributes": {
    "age": 30.5,
    "gender": "female",
    "emotion": {
      "anger": 0.001,
      "contempt": 0.002,
      "disgust": 0.001,
      "fear": 0.003,
      "happiness": 0.95,
      "neutral": 0.03,
      "sadness": 0.01,
      "surprise": 0.002
    }
  }
}

Note: If you don't specify returnFaceAttributes, no attributes are returned. The exam often tests this configuration requirement.

Interaction with Face Identification

Attributes are separate from face identification. Face identification requires a PersonGroup and compares a detected face against enrolled faces. Attributes can be extracted during detection without any enrollment. However, you can combine both—for example, detect a face, get its attributes, and then identify it. The exam will ask you to differentiate: detection finds faces and optionally returns attributes; identification matches faces to known persons.

Pricing and Limits

The Face API is priced per transaction. Each API call counts as one transaction regardless of how many faces are in the image. However, there is a limit on the number of faces returned per image (default 100). For attribute extraction, the same transaction cost applies. The exam may test that attribute extraction does not incur additional cost beyond the detection call.

Regional Availability

The Face API is available in many Azure regions. However, some features like emotion detection may not be available in all regions due to compliance. For the exam, know that the Face API is generally available in West US, East US, West Europe, Southeast Asia, and others.

Responsible AI Considerations

Microsoft has retired the use of Face API for emotion detection in certain scenarios due to privacy and bias concerns. As of June 2023, new customers cannot use the Face API to infer emotional states; existing customers have limited access. The AI-900 exam reflects this: you should know that emotion detection is a sensitive capability and Microsoft restricts its use. However, the exam still tests the technical understanding of how it works.

Common Pitfalls

Not specifying attributes: If you don't include returnFaceAttributes, you get only faceId and rectangle.

Assuming attributes are always returned: Attributes may be missing if the face is too small, blurred, or occluded.

Confusing emotion with sentiment: Emotion is from facial expressions; sentiment analysis is from text.

Thinking age is precise: Age is an estimate; the exam may ask if it's exact (no).

Walk-Through

Create Face API Resource

In the Azure portal, create a Cognitive Services resource of type 'Face'. Choose a pricing tier (F0 for free, S0 for standard). Note the endpoint and subscription key. This is your entry point for all Face API calls. The exam expects you to know that you need a Face resource, not a generic Cognitive Services resource, although the generic resource also works if Face is enabled.

Prepare Image Input

The image must be JPEG, PNG, GIF, BMP, or TIFF. Minimum size is 36x36 pixels. Maximum size is 4 MB. The image can be provided as a URL or as binary data in the request body. The API will detect up to 100 faces per image. For attribute extraction, faces should be at least 200x200 pixels for best accuracy.

Call Detect with Attributes

Send a POST request to the detect endpoint with the required parameters. The URL format is: `https://{endpoint}/face/v1.0/detect`. Include query parameters: `returnFaceId` (true/false), `returnFaceLandmarks` (true/false), and `returnFaceAttributes` (comma-separated list of attributes you want). For example: `returnFaceAttributes=age,gender,emotion`. The request header must include `Ocp-Apim-Subscription-Key` and `Content-Type`. The body contains the image URL or binary data.

Parse Response JSON

The API returns a JSON array. Each object has a faceId (if requested), faceRectangle (top, left, width, height), and faceAttributes (if requested). The emotion object contains eight confidence scores. The sum of these scores is approximately 1. The highest score indicates the predicted emotion. For example, if happiness is 0.95, the person is likely happy. The exam may ask you to interpret such output.

Handle Errors and Edge Cases

Common errors: 400 if image is invalid or too large; 401 if subscription key is wrong; 403 if rate limit exceeded (20 per second for S0 tier). If no face is detected, an empty array is returned. If attributes cannot be extracted for a face (e.g., too blurred), the attribute object may be missing or have default values. The exam tests understanding of these error conditions.

What This Looks Like on the Job

Enterprise Scenario 1: Retail Customer Sentiment Analysis

A large retail chain wants to gauge customer reactions to new product displays. They install cameras at eye level near displays. The Face API is called on each frame to detect faces and extract emotion attributes. The system aggregates emotion data over time to measure happiness and surprise levels. This helps the marketing team decide which displays are most engaging. In production, the system must handle high throughput—up to 30 frames per second from multiple cameras. The solution uses Azure Functions to process images asynchronously and stores results in Cosmos DB. A common misconfiguration is not setting returnFaceAttributes to include 'emotion', resulting in no emotion data. Also, the system must comply with privacy regulations; faces are not stored, only aggregated emotion counts. When misconfigured, the system might return neutral for all faces if the image quality is poor (e.g., low light). The team learned to preprocess images to enhance brightness before calling the API.

Enterprise Scenario 2: Access Control with Liveness Detection

A financial institution uses face detection to verify identity at ATMs. They use Face API to detect a face and extract attributes like glasses and facial hair to compare with a stored profile. However, they need to prevent spoofing with photos. They combine attribute extraction with liveness detection (not part of Face API) to ensure a real person. In this scenario, attribute extraction helps filter out faces that don't match the expected features (e.g., wrong gender). Performance considerations: the API must respond within 2 seconds to avoid user frustration. The team uses the S0 tier with a dedicated endpoint. A common mistake is to assume that the Face API includes liveness detection—it does not. The exam may test that liveness detection is a separate feature.

Enterprise Scenario 3: Social Media Photo Tagging

A social media platform uses Face API to automatically tag users in photos. They first detect faces and extract attributes like age and gender to suggest tags. Then they use face identification to match against friends. The platform processes millions of photos daily. They use Azure Blob Storage to store images and Azure Functions to orchestrate. They found that attribute extraction sometimes fails for faces with heavy makeup or unusual angles. They mitigated by requesting multiple attributes and using the ones with highest confidence. Cost is a major factor—each call costs money, so they cache results for identical images. The exam may ask about cost optimization: you can reduce costs by not requesting attributes you don't need.

How AI-900 Actually Tests This

What AI-900 Tests on This Topic

Objective 3.3: "Identify computer vision capabilities" includes face detection, attribute extraction, and emotion recognition. Specific sub-objectives: describe capabilities of the Face API (detect, identify, verify, find similar), and list attributes that can be detected (age, gender, emotion, etc.). The exam expects you to know:

The exact list of detectable emotions (eight: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise).

That attributes are optional and must be requested via returnFaceAttributes.

That emotion detection returns confidence scores that sum to 1.

That age is an estimate, not an exact value.

That face identification is different from attribute extraction.

Most Common Wrong Answers

Wrong emotion list: Candidates often include 'boredom' or 'excitement'. The exam uses only the eight standard emotions. The trap: a question might list 'surprise' and 'fear' but also 'disgust'—all correct. But if they include 'confusion', that's wrong.

Assuming age is exact: A question might ask if age is returned as an integer. The correct answer is a float (estimated). The wrong answer says 'exact age'.

Confusing detection with identification: A scenario describes finding a person in a crowd. Many choose 'face detection' but the correct answer is 'face identification' because it matches against a known set.

Forgetting to specify attributes: A question asks what is returned if you call detect without returnFaceAttributes. The wrong answer includes attributes; the correct answer is only faceId and rectangle.

Numbers and Terms That Appear Verbatim

27 facial landmarks

8 emotions

100 faces per image limit

4 MB maximum image size

36x36 minimum image size

returnFaceAttributes parameter

faceId (string)

faceRectangle (top, left, width, height)

Edge Cases the Exam Loves

Image with multiple faces: The API returns an array of face objects. Each face has its own attributes.

No face detected: Returns an empty array.

Face partially occluded: Attributes may be missing or have lower confidence.

Emotion scores sum to 1: If a candidate thinks they sum to 100, they are wrong (they are 0-1).

How to Eliminate Wrong Answers

If a question asks which emotion is NOT supported, check for 'contempt' or 'surprise'—these are supported. 'Boredom' is not.

If a question asks what is required to get attributes, look for 'returnFaceAttributes' in the answer choices.

If a question mentions matching a face to a database, it's identification, not detection.

If a question asks about age, remember it's an estimate (float).

Key Takeaways

Face detection is the first step; attribute extraction is optional and must be requested.

The eight emotions supported are: anger, contempt, disgust, fear, happiness, neutral, sadness, surprise.

Emotion confidence scores are between 0 and 1 and sum to 1.

Age is an estimated float, not an exact integer.

Face identification requires a PersonGroup; detection does not.

Maximum 100 faces can be detected per image.

Attributes like glasses, facial hair, and makeup are also available.

The Face API is a RESTful service; you call it via HTTP POST.

Image size must be between 36x36 and 4 MB.

Liveness detection is not part of the Face API.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Face Detection

Finds faces in an image

Returns bounding box and faceId

Does not require a PersonGroup

Can optionally return attributes

One-to-many: one image to many faces

Face Identification

Matches a detected face to known persons

Requires a PersonGroup with enrolled faces

Returns a personId and confidence

Does not return attributes (unless combined)

Many-to-one: many faces to one person

Watch Out for These

Mistake

Face detection and face identification are the same thing.

Correct

Face detection finds faces in an image and returns bounding boxes and optional attributes. Face identification matches a detected face against a database of known persons (PersonGroup). They are separate capabilities; identification requires a prior enrollment step.

Mistake

The Face API can detect emotions from text.

Correct

Emotion detection in Face API is based on facial expressions in images. Text-based sentiment analysis is a different service (Text Analytics). The exam tests that Face API works with images only.

Mistake

Age is returned as an exact integer.

Correct

Age is returned as a floating-point number (e.g., 30.5), representing an estimate. It is not guaranteed to be accurate.

Mistake

Emotion detection returns a single emotion label.

Correct

It returns confidence scores for eight emotions. The highest score indicates the most likely emotion, but all scores are provided.

Mistake

You must use a separate API call for each attribute.

Correct

You can request multiple attributes in a single call by comma-separating them in the returnFaceAttributes parameter. This reduces cost and latency.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between face detection and face identification in Azure Face API?

Face detection locates faces in an image and returns their bounding box coordinates and an optional faceId. It can also extract attributes like age and emotion. Face identification compares a detected face against a database of known persons (PersonGroup) to find a match. Detection is a prerequisite for identification. The exam often tests this distinction: detection finds faces; identification matches them.

Which emotions can the Azure Face API detect?

The Face API can detect eight emotions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. Each emotion is returned as a confidence score between 0 and 1. The sum of all scores for a face is approximately 1. The highest score indicates the predicted emotion. The exam expects you to know this exact list.

How do I get attributes like age and gender from a face?

You must include the query parameter `returnFaceAttributes` in your API request, with a comma-separated list of attributes you want. For example: `returnFaceAttributes=age,gender,emotion`. If omitted, no attributes are returned. The exam tests this configuration requirement.

Is the age returned by Face API exact?

No, the age is an estimate returned as a floating-point number. It is not guaranteed to be accurate. The exam may ask if it's exact or approximate; the correct answer is approximate.

Can I use the Face API to verify if two faces belong to the same person?

Yes, the Face API includes a Verify operation that takes two faceIds and returns a confidence score indicating whether they are the same person. This is different from identification, which matches against a group.

What image formats are supported by the Face API?

The Face API supports JPEG, PNG, GIF, BMP, and TIFF. The minimum image size is 36x36 pixels, and the maximum file size is 4 MB.

Does the Face API support liveness detection?

No, the Face API does not include liveness detection. Liveness detection is a separate capability that determines if a face is from a real person or a spoof. The exam may test that this is not part of the Face API.

Terms Worth Knowing

Artificial intelligence Computer vision Generative AI Machine learning Natural language processing Responsible AI

Ready to put this to the test?

You've just covered Face Attributes and Emotion Detection — now see how well it sticks with free AI-900 practice questions. Full explanations included, no account needed.

Try AI-900 practice questions Back to all chapters

Done with this chapter?

Face Detection vs Face Recognition

Document Layout Analysis

See the full AI-900 study guide