CCNA Describe Features Of Computer Vision Workloads On Azure Questions — Page 3 of 3

151

MCQmedium

What is 'face attribute analysis' in Azure AI Face service?

A.Identifying the named person in a photograph using a face database

B.Estimating age, emotion, head pose, and appearance attributes from detected faces

C.Verifying whether a submitted selfie matches a government-issued ID document

D.Detecting whether a face has been digitally manipulated or deepfaked

AnswerB

Face attribute analysis returns estimated attributes per detected face — age, emotion, pose, glasses — with responsible AI caveats on emotion.

Why this answer

Face attribute analysis in Azure AI Face service extracts a set of facial attributes from detected faces, including estimated age, emotion (e.g., happiness, sadness, anger), head pose (pitch, yaw, roll), and appearance traits like facial hair, glasses, and makeup. This is distinct from identification or verification tasks because it does not match faces against a database or compare two images; it simply returns metadata about the face itself.

Exam trap

The trap here is that candidates confuse 'face attribute analysis' with 'face identification' or 'face verification', because all three involve faces, but attribute analysis only extracts descriptive metadata and does not perform any matching or recognition against a database.

How to eliminate wrong answers

Option A is wrong because identifying a named person using a face database is 'face identification' (or 'face recognition'), not attribute analysis; it requires a PersonGroup and training, not just detection. Option C is wrong because verifying a selfie against a government ID is 'face verification' (a 1:1 comparison) or 'liveness detection', not attribute analysis; it involves comparing two face vectors for similarity. Option D is wrong because detecting digital manipulation or deepfakes is not a built-in feature of Azure AI Face service; it would require separate anti-spoofing or deepfake detection models, not standard attribute extraction.

Practice this question →

152

MCQmedium

A retail store wants to use Azure Computer Vision to count the number of people entering a store from live video feeds. Which prebuilt Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.People Detection

D.Optical Character Recognition (OCR)

AnswerC

People Detection is a prebuilt Computer Vision feature that detects human bodies in images and provides bounding boxes, enabling accurate counting of people.

Why this answer

People Detection is a specialized prebuilt capability within Azure Computer Vision that is designed specifically to detect and count people in images or video frames. Unlike generic object detection, it is optimized for identifying human figures regardless of pose or occlusion, making it ideal for counting store entries from live video feeds.

Exam trap

The trap here is that candidates often confuse 'Object Detection' with 'People Detection,' assuming the prebuilt object detection model can reliably count people, but Microsoft specifically offers People Detection as a separate, optimized API for this exact use case.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to an entire image (e.g., 'store interior') and cannot locate or count multiple instances of people. Option B is wrong because Object Detection can locate objects but requires a custom model trained on people; the prebuilt 'object detection' model detects 80 common objects but is not optimized for accurate people counting in crowded or occluded scenes. Option D is wrong because Optical Character Recognition (OCR) extracts text from images and has no capability to detect or count people.

Practice this question →

153

MCQmedium

An art gallery wants to build a mobile app that allows visitors to take a photo of a specific painting and receive detailed information about that artwork. The gallery has a library of high-quality images of each painting in their collection. Which Azure AI service should they use to build this identification capability?

A.Azure Custom Vision

B.Azure Computer Vision (pre-built image analysis)

C.Azure Face API

D.Azure Computer Vision (OCR)

AnswerA

Correct. Custom Vision enables you to train a custom image classifier using your own labeled images, which is exactly what the gallery needs to identify specific paintings.

Why this answer

Azure Custom Vision is the correct choice because it allows the gallery to train a custom image classification model using their library of high-quality painting images. This service enables the app to identify specific artworks from user-captured photos and return detailed information, as it is designed for custom classification scenarios where pre-built models are insufficient.

Exam trap

The trap here is that candidates confuse Azure Computer Vision's pre-built image analysis with Custom Vision, assuming the former can be customized for specific objects, but only Custom Vision supports training on custom datasets.

How to eliminate wrong answers

Option B is wrong because Azure Computer Vision (pre-built image analysis) provides general image tagging and description, but cannot be trained to recognize specific custom objects like individual paintings. Option C is wrong because Azure Face API is specialized for detecting and analyzing human faces, not for identifying artwork or objects. Option D is wrong because Azure Computer Vision (OCR) extracts text from images, which is irrelevant for identifying paintings by visual appearance.

Practice this question →

154

MCQeasy

What information does Azure AI Face service provide about detected faces beyond just their location?

A.Only the coordinates of the face bounding box

B.Age estimate, emotion, head pose, and other facial attributes

C.The person's name and identity from a public database

D.Only whether the face belongs to a human or not

AnswerB

Azure AI Face returns age estimates, detected emotion, glasses type, head pose, and other attributes alongside the face location.

Why this answer

Azure AI Face service can extract a wide range of facial attributes beyond just the bounding box coordinates. These include age estimate, emotion (e.g., happiness, sadness, surprise), head pose (pitch, yaw, roll), facial hair, glasses, and more, making option B correct.

Exam trap

The trap here is that candidates may assume the Face service only provides basic location data (bounding box) or mistakenly think it can look up identities from public databases like social media, when in fact it requires custom enrollment for identification.

How to eliminate wrong answers

Option A is wrong because the Face service does not return only bounding box coordinates; it can return a rich set of facial attributes. Option C is wrong because the Face service does not identify a person's name or identity from a public database; it requires prior enrollment in a private PersonGroup for identification. Option D is wrong because the service does not merely classify a face as human or not; it provides detailed attributes and can also perform verification and identification.

Practice this question →

155

MCQhard

A parking lot management company uses security cameras to monitor vehicles. They need to both detect the presence of license plates in an image and read the alphanumeric characters on those plates. Which Azure Computer Vision capability should they use to achieve both requirements?

A.Image Analysis (describe image and detect objects)

B.Optical Character Recognition (OCR) - Read API

C.Face API

D.Custom Vision (object detection)

AnswerB

Correct. OCR detects text regions and extracts the characters, making it suitable for both locating and reading license plates.

Why this answer

Option B (OCR - Read API) is correct because Azure's Read API is specifically designed to both detect the presence of text (including license plates) in an image and extract the alphanumeric characters from that text. This meets both requirements—detecting the plate and reading its characters—in a single call, using deep-learning-based recognition models optimized for printed and handwritten text.

Exam trap

The trap here is that candidates confuse object detection (which can locate a license plate) with OCR (which can both locate and read the text), leading them to pick Custom Vision or Image Analysis instead of the Read API.

How to eliminate wrong answers

Option A is wrong because Image Analysis (describe image and detect objects) can identify objects like a car or a license plate region, but it does not extract the alphanumeric characters from the plate; it only provides object labels and bounding boxes. Option C is wrong because Face API is specialized for detecting, analyzing, and recognizing human faces, not license plates or text. Option D is wrong because Custom Vision (object detection) can be trained to detect license plates as objects, but it does not natively read the alphanumeric characters on the plate; you would need a separate OCR step to extract the text.

Practice this question →

156

MCQmedium

What is semantic segmentation in computer vision?

A.Detecting the boundaries of objects using rectangular boxes

B.Classifying each pixel in an image into a semantic category

C.Generating natural language descriptions of images

D.Extracting text from images using OCR

AnswerB

Semantic segmentation assigns a class label to every pixel, providing detailed scene understanding at pixel level.

Why this answer

Semantic segmentation is a computer vision task that assigns a class label to every single pixel in an image, effectively partitioning the image into regions that correspond to different semantic categories (e.g., road, car, pedestrian). This is distinct from object detection, which only provides bounding boxes around objects, and from image captioning or OCR, which operate at a higher or different level of abstraction.

Exam trap

The trap here is that candidates often confuse semantic segmentation with object detection (Option A) because both involve identifying objects, but segmentation requires pixel-level precision rather than bounding boxes.

How to eliminate wrong answers

Option A is wrong because detecting boundaries of objects using rectangular boxes describes object detection, not semantic segmentation, which operates at the pixel level rather than with bounding boxes. Option C is wrong because generating natural language descriptions of images is image captioning, a different computer vision task that produces text, not pixel-level classification. Option D is wrong because extracting text from images using OCR is optical character recognition, which focuses on text extraction, not pixel-wise semantic labeling.

Practice this question →

157

MCQmedium

What does Azure AI Vision's 'smart crops' feature do?

A.Identifies agricultural crops in satellite imagery

B.Identifies the most important region for optimal thumbnail cropping at any aspect ratio

C.Removes unwanted background elements from images

D.Detects when an image has been cropped or edited

AnswerB

Smart crops analyzes visual saliency and returns optimal bounding boxes for cropping images to generate compelling thumbnails.

Why this answer

Azure AI Vision's smart crops feature uses AI to identify the most important region of an image and then crops it to any specified aspect ratio while keeping that region in focus. This is particularly useful for generating thumbnails that maintain visual context across different display sizes, such as social media previews or responsive web design.

Exam trap

The trap here is that candidates confuse 'smart crops' with general image editing features like background removal or editing detection, but the key differentiator is that smart crops specifically focuses on preserving the most important region when resizing to different aspect ratios.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision's smart crops feature is not designed for agricultural analysis; satellite imagery crop identification would fall under Azure's Computer Vision for geospatial or custom vision models, not the smart crops API. Option C is wrong because removing unwanted background elements is a separate capability called background removal or segmentation, which is distinct from smart cropping that preserves the entire image's important region. Option D is wrong because detecting if an image has been cropped or edited is not a feature of Azure AI Vision; smart crops generates new cropped versions but does not analyze images for prior editing.

Practice this question →

158

MCQeasy

A museum wants to create an interactive exhibit where visitors can take a photo of a painting. The system should then generate a descriptive caption (e.g., 'A woman with a pearl earring') and classify the painting as either a portrait or landscape. Which Azure Computer Vision capability should they use without needing to train a custom model?

A.Custom Vision

B.Image Analysis

C.Face Detection

D.Optical Character Recognition (OCR)

AnswerB

Azure Image Analysis prebuilt model can describe image content in natural language and categorize images into various categories, including portrait and landscape.

Why this answer

Image Analysis in Azure Computer Vision provides pre-built capabilities for extracting rich information from images, including generating human-readable captions (via the 'describe' operation) and classifying images into categories like 'portrait' or 'landscape' without requiring any custom training. This directly matches the museum's need for both caption generation and orientation classification using a pre-trained model.

Exam trap

The trap here is that candidates often confuse Custom Vision (which requires training) with Image Analysis (which is pre-built), or mistakenly think Face Detection or OCR can generate descriptive captions, when in fact they are specialized for different tasks.

How to eliminate wrong answers

Option A is wrong because Custom Vision requires training a custom model with labeled images to recognize specific objects or scenes, which is unnecessary here since the museum needs pre-built captioning and classification. Option C is wrong because Face Detection is specialized for detecting human faces and attributes (e.g., age, emotion), not for generating descriptive captions of entire paintings or classifying them as portrait/landscape. Option D is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant to generating captions or classifying painting orientation.

Practice this question →

159

MCQmedium

A logistics company processes packages on an automated conveyor belt. They need to read shipping labels that are often rotated or skewed, and also detect whether a 'FRAGILE' sticker is present on the package. Which combination of Azure Computer Vision capabilities should they use?

A.OCR (Read API) and Object Detection

B.Image Classification and OCR (Read API)

C.Object Detection and Face Detection

D.Image Classification and Face Detection

AnswerA

Correct. OCR extracts text from images, even when rotated or skewed. Object Detection identifies and locates specific objects (like a 'FRAGILE' sticker) within the image.

Why this answer

The scenario requires reading rotated or skewed text from shipping labels (handled by the OCR Read API, which extracts printed and handwritten text from images, even when rotated or skewed) and detecting whether a 'FRAGILE' sticker is present (handled by Object Detection, which identifies and locates specific objects—like stickers—within an image). Option A correctly pairs these two capabilities to meet both requirements.

Exam trap

The trap here is that candidates confuse Image Classification (which labels the whole image) with Object Detection (which finds specific objects), leading them to pick Option B, thinking classification can detect a sticker, when it cannot provide location or multiple object instances.

How to eliminate wrong answers

Option B is wrong because Image Classification assigns a single label to the entire image (e.g., 'package'), but it cannot detect or locate a specific sticker like 'FRAGILE'—it lacks spatial localization. Option C is wrong because Face Detection is designed to detect human faces, not stickers or text, and is irrelevant to this logistics scenario. Option D is wrong because it combines Image Classification (which cannot detect stickers) with Face Detection (irrelevant), missing both the text-reading and sticker-detection requirements.

Practice this question →

160

MCQmedium

A social media platform wants to automatically detect and flag images that contain violent content or adult material before they are published. Which prebuilt Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Object Detection

C.Image Analysis (with content moderation)

D.Background Removal

AnswerC

Image Analysis includes a content moderation feature that can detect adult, racy, and violent content. It provides a confidence score for flagged content.

Why this answer

Option C is correct because Azure Computer Vision's Image Analysis includes a content moderation feature that can detect adult, racy, and violent content in images. This prebuilt capability is specifically designed to flag inappropriate material before publication, making it the ideal choice for the social media platform's requirement.

Exam trap

The trap here is that candidates often confuse Object Detection (which identifies objects) with content moderation (which classifies the nature of the image), leading them to pick Option B when the question specifically asks about detecting violent or adult material.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not violent or adult content. Option B is wrong because Object Detection identifies and locates objects within an image (e.g., people, cars) but does not classify content as violent or adult. Option D is wrong because Background Removal isolates the foreground subject from the background and has no capability to detect or moderate violent or adult material.

Practice this question →

161

MCQeasy

What is 'Azure Percept' (now deprecated) and what role did it play in edge AI?

A.A cloud-only AI service for high-accuracy computer vision inference

B.An edge AI hardware platform for deploying vision and speech AI models locally on devices

C.A perception layer in the Azure networking stack for monitoring packet loss

D.A service for perceiving user intent from mouse movements and keyboard patterns

AnswerB

Azure Percept enabled local edge AI — camera-based vision and audio AI running on device, reducing cloud dependency.

Why this answer

Azure Percept was a hardware and software platform designed to bring AI inference to the edge, specifically for vision and speech workloads. It included the Azure Percept DK (developer kit) with an Intel Movidius Myriad X VPU, enabling local processing of AI models without constant cloud connectivity. This made it ideal for low-latency, offline scenarios like manufacturing quality inspection or smart retail.

Exam trap

The trap here is that candidates confuse 'edge AI' with 'cloud AI' and assume Azure Percept was a cloud service, when in fact it was a hardware platform for local inference, often tested alongside the concept of 'Azure Percept Studio' for no-code model deployment.

How to eliminate wrong answers

Option A is wrong because Azure Percept was not a cloud-only service; it was an edge AI platform that could optionally sync with Azure cloud services but performed inference locally. Option C is wrong because Azure Percept has nothing to do with networking or packet loss monitoring; that describes Azure Network Watcher or similar tools. Option D is wrong because Azure Percept does not perceive user intent from mouse or keyboard patterns; that is a misconception about behavioral analytics or user modeling services.

Practice this question →

162

MCQeasy

Which Azure AI service detects and identifies human faces in images, including attributes like age estimate and emotion?

A.Azure AI Vision

B.Azure AI Face

C.Azure AI Custom Vision

D.Azure AI Video Indexer

AnswerB

Azure AI Face detects faces in images and provides attributes like age estimate, emotion, and supports face verification.

Why this answer

Azure AI Face is the correct service because it is specifically designed to detect and identify human faces in images, and it can extract attributes such as age estimates, emotions (e.g., happiness, sadness), and facial landmarks. Unlike general-purpose image analysis, Azure AI Face uses specialized face detection models and returns face rectangles along with optional attribute data.

Exam trap

The trap here is that candidates confuse Azure AI Vision's basic face detection (which only returns bounding boxes) with Azure AI Face's specialized attribute extraction, leading them to select Azure AI Vision when the question explicitly asks for age estimate and emotion attributes.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision provides general image analysis (e.g., object detection, OCR, scene description) but does not offer dedicated face attribute extraction like age or emotion; it only returns a basic face bounding box without detailed attributes. Option C is wrong because Azure AI Custom Vision is used to train custom image classification or object detection models on user-provided datasets, not for pre-built face detection with age and emotion attributes. Option D is wrong because Azure AI Video Indexer is focused on extracting insights from video content (e.g., speech transcription, scene segmentation, and face detection in video), but it is not the primary service for still-image face attribute analysis and does not provide the same granular attribute extraction as Azure AI Face.

Practice this question →

163

MCQmedium

What is the purpose of Azure AI Video Indexer's transcript feature?

A.To translate video subtitles into multiple languages

B.To automatically convert speech in videos to searchable text with timestamps

C.To generate written scripts for producing new videos

D.To extract text visible in video frames (on-screen text)

AnswerB

Video Indexer transcribes spoken content to text with timestamps, enabling text-based search across video content.

Why this answer

Azure AI Video Indexer's transcript feature uses automatic speech recognition (ASR) to convert spoken audio in videos into a text transcript, which is then indexed with precise timestamps for each word or phrase. This enables users to search, navigate, and analyze video content by keyword or phrase, making the video's audio content fully searchable and accessible.

Exam trap

The trap here is that candidates often confuse the transcript feature (speech-to-text) with the OCR feature (on-screen text extraction) or with translation, because all three involve 'text' but serve fundamentally different purposes in Video Indexer's pipeline.

How to eliminate wrong answers

Option A is wrong because translation of subtitles is a separate feature in Video Indexer (the 'Translate' capability), not the core purpose of the transcript feature, which focuses on generating the original-language text from speech. Option C is wrong because the transcript feature extracts existing speech from a video; it does not generate new written scripts for producing videos, which would be a scriptwriting or content creation tool. Option D is wrong because extracting text visible in video frames (on-screen text) is handled by the OCR (optical character recognition) feature in Video Indexer, not the transcript feature, which deals exclusively with audio-derived speech.

Practice this question →

164

MCQeasy

What is facial recognition and what are the key responsible AI considerations for its use?

A.Facial recognition has no ethical concerns and should be deployed universally

B.Facial recognition requires ethical consideration regarding accuracy disparities, privacy, and potential for misuse

C.Facial recognition is only used for unlocking smartphones

D.Facial recognition is 100% accurate across all demographics

AnswerB

Responsible facial recognition deployment requires addressing demographic accuracy disparities, obtaining consent, protecting privacy, and preventing misuse.

Why this answer

Facial recognition is a computer vision technology that identifies or verifies individuals by analyzing facial features from images or video. The key responsible AI considerations include addressing accuracy disparities across demographic groups (e.g., higher false positive rates for certain ethnicities), ensuring privacy through data minimization and consent, and preventing misuse such as mass surveillance without oversight. Option B correctly captures these ethical imperatives, which are critical for trustworthy deployment.

Exam trap

The trap here is that candidates may assume facial recognition is either harmless or perfectly accurate, ignoring the documented bias and privacy risks that responsible AI frameworks like Microsoft's Responsible AI Standard explicitly address.

How to eliminate wrong answers

Option A is wrong because facial recognition has significant ethical concerns, including bias, privacy violations, and potential for misuse, making universal deployment irresponsible. Option C is wrong because facial recognition is used in many applications beyond smartphone unlocking, such as security systems, identity verification, and law enforcement. Option D is wrong because facial recognition is not 100% accurate across all demographics; studies show accuracy disparities, particularly for women and people with darker skin tones, due to training data imbalances.

Practice this question →

165

MCQeasy

What is 'receipt analysis' in Azure AI Document Intelligence and what data does it extract?

A.Analysing customer satisfaction scores from post-purchase surveys

B.Extracting merchant name, items, prices, tax, and totals from retail receipt images

C.Verifying that a receipt matches the purchase record in a financial database

D.Detecting fraudulent receipts by comparing them to a known-good receipt database

AnswerB

Receipt analysis extracts structured financial fields from receipts — enabling automated expense management and bookkeeping.

Why this answer

Receipt analysis in Azure AI Document Intelligence is a prebuilt model designed to extract key-value pairs and line items from sales receipts. Option B correctly identifies that it extracts merchant name, items, prices, tax, and totals from retail receipt images, which is the primary function of this model.

Exam trap

The trap here is confusing the extraction of receipt data with downstream tasks like validation, fraud detection, or sentiment analysis, leading candidates to select options that describe post-processing steps rather than the core capability of the receipt analysis model.

How to eliminate wrong answers

Option A is wrong because analyzing customer satisfaction scores from post-purchase surveys is a text analytics or sentiment analysis task, not a document intelligence feature for structured data extraction from receipts. Option C is wrong because verifying a receipt against a financial database is a reconciliation or validation process, not a core extraction capability of the receipt analysis model. Option D is wrong because detecting fraudulent receipts by comparison to a known-good database is a fraud detection scenario, not a feature of the receipt analysis model, which focuses on extracting data rather than verifying authenticity.

Practice this question →

166

MCQeasy

A retail company wants to build a solution that automatically reads the printed text on product labels to update inventory records. The labels contain alphanumeric characters and are in various fonts and sizes. Which Azure Cognitive Service should they use?

A.Azure Face Service

B.Azure Form Recognizer

C.Azure Computer Vision - OCR

D.Azure Video Indexer

AnswerC

Computer Vision's OCR (Optical Character Recognition) reads printed and handwritten text from images, making it ideal for product label text extraction.

Why this answer

Azure Computer Vision's OCR (Optical Character Recognition) API is specifically designed to extract printed text from images, handling various fonts, sizes, and alphanumeric characters. This makes it the ideal choice for reading product labels to update inventory records, as it can process the diverse label formats commonly found in retail environments.

Exam trap

The trap here is that candidates may confuse Azure Form Recognizer (which includes OCR capabilities) with the simpler Computer Vision OCR service, but Form Recognizer is designed for structured document extraction, not general-purpose text reading from labels.

How to eliminate wrong answers

Option A is wrong because Azure Face Service is designed for detecting and analyzing human faces (e.g., facial attributes, emotions, identification), not for reading printed text on labels. Option B is wrong because Azure Form Recognizer is optimized for extracting structured data from forms and documents (e.g., invoices, receipts) using prebuilt or custom models, but it is overkill and less efficient for simple printed text extraction from labels; it relies on OCR as a subcomponent but adds unnecessary complexity for this use case. Option D is wrong because Azure Video Indexer is used for analyzing video content (e.g., speech transcription, scene detection, face recognition), not for extracting printed text from static images of labels.

Practice this question →

167

MCQeasy

What is 'image classification' in Azure AI Custom Vision?

A.Organising image files into folders on Azure Blob Storage by date

B.Assigning a category label to an entire image based on its dominant visual content

C.Converting colour images to black and white for accessibility purposes

D.Sorting images by their file size and resolution metadata

AnswerB

Image classification labels the whole image (cat/dog/car) — simpler than object detection, which locates specific instances within the image.

Why this answer

Image classification in Azure AI Custom Vision involves training a model to assign a single category label (e.g., 'dog', 'cat') to an entire image based on its dominant visual content. This is a supervised learning task where the model learns from labeled images to predict the most likely class for new, unseen images. Option B correctly describes this core functionality.

Exam trap

The trap here is that candidates may confuse image classification with object detection (which identifies multiple objects and their locations) or with simple image processing tasks like filtering or sorting, leading them to pick options that describe non-AI operations.

How to eliminate wrong answers

Option A is wrong because organizing image files into folders on Azure Blob Storage by date is a storage management task, not a computer vision AI workload; it does not involve any model training or inference. Option C is wrong because converting color images to black and white is a simple image processing operation (e.g., using OpenCV or Azure Computer Vision's grayscale conversion), not a classification task that assigns semantic labels. Option D is wrong because sorting images by file size and resolution metadata is a file system or data preprocessing step, not a machine learning classification process that identifies visual content.

Practice this question →

168

MCQmedium

An autonomous drone needs to navigate a forest by identifying individual trees, including their exact shape and boundaries, to avoid colliding with branches. The drone also needs to distinguish between trees and other objects like rocks. Which Azure Computer Vision capability is best suited for this requirement?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Optical character recognition (OCR)

AnswerC

Semantic segmentation labels each pixel with a class (e.g., 'tree', 'rock', 'sky'). This provides the precise shape and boundaries needed for collision avoidance.

Why this answer

Semantic segmentation is the correct choice because it classifies every pixel in an image, assigning each pixel to a specific class (e.g., 'tree', 'rock', 'branch'). This pixel-level precision allows the drone to identify the exact shape and boundaries of individual trees, which is essential for collision avoidance in a forest environment.

Exam trap

The trap here is that candidates confuse object detection (bounding boxes) with semantic segmentation (pixel-level masks), assuming bounding boxes provide enough detail for precise boundary avoidance, but the question explicitly requires 'exact shape and boundaries,' which only pixel-level segmentation can deliver.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image, not individual objects or their boundaries, so it cannot provide the per-pixel detail needed to navigate around branches. Option B is wrong because object detection draws bounding boxes around objects, which gives approximate locations but not the precise shape or boundary of each tree, making it insufficient for avoiding fine branches. Option D is wrong because optical character recognition (OCR) extracts text from images, which is irrelevant to identifying trees, rocks, or other natural objects.

Practice this question →

169

MCQmedium

What is 'quality control' computer vision and how is it used in manufacturing?

A.Monitoring the quality of AI model outputs to ensure they meet accuracy standards

B.Detecting manufacturing defects at production line speeds with consistent accuracy

C.Verifying that factory video surveillance cameras meet quality standards

D.Controlling the quality of training images used to build computer vision models

AnswerB

QC vision identifies cracks, scratches, and assembly errors at speed — replacing inconsistent manual inspection with consistent AI.

Why this answer

Quality control in computer vision refers to using AI models to inspect products on a manufacturing line, detecting defects such as scratches, dents, or misalignments at high speed. Azure Custom Vision or Azure Computer Vision can be trained on labeled images of good and defective items to perform real-time inference, ensuring consistent accuracy far beyond human visual inspection. This directly addresses the need for automated, scalable defect detection in production environments.

Exam trap

The trap here is that candidates confuse 'quality control' of the AI model itself (Option A) with using computer vision to perform quality control on physical products, which is the core manufacturing use case.

How to eliminate wrong answers

Option A is wrong because it describes monitoring AI model output accuracy, which is a model governance or MLOps task, not the application of computer vision for physical product inspection in manufacturing. Option C is wrong because it confuses the quality of surveillance camera hardware with the computer vision workload used to inspect manufactured items; the question is about using vision AI for defect detection, not verifying camera specs. Option D is wrong because it refers to curating training data quality, which is a prerequisite for building models, not the operational use of computer vision for quality control on the factory floor.

Practice this question →

170

MCQmedium

A brand monitoring company wants to automatically detect the presence of specific logos (e.g., Apple, Coca-Cola) in social media images. The logos can appear in various orientations and sizes within the image. Which Azure Computer Vision capability is specifically designed to identify popular brands from their logos?

A.Image Classification

B.Object Detection

C.Brand Detection

D.Optical Character Recognition

AnswerC

Brand detection is a built-in feature of Azure Computer Vision that identifies thousands of global brands from their logos, handling variations in orientation and size.

Why this answer

Brand Detection is a specialized Azure Computer Vision capability that uses a pre-trained model to identify thousands of global brands from their logos in images. It is specifically designed to handle variations in logo orientation, size, and placement, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates often confuse Object Detection (which finds generic objects) with Brand Detection (which is a specialized, pre-trained subset for logos), leading them to select Object Detection because it also uses bounding boxes.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to the entire image (e.g., 'soda can') but does not locate or identify specific brand logos within the image. Option B is wrong because Object Detection identifies and locates generic objects (e.g., 'bottle', 'car') using bounding boxes, but it is not pre-trained to recognize specific brand logos like Apple or Coca-Cola. Option D is wrong because Optical Character Recognition (OCR) extracts printed or handwritten text from images, not visual logos or brand symbols.

Practice this question →

171

MCQeasy

A manufacturing company uses overhead cameras on an assembly line to check that each part is present in the correct location on a circuit board. The system must not only confirm the part is there but also draw a box around each part to show its exact position. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Classification

C.Object Detection

D.Face Detection

AnswerC

Object Detection identifies objects and returns their bounding box coordinates, making it suitable for locating each part and drawing boxes around them.

Why this answer

Object Detection is the correct capability because it not only identifies whether a specific object (like a circuit board part) is present in an image but also returns bounding box coordinates that indicate the exact location of each detected object. This meets the requirement to both confirm the part's presence and draw a box around it.

Exam trap

The trap here is that candidates confuse Image Classification (which only labels the whole image) with Object Detection (which provides per-object localization), especially when the question emphasizes both 'confirm the part is there' and 'draw a box around each part'.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) is designed to extract printed or handwritten text from images, not to detect or locate physical objects like circuit board parts. Option B is wrong because Image Classification assigns a single label to the entire image (e.g., 'circuit board with all parts') but does not provide bounding boxes or per-object localization. Option D is wrong because Face Detection is specialized for locating human faces and cannot be used to detect generic industrial parts on a circuit board.

Practice this question →

172

MCQmedium

A security company needs to identify individuals in a crowd by matching their faces against a database of known persons of interest. The system must detect faces, verify the identities, and provide a confidence score. Which Azure Computer Vision capability should they use?

A.Facial recognition

B.Optical character recognition (OCR)

C.Image classification

D.Object detection

AnswerA

Facial recognition uses face detection and matching against a known database to identify individuals, exactly as required for this security scenario.

Why this answer

Azure Computer Vision's facial recognition capability is specifically designed to detect human faces in images, match them against a known database of persons, and return a confidence score for each match. This directly aligns with the security company's requirement to identify individuals in a crowd by verifying their identities against a watchlist.

Exam trap

The trap here is that candidates confuse 'facial recognition' (identity verification against a database) with 'object detection' (locating faces as objects), but only facial recognition provides the identity matching and confidence score required for this scenario.

How to eliminate wrong answers

Option B is wrong because Optical Character Recognition (OCR) extracts text from images, not faces or identities. Option C is wrong because Image Classification assigns a single label to an entire image (e.g., 'crowd'), not individual face detection and matching. Option D is wrong because Object Detection locates and labels generic objects (e.g., 'person') with bounding boxes, but does not perform identity verification or confidence-based matching against a known database.

Practice this question →

173

MCQmedium

A manufacturing company uses cameras on an assembly line to inspect products for cosmetic defects such as scratches, dents, or color inconsistencies. They need to classify each product as 'defective' or 'non-defective' and also identify the precise region (e.g., a specific area of the product surface) that contains the defect. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Optical Character Recognition (OCR)

AnswerC

Correct. Semantic segmentation classifies every pixel, enabling the model to identify defective regions with high precision, even for irregular shapes.

Why this answer

Semantic segmentation is the correct choice because it assigns a class label (e.g., 'defective' or 'non-defective') to every pixel in the image, enabling the model to not only classify the product but also delineate the exact boundary of the defect region. This pixel-level precision is required to identify the precise area of the product surface containing the scratch, dent, or color inconsistency.

Exam trap

The trap here is that candidates often confuse object detection (bounding boxes) with semantic segmentation (pixel-level masks), assuming bounding boxes are sufficient for precise defect localization, but the question explicitly requires identifying the 'precise region' of the defect, which demands pixel-level accuracy.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'defective' or 'non-defective') but does not localize where the defect is on the product surface. Option B is wrong because object detection draws bounding boxes around objects (e.g., a product or a defect) but does not provide pixel-level segmentation of the defect region; it cannot precisely outline irregular defect boundaries. Option D is wrong because Optical Character Recognition (OCR) extracts text from images and is irrelevant for detecting cosmetic defects like scratches or dents.

Practice this question →

174

MCQmedium

What is 'wildlife monitoring' as a computer vision application and what Azure services power it?

A.CCTV monitoring of wildlife parks to ensure visitor safety from animal encounters

B.Using computer vision to identify species, count populations, and track animals from camera trap images

C.Real-time video monitoring of endangered animal exhibits in zoos for welfare compliance

D.AI-powered smart thermostats that monitor and adapt wildlife sanctuary temperatures

AnswerB

Wildlife AI classifies species and tracks individuals from camera traps — enabling conservation monitoring at scales impossible for humans alone.

Why this answer

Option B is correct because 'wildlife monitoring' in the context of computer vision specifically refers to using AI to automatically analyze camera trap images to identify species, count populations, and track animal movements. Azure services such as Custom Vision (for training species-specific classifiers) and Computer Vision (for image analysis) power this by processing images captured in the field, enabling conservationists to gather data without manual review.

Exam trap

The trap here is that candidates confuse general surveillance or IoT applications with the specific computer vision task of species identification from static images, leading them to pick options that involve real-time video or environmental control rather than image analysis.

How to eliminate wrong answers

Option A is wrong because it describes a safety monitoring use case (visitor safety from animal encounters), which is a form of surveillance, not the ecological research application of wildlife monitoring that focuses on species identification and population counting. Option C is wrong because it describes real-time video monitoring of zoo exhibits for welfare compliance, which is a controlled, captive environment use case, not the typical remote, camera-trap-based wildlife monitoring in natural habitats. Option D is wrong because it describes smart thermostats for temperature control, which is an IoT/home automation application, not a computer vision workload — it has no image or video analysis component.

Practice this question →

175

MCQhard

A security company needs to analyze live video feeds from multiple cameras to detect specific objects (e.g., vehicles, people) and also read license plate numbers from vehicles. Which combination of Azure Computer Vision capabilities should they use?

A.Object detection and Optical Character Recognition

B.Image analysis and face detection

C.Semantic segmentation and image captioning

D.Spatial analysis and image classification

AnswerA

Object detection finds and locates vehicles/people, and OCR reads the text on license plates, fulfilling both requirements.

Why this answer

Option A is correct because the scenario requires two distinct capabilities: detecting specific objects (vehicles, people) in live video feeds, which is handled by Azure Computer Vision's Object Detection feature, and reading license plate numbers, which requires Optical Character Recognition (OCR). Object detection identifies and locates objects within an image or video frame, while OCR extracts text from images, making this combination ideal for the use case.

Exam trap

The trap here is that candidates may confuse Image Analysis (which provides tags and descriptions) with Object Detection, or assume Face Detection can be generalized to other objects, leading them to choose Option B instead of the correct combination of Object Detection and OCR.

How to eliminate wrong answers

Option B is wrong because Image Analysis provides general content descriptions and tags, but not precise object localization, and Face Detection is limited to human faces, not vehicles or license plates. Option C is wrong because Semantic Segmentation classifies every pixel in an image into categories (e.g., road, sky) but does not detect specific objects or read text, and Image Captioning generates descriptive sentences, not object detection or OCR. Option D is wrong because Spatial Analysis analyzes people movement and interactions in a space, not object detection or text extraction, and Image Classification assigns a single label to an entire image, not multiple objects or license plate numbers.

Practice this question →

176

MCQmedium

A retail company wants to automatically analyze in-store video footage to count the number of customers entering and exiting through different doors. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image classification

C.Object detection

D.Face detection

AnswerC

Object detection identifies and localizes multiple objects (e.g., persons) in a scene, enabling counting and movement tracking.

Why this answer

Object detection is the correct capability because it can identify and locate multiple instances of people within a video frame, drawing bounding boxes around each person. This allows the system to track individuals across frames and count them as they cross virtual lines at doorways, distinguishing between entering and exiting movements. Optical Character Recognition (OCR), image classification, and face detection lack the spatial localization and multi-instance tracking required for this specific counting task.

Exam trap

The trap here is that candidates confuse face detection with person detection, assuming that counting people requires detecting faces, but face detection fails when faces are not visible, whereas object detection with the 'person' class works on full bodies regardless of orientation.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not people or objects, and cannot count customers entering or exiting doors. Option B is wrong because image classification assigns a single label to an entire image (e.g., 'crowded store'), but cannot detect multiple individual objects or their positions to perform counting. Option D is wrong because face detection identifies and locates human faces, not full bodies, and would miss customers whose faces are not visible (e.g., from behind or at a distance), making it unreliable for counting all entries and exits.

Practice this question →

177

MCQeasy

What is 'Azure Custom Vision's training iterations' and why would you train multiple iterations?

A.Iterations represent attempts to upload images before one succeeds due to network issues

B.Versioned training runs — each iteration trains on all current tagged images and can be compared and published

C.The number of times the model scans the same image for different object types

D.A pricing unit where each API call consumes one iteration from your monthly quota

AnswerB

Each iteration produces a model version — run more iterations after adding images or labels to improve the model progressively.

Why this answer

In Azure Custom Vision, a training iteration is a versioned model produced by training on the current set of tagged images. Each iteration captures the model's learned patterns at a specific point in time. Training multiple iterations allows you to compare performance across different hyperparameters, data splits, or image sets, then publish the best-performing iteration to a prediction endpoint for production use.

Exam trap

The trap here is confusing 'iteration' with a technical term like 'epoch' or 'inference pass,' when in Custom Vision it specifically means a versioned training run that can be compared and published.

How to eliminate wrong answers

Option A is wrong because iterations are not related to upload retries; image upload failures are handled by Azure Blob Storage retry policies, not by Custom Vision iterations. Option C is wrong because the number of times a model scans an image for object types is determined by the model architecture and inference settings, not by training iterations. Option D is wrong because iterations are not a pricing unit; Azure Custom Vision pricing is based on training hours and prediction transactions, not on a per-iteration quota.

Practice this question →

178

MCQmedium

What is the Azure AI Face service's 'face verification' capability?

A.Confirming that detected faces belong to humans and not artificial representations

B.Comparing two facial images to determine if they belong to the same person

C.Verifying that facial recognition results meet accuracy requirements

D.Confirming the identity of a known person against a database of millions

AnswerB

Face verification (1:1 comparison) returns a confidence score for whether two faces are the same individual — used in identity verification.

Why this answer

Azure AI Face service's 'face verification' capability is designed to compare two facial images and determine if they belong to the same person. It returns a confidence score and a boolean result indicating whether the faces match, based on a user-defined threshold. This is distinct from identification, which matches against a larger database.

Exam trap

The trap here is that candidates confuse 'face verification' (one-to-one matching) with 'face identification' (one-to-many matching), leading them to select option D, which describes identification against a large database.

How to eliminate wrong answers

Option A is wrong because the Face service's liveness detection (not verification) is used to confirm that detected faces belong to humans and not artificial representations like photos or masks. Option C is wrong because verifying accuracy requirements is a quality assurance or validation step, not a specific API capability of the Face service. Option D is wrong because confirming the identity of a known person against a database of millions is the 'face identification' capability, which uses a PersonGroup to find the best match, not the one-to-one comparison of face verification.

Practice this question →

179

MCQhard

A retail store wants to analyze customer behavior in front of a specific product display. They need to determine how long each customer stands in front of the display and whether they pick up an item. Which Azure Computer Vision capability should they use?

A.Image Classification

B.Optical Character Recognition (OCR)

C.Object Detection

D.Spatial Analysis

AnswerD

Spatial Analysis is a computer vision capability specifically designed for analyzing people's presence, movement, and interactions within a physical space. It can measure dwell time and detect actions like a person reaching for an item, making it the correct choice for this scenario.

Why this answer

Spatial Analysis is the correct Azure Computer Vision capability because it is specifically designed to analyze people's movement, presence, and interactions within a physical space using video feeds. It can track how long a customer stands in front of a display (dwell time) and detect actions like picking up an item, by processing bounding boxes and skeleton data from cameras.

Exam trap

The trap here is that candidates confuse Object Detection (which only identifies objects in a static frame) with Spatial Analysis (which tracks movement and actions over time), leading them to pick Option C because they think detecting a person and an item is sufficient, but they miss the temporal and action-based requirements.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to an entire image (e.g., 'product display') but cannot track individual customer duration or detect pick-up actions. Option B is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant to analyzing customer behavior or physical interactions. Option C is wrong because Object Detection identifies and locates objects (e.g., products or people) in an image but does not track temporal behavior like dwell time or detect specific human actions such as picking up an item.

Practice this question →

180

MCQeasy

What is 'celebrity recognition' in Azure AI Vision and what are its responsible AI limitations?

A.Identifying any person by their face in a photograph using a global identity database

B.Recognising well-known public figures in images, with responsible AI access restrictions

C.Automatically tagging images with the names of all people photographed at an event

D.A feature available to all Azure customers for identifying any person in any image

AnswerB

Celebrity recognition identifies public figures but has responsible AI controls — restricted use cases, no surveillance, no private individuals.

Why this answer

Celebrity recognition in Azure AI Vision is a specialized feature that identifies well-known public figures (e.g., actors, politicians, athletes) in images. It is not a general-purpose facial identification service; instead, it relies on a curated dataset of public figures and is subject to responsible AI access restrictions, including limited availability and usage policies to prevent misuse.

Exam trap

The trap here is that candidates confuse celebrity recognition with general facial recognition or identification, assuming it can identify any person in an image, when in fact it is restricted to a curated set of public figures and has responsible AI access controls.

How to eliminate wrong answers

Option A is wrong because celebrity recognition does not use a global identity database to identify any person; it only recognizes a predefined set of public figures, not arbitrary individuals. Option C is wrong because the feature does not automatically tag all people in an image; it only identifies specific celebrities, not every person photographed. Option D is wrong because the feature is not available to all Azure customers without restrictions; it requires special approval and is governed by responsible AI guidelines to limit its use.

Practice this question →

181

MCQeasy

What is 'Azure AI Vision's image moderation' and what content categories does it detect?

A.Moderating the resolution and quality of user-uploaded images for platform standards

B.Detecting sexually explicit (adult) and suggestive (racy) content in images with confidence scores

C.Modifying images to blur or remove inappropriate elements automatically

D.Detecting copyright violations in user-uploaded images by comparing to known copyrighted works

AnswerB

Image moderation returns adult and racy scores — enabling automatic filtering of inappropriate visual content on platforms.

Why this answer

Azure AI Vision's image moderation is specifically designed to detect sexually explicit (adult) and suggestive (racy) content in images, returning confidence scores for each category. This is a core feature of the computer vision service that helps platforms comply with content policies by classifying inappropriate visual content rather than modifying images or checking for copyright violations.

Exam trap

The trap here is that candidates often confuse Azure AI Vision's image moderation with broader content moderation services (like Azure Content Moderator) or assume it performs automatic actions like blurring, when in fact it only returns classification scores for adult and racy content.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision image moderation does not assess image resolution or quality; it focuses on content classification, not technical standards. Option C is wrong because the service only detects and scores content categories; it does not automatically blur or remove elements—that would require a separate processing pipeline. Option D is wrong because copyright detection is not a feature of Azure AI Vision image moderation; it is handled by other services like Azure Content Moderator or third-party tools, and the service does not compare images against a database of copyrighted works.

Practice this question →

182

MCQmedium

A security company wants to monitor a restricted area using camera feeds. The system must detect if a person is present in each video frame and draw a rectangle around each detected person. Which Azure Cognitive Services Computer Vision capability should they use?

A.Image Analysis (Describe image)

B.Object Detection

C.Optical Character Recognition (OCR)

D.Face Detection

AnswerB

Object Detection is designed to locate and classify multiple objects in an image, returning bounding box coordinates for each detected object. It can detect people among other object classes.

Why this answer

Object Detection is the correct capability because it identifies and locates objects (including people) within an image by drawing bounding boxes around each detected instance. This directly matches the requirement to detect persons in video frames and draw rectangles around them, which is a core function of the Object Detection API in Azure Cognitive Services Computer Vision.

Exam trap

The trap here is that candidates confuse Face Detection (which only finds faces) with Object Detection (which finds full persons and other objects), leading them to choose D when the requirement is to detect entire people, not just their faces.

How to eliminate wrong answers

Option A is wrong because Image Analysis (Describe image) generates a human-readable caption or tags describing the scene, but it does not provide bounding box coordinates for individual objects. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, not people or objects. Option D is wrong because Face Detection specifically detects human faces and returns face rectangles, but it does not detect full bodies or persons; it would miss people whose faces are not visible or who are turned away.

Practice this question →

183

MCQmedium

What is 'model confidence score' in Azure Custom Vision predictions?

A.The percentage of training images the model correctly labelled during training

B.A per-prediction certainty measure indicating how sure the model is about a specific classification

C.A rating of the training data quality provided by the annotation team

D.Microsoft's certification level for how well a Custom Vision model meets enterprise standards

AnswerB

Confidence scores let applications set thresholds — accepting high-confidence predictions and routing low-confidence ones for human review.

Why this answer

In Azure Custom Vision, the model confidence score is a per-prediction value (ranging from 0 to 1) that quantifies the model's certainty that a given input image belongs to a specific class. It is computed during inference based on the probability distribution output by the trained classifier, not during training. This score helps users decide whether to accept or reject a prediction based on a custom threshold.

Exam trap

The trap here is that candidates confuse training accuracy (how well the model performed on the training set) with the per-prediction confidence score, leading them to select Option A instead of recognizing that confidence is a real-time inference measure.

How to eliminate wrong answers

Option A is wrong because it describes training accuracy (the percentage of correctly labelled training images), not the per-prediction confidence score returned during inference. Option C is wrong because it confuses annotation quality metrics (e.g., inter-rater agreement) with the model's own probabilistic output; confidence score is independent of annotation team ratings. Option D is wrong because there is no Microsoft certification level for Custom Vision models; confidence score is a technical output, not a compliance or enterprise standard rating.

Practice this question →

184

MCQmedium

What is 'spatial analysis' in Azure AI Vision?

A.Analysing the geographic distribution of Azure data centres globally

B.Analysing video to understand people's movements and interactions within physical spaces

C.Mapping pixels in an image to three-dimensional coordinates

D.Categorising images by their physical dimensions and file size

AnswerB

Spatial analysis uses video AI to count people, detect zone entry, measure density, and analyse movement patterns in real-world environments.

Why this answer

Spatial analysis in Azure AI Vision uses video analytics to detect and track people in a physical space, analyzing their movements, positions, and interactions over time. It leverages computer vision models to understand spatial relationships and patterns, such as how people move through a store or queue at a counter.

Exam trap

The trap here is that candidates confuse 'spatial' with geographic or 3D mapping concepts, when in Azure AI Vision it specifically refers to analyzing people's movements and interactions within a physical space from video feeds.

How to eliminate wrong answers

Option A is wrong because it describes the geographic distribution of Azure data centers, which is a cloud infrastructure concept, not a computer vision feature. Option C is wrong because mapping pixels to 3D coordinates is a 3D reconstruction or depth estimation task, not spatial analysis as defined in Azure AI Vision. Option D is wrong because categorizing images by physical dimensions and file size is a basic file metadata operation, unrelated to analyzing people's movements in video.

Practice this question →

185

MCQmedium

A retail store uses security cameras to analyze customer behavior. They need to detect when a person enters a specific zone (e.g., an aisle) and count how many people are in that zone at any given time. Which Azure Computer Vision capability should they use?

A.Spatial Analysis

B.Object Detection

C.Image Classification

D.Optical Character Recognition (OCR)

AnswerA

Correct. Spatial Analysis provides real-time tracking of people in a zone, including entry detection and head counts.

Why this answer

Spatial Analysis is the correct Azure Computer Vision capability because it is specifically designed to analyze video feeds from cameras to detect people entering predefined zones, track their movement, and count occupancy in real time. This capability uses AI models to understand spatial relationships and events within a video frame, such as a person crossing a line or entering a zone, which directly matches the requirement to detect when a person enters a specific aisle and count how many people are in that zone.

Exam trap

The trap here is that candidates often confuse Object Detection with Spatial Analysis because both can detect people in a frame, but Object Detection lacks the temporal and spatial reasoning (zone/line crossing, tracking, counting) required for this scenario.

How to eliminate wrong answers

Option B (Object Detection) is wrong because it identifies and locates objects (like people) within an image or frame but does not track their movement across zones or count occupancy over time; it provides bounding boxes and labels per frame without spatial event analysis. Option C (Image Classification) is wrong because it assigns a single label to an entire image (e.g., 'aisle with people') and cannot detect multiple individuals, track their entry into a zone, or provide real-time counts. Option D (Optical Character Recognition) is wrong because it extracts text from images and is irrelevant to detecting people, tracking movement, or counting occupancy in a physical space.

Practice this question →

186

MCQhard

A medical research team wants to analyze MRI scans to identify and measure the precise boundaries of tumors. They need to assign each pixel in the image to a class (e.g., tumor, healthy tissue, background). Which Azure Computer Vision capability should they use?

A.Object Detection

B.Image Classification

C.Semantic Segmentation

D.Optical Character Recognition

AnswerC

Semantic segmentation classifies every pixel, producing a detailed map of regions (e.g., tumor boundaries), which is exactly what the research team needs.

Why this answer

Semantic segmentation assigns a class label to every pixel in an image, making it the correct choice for precisely delineating tumor boundaries in MRI scans. Azure Computer Vision's semantic segmentation capability outputs a pixel-level mask, enabling the research team to differentiate tumor, healthy tissue, and background at the finest granularity.

Exam trap

The trap here is that candidates often confuse object detection with segmentation, assuming bounding boxes are sufficient for boundary measurement, but Azure explicitly tests that semantic segmentation provides pixel-level precision required for medical imaging tasks.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects with bounding boxes, not pixel-level boundaries, so it cannot measure precise tumor edges. Option B is wrong because image classification assigns a single label to the entire image, not per-pixel classes, and thus cannot segment different tissue types within the same scan. Option D is wrong because optical character recognition extracts text from images, which is irrelevant to analyzing medical imaging data like MRI scans.

Practice this question →

187

MCQeasy

A retail chain wants to analyze in-store security camera feeds to count the number of customers entering the store each hour. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Optical Character Recognition (OCR)

D.Facial recognition

AnswerB

Object detection locates and identifies multiple objects in an image, such as people. By counting the number of 'person' detections per frame, the system can estimate foot traffic.

Why this answer

Object detection is the correct capability because it can identify and locate multiple instances of 'person' objects within each video frame, then track and count them over time to determine the number of customers entering per hour. Image classification only labels the entire image with a single category, which cannot provide per-object counts or spatial locations needed for accurate customer counting.

Exam trap

The trap here is that candidates confuse object detection with image classification, thinking that classifying an image as 'crowded' or 'empty' is sufficient for counting, when in fact object detection is required to enumerate individual instances.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'store interior'), but cannot detect or count individual objects like people. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, not people or objects, so it is irrelevant for counting customers. Option D is wrong because facial recognition identifies specific individuals by their facial features, which is unnecessary and raises privacy concerns for simple customer counting; object detection with a generic 'person' class suffices.

Practice this question →

188

MCQeasy

What is the purpose of 'image moderation' using Azure AI Content Safety?

A.Adjusting image brightness and contrast for better display quality

B.Detecting and categorizing harmful content in images (sexual, violent, hate) for automatic content filtering

C.Verifying that images meet minimum quality standards for AI training

D.Compressing images to reduce bandwidth during content delivery

AnswerB

Image moderation returns severity scores for harmful content categories, enabling platforms to automatically filter inappropriate images.

Why this answer

Azure AI Content Safety's image moderation is designed to detect and categorize harmful content such as sexual, violent, and hate-related material within images. This enables automatic content filtering to ensure compliance with safety policies, which is a core computer vision workload for content moderation.

Exam trap

The trap here is that candidates confuse general image processing tasks (like brightness adjustment or compression) with the specific purpose of content moderation, which is solely about detecting and categorizing harmful content.

How to eliminate wrong answers

Option A is wrong because adjusting image brightness and contrast is a basic image processing task, not a purpose of content safety moderation. Option C is wrong because verifying minimum quality standards for AI training is unrelated to content safety; Azure AI Content Safety focuses on harmful content detection, not data quality. Option D is wrong because compressing images to reduce bandwidth is a storage or delivery optimization task, not a content moderation feature.

Practice this question →

189

MCQeasy

A digital art library wants to automatically generate a list of relevant keywords (e.g., 'landscape', 'portrait', 'abstract', 'nature') for each image in their collection. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Tagging

C.Image Captioning

D.Object Detection

AnswerB

Image Tagging automatically generates a list of relevant keywords or tags that describe the content of the image.

Why this answer

Image Tagging (B) is the correct capability because it analyzes the content of an image and returns a set of relevant keywords (tags) based on the detected objects, scenes, and concepts. This directly matches the requirement to generate a list of keywords like 'landscape', 'portrait', 'abstract', and 'nature' for each image.

Exam trap

The trap here is that candidates confuse Image Captioning (which produces a single sentence) with Image Tagging (which produces a list of keywords), or they assume Object Detection is needed because it identifies objects, but it does not return a simple keyword list.

How to eliminate wrong answers

Option A (OCR) is wrong because it extracts printed or handwritten text from images, not conceptual keywords about the image's content. Option C (Image Captioning) is wrong because it generates a single human-readable sentence describing the image, not a list of discrete keywords. Option D (Object Detection) is wrong because it identifies and locates specific objects within an image using bounding boxes, but does not produce a flat list of thematic keywords.

Practice this question →

190

MCQmedium

What is the difference between Azure AI Vision and Azure AI Custom Vision?

A.Azure AI Vision is faster; Custom Vision is more accurate

B.Azure AI Vision offers pre-built models; Custom Vision trains custom models on your labeled images

C.Azure AI Vision analyzes only photos; Custom Vision analyzes documents

D.They are different names for the same service

AnswerB

AI Vision = ready-to-use general models; Custom Vision = train your own specialized models with your own labeled data.

Why this answer

Azure AI Vision provides pre-built, ready-to-use models for common computer vision tasks like image analysis, OCR, and facial recognition, requiring no training data. Azure AI Custom Vision, on the other hand, allows you to train custom models using your own labeled images to solve specific classification or object detection problems. This distinction makes option B correct because it accurately captures the core difference between a pre-built service and a customizable training platform.

Exam trap

The trap here is that candidates confuse 'pre-built' with 'faster' or 'more accurate,' or assume both services are interchangeable, when in fact the key differentiator is whether you need to provide your own labeled training data (Custom Vision) or can rely on Microsoft's pre-trained models (Azure AI Vision).

How to eliminate wrong answers

Option A is wrong because it falsely claims a performance trade-off; both services can be optimized for speed or accuracy depending on configuration, and the fundamental difference is not about speed versus accuracy. Option C is wrong because Azure AI Vision can analyze a wide range of visual data including photos, videos, and documents (via OCR), while Custom Vision is not limited to documents and is primarily for custom image classification and object detection. Option D is wrong because they are distinct services with different APIs, capabilities, and use cases; Azure AI Vision is a pre-built service, whereas Custom Vision is a training and deployment platform for custom models.

Practice this question →

191

MCQeasy

What is 'form recognition' in Azure AI Document Intelligence and what types of forms does it support?

A.Recognising when a web form has been submitted by a user in a browser application

B.Extracting key-value pairs and tables from structured form documents using pre-built or custom models

C.Generating HTML forms automatically from a database schema

D.Validating that completed forms meet schema and data type requirements

AnswerB

Form recognition handles tax forms, applications, and custom business forms — returning structured JSON with extracted field values.

Why this answer

Form recognition in Azure AI Document Intelligence (formerly Form Recognizer) is a specialized service that uses optical character recognition (OCR) and machine learning to extract key-value pairs, tables, and text from structured or semi-structured documents. It supports pre-built models for common forms like invoices and receipts, as well as custom models trained on user-provided form samples. Option B correctly describes this extraction capability.

Exam trap

The trap here is that candidates confuse 'form recognition' with general OCR or web form processing, but the exam specifically tests the understanding that it extracts structured data (key-value pairs and tables) from document images or PDFs using pre-built or custom models.

How to eliminate wrong answers

Option A is wrong because it describes a client-side web form submission event, which is unrelated to Azure AI Document Intelligence's document analysis capabilities. Option C is wrong because generating HTML forms from a database schema is a web development task, not a feature of Azure's document intelligence service. Option D is wrong because validating form data against schema and data type requirements is a data validation process, not the extraction of content from scanned or digital form documents.

Practice this question →

192

MCQmedium

What is 'medical imaging AI' and what Azure services support clinical imaging applications?

A.AI systems that replace radiologists by autonomously making all diagnoses from scans

B.AI that analyses radiology and pathology images for clinical decision support — assisted by Azure AI Health Insights

C.Storing and managing medical images in Azure Blob Storage with HIPAA compliance

D.Automatically scheduling patient appointments based on medical image analysis results

AnswerB

Medical imaging AI detects abnormalities and assists clinical workflows — with regulatory compliance requirements.

Why this answer

Medical imaging AI refers to AI systems that analyze radiology and pathology images to assist clinicians with diagnosis, treatment planning, and clinical decision support. Azure AI Health Insights (formerly part of Azure Cognitive Services) provides pre-built models and APIs for medical image analysis, such as detecting abnormalities in X-rays, CT scans, and MRIs, while integrating with clinical workflows.

Exam trap

The trap here is that candidates confuse data storage or workflow automation with AI analysis, or assume AI fully replaces human radiologists, when the exam emphasizes AI as a decision-support tool that assists, not replaces, clinicians.

How to eliminate wrong answers

Option A is wrong because it incorrectly claims AI systems replace radiologists entirely; in reality, medical imaging AI is designed to augment, not replace, human expertise by providing decision support. Option C is wrong because storing images in Azure Blob Storage with HIPAA compliance is a data management task, not an AI capability for analyzing medical images. Option D is wrong because scheduling appointments based on image analysis is a workflow automation feature, not a core function of medical imaging AI, which focuses on image interpretation.

Practice this question →

193

MCQmedium

A retail store wants to analyze customer movement patterns, such as dwell time in front of displays and foot traffic heatmaps, using existing surveillance cameras. Which Azure Computer Vision capability is most suitable?

A.Object detection

B.Optical character recognition (OCR)

C.Spatial analysis

D.Image classification

AnswerC

Spatial analysis is designed for tracking people in video, measuring dwell time, and generating insights like foot traffic patterns and heatmaps, making it ideal for retail analytics.

Why this answer

Spatial analysis is the correct choice because it is specifically designed to analyze people's presence, movement, and interactions within a physical space using video feeds. It can measure dwell time in front of displays and generate foot traffic heatmaps by tracking individuals across camera frames, which directly matches the retail store's requirements.

Exam trap

The trap here is that candidates confuse object detection (which finds objects in a single frame) with spatial analysis (which tracks movement over time across multiple frames), leading them to pick object detection for a scenario that requires temporal and spatial tracking.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects (e.g., products, shelves) within an image but does not track human movement patterns or measure dwell time. Option B is wrong because optical character recognition (OCR) extracts text from images, which is irrelevant to analyzing customer movement or foot traffic. Option D is wrong because image classification assigns a single label to an entire image (e.g., 'store interior') and cannot provide per-person tracking or spatial metrics like heatmaps.

Practice this question →

194

MCQmedium

A manufacturing company uses cameras on an assembly line to inspect products for defects such as scratches, dents, and discoloration. They need to identify the specific type of defect and its location on each product. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Optical character recognition

AnswerB

Object detection identifies and locates multiple objects (defects) in an image by returning bounding boxes and class labels for each.

Why this answer

Object detection is the correct capability because it not only identifies the presence of defects (like scratches, dents, or discoloration) but also localizes each defect by drawing bounding boxes around them. This meets the requirement to both classify the specific defect type and report its location on the product.

Exam trap

The trap here is that candidates confuse object detection with image classification, assuming that identifying the defect type alone is sufficient, but the question explicitly requires both the type and location, which only object detection provides.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image, such as 'defective' or 'non-defective', but cannot identify multiple defect types or their locations. Option C is wrong because semantic segmentation labels every pixel in the image with a class (e.g., 'scratch', 'dent'), which provides pixel-level masks rather than bounding boxes and is overkill for simply locating defects; it also does not distinguish between individual instances of the same defect type. Option D is wrong because optical character recognition is designed to extract text from images, not to detect physical surface defects like scratches or dents.

Practice this question →

195

MCQmedium

What is 'Azure AI Vision's image vectorisation' and how does it enable image search?

A.Converting image files to a vectorised (lossless) format like SVG for web use

B.Converting images to semantic embedding vectors for similarity-based search and retrieval

C.Drawing vector graphics from a description of an image's contents

D.Optimising image file size by converting to the most efficient vector format

AnswerB

Image vectorisation creates ML embeddings — enabling text-to-image search and finding visually similar content via vector comparison.

Why this answer

Azure AI Vision's image vectorisation converts images into semantic embedding vectors—numerical representations that capture the visual content and meaning of an image. These vectors enable similarity-based search by allowing the system to compare the vector of a query image against a database of pre-computed image vectors, returning the most visually or semantically similar results.

Exam trap

The trap here is confusing 'vectorisation' in the context of AI embeddings with the common computing term 'vectorisation' meaning converting raster images to vector graphics (like SVG), leading candidates to pick Option A.

How to eliminate wrong answers

Option A is wrong because it describes converting images to a lossless vector format like SVG, which is a file format for scalable graphics, not a semantic embedding for search. Option C is wrong because it describes generating vector graphics from a text description, which is a generative task (like DALL-E), not the process of creating searchable embeddings from existing images. Option D is wrong because it focuses on file size optimisation by converting to an efficient vector format, which is about compression, not about creating semantic representations for similarity search.

Practice this question →

196

MCQhard

What is 'image generation quality' evaluation — how do you measure if a generated image is good?

A.Only image resolution and file size — higher resolution means better quality

B.Metrics like FID (image distribution similarity) and CLIP score (prompt adherence), plus human evaluation

C.Simply asking the model what score it gives its own output

D.Counting the number of objects correctly included vs. missing from the prompt

AnswerB

FID measures image realism, CLIP score measures prompt alignment — combined with human MOS for full quality assessment.

Why this answer

Option B is correct because image generation quality is evaluated using a combination of automated metrics and human judgment. FID (Fréchet Inception Distance) measures how similar the distribution of generated images is to real images, while CLIP score assesses how well the image aligns with the given text prompt. Human evaluation is also critical to capture perceptual quality that automated metrics may miss, such as aesthetic appeal and contextual coherence.

Exam trap

The trap here is that candidates may assume objective, simple metrics like resolution or object counts are sufficient, but Azure AI-900 expects understanding that quality evaluation requires both automated distribution-based metrics and human judgment.

How to eliminate wrong answers

Option A is wrong because image resolution and file size alone do not determine quality; a high-resolution image can still be blurry, distorted, or fail to match the prompt. Option C is wrong because a model cannot objectively score its own output—it lacks self-awareness and would produce a biased or meaningless score. Option D is wrong because counting objects is a simplistic, rule-based approach that ignores important aspects like image realism, style, and overall composition.

Practice this question →

197

MCQmedium

A real estate agency wants to create a feature on their website that automatically crops uploaded property photos to focus on the house itself, removing excess sky, ground, or other surroundings. Which Azure Computer Vision capability should they use?

A.OCR (Optical Character Recognition)

B.Image captioning

C.Smart cropping

D.Object detection

AnswerC

Smart cropping automatically identifies the most interesting region of an image and crops it to that area, perfect for focusing on the main subject like a house.

Why this answer

Smart cropping is the correct capability because it uses AI to identify the most visually salient region of an image and automatically crops it to focus on the main subject—in this case, the house—while removing irrelevant background like sky or ground. This is distinct from generic cropping as it leverages computer vision to detect the primary object and compositionally frame it.

Exam trap

The trap here is that candidates confuse object detection with smart cropping, assuming that detecting the house with a bounding box is equivalent to cropping, but object detection only provides coordinates and does not automatically perform the intelligent, composition-aware cropping that smart cropping does.

How to eliminate wrong answers

Option A is wrong because OCR (Optical Character Recognition) extracts text from images, not the main subject for cropping. Option B is wrong because image captioning generates a textual description of the image content, not a cropped region. Option D is wrong because object detection identifies and locates objects with bounding boxes but does not automatically produce a cropped image optimized for the main subject; it requires additional logic to perform the crop.

Practice this question →

198

MCQmedium

What is the purpose of Azure AI Vision's 'product recognition' feature?

A.Recognizing counterfeit products in supply chain images

B.Identifying retail products in images to match them to a product catalog without barcodes

C.Recognizing products mentioned in customer text reviews

D.Detecting product defects in manufacturing quality control

AnswerB

Product recognition uses visual AI to identify products from appearance, enabling cashierless checkout and inventory automation.

Why this answer

Azure AI Vision's 'product recognition' feature is designed to identify retail products in images and match them to a product catalog without relying on barcodes. It uses computer vision models trained on product images to detect and recognize items based on visual features like packaging, logos, and shape, enabling inventory management and checkout automation in retail scenarios.

Exam trap

The trap here is that candidates may confuse product recognition with other computer vision tasks like defect detection or counterfeit analysis, but Azure AI Vision's product recognition is specifically for identifying known retail products from images, not for quality control or authentication.

How to eliminate wrong answers

Option A is wrong because product recognition does not detect counterfeit products; that would require specialized anomaly detection or authentication models, not standard product recognition. Option C is wrong because product recognition works on images, not text; analyzing product mentions in text reviews is a natural language processing (NLP) task, not a computer vision feature. Option D is wrong because detecting product defects in manufacturing is a separate computer vision capability (e.g., anomaly detection or quality control), not the product recognition feature which focuses on identifying known catalog items.

Practice this question →

199

MCQmedium

A social media platform wants to automatically review user-uploaded images to flag any that contain explicit or suggestive adult content, as well as violent imagery. Which Azure Computer Vision feature should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis - Tags

C.Image Analysis - Moderate content

D.Face Detection

AnswerC

This feature returns confidence scores for adult, racy, and violent content categories, enabling automatic flagging of inappropriate images.

Why this answer

Option C is correct because the 'Moderate content' feature of Azure Computer Vision is specifically designed to detect adult, suggestive, and violent content in images. It returns a binary flag and confidence scores for categories like adult, racy, and gory, making it the appropriate choice for automatically flagging explicit or violent user-uploaded images.

Exam trap

The trap here is that candidates often confuse 'Image Analysis - Tags' (which describes objects) with content moderation, or assume Face Detection can infer inappropriate content based on facial expressions, but neither performs explicit adult or violence detection.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not content moderation for adult or violent imagery. Option B is wrong because Image Analysis - Tags returns a set of descriptive tags (e.g., 'person', 'tree') based on objects and scenes, but does not evaluate content for adult or violent categories. Option D is wrong because Face Detection identifies human faces and attributes like age or emotion, but does not detect explicit, suggestive, or violent content.

Practice this question →

200

MCQhard

A retail company wants to use security cameras to analyze customer flow. They need to detect when a person enters a specific store zone, count how many people are in that zone at any given time, and track the direction each person moves within the zone. Which Azure Computer Vision capability should they use?

A.Object detection

B.Spatial Analysis

C.Optical Character Recognition (OCR)

D.Semantic segmentation

AnswerB

Spatial Analysis enables real-time analysis of people movement and occupancy in defined zones, making it ideal for this requirement.

Why this answer

Spatial Analysis is the correct Azure Computer Vision capability because it is specifically designed to analyze video feeds from cameras to detect people, count them in defined zones, and track their movement direction. Unlike general object detection, Spatial Analysis provides the specialized functions for zone occupancy and person trajectory tracking required by the retail scenario.

Exam trap

The trap here is that candidates often confuse object detection (which simply finds objects) with Spatial Analysis (which adds zone-aware tracking and counting), leading them to choose the more familiar 'Object detection' option without recognizing the need for directional tracking and zone occupancy.

How to eliminate wrong answers

Option A is wrong because object detection only identifies and locates objects (e.g., people) within an image or video frame, but it does not track movement direction or count people in a specific zone over time. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant to analyzing customer flow or tracking people. Option D is wrong because semantic segmentation classifies every pixel in an image into categories (e.g., floor, wall, person), but it does not provide zone-based counting or directional tracking of individuals.

Practice this question →

201

MCQhard

A travel booking website wants to automatically identify famous landmarks (e.g., Eiffel Tower, Taj Mahal) in photos uploaded by users. They want to use a prebuilt Azure Computer Vision feature without custom training. Which capability should they use?

A.Image classification

B.Optical character recognition (OCR)

C.Object detection

D.Domain-specific models (Landmark detection)

AnswerD

Azure Computer Vision includes prebuilt domain-specific models for landmarks, allowing identification of famous landmarks without custom training.

Why this answer

Option D is correct because Azure Computer Vision includes prebuilt domain-specific models for landmark detection that can identify famous landmarks like the Eiffel Tower or Taj Mahal without any custom training. This capability is specifically designed to recognize well-known structures from user-uploaded photos, making it the ideal choice for the travel booking website's requirement.

Exam trap

The trap here is that candidates often confuse object detection (which locates generic objects) with domain-specific models (which are pre-trained for specialized tasks like landmark recognition), leading them to choose Option C incorrectly.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to an entire image (e.g., 'landscape' or 'building'), but it cannot identify specific landmarks like the Eiffel Tower without custom training. Option B is wrong because optical character recognition (OCR) extracts text from images, such as signs or documents, and has no capability to recognize landmarks. Option C is wrong because object detection identifies and locates generic objects (e.g., 'person', 'car') within an image, but it does not include prebuilt models for recognizing specific landmarks without custom training.

Practice this question →

202

MCQmedium

What is 'Azure AI Vision's colour analysis' and what information does it return?

A.Converting colour images to greyscale for accessibility or artistic purposes

B.Returning dominant colours, accent colour, and B&W detection for image theming and organisation

C.Adjusting image brightness, saturation, and contrast to optimise visual quality

D.Detecting colour-related accessibility issues in user interface designs

AnswerB

Colour analysis extracts palette information — enabling automatic UI theming, image sorting, and colour-based search.

Why this answer

Azure AI Vision's color analysis extracts color information from images to support theming and organization tasks. It returns the dominant foreground and background colors, an accent color (the most vibrant color suitable for UI theming), and a boolean flag indicating whether the image is black-and-white. This is distinct from image editing or accessibility detection.

Exam trap

The trap here is that candidates confuse 'color analysis' (returning metadata about colors) with 'color editing' (modifying image pixels), leading them to pick options that describe image manipulation rather than analysis.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision's color analysis does not convert images to greyscale; that would be a separate image processing operation, not an analysis feature. Option C is wrong because adjusting brightness, saturation, and contrast is an image enhancement or editing task, not part of the color analysis API which only returns metadata about existing colors. Option D is wrong because color-related accessibility detection in UI designs is not a capability of Azure AI Vision's color analysis; the service focuses on analyzing images, not evaluating UI accessibility.

Practice this question →

203

Drag & Dropmedium

Drag and drop the steps to process text with Azure Text Analytics (Language service) into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Using Text Analytics involves setting up a resource, making an API call, and interpreting results.

Practice this question →

204

Multi-Selectmedium

A company needs to extract text from scanned invoices and receipts. Which Azure services are suitable for this task? (Select all that apply.)

Select 2 answers

A.Computer Vision

B.Form Recognizer

C.Text Analytics

D.Custom Vision

AnswersA, B

Computer Vision includes an OCR capability that can detect and extract text from images and documents.

Why this answer

Computer Vision (A) is correct because its OCR (Optical Character Recognition) capability can extract printed and handwritten text from images, including scanned invoices and receipts. Form Recognizer (B) is correct because it is specifically designed to extract text, key-value pairs, and tables from forms and documents like invoices and receipts, using prebuilt models. Both services can handle the task, but Form Recognizer is more specialized for structured document extraction.

Exam trap

The trap here is that candidates often confuse Text Analytics with OCR capabilities, assuming it can process images, when in fact it only works on raw text input.

Practice this question →

205

MCQeasy

A logistics company needs to automatically extract printed and handwritten text from scanned shipping labels. Which Azure Computer Vision capability should they use?

A.Azure Face API

B.Azure Computer Vision Read API

C.Azure Custom Vision

D.Azure Video Indexer

AnswerB

Read API performs OCR to extract printed and handwritten text from images, suitable for shipping labels.

Why this answer

The Azure Computer Vision Read API is specifically designed to extract printed and handwritten text from images and documents, such as scanned shipping labels. It uses optical character recognition (OCR) to process text in various languages and formats, making it the correct choice for this logistics scenario.

Exam trap

The trap here is that candidates often confuse Azure Custom Vision with OCR capabilities, assuming it can be trained for text extraction, but Custom Vision is limited to object detection and classification, not text recognition.

How to eliminate wrong answers

Option A is wrong because Azure Face API is used for detecting, recognizing, and analyzing human faces in images, not for extracting text from documents. Option C is wrong because Azure Custom Vision is a tool for training custom image classification and object detection models, not for OCR or text extraction. Option D is wrong because Azure Video Indexer is designed to extract insights from video content, such as speech transcription and scene detection, not for extracting text from static scanned images.

Practice this question →

206

MCQeasy

What does Azure AI Vision's image tagging feature return?

A.A JSON file with the image's color palette in hex codes

B.A list of descriptive keywords about the image content with confidence scores

C.GPS coordinates of where the photo was taken

D.The camera settings used to capture the image

AnswerB

Image tagging returns keyword tags describing objects, scenes, activities, and colors in the image with confidence scores.

Why this answer

Azure AI Vision's image tagging feature analyzes the content of an image and returns a list of descriptive keywords (tags) along with a confidence score for each tag. This allows applications to automatically identify objects, people, scenes, and actions within the image without requiring manual labeling.

Exam trap

The trap here is that candidates confuse image tagging with other image analysis features like optical character recognition (OCR), face detection, or metadata extraction, leading them to select options that describe unrelated capabilities.

How to eliminate wrong answers

Option A is wrong because image tagging does not return color palette information; that would be a separate feature like analyzing color schemes or dominant colors. Option C is wrong because GPS coordinates are metadata that might be extracted from the image file's EXIF data, but image tagging focuses on visual content, not location data. Option D is wrong because camera settings (e.g., aperture, shutter speed) are also EXIF metadata, not part of the tagging output, which is purely about describing what is visually present in the image.

Practice this question →

207

Matchingmedium

Match each Azure AI service tier to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Limited usage for evaluation

Production usage with pay-as-you-go

Higher throughput than S0

Single key for multiple services

Train custom image classification models

Why these pairings

Tiers define pricing and capacity for Azure AI services.

Practice this question →

208

MCQhard

What is 'pose estimation' in computer vision and what is it used for?

A.Estimating the correct posture for employees based on ergonomics guidelines

B.Detecting body keypoint positions (joints) in images to infer posture and movement

C.Determining the camera angle and position used to capture a photograph

D.Classifying whether a person is sitting or standing in an image

AnswerB

Pose estimation locates skeletal keypoints (joints) to understand body position — enabling fitness tracking, animation, and gesture recognition.

Why this answer

Pose estimation is a computer vision technique that detects and localizes keypoints (joints) on a human body in an image or video. These keypoints, such as shoulders, elbows, wrists, hips, and knees, are used to infer the body's posture, orientation, and movement. Option B correctly describes this process of detecting body keypoint positions to infer posture and movement.

Exam trap

The trap here is confusing human pose estimation (detecting body keypoints) with camera pose estimation (determining camera position) or with simple classification tasks like sitting/standing, leading candidates to pick options C or D.

How to eliminate wrong answers

Option A is wrong because it describes an ergonomic assessment, not a computer vision technique; pose estimation outputs keypoint coordinates, not compliance with ergonomic guidelines. Option C is wrong because determining camera angle and position is a separate task called camera pose estimation or structure from motion, not human pose estimation. Option D is wrong because classifying a person as sitting or standing is a simpler action recognition task that could be derived from pose estimation, but pose estimation itself involves detecting specific joint keypoints, not just outputting a binary state.

Practice this question →