Knowledge + Practice

CCNA Describe Features Of Computer Vision Workloads On Azure Questions

75 of 208 questions · Page 1/3 · Describe Features Of Computer Vision Workloads On Azure topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQeasy

What does the Azure AI Vision 'Image Analysis' capability return when analyzing an image?

A.Only the file size and dimensions of the image

B.Descriptions, objects, tags, and other semantic information about the image content

C.Only a single category label for the entire image

D.A 3D point cloud of the scene

AnswerB

Image Analysis returns natural language descriptions, detected objects, tags, categories, and other semantic information about what's in the image.

Why this answer

Azure AI Vision's Image Analysis capability uses pre-trained deep learning models to extract rich semantic information from images, including human-readable descriptions, a list of detected objects with bounding boxes, and a set of relevant tags. This goes far beyond basic metadata, making option B correct because it accurately captures the breadth of semantic outputs the service provides.

Exam trap

The trap here is that candidates confuse basic image metadata (file size, dimensions) with the semantic analysis outputs of Azure AI Vision, leading them to choose option A, or they assume the service only returns a single label (option C) because they think of simpler classification models rather than the multi-output analysis capability.

How to eliminate wrong answers

Option A is wrong because Image Analysis does not return file size or dimensions; those are basic metadata properties handled by storage services, not the computer vision API. Option C is wrong because the service returns multiple category labels, tags, and descriptions, not just a single category label for the entire image. Option D is wrong because Azure AI Vision does not generate 3D point clouds; that capability is associated with depth-sensing cameras or specialized 3D reconstruction services, not the 2D image analysis API.

Practice this question →

2

MCQmedium

A logistics company receives thousands of handwritten shipping forms daily. They need an automated solution to extract the destination address, sender name, and package weight from these forms. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis

C.Face detection

D.Custom Vision

AnswerA

Correct because OCR is the technology used to extract text from images and documents, including handwritten text. Azure AI Computer Vision includes OCR capabilities.

Why this answer

The correct answer is A, Optical Character Recognition (OCR), because the task requires extracting text (destination address, sender name, package weight) from handwritten shipping forms. Azure's OCR API, part of Computer Vision, is specifically designed to detect and read printed and handwritten text from images, making it the appropriate capability for this document processing scenario.

Exam trap

The trap here is that candidates may confuse Image Analysis (which can detect text in images via the 'tags' or 'description' features) with the dedicated OCR capability, but Image Analysis does not provide the precise text extraction and bounding box coordinates that OCR offers.

How to eliminate wrong answers

Option B (Image Analysis) is wrong because it focuses on describing visual content (objects, scenes, tags) and does not extract text from images. Option C (Face detection) is wrong because it identifies human faces and facial attributes, not textual data from documents. Option D (Custom Vision) is wrong because it is used to train custom image classification or object detection models, not to perform general-purpose text extraction from handwritten forms.

Practice this question →

3

MCQmedium

A home security system uses a camera to detect common household objects such as a person, a pet, a bag, or a package. The system needs to identify the presence and location (bounding box) of these objects in images. The development team wants to use a prebuilt Azure AI service without any custom training. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis – Object Detection

C.Image Analysis – Image Captioning

D.Custom Vision

AnswerB

This prebuilt feature can detect common objects and provide bounding box coordinates without any custom training. It fits the requirement to identify and locate household objects.

Why this answer

Option B (Image Analysis – Object Detection) is correct because the requirement is to identify both the presence and location (bounding box) of common household objects in images using a prebuilt Azure AI service without custom training. Azure Computer Vision's Image Analysis – Object Detection provides pre-trained models that can detect multiple objects, including people, pets, bags, and packages, and return their bounding box coordinates, exactly matching the scenario.

Exam trap

The trap here is that candidates may confuse Image Captioning (which describes the scene) with Object Detection (which provides precise locations), or assume Custom Vision is needed when the prebuilt Object Detection model already covers the required object categories.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) is designed to extract text from images, not to detect or locate physical objects like people or pets. Option C is wrong because Image Captioning generates a human-readable description of an image's content but does not provide bounding box coordinates for individual objects. Option D is wrong because Custom Vision requires custom training with labeled images to detect specific objects, which contradicts the requirement to use a prebuilt service without any custom training.

Practice this question →

4

MCQmedium

What is 'smart cropping' in Azure AI Vision and how is it different from simple cropping?

A.Cropping images faster using GPU-accelerated image processing

B.AI-guided cropping that keeps the most important content in frame regardless of aspect ratio

C.Automatically cropping out people's faces from images for privacy protection

D.Cropping images to remove background noise and irrelevant context

AnswerB

Smart cropping identifies the visually important region — ensuring thumbnails include the subject rather than cutting it off.

Why this answer

Smart cropping in Azure AI Vision uses AI to analyze the image content and intelligently determine the most important region, then crops the image to any specified aspect ratio while keeping that region in frame. This differs from simple cropping, which merely removes pixels from the edges without understanding the image's semantic content. The AI model identifies salient objects, faces, or text to ensure the cropped result remains visually meaningful.

Exam trap

The trap here is that candidates confuse smart cropping with simple performance optimizations or privacy features, rather than recognizing it as an AI-driven content-preserving technique that adapts to any aspect ratio.

How to eliminate wrong answers

Option A is wrong because smart cropping is not about processing speed or GPU acceleration; it is about content-aware cropping guided by AI. Option C is wrong because smart cropping does not automatically remove faces for privacy; that would be a separate feature like face blurring or anonymization. Option D is wrong because smart cropping does not remove background noise or irrelevant context; it preserves the most important content, which may include background elements if they are salient.

Practice this question →

5

MCQmedium

What does it mean to 'export' a model from Azure AI Custom Vision?

A.Sharing the model configuration with other Azure subscriptions

B.Downloading the trained model as a file for offline inference on edge devices

C.Moving the model from Custom Vision to Azure Machine Learning

D.Submitting the model for Microsoft certification review

AnswerB

Exporting Custom Vision models creates downloadable ONNX/TFLite/CoreML files that run locally without cloud API calls.

Why this answer

Exporting a model from Azure AI Custom Vision means downloading the trained model as a file (e.g., TensorFlow, ONNX, CoreML, or Docker container) so it can be run locally on edge devices without requiring an internet connection to the cloud API. This enables offline inference, reduced latency, and data privacy for scenarios like manufacturing or retail.

Exam trap

The trap here is that candidates confuse 'export' with 'sharing' or 'moving' the model to another Azure service, when in fact export specifically means downloading a deployable file for offline/edge use.

How to eliminate wrong answers

Option A is wrong because sharing model configuration with other Azure subscriptions is done via resource sharing or RBAC, not through an export operation; export produces a file, not a subscription transfer. Option C is wrong because moving the model to Azure Machine Learning would involve registering the model in AML, but Custom Vision's export feature is specifically for downloading a file for offline use, not for moving to another Azure service. Option D is wrong because submitting the model for Microsoft certification review is not a feature of Custom Vision; certification is unrelated to the export functionality.

Practice this question →

6

MCQhard

A wildlife research team uses drone imagery to monitor penguin populations in a remote area. The penguins are small, blend into the rocky background, and are often only partially visible. The team has a limited set of 500 labeled drone images showing penguins. They want to build a system that accurately detects and counts penguins. Which approach should they take using Azure AI services?

A.Use the pre-built Computer Vision object detection API directly.

B.Train a Custom Vision object detection model using the labeled images.

C.Use the Computer Vision Image Analysis API with the 'dense captioning' feature.

D.Train a Custom Vision image classification model with the labeled images.

AnswerB

Custom Vision enables training a specialized object detection model with a small set of labeled images. With only one object class ('penguin'), 500 images are more than sufficient to achieve good accuracy for detection and counting.

Why this answer

The pre-built Computer Vision object detection API is optimized for common objects and may not perform well on small, camouflaged penguins in rocky terrain. Custom Vision allows the team to train a dedicated object detection model using their 500 labeled images, enabling the model to learn the specific visual features of penguins in this challenging environment. This approach is ideal for domain-specific detection tasks where off-the-shelf models lack accuracy.

Exam trap

The trap here is that candidates confuse image classification with object detection, assuming a single label per image can solve a counting problem, or overestimate the generic API's ability to handle niche, low-contrast objects without custom training.

How to eliminate wrong answers

Option A is wrong because the pre-built Computer Vision object detection API is trained on generic object classes and cannot reliably detect small, partially visible penguins that blend into the background. Option C is wrong because dense captioning generates descriptive text for image regions, not bounding boxes or counts for specific objects like penguins. Option D is wrong because image classification assigns a single label to the entire image, whereas the team needs to detect and count multiple penguins per image, which requires object detection.

Practice this question →

7

MCQmedium

An e-commerce website wants to automatically remove the background from product photos uploaded by sellers so that items appear on a consistent plain background. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Background Removal

C.Image Captioning

D.Object Detection

AnswerB

Background removal isolates the foreground object from the background, perfect for this use case.

Why this answer

Background Removal is the correct capability because it is specifically designed to isolate the foreground subject from the background in an image, producing a transparent or solid-color background. This directly meets the requirement of automatically removing backgrounds from product photos to create a consistent plain background. Azure's Background Removal API uses deep learning models trained on millions of images to segment the primary object from its surroundings.

Exam trap

The trap here is that candidates often confuse Object Detection (which identifies objects) with Background Removal (which segments the entire foreground), leading them to choose D because they think detecting the product is sufficient to remove the background.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not background removal. Option C is wrong because Image Captioning generates a human-readable description of the image content, not background manipulation. Option D is wrong because Object Detection identifies and locates objects within an image using bounding boxes, but does not remove or alter the background.

Practice this question →

8

MCQeasy

A retail company wants to use Azure Computer Vision to automatically monitor shelf inventory. They need to detect whether items are present on a shelf and count the number of items, without needing to identify the specific product type. Which prebuilt Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image classification

C.Object detection

D.Semantic segmentation

AnswerC

Object detection identifies each object instance, provides bounding boxes, and allows counting of detected objects, even if product types are not distinguished.

Why this answer

Object detection (Option C) is the correct prebuilt Computer Vision capability because it can both locate items within an image using bounding boxes and count them, without requiring identification of the specific product type. This aligns directly with the requirement to detect presence and count items on a shelf, as object detection outputs the coordinates and count of detected objects, not their classification into fine-grained categories.

Exam trap

The trap here is that candidates confuse object detection with image classification, assuming that classifying the shelf as 'stocked' or 'empty' is sufficient, but the question explicitly requires counting individual items, which only object detection can provide.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not physical objects, so it cannot detect or count shelf items. Option B is wrong because image classification assigns a single label to an entire image (e.g., 'shelf with items'), but it does not provide bounding boxes or a count of individual items. Option D is wrong because semantic segmentation classifies every pixel in an image into a category (e.g., 'shelf', 'item'), but it does not distinguish between individual object instances, making it unsuitable for counting discrete items.

Practice this question →

9

MCQeasy

Which Azure AI service can read text from a photo of a street sign taken by a mobile device?

A.Azure AI Speech

B.Azure AI Vision (Read API / OCR)

C.Azure AI Language

D.Azure AI Translator

AnswerB

Azure AI Vision's Read API extracts printed and handwritten text from images, including real-world photos of signs.

Why this answer

Azure AI Vision's Read API (part of the Computer Vision service) is designed to extract printed and handwritten text from images, including photos of street signs. It uses optical character recognition (OCR) to detect and digitize text, making it the correct choice for reading text from a mobile device photo.

Exam trap

The trap here is that candidates may confuse Azure AI Vision's OCR capabilities with Azure AI Language's text analysis features, or mistakenly think Azure AI Speech can process visual text, when in fact only the Read API within Azure AI Vision is designed for extracting text from images.

How to eliminate wrong answers

Option A is wrong because Azure AI Speech focuses on speech-to-text, text-to-speech, and speech translation, not on extracting text from images. Option C is wrong because Azure AI Language provides natural language processing (e.g., sentiment analysis, key phrase extraction) but does not perform OCR or image-based text extraction. Option D is wrong because Azure AI Translator translates text between languages but cannot read or extract text from images.

Practice this question →

10

MCQeasy

A company needs to automatically extract text from scanned invoices that contain both printed text and handwritten notes. Which Azure AI service is specifically designed to handle this type of document?

A.Azure Face API

B.Azure AI Document Intelligence (formerly Form Recognizer)

C.Azure Custom Vision

D.Azure Video Indexer

AnswerB

This service is optimized for extracting text, including handwriting, and structured data from documents like invoices and forms.

Why this answer

Azure AI Document Intelligence (formerly Form Recognizer) is specifically designed to extract text, key-value pairs, and tables from scanned documents, including invoices with both printed text and handwritten notes. It uses optical character recognition (OCR) combined with deep learning models to handle mixed content, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates may confuse Azure AI Document Intelligence with general OCR services like Azure AI Vision's Read API, but Document Intelligence is specifically optimized for structured document extraction with prebuilt models for invoices, receipts, and forms.

How to eliminate wrong answers

Option A is wrong because Azure Face API is designed for facial detection, recognition, and analysis, not for extracting text from documents. Option C is wrong because Azure Custom Vision is used for image classification and object detection, not for text extraction from scanned documents. Option D is wrong because Azure Video Indexer is used to extract insights from video content, such as speech and faces, not from static scanned invoices.

Practice this question →

11

MCQmedium

What is the difference between Azure AI Vision and Azure AI Custom Vision in terms of when to use each?

A.Use Azure AI Vision for large images; use Custom Vision for small images

B.Use Azure AI Vision for general image analysis; use Custom Vision when you need specialized domain-specific recognition

C.Use Azure AI Vision only in production; Custom Vision only in development

D.Use Azure AI Vision for images from cameras; Custom Vision for images from documents

AnswerB

Pre-built Vision for common objects/scenes/OCR; Custom Vision for training models to recognize domain-specific categories.

Why this answer

Azure AI Vision is a pre-trained service for general image analysis tasks like object detection, OCR, and description generation, requiring no custom training. Azure AI Custom Vision allows you to train a model on your own labeled images for specialized, domain-specific recognition tasks, such as identifying unique product defects or rare animal species. Option B correctly captures this distinction: use Azure AI Vision for broad, out-of-the-box capabilities and Custom Vision when you need tailored recognition for your specific use case.

Exam trap

The trap here is that candidates confuse 'general vs. specialized' with superficial attributes like image size or source, leading them to pick options that sound plausible but miss the core functional difference between pre-trained and custom-trained models.

How to eliminate wrong answers

Option A is wrong because the difference is not about image size; both services can handle images of varying sizes, and Azure AI Vision has specific size limits (e.g., 4 MB for analysis) while Custom Vision also has its own constraints. Option C is wrong because both services can be used in production and development; Custom Vision is often used in development to train a model, then deployed to production, and Azure AI Vision is used in both stages for general analysis. Option D is wrong because the distinction is not about the source of images (camera vs. documents); Azure AI Vision can analyze images from cameras or documents (e.g., OCR on scanned documents), and Custom Vision can be trained on any image type, including document images for custom classification.

Practice this question →

12

MCQmedium

What is the primary use case for Azure AI Vision's 'image retrieval' using multimodal embeddings?

A.Storing images in Azure Blob Storage with automatic tagging

B.Enabling natural language image search and finding visually similar images using semantic understanding

C.Automatically resizing images for different screen sizes

D.Detecting copyrighted images in user-uploaded content

AnswerB

Multimodal embeddings let you search image libraries with text queries ('red car on a road') or find images similar to a reference image.

Why this answer

Azure AI Vision's image retrieval using multimodal embeddings is designed to enable natural language image search and find visually similar images by leveraging semantic understanding. It converts both images and text into vector embeddings in a shared semantic space, allowing queries like 'a red car on a beach' to retrieve relevant images without relying on exact keyword matches or pre-defined tags.

Exam trap

The trap here is that candidates confuse 'image retrieval using multimodal embeddings' with simpler image tagging or metadata-based search, overlooking that the core innovation is semantic understanding across modalities rather than keyword or tag matching.

How to eliminate wrong answers

Option A is wrong because storing images in Azure Blob Storage with automatic tagging is a separate capability (e.g., using Azure Computer Vision's image tagging or custom vision), not the primary use case of multimodal embeddings for retrieval. Option C is wrong because automatically resizing images for different screen sizes is a media processing task, often handled by Azure Media Services or Content Delivery Network, not by AI Vision's image retrieval. Option D is wrong because detecting copyrighted images in user-uploaded content is typically done with content moderation or fingerprinting services (e.g., Azure Content Moderator or custom hash-based systems), not by multimodal embeddings which focus on semantic similarity search.

Practice this question →

13

MCQmedium

What is 'video action recognition' in computer vision?

A.Recognising which video format (MP4, MOV) an uploaded file uses

B.Identifying human activities (running, cooking, falling) from temporal patterns across video frames

C.Detecting when inappropriate actions are performed in user-generated video content

D.Tracking when viewers take actions (like, share, comment) in response to a video

AnswerB

Action recognition understands motion sequences — classifying activities from temporal video patterns for sports, safety, and behaviour analysis.

Why this answer

Video action recognition is a computer vision technique that analyzes sequences of video frames to identify and classify human activities based on temporal patterns and motion cues. Option B correctly describes this as identifying activities like running, cooking, or falling from temporal patterns across frames, which is the core definition used in Azure Video Indexer and other AI services.

Exam trap

The trap here is confusing a specific application (like content moderation in Option C) with the general computer vision capability, leading candidates to pick a narrower, use-case-driven answer instead of the broad technical definition.

How to eliminate wrong answers

Option A is wrong because it describes file format detection (e.g., MP4 vs. MOV), which is a trivial metadata check, not a computer vision task involving visual content analysis. Option C is wrong because it describes a specific application (moderation of inappropriate actions), not the general capability of recognizing any predefined action from temporal patterns.

Option D is wrong because it describes user engagement analytics (likes, shares, comments), which is a social media metric, not a computer vision workload analyzing video content.

Practice this question →

14

MCQmedium

A quality inspection system uses cameras to examine metal parts for surface defects. The system must identify the exact location and shape of each scratch, dent, or crack. Which Azure Computer Vision capability is best suited for this?

A.Image Classification

B.Object Detection

C.Semantic Segmentation

D.Dense Captioning

AnswerC

Semantic segmentation labels each pixel, allowing precise identification of defect shapes and boundaries at the pixel level.

Why this answer

Semantic segmentation is the correct choice because it classifies every pixel in an image, allowing the system to precisely delineate the exact location, shape, and boundaries of surface defects like scratches, dents, or cracks on metal parts. This pixel-level granularity is essential for quality inspection where the geometry of each defect must be measured and analyzed.

Exam trap

The trap here is that candidates often confuse Object Detection (bounding boxes) with Semantic Segmentation (pixel-level masks), failing to recognize that only segmentation can capture the exact shape of irregular defects like cracks or dents.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to the entire image (e.g., 'defective' or 'non-defective'), but it cannot identify the location or shape of individual defects. Option B is wrong because Object Detection draws bounding boxes around objects, which is too coarse for irregularly shaped defects like scratches or cracks that require pixel-accurate boundaries. Option D is wrong because Dense Captioning generates descriptive captions for image regions, but it does not produce a pixel-level segmentation map needed to precisely outline defect shapes.

Practice this question →

15

MCQeasy

What does the 'image analysis' API in Azure AI Vision return when given an image?

A.The raw pixel data of the image in a compressed format

B.Rich metadata including captions, detected objects, tags, colour analysis, and content flags

C.A score from 1 to 10 rating the aesthetic quality of the photograph

D.A list of similar images found across the web

AnswerB

Image analysis returns comprehensive semantic metadata about the image content — from captions to objects to content moderation flags.

Why this answer

The Image Analysis API in Azure AI Vision returns rich metadata about the image content, including captions, detected objects, tags, color analysis, and content moderation flags. This is because the API applies pre-trained deep learning models to extract semantic information from the image, not raw pixel data or aesthetic scores.

Exam trap

The trap here is that candidates confuse the Image Analysis API with other Azure services like the Custom Vision API (which requires training) or the Bing Image Search API, leading them to choose options that describe unrelated functionalities.

How to eliminate wrong answers

Option A is wrong because the Image Analysis API does not return raw pixel data; it returns metadata about the image content, and pixel data would be irrelevant for computer vision analysis. Option C is wrong because the API does not provide an aesthetic quality score; it focuses on content recognition and description, not subjective ratings. Option D is wrong because the API does not perform reverse image search or return similar images from the web; that functionality is provided by the Bing Image Search API, not Azure AI Vision.

Practice this question →

16

MCQeasy

What is the purpose of Azure AI Vision's 'thumbnail generation' feature?

A.Reducing file sizes of images for faster web page loading

B.Generating crop-focused preview images that highlight the most important content area

C.Creating thumbnail-sized AI model icons for the Azure portal

D.Generating multiple image variations in different artistic styles

AnswerB

Smart thumbnails use AI to identify key image regions and crop to them intelligently — ensuring thumbnails show the important content.

Why this answer

Azure AI Vision's thumbnail generation feature analyzes the image content to identify the most important region (e.g., a person's face or a prominent object) and then crops the image around that region to produce a focused preview. This is distinct from simple resizing or compression, as it uses AI-based spatial analysis to preserve the key subject while discarding irrelevant background areas.

Exam trap

The trap here is that candidates confuse 'thumbnail generation' with simple image resizing or compression, missing the key differentiator that Azure AI Vision uses AI to intelligently crop around the most important content rather than just scaling down the entire image.

How to eliminate wrong answers

Option A is wrong because thumbnail generation does not primarily reduce file sizes for faster loading; that is the purpose of image compression or resizing services, not the AI-driven cropping feature. Option C is wrong because the feature generates thumbnails of user-uploaded images, not icons for Azure portal UI elements. Option D is wrong because thumbnail generation produces a single cropped version, not multiple variations in different artistic styles (that would be a style transfer or generative AI capability).

Practice this question →

17

MCQmedium

What is 'health and safety monitoring' using computer vision and what scenarios does it address?

A.An employee wellness programme that tracks steps and exercise using wearables

B.Using computer vision to detect PPE compliance, hazards, restricted zone entry, and safety violations

C.AI-powered medical diagnosis from health data captured by wearable sensors

D.Monitoring employee screen time and break patterns for ergonomic health compliance

AnswerB

Safety monitoring AI analyses video for hard hat detection, zone violations, fire detection — reducing workplace accidents.

Why this answer

Health and safety monitoring using computer vision involves analyzing video feeds or images to automatically detect compliance with personal protective equipment (PPE) requirements, identify workplace hazards, monitor restricted zone entries, and flag safety violations. This is a core computer vision workload on Azure, leveraging services like Azure Video Indexer or Custom Vision to process visual data in real time, enabling proactive safety enforcement without human intervention.

Exam trap

The trap here is that candidates confuse general AI health monitoring (like wearables or ergonomic software) with computer-vision-specific safety monitoring, leading them to pick options that involve non-visual sensor data or administrative tracking rather than image/video analysis.

How to eliminate wrong answers

Option A is wrong because it describes an employee wellness program using wearable step trackers, which relies on sensor data and not computer vision analysis of visual inputs. Option C is wrong because it refers to AI-powered medical diagnosis from wearable sensor health data, which is a healthcare AI scenario, not computer vision for physical safety monitoring. Option D is wrong because it involves monitoring screen time and break patterns for ergonomic compliance, which typically uses software logging or activity tracking, not computer vision to detect physical hazards or PPE.

Practice this question →

18

MCQmedium

What is the purpose of training data labeling in computer vision model development?

A.Adding watermarks to images for copyright protection

B.Adding ground-truth annotations to training images so the model learns what to predict

C.Compressing images to reduce storage costs during training

D.Filtering out low-quality or blurry training images

AnswerB

Labeling provides correct answers for each training example — the model learns to predict those labels from the images.

Why this answer

Training data labeling is the process of adding ground-truth annotations (e.g., bounding boxes, segmentation masks, or class labels) to each training image. This supervised learning step provides the model with the correct answer for each example, enabling it to learn the mapping from image features to the desired output during training. Without labeled data, the model cannot be trained to recognize objects, classify scenes, or detect anomalies in computer vision tasks.

Exam trap

The trap here is that candidates confuse data cleaning (filtering bad images) or data preprocessing (compression) with the core supervised learning requirement of providing ground-truth annotations, leading them to select options that describe peripheral data management tasks rather than the essential labeling step.

How to eliminate wrong answers

Option A is wrong because adding watermarks is a post-processing step for copyright protection, not a training data preparation task; it does not provide any supervisory signal for model learning. Option C is wrong because compressing images reduces file size and storage costs but discards pixel detail that the model needs to learn visual patterns; labeling is about annotation, not compression. Option D is wrong because filtering out low-quality images is a data cleaning step that improves dataset quality, but it is not the same as labeling; labeling specifically adds semantic annotations to the images that remain.

Practice this question →

19

MCQeasy

A logistics company receives thousands of handwritten shipping labels each day. They want to use Azure AI to automatically read the handwritten addresses and convert them into digital text. Which Azure Cognitive Services capability should they use?

A.Image classification

B.Optical character recognition (OCR)

C.Object detection

D.Face detection

AnswerB

OCR extracts text from images, including handwritten content, and is ideal for this scenario.

Why this answer

Optical character recognition (OCR) is the correct Azure Cognitive Services capability because it is specifically designed to extract printed or handwritten text from images and convert it into machine-readable digital text. In this scenario, the logistics company needs to read handwritten addresses from shipping labels, which is a classic OCR workload. Azure's Computer Vision OCR API (including the Read API) can handle both printed and handwritten text, making it the ideal choice for this task.

Exam trap

Microsoft often tests the distinction between OCR and image classification, where candidates mistakenly choose image classification because they think 'reading text' is a form of classifying the image content, but OCR is a specialized text extraction service, not a classification task.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label or category to an entire image (e.g., 'shipping label' or 'document'), but it does not extract or read the text content from the image. Option C is wrong because object detection identifies and locates specific objects (e.g., boxes, barcodes) within an image using bounding boxes, but it cannot read or interpret the text written on those objects. Option D is wrong because face detection identifies and locates human faces in an image, which is irrelevant to reading handwritten addresses on shipping labels.

Practice this question →

20

MCQeasy

Which Azure AI service can analyze an image and return a description of its contents in natural language?

A.Azure AI Language

B.Azure AI Vision (Computer Vision)

C.Azure AI Speech

D.Azure Bot Service

AnswerB

Azure AI Vision can analyze images and generate natural language descriptions, identify objects, and extract text from images.

Why this answer

Azure AI Vision (Computer Vision) includes an image analysis API that can generate a human-readable description of an image's contents. This feature uses deep learning models to identify objects, actions, and scenes, then produces a natural language caption describing the image. The correct answer is B because this is the specific service designed for image understanding and description generation.

Exam trap

The trap here is that candidates confuse Azure AI Language (which handles text) with Azure AI Vision, assuming that 'natural language' output implies a language service, when in fact the image-to-text description is a core feature of the Vision service.

How to eliminate wrong answers

Option A is wrong because Azure AI Language is focused on text analytics, sentiment analysis, and language understanding, not image analysis. Option C is wrong because Azure AI Speech handles speech-to-text, text-to-speech, and speech translation, with no capability to analyze images. Option D is wrong because Azure Bot Service is a framework for building conversational AI agents, not for processing or describing visual content.

Practice this question →

21

MCQmedium

What is 'Azure AI Vision's landmark detection' and what does it return?

A.Detecting important milestones in a project timeline using AI

B.Identifying well-known physical landmarks (Eiffel Tower, Big Ben) in photographs with a confidence score

C.Creating highlighted markers on maps showing user-defined points of interest

D.Detecting major architectural features of any building regardless of whether it is famous

AnswerB

Landmark detection names famous locations from photos — enabling automatic location tagging and travel content analysis.

Why this answer

Azure AI Vision's landmark detection is a pre-built computer vision capability that identifies well-known physical landmarks (e.g., Eiffel Tower, Big Ben) in images. It returns the landmark name along with a confidence score indicating the likelihood of the match, enabling applications like automated photo tagging or travel content enrichment.

Exam trap

The trap here is confusing 'landmark detection' with generic object detection or architectural feature recognition, leading candidates to choose Option D, which incorrectly assumes any building can be identified.

How to eliminate wrong answers

Option A is wrong because it describes project management milestones, not physical landmarks; Azure AI Vision operates on visual image data, not abstract timelines. Option C is wrong because it describes user-defined map markers, which is a geospatial feature unrelated to Azure AI Vision's pre-trained landmark detection model. Option D is wrong because landmark detection only recognizes famous, pre-trained landmarks, not arbitrary architectural features of any building; it requires the landmark to be in the service's curated database.

Practice this question →

22

MCQmedium

What is 'image generation' in Azure AI Vision (beyond DALL-E) and what model is used?

A.Creating image files from raw binary data uploaded to Azure Blob Storage

B.Florence-powered vision-language capabilities for dense captioning, grounded detection, and image-text search

C.Generating high-resolution versions of low-resolution input images

D.Automatically generating training image variations through data augmentation

AnswerB

Microsoft's Florence foundation model powers advanced Azure AI Vision features — multi-modal capabilities for image-text understanding and search.

Why this answer

Option B is correct because 'image generation' in Azure AI Vision (beyond DALL-E) refers to the Florence-powered vision-language capabilities that enable tasks like dense captioning, grounded object detection, and image-text search. These models generate textual descriptions or bounding boxes from images, not new pixel-based images, and are distinct from DALL-E's generative image creation.

Exam trap

The trap here is that candidates confuse 'image generation' with creating new images (like DALL-E), but Azure AI Vision's Florence model generates textual outputs (captions, detections) from images, not pixel-based images.

How to eliminate wrong answers

Option A is wrong because creating image files from raw binary data uploaded to Azure Blob Storage is a storage and file conversion operation, not a computer vision AI capability; Azure AI Vision does not generate images from raw bytes. Option C is wrong because generating high-resolution versions of low-resolution input images describes super-resolution, which is a separate image enhancement feature, not the vision-language 'image generation' referred to in the question. Option D is wrong because automatically generating training image variations through data augmentation is a preprocessing technique for model training, not a built-in Azure AI Vision feature for image generation or vision-language tasks.

Practice this question →

23

MCQmedium

What does Azure AI Vision's 'dense captioning' feature do?

A.Creates very long detailed captions for entire images

B.Generates natural language descriptions for multiple regions within a single image

C.Extracts text from dense text-heavy images like documents

D.Analyzes the density of objects in an image for crowd counting

AnswerB

Dense captioning identifies regions of interest in an image and generates a localized caption for each region.

Why this answer

Azure AI Vision's dense captioning feature goes beyond generating a single caption for the entire image. It analyzes the image to identify multiple distinct regions (e.g., a person, a car, a building) and generates a natural language description for each region, along with bounding box coordinates. This is correct because the feature's core purpose is to provide granular, region-level descriptions, not just a single long caption.

Exam trap

The trap here is that candidates confuse 'dense captioning' with generating a single, verbose caption for the whole image (Option A), when in fact it produces multiple, region-specific descriptions.

How to eliminate wrong answers

Option A is wrong because dense captioning does not create 'very long detailed captions' for the entire image; it generates multiple shorter captions for specific regions. Option C is wrong because extracting text from dense text-heavy images is the function of Azure AI Vision's OCR (Optical Character Recognition) feature, not dense captioning. Option D is wrong because analyzing the density of objects for crowd counting is a separate capability often associated with object detection or specialized crowd analysis models, not the dense captioning feature.

Practice this question →

24

MCQmedium

What is Azure AI Content Safety used for in computer vision scenarios?

A.Compressing images to reduce storage costs

B.Detecting harmful or inappropriate content in images for content moderation

C.Enhancing image quality and resolution

D.Converting images to text descriptions for accessibility

AnswerB

Content Safety analyzes images for sexual, violent, and other harmful content categories to support automated content moderation.

Why this answer

Azure AI Content Safety is designed to detect harmful or inappropriate content in images, such as violence, hate speech, self-harm, or sexually explicit material. In computer vision scenarios, it analyzes visual features to classify content into severity levels, enabling automated content moderation. This directly supports safe user-generated content platforms by flagging or blocking prohibited imagery.

Exam trap

The trap here is that candidates confuse Azure AI Content Safety with Azure AI Vision's image analysis features, mistakenly thinking it handles enhancement or description tasks, when in fact it is strictly a content moderation service for detecting harmful material.

How to eliminate wrong answers

Option A is wrong because compressing images to reduce storage costs is handled by Azure Storage features or image optimization services, not by AI Content Safety, which focuses on content analysis rather than file size reduction. Option C is wrong because enhancing image quality and resolution is a task for Azure AI Vision's super-resolution or image enhancement capabilities, not for content safety moderation. Option D is wrong because converting images to text descriptions for accessibility is performed by Azure AI Vision's image captioning or OCR features, not by Content Safety, which does not generate descriptive text.

Practice this question →

25

MCQmedium

A logistics company uses security cameras to monitor boxes on warehouse shelves. They need an AI solution that can count the number of boxes on each shelf and also identify if any box is red (indicating a priority shipment). Which Azure Computer Vision capability should they use?

A.Image Analysis (object detection)

B.Optical Character Recognition (OCR)

C.Face detection

D.Spatial analysis

AnswerA

Object detection can locate multiple instances of objects (e.g., boxes) and provide properties like color, enabling counting and attribute extraction.

Why this answer

Option A is correct because Image Analysis with object detection can identify and localize multiple objects (boxes) within an image, count them, and detect specific attributes like color (red boxes) by analyzing pixel values in the detected bounding boxes. This directly meets the requirement to count boxes and identify priority shipments based on color.

Exam trap

The trap here is that candidates may confuse object detection with OCR or spatial analysis, thinking text extraction or motion tracking could somehow count boxes or detect colors, when in fact object detection is the only option that can both localize objects and support color analysis.

How to eliminate wrong answers

Option B is wrong because Optical Character Recognition (OCR) extracts text from images, not objects or colors; it cannot count boxes or detect red boxes. Option C is wrong because Face detection is specialized for locating human faces, not inanimate objects like boxes, and cannot identify colors or count non-face items. Option D is wrong because Spatial analysis focuses on tracking movement and presence of people or objects in a video feed over time, not static counting or color detection in single images.

Practice this question →

26

MCQmedium

What is 'image captioning' in Azure AI Vision and how is it different from image tagging?

A.Captioning adds user-written descriptions; tagging uses AI to detect objects automatically

B.Captioning generates a natural language sentence describing the scene; tagging returns individual concept keywords

C.Captioning works on video; tagging works only on still images

D.Image tagging is more accurate than captioning because it uses simpler classification

AnswerB

Caption: 'A cat sitting on a sofa.' Tags: ['cat', 'sofa', 'indoor']. Captions provide narrative context; tags enable efficient filtering.

Why this answer

Option B is correct because image captioning in Azure AI Vision uses a deep learning model to analyze the entire scene and generate a coherent, natural language sentence describing the image content, such as 'a group of people playing soccer in a park.' In contrast, image tagging returns a list of individual keywords or concepts (e.g., 'soccer,' 'grass,' 'people') without forming a complete sentence. This distinction is fundamental to understanding the different outputs of these two Azure AI Vision features.

Exam trap

The trap here is that candidates often confuse image captioning with manual annotation or assume tagging is always more accurate, when in fact the key difference is the output format—a full sentence versus a list of keywords—not the method of input or accuracy level.

How to eliminate wrong answers

Option A is wrong because image captioning does not rely on user-written descriptions; it automatically generates captions using AI models, not manual input. Option C is wrong because both image captioning and image tagging work on still images, not video; Azure Video Indexer is used for video analysis. Option D is wrong because accuracy is not inherently higher for tagging; both features use different models and serve different purposes, and captioning can be equally accurate for its task of generating descriptive sentences.

Practice this question →

27

MCQeasy

What is 'ID document recognition' in Azure AI Document Intelligence?

A.Verifying whether a provided ID document is genuine or a counterfeit

B.Extracting structured fields (name, DOB, document number) from passports and identity cards

C.Assigning employee ID numbers to workers in an HR management system

D.Recognising which employees are present using their ID badge photos

AnswerB

ID document recognition extracts identity fields from passports, driving licences, and ID cards — for KYC and verification workflows.

Why this answer

ID document recognition in Azure AI Document Intelligence is a prebuilt model designed to extract structured fields such as name, date of birth, document number, and expiration date from passports, driver licenses, and identity cards. It uses optical character recognition (OCR) and trained machine learning models to parse the document layout and return key-value pairs, not to verify authenticity or perform identity matching.

Exam trap

The trap here is confusing document data extraction with identity verification or facial recognition, leading candidates to select options that imply authentication or person identification rather than structured field extraction.

How to eliminate wrong answers

Option A is wrong because ID document recognition does not perform forgery detection or authenticity verification; it only extracts structured data from the document. Option C is wrong because assigning employee ID numbers is a business process unrelated to document analysis; Azure AI Document Intelligence does not generate or assign identifiers. Option D is wrong because recognizing employees from ID badge photos is a facial recognition or object detection task, not a document analysis capability; ID document recognition processes the text on the document, not the person in the photo.

Practice this question →

28

MCQeasy

A company wants to use Azure Computer Vision to automatically analyze images of handwritten forms and extract the text for data entry. Which prebuilt Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis

C.Face API

D.Object Detection

AnswerA

OCR, specifically the Read API in Azure Computer Vision, is designed to extract text from images, including handwritten text, and converts it into machine-readable text.

Why this answer

Azure Computer Vision's Optical Character Recognition (OCR) capability is specifically designed to extract printed or handwritten text from images, including forms. It uses the Read API, which is optimized for text-heavy documents and supports handwritten text recognition, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates often confuse Image Analysis (which can detect text in images as a general feature) with the dedicated OCR capability, but Image Analysis does not provide the same level of handwritten text extraction accuracy or structured output as the Read API.

How to eliminate wrong answers

Option B (Image Analysis) is wrong because it focuses on describing visual content (e.g., objects, colors, captions) and does not extract text from images. Option C (Face API) is wrong because it is dedicated to detecting and analyzing human faces, not text. Option D (Object Detection) is wrong because it identifies and locates objects within an image, not handwritten text.

Practice this question →

29

MCQmedium

A museum wants to create an application that automatically generates descriptive captions for uploaded photos of artworks. The captions should describe the main subject, scene, and artistic style. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis (with description feature)

C.Face API

D.Custom Vision (object detection)

AnswerB

Image Analysis includes a description feature that generates human-readable captions summarizing the image content, which fits the requirement for artwork captions.

Why this answer

Option B is correct because the Image Analysis capability in Azure Computer Vision includes a 'description' feature that generates human-readable captions summarizing the main subject, scene, and artistic style of an image. This is achieved through pre-trained deep learning models that analyze visual content and produce natural language descriptions, making it ideal for automatically captioning artwork photos.

Exam trap

The trap here is that candidates often confuse OCR (Option A) with image description, assuming text extraction can generate captions, or they mistakenly think Custom Vision (Option D) is required for any custom analysis, when in fact the pre-built Image Analysis description feature handles general scene and style captioning without training.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) is designed to extract printed or handwritten text from images, not to generate descriptive captions about the subject, scene, or style of an artwork. Option C is wrong because Face API specializes in detecting, analyzing, and recognizing human faces, including attributes like age and emotion, but it cannot describe the overall scene or artistic style of an artwork. Option D is wrong because Custom Vision (object detection) requires training a custom model to identify specific objects or regions in images, and it does not provide pre-built natural language caption generation for general scenes or artistic styles.

Practice this question →

30

MCQmedium

A museum wants to automatically generate descriptive tags for its digital art collection. They need to identify objects, themes, and artistic styles in the images without any custom training. Which Azure Computer Vision feature should they use?

A.Azure AI Custom Vision

B.Azure AI Computer Vision Image Analysis

C.Azure AI Face service

D.Azure AI Form Recognizer

AnswerB

The prebuilt Image Analysis service can detect objects, themes, and generate tags and descriptions from images without any custom training.

Why this answer

Azure AI Computer Vision Image Analysis provides pre-built models that can automatically generate descriptive tags for images, identifying objects, themes, and artistic styles without any custom training. This feature uses a set of thousands of recognizable objects, living beings, scenery, and actions, making it ideal for the museum's requirement to tag digital art without custom model development.

Exam trap

The trap here is that candidates may confuse Custom Vision (which requires training) with the pre-built Image Analysis feature, mistakenly thinking custom training is needed for domain-specific tasks like art tagging, when in fact the pre-built model already covers common objects and themes.

How to eliminate wrong answers

Option A is wrong because Azure AI Custom Vision requires users to upload and label their own images to train a custom model, which contradicts the 'without any custom training' requirement. Option C is wrong because Azure AI Face service is specialized for detecting and analyzing human faces (e.g., age, emotion, facial landmarks) and cannot identify general objects, themes, or artistic styles. Option D is wrong because Azure AI Form Recognizer is designed to extract text and structure from documents (e.g., invoices, receipts) and is not intended for image content analysis or tagging.

Practice this question →

31

MCQmedium

What capability does Azure AI Vision's 'celebrity recognition' feature provide?

A.Automatically scheduling meetings with celebrities based on their availability

B.Identifying well-known public figures in images and returning their names with confidence scores

C.Generating fictional celebrity lookalikes for entertainment applications

D.Verifying celebrity identities for event access control

AnswerB

Celebrity recognition uses a specialized domain model to identify famous public figures in images for media and content applications.

Why this answer

Azure AI Vision's celebrity recognition feature is a specialized domain-specific model that identifies well-known public figures (e.g., actors, politicians, athletes) within images. It returns the recognized celebrity's name along with a confidence score, enabling applications like media indexing or social media analysis. This capability is built on top of the general object detection and facial recognition models, but is pre-trained on a curated dataset of celebrity faces.

Exam trap

The trap here is that candidates confuse celebrity recognition (a pre-built, domain-specific model for identifying famous people) with general facial recognition or verification, which are separate capabilities in Azure AI Vision with different use cases and APIs.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision does not have any scheduling or calendar integration capabilities; it is an image analysis service, not a productivity or meeting management tool. Option C is wrong because the feature does not generate or synthesize fictional lookalikes; it only identifies real, known individuals from a pre-defined database. Option D is wrong because celebrity recognition is not designed for identity verification or access control; it lacks the liveness detection and high-accuracy matching required for security scenarios, and Azure Face API (with person groups) would be used for that purpose.

Practice this question →

32

MCQmedium

A logistics company needs to automatically read shipping labels on packages. The labels contain printed text in various fonts and sizes, as well as handwritten addresses. Which Azure Computer Vision capability should they use to extract the text from the labels?

A.Image Analysis

B.Face API

C.Optical Character Recognition (OCR) - Read API

D.Custom Vision

AnswerC

The Read API is purpose-built for extracting printed and handwritten text from images and documents, supporting various fonts and sizes.

Why this answer

The Read API (part of Azure Computer Vision's OCR capabilities) is specifically designed to extract printed and handwritten text from images, handling varied fonts, sizes, and styles. This makes it the correct choice for reading shipping labels that contain both printed text and handwritten addresses.

Exam trap

The trap here is that candidates often confuse Image Analysis (which can detect text in images but not extract it reliably from mixed formats) with the dedicated OCR Read API, or they mistakenly think Custom Vision can be trained for text extraction when it is designed for custom visual patterns.

How to eliminate wrong answers

Option A is wrong because Image Analysis provides general image descriptions, object detection, and tags, but does not include text extraction from mixed printed and handwritten content. Option B is wrong because Face API is dedicated to detecting, recognizing, and analyzing human faces, not text. Option D is wrong because Custom Vision is used to train custom image classification or object detection models, not for out-of-the-box text extraction from labels.

Practice this question →

33

MCQhard

A medical research team needs to analyze CT scans to identify and outline the exact boundaries of lung nodules. Which Azure Computer Vision capability should they use?

A.Image Classification

B.Object Detection

C.Semantic Segmentation

D.Optical Character Recognition (OCR)

AnswerC

Semantic Segmentation classifies every pixel, providing exact boundaries of each object, which is ideal for outlining lung nodules.

Why this answer

Semantic segmentation is the correct capability because it classifies each pixel in an image, enabling precise delineation of object boundaries. For CT scans, this allows the model to outline the exact shape and contour of lung nodules, which is essential for medical analysis. Image classification and object detection only provide labels or bounding boxes, not pixel-level boundaries.

Exam trap

The trap here is that candidates confuse object detection with semantic segmentation, assuming bounding boxes are sufficient for boundary outlining, but the exam tests the distinction between rectangular region identification and pixel-level precision.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image, not identifying or outlining individual objects like nodules. Option B is wrong because object detection provides bounding boxes around objects, which are rectangular and cannot capture the irregular, precise boundaries of lung nodules. Option D is wrong because OCR extracts text from images, which is irrelevant to analyzing CT scans for nodule boundaries.

Practice this question →

34

MCQhard

A manufacturing company wants to use Azure Computer Vision to inspect products on an assembly line for defects. They have a labeled dataset with images of defective and non-defective products. They need to not only classify products as defective or not, but also identify the exact location of the defect (e.g., a crack) in the image. Which Azure Computer Vision capability should they use?

A.Custom Vision object detection

B.Custom Vision image classification

C.Azure Face API

D.Optical Character Recognition (OCR)

AnswerA

Correct. Object detection can be trained to identify and locate defects (e.g., cracks) within an image, providing both classification and location.

Why this answer

Custom Vision object detection is the correct choice because it not only classifies images (defective vs. non-defective) but also localizes defects by drawing bounding boxes around them. The labeled dataset with defect locations directly supports training a model to output both class labels and spatial coordinates, which is exactly what object detection provides.

Exam trap

The trap here is that candidates confuse image classification (which only labels the whole image) with object detection (which provides both classification and localization), leading them to choose Custom Vision image classification despite the explicit need for defect location.

How to eliminate wrong answers

Option B is wrong because Custom Vision image classification only assigns a single label to the entire image (e.g., 'defective' or 'non-defective') and cannot identify the exact location of a defect. Option C is wrong because Azure Face API is specialized for detecting, analyzing, and recognizing human faces, not for industrial defect localization. Option D is wrong because Optical Character Recognition (OCR) extracts text from images and has no capability to detect or localize physical defects like cracks.

Practice this question →

35

MCQeasy

What is 'Azure AI Vision's image analysis v4.0' and what new capability does it add?

A.A version supporting 4K resolution images for the first time

B.Florence-powered advanced capabilities including dense captioning, embeddings, and improved background removal

C.A version requiring 4x more compute than the previous version

D.The fourth iteration of Microsoft's Kinect 3D depth sensor SDK

AnswerB

v4.0 brings Florence's language-vision understanding — enabling dense regional captions, vector embeddings, and richer scene understanding.

Why this answer

Azure AI Vision's image analysis v4.0 is a major update that leverages the Florence foundation model to deliver advanced capabilities such as dense captioning (generating detailed descriptions for multiple regions in an image), image embeddings (vector representations for similarity search), and improved background removal. This version significantly enhances the depth and accuracy of image understanding compared to previous versions.

Exam trap

The trap here is that candidates confuse 'version 4.0' with a simple incremental update (like resolution or performance tweaks) rather than recognizing it as a paradigm shift powered by the Florence foundation model, which is the core new capability tested.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision v4.0 does not specifically introduce 4K resolution support; resolution handling was already available in prior versions, and the key new capability is the Florence-powered AI features, not a resolution threshold. Option C is wrong because the update does not require 4x more compute; the Florence model is optimized for efficiency, and the exam focuses on functional improvements, not resource requirements. Option D is wrong because Azure AI Vision is a cloud-based image analysis service, not related to the Kinect 3D depth sensor SDK, which is a separate hardware product for motion sensing.

Practice this question →

36

Multi-Selectmedium

A manufacturing company wants to use Azure Computer Vision to automatically inspect products on an assembly line for defects. They need to identify and locate specific types of defects (e.g., scratch, dent, crack) in product images. Which Azure Computer Vision capabilities could be used together to achieve this? (Select two options.)

Select 2 answers

A.Object Detection

B.Semantic Segmentation

C.Optical Character Recognition (OCR)

D.Image Classification

AnswersA, B

Object Detection identifies and locates multiple objects (defects) in an image with bounding boxes.

Why this answer

Option A is correct because Object Detection in Azure Computer Vision can identify and locate multiple specific defect types (e.g., scratch, dent, crack) within product images by drawing bounding boxes around each defect. This capability directly meets the requirement to both identify and locate defects on the assembly line.

Exam trap

The trap here is that candidates often confuse Image Classification with Object Detection, not realizing that classification cannot locate multiple defects or distinguish between defect types in a single image.

Practice this question →

37

MCQmedium

A museum wants to create an app that allows visitors to take a photo of a painting and receive information about the artist, year, and style. The app needs to identify the painting from a database of thousands of artworks. Which Azure Computer Vision capability is most suitable?

A.Optical Character Recognition (OCR)

B.Image classification

C.Object detection

D.Face detection

AnswerB

Image classification analyzes the entire image content and returns a label. This can be trained to recognize each specific painting and provide its details.

Why this answer

Image classification is the correct choice because the app needs to assign a single label (the specific painting) to the entire photo. Azure Computer Vision's image classification models are trained to recognize and categorize entire images into predefined classes, which matches the requirement of identifying a painting from a database of thousands of artworks based on the visual content of the photo.

Exam trap

The trap here is that candidates confuse image classification (labeling the whole image) with object detection (locating objects within the image), but the requirement to identify the painting from a photo of the entire artwork makes classification the precise fit.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not visual features of paintings; it would only work if the painting had a visible label or plaque. Option C is wrong because object detection identifies and locates multiple objects within an image (e.g., people, furniture) and returns bounding boxes, but the app needs to classify the entire painting as a single entity, not detect sub-objects. Option D is wrong because face detection specifically identifies human faces in images, which is irrelevant to recognizing a painting's artistic attributes.

Practice this question →

38

MCQmedium

What is the difference between face detection and face identification?

A.Face detection identifies who the person is; face identification counts how many faces are present

B.Face detection finds face locations; face identification determines who the person is from an enrolled database

C.Face detection works on videos; face identification works on static images only

D.They are the same operation with different names

AnswerB

Detection = locating faces in an image; identification = matching detected faces to known individuals in an enrolled group.

Why this answer

Face detection is a computer vision task that locates human faces in an image or video, returning bounding box coordinates. Face identification (or recognition) goes a step further by matching a detected face against a database of enrolled individuals to determine a specific identity. Option B correctly distinguishes these two operations: detection finds where faces are, while identification determines who the person is.

Exam trap

The trap here is confusing the terms 'detection' and 'identification' as interchangeable, when in fact detection is a prerequisite for identification and they serve fundamentally different roles in a computer vision pipeline.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: face detection does not identify who the person is, and face identification does not count faces—that is a separate task called face counting. Option C is wrong because both face detection and face identification can work on both videos and static images; Azure Face API supports both modalities. Option D is wrong because they are distinct operations with different purposes and outputs—detection returns bounding boxes, identification returns identity matches from a person group.

Practice this question →

39

MCQeasy

A museum wants to automatically transcribe handwritten labels on historical artifacts. The handwriting varies in style and may include numbers and special characters. Which Azure Computer Vision capability should they use?

A.Image captioning

B.Optical Character Recognition (OCR)

C.Facial recognition

D.Object detection

AnswerB

OCR extracts text from images, including handwritten text, numbers, and special characters, which matches the requirement.

Why this answer

Optical Character Recognition (OCR) is the correct choice because it is specifically designed to extract printed or handwritten text from images, including numbers and special characters. Azure Computer Vision's OCR API can handle varied handwriting styles and convert them into machine-readable text, making it ideal for transcribing historical artifact labels.

Exam trap

The trap here is that candidates may confuse OCR with image captioning, thinking both can 'read' text, but captioning describes the image contextually rather than extracting exact characters.

How to eliminate wrong answers

Option A is wrong because image captioning generates a natural language description of the overall scene or objects in an image, not the extraction of specific text characters. Option C is wrong because facial recognition identifies or verifies individuals based on facial features, which is unrelated to text transcription. Option D is wrong because object detection identifies and locates objects (e.g., vases, tools) within an image, but it does not read or transcribe any text present on those objects.

Practice this question →

40

MCQeasy

What is the Azure AI Vision Image Analysis 4.0's 'Florence' foundation model capable of?

A.Only detecting faces in images

B.Advanced image understanding including detailed captions, dense captioning, and multimodal embeddings

C.Only processing medical imaging for diagnostic purposes

D.Converting images into 3D models

AnswerB

Florence foundation model enables detailed image captioning, multi-region dense captions, background removal, and vision-language embeddings.

Why this answer

Option B is correct because the Florence foundation model in Azure AI Vision Image Analysis 4.0 is a multimodal model designed for advanced image understanding. It can generate detailed image captions, produce dense captions (describing multiple regions within an image), and create multimodal embeddings that align visual and textual representations for tasks like image search and similarity.

Exam trap

The trap here is that candidates may assume 'foundation model' only applies to language tasks (like GPT) and overlook that Florence is a multimodal vision-language model, leading them to choose a narrow option like face detection or medical imaging.

How to eliminate wrong answers

Option A is wrong because the Florence model goes far beyond face detection; it is a general-purpose vision model capable of scene understanding, object recognition, and captioning, not limited to facial analysis. Option C is wrong because Florence is not specialized for medical imaging; Azure AI Vision offers separate healthcare-specific APIs (e.g., Medical Imaging) for diagnostic purposes, but Florence is a general foundation model. Option D is wrong because Florence does not convert images into 3D models; 3D model generation is not a capability of Image Analysis 4.0, which focuses on 2D image understanding and metadata extraction.

Practice this question →

41

MCQmedium

What is 'Azure AI Custom Vision' and how does it differ from Azure AI Vision?

A.Azure AI Vision is for video; Custom Vision is for still images only

B.Azure AI Vision offers pre-built general models; Custom Vision lets you train models for your specific categories

C.Custom Vision is more expensive because it uses more advanced AI algorithms

D.Azure AI Vision requires GPU compute; Custom Vision runs on CPU only

AnswerB

Custom Vision trains on your labelled images for domain-specific classification or detection — while Azure AI Vision's models are general-purpose.

Why this answer

Azure AI Vision provides pre-trained models for common computer vision tasks like object detection, OCR, and image analysis without requiring custom training data. Azure AI Custom Vision, on the other hand, allows you to upload your own labeled images and train a model to recognize specific categories or objects that are unique to your business scenario. This distinction makes B correct because it highlights the key difference: pre-built general models versus custom-trained models.

Exam trap

The trap here is that candidates often confuse 'Custom Vision' with being a more advanced or expensive version of Azure AI Vision, when in fact the core distinction is about customization versus pre-built functionality, not cost or hardware requirements.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision is not limited to video; it supports both images and video analysis, while Custom Vision also works with still images and can be used for image classification and object detection. Option C is wrong because Custom Vision is not inherently more expensive due to 'more advanced AI algorithms'; pricing is based on compute time, training hours, and prediction API calls, not on algorithm complexity, and both services use similar underlying deep learning techniques. Option D is wrong because neither service strictly requires GPU compute; both can run on CPU-based infrastructure, though GPU acceleration may be used for training in Custom Vision to improve speed, but it is not a mandatory requirement.

Practice this question →

42

MCQmedium

A logistics company needs to automatically read shipping labels on packages, which include text printed in various fonts and sizes, as well as handwritten addresses. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR) via the Read API

B.Dense Captioning

C.Image Analysis - Object Detection

D.Image Analysis - Tagging

AnswerA

The Read API extracts text from images, including handwritten and printed text, making it ideal for reading shipping labels.

Why this answer

The Read API is the correct choice because it is specifically designed for extracting printed and handwritten text from images, handling various fonts, sizes, and styles. This makes it ideal for reading shipping labels that contain both machine-printed text and handwritten addresses.

Exam trap

The trap here is that candidates may confuse general image analysis capabilities (like tagging or object detection) with text extraction, not realizing that OCR via the Read API is the dedicated service for reading text from images.

How to eliminate wrong answers

Option B is wrong because Dense Captioning generates descriptive captions for regions of an image, not text extraction. Option C is wrong because Object Detection identifies and locates objects (e.g., boxes, pallets) but does not read text. Option D is wrong because Image Analysis - Tagging assigns descriptive tags to the entire image (e.g., 'package', 'label') but does not extract the textual content.

Practice this question →

43

MCQmedium

An autonomous driving company is developing a system that needs to understand the road scene at a granular level. For each pixel in a camera image, the system must classify whether it belongs to the road, a pedestrian, a vehicle, a traffic sign, or the sky. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Optical character recognition (OCR)

AnswerC

Correct. Semantic segmentation classifies every pixel, providing a dense understanding of the scene.

Why this answer

Semantic segmentation is the correct choice because it classifies every pixel in an image into a predefined category, such as road, pedestrian, vehicle, traffic sign, or sky. This pixel-level classification is essential for autonomous driving to understand the road scene at a granular level, enabling precise boundary detection and scene understanding.

Exam trap

The trap here is that candidates confuse object detection with pixel-level classification, assuming bounding boxes provide enough detail, but semantic segmentation is required for granular scene understanding where every pixel matters.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image, not individual pixels, so it cannot distinguish between road, pedestrian, and sky in the same scene. Option B is wrong because object detection identifies and locates objects with bounding boxes, but it does not classify every pixel, missing fine-grained boundaries like the edge of a road or the shape of a traffic sign. Option D is wrong because optical character recognition (OCR) extracts text from images, such as reading a speed limit sign, but it does not classify pixels into scene categories like road or sky.

Practice this question →

44

MCQmedium

What is the purpose of the Azure AI Document Intelligence's prebuilt models?

A.Training custom document extraction models for unique business forms

B.Extracting structured data from common document types (invoices, receipts, IDs) without custom training

C.Translating documents from one language to another

D.Converting documents to PDF format for archiving

AnswerB

Prebuilt models are pre-trained for common document types — you just point them at the document and receive structured extracted fields.

Why this answer

Azure AI Document Intelligence's prebuilt models are designed to extract structured data from common document types such as invoices, receipts, and IDs without requiring any custom training. They leverage pre-trained neural networks that recognize fields like invoice totals, receipt line items, and ID numbers, enabling rapid data extraction for standard forms. This aligns with the purpose of reducing manual data entry and accelerating document processing workflows.

Exam trap

The trap here is that candidates often confuse prebuilt models with custom models, assuming that all Document Intelligence models require training, when in fact prebuilt models are ready-to-use for common document types.

How to eliminate wrong answers

Option A is wrong because training custom document extraction models for unique business forms is the purpose of Document Intelligence's custom model feature, not its prebuilt models. Option C is wrong because translating documents between languages is a function of Azure AI Translator, not Document Intelligence. Option D is wrong because converting documents to PDF format for archiving is a general file conversion task, not a capability of Document Intelligence, which focuses on extracting information from documents, not changing their format.

Practice this question →

45

MCQmedium

What is 'document intelligence' (Azure AI Document Intelligence) and what types of documents can it process?

A.A service that creates documents from structured data in a database

B.A service that extracts structured data (fields, tables, key-value pairs) from forms and documents

C.A document management system for storing and organising files in Azure

D.A grammar checking tool that reviews documents for writing quality

AnswerB

Document Intelligence understands document structure to extract fields and tables from invoices, receipts, forms, and IDs.

Why this answer

Azure AI Document Intelligence (formerly Form Recognizer) is a service that uses optical character recognition (OCR) and machine learning to extract structured data—such as fields, tables, and key-value pairs—from scanned forms and documents. This enables automated processing of invoices, receipts, business cards, and other structured documents without manual data entry.

Exam trap

The trap here is that candidates confuse 'document intelligence' with general document management or editing tools, but the exam specifically tests that it is an extraction service for structured data from forms and documents.

How to eliminate wrong answers

Option A is wrong because Document Intelligence does not create documents from structured data; that describes a document generation or templating service, not an extraction service. Option C is wrong because Document Intelligence is not a document management system for storing and organizing files; that describes Azure Blob Storage or SharePoint, not an AI-based extraction service. Option D is wrong because Document Intelligence does not perform grammar checking or writing quality review; that describes a natural language processing tool like Azure AI Language's text analysis, not a document extraction service.

Practice this question →

46

Drag & Dropmedium

Drag and drop the steps to create a bot with Azure Bot Service into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating a bot involves provisioning a resource, developing logic, testing, and connecting channels.

Practice this question →

47

MCQmedium

What does the 'read' operation in Azure AI Vision do?

A.Reads and describes what's happening in a video

B.Extracts printed and handwritten text from images and documents

C.Reads and verifies digital signatures in documents

D.Reads metadata (EXIF data) embedded in image files

AnswerB

The Read API/OCR extracts text from images and PDFs, returning text content with spatial location information.

Why this answer

The 'read' operation in Azure AI Vision is specifically designed to extract printed and handwritten text from images and documents using Optical Character Recognition (OCR) technology. It returns the detected text along with bounding box coordinates and confidence scores, making it suitable for digitizing documents, processing forms, and extracting text from photos.

Exam trap

The trap here is that candidates confuse the 'read' operation with the 'analyze image' operation (which describes images) or assume it handles video, but the 'read' API is strictly for text extraction from static images and documents.

How to eliminate wrong answers

Option A is wrong because the 'read' operation does not analyze video content; video analysis is handled by Azure Video Indexer or the Video Analyzer service, not the 'read' API. Option C is wrong because the 'read' operation does not verify digital signatures; signature verification is a cryptographic function typically performed by Azure Key Vault or custom PKI solutions, not by computer vision OCR. Option D is wrong because the 'read' operation does not read metadata like EXIF data; EXIF data is extracted using image processing libraries or Azure Media Services, while the 'read' API focuses solely on text content within the image.

Practice this question →

48

MCQmedium

What is 'scene understanding' in Azure AI Vision?

A.Classifying images by the type of filming location (indoor, outdoor, urban, rural)

B.Holistic comprehension of an image's full context, relationships, and scene description

C.Breaking an image into individual scenes for video timeline analysis

D.Determining the camera settings (ISO, aperture) used to capture a photograph

AnswerB

Scene understanding produces rich contextual descriptions ('red car parked by glass office building') — beyond mere object lists.

Why this answer

Scene understanding in Azure AI Vision goes beyond simple image classification to provide a holistic comprehension of an image's full context, including objects, their relationships, and a descriptive scene summary. This capability leverages deep learning models to analyze the entire visual content and generate human-readable captions that describe what is happening in the image, such as 'a group of people playing soccer in a park.'

Exam trap

The trap here is that candidates often confuse scene understanding with simpler image classification or metadata extraction, leading them to pick options like A or D, which describe narrower tasks rather than the holistic contextual analysis that defines scene understanding.

How to eliminate wrong answers

Option A is wrong because classifying images by filming location (indoor, outdoor, urban, rural) is a specific type of image classification or domain detection, not the comprehensive scene understanding that includes object relationships and full context. Option C is wrong because breaking an image into individual scenes for video timeline analysis is a video analysis task (e.g., shot detection or keyframe extraction), not a core capability of Azure AI Vision's scene understanding feature, which operates on static images. Option D is wrong because determining camera settings like ISO and aperture is metadata extraction or EXIF analysis, which is unrelated to the semantic understanding of image content provided by scene understanding.

Practice this question →

49

MCQmedium

A retail company uses security cameras to monitor shelves. They want to identify whether a customer is holding a specific product (e.g., a green detergent bottle) and also determine the location of that product within the camera frame. Which Azure Computer Vision capability should they use?

A.Object detection

B.Image classification

C.Optical character recognition (OCR)

D.Semantic segmentation

AnswerA

Object detection finds instances of objects and provides bounding box coordinates, which meets both the identification and localization requirements.

Why this answer

Object detection is the correct capability because it not only identifies the presence of a specific product (like a green detergent bottle) in an image but also returns bounding box coordinates that indicate the product's location within the camera frame. This dual output—classification plus localization—directly matches the requirement to both recognize the object and determine its position.

Exam trap

The trap here is that candidates often confuse object detection with image classification, thinking that identifying the product is sufficient, but they overlook the explicit requirement for location information that only object detection provides.

How to eliminate wrong answers

Option B is wrong because image classification assigns a single label to the entire image (e.g., 'detergent bottle') but does not provide any spatial information about where the object is located. Option C is wrong because OCR is designed to extract text from images, not to identify or locate physical products like a detergent bottle. Option D is wrong because semantic segmentation assigns a class label to every pixel in the image, creating a pixel-level mask, but it does not output bounding boxes or directly indicate the product's location within the frame in a way that is typically used for product detection tasks.

Practice this question →

50

MCQmedium

A retail company uses overhead cameras to monitor shelf inventory in a store. They want to build a system that automatically detects whether a shelf section is empty or stocked, and specifically identify product categories (e.g., 'soft drinks', 'chips', 'canned goods') and count the number of items in each category. The company has a large set of labeled images showing different shelf states. Which Azure Computer Vision service should they use to build this custom detection and counting solution?

A.Computer Vision Image Analysis with dense captioning

B.Custom Vision object detection

C.Optical Character Recognition (OCR)

D.Azure Machine Learning with a pre-trained YOLO model

AnswerB

Custom Vision object detection is specifically designed for training models to detect and locate objects of interest. With labeled images of product categories, you can create a model that outputs bounding boxes around each detected item, enabling counting.

Why this answer

Custom Vision object detection is the correct choice because it allows the company to train a model on their labeled images to detect and localize specific product categories (e.g., 'soft drinks', 'chips') and count items within each category. Unlike pre-built Computer Vision features, Custom Vision enables custom object detection with bounding boxes and classification, which directly supports the requirement for detecting shelf states and counting items per category.

Exam trap

The trap here is that candidates confuse pre-built Computer Vision features (like dense captioning or OCR) with Custom Vision, assuming any Azure Computer Vision service can be customized without training, but only Custom Vision supports custom object detection with bounding boxes and counting.

How to eliminate wrong answers

Option A is wrong because Computer Vision Image Analysis with dense captioning generates descriptive captions for regions of an image but does not provide object detection with bounding boxes or item counting per category, making it unsuitable for precise inventory counting. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant for detecting non-textual objects like product categories on shelves. Option D is wrong because Azure Machine Learning with a pre-trained YOLO model, while technically capable, is not a managed Azure Computer Vision service; the question asks for an Azure Computer Vision service, and Custom Vision is the appropriate PaaS offering for custom object detection without requiring manual ML pipeline setup.

Practice this question →

51

MCQmedium

What does Azure AI Vision's 'people detection' (spatial analysis) feature track?

A.Identifying the names of specific people in video footage

B.Counting, tracking movement, and measuring occupancy of people in defined zones from video

C.Detecting whether people are wearing masks or safety equipment

D.Measuring individual people's heights and body dimensions

AnswerB

Spatial analysis tracks anonymous people to measure occupancy, queue length, zone entry/exit, and dwell time.

Why this answer

Azure AI Vision's spatial analysis (people detection) tracks the movement of people in video feeds, counting individuals and measuring how long they stay in defined zones. It does not identify specific people, detect masks or safety equipment, or measure body dimensions. This feature is designed for occupancy monitoring and flow analysis in physical spaces.

Exam trap

The trap here is that candidates confuse 'people detection' with facial recognition or attribute detection (like masks), but Azure AI Vision's spatial analysis is strictly about anonymous tracking and counting, not identification or detailed attribute analysis.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision's people detection does not perform facial recognition or identify specific individuals; it only detects and tracks people as anonymous objects. Option C is wrong because detecting masks or safety equipment is a separate custom vision capability, not part of the spatial analysis people detection feature. Option D is wrong because the feature does not measure individual heights or body dimensions; it only tracks presence, movement, and occupancy in zones.

Practice this question →

52

MCQmedium

What is Azure AI Document Intelligence's 'custom extraction model' used for?

A.Automatically generating new document templates from existing forms

B.Training on your labeled documents to extract business-specific fields not covered by prebuilt models

C.Translating documents into multiple languages simultaneously

D.Redacting sensitive information from documents automatically

AnswerB

Custom extraction models handle unique business forms — labeled with your specific field names to train field extraction for your document types.

Why this answer

Azure AI Document Intelligence's custom extraction model is correct because it allows you to train a model on your own labeled documents to extract fields that are specific to your business domain and not covered by prebuilt models. This is essential for processing specialized forms like invoices, contracts, or medical records that have unique data fields.

Exam trap

The trap here is that candidates often confuse custom extraction models with template generation or translation, assuming Document Intelligence can create templates or translate text, when in reality it is strictly for extraction and classification of document content.

How to eliminate wrong answers

Option A is wrong because custom extraction models do not generate new document templates; they learn to extract specific fields from existing documents, not create templates. Option C is wrong because document translation is handled by Azure AI Translator, not Document Intelligence, which focuses on extraction and classification. Option D is wrong because redaction of sensitive information is not a built-in feature of custom extraction models; it would require additional processing or integration with other services like Azure Purview or custom logic.

Practice this question →

53

MCQeasy

What is 'Azure AI Vision's Read API' and what makes it superior for OCR?

A.The standard API for reading data from Azure Storage accounts and databases

B.An advanced OCR service handling multi-page PDFs, handwriting, and complex layouts with word-level coordinates

C.An API for reading audio content and converting it to text transcripts

D.A feature for reading the metadata of image files stored in Azure Blob Storage

AnswerB

The Read API goes beyond basic OCR — multi-page, handwriting, complex layouts, and word positions — powered by deep learning.

Why this answer

Azure AI Vision's Read API is an advanced OCR service that extracts text from images and documents, including multi-page PDFs, handwritten text, and complex layouts. It is superior because it returns word-level bounding box coordinates and confidence scores, enabling precise text localization and structured output for downstream processing.

Exam trap

The trap here is that candidates may confuse the Read API with other Azure services like Storage APIs or Speech services, overlooking that it is specifically a computer vision OCR service for text extraction from images and documents.

How to eliminate wrong answers

Option A is wrong because the Read API is not for reading data from Azure Storage accounts or databases; it is an OCR service for extracting text from visual content. Option C is wrong because reading audio content and converting it to text is the function of Azure Speech-to-Text, not the Read API. Option D is wrong because reading metadata of image files is not the purpose of the Read API; it extracts text content from images, not file metadata.

Practice this question →

54

MCQeasy

A hotel booking website wants to automatically analyze guest-submitted photos of hotel rooms to verify if they contain common amenities such as a bed, a desk, and a chair. They want to use a prebuilt Azure AI service without any custom training. Which feature should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis (prebuilt)

C.Object Detection

D.Handwriting OCR

AnswerC

Azure Computer Vision's prebuilt object detection identifies common objects (such as bed, desk, chair) in an image and returns their locations with bounding boxes. This is the correct capability for verifying the presence of specific furniture items.

Why this answer

Object Detection (prebuilt) is the correct choice because it can identify and locate multiple specific objects (bed, desk, chair) within an image by drawing bounding boxes around them. This prebuilt Azure AI Vision feature requires no custom training and directly supports detecting common amenities in hotel room photos.

Exam trap

The trap here is that candidates confuse 'Image Analysis' (which provides descriptive tags but not precise object localization) with 'Object Detection' (which provides bounding boxes for specific objects), leading them to choose Option B incorrectly.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not objects like furniture. Option B is wrong because prebuilt Image Analysis provides general tags and descriptions but cannot precisely locate or verify multiple specific objects (e.g., bed, desk, chair) with bounding boxes. Option D is wrong because Handwriting OCR is a specialized form of OCR for handwritten text, not for detecting physical objects.

Practice this question →

55

MCQmedium

What is 'image segmentation' and how does it differ from object detection?

A.Dividing an image file into smaller files for distributed storage

B.Classifying every pixel in an image to identify precise boundaries — more detailed than bounding-box object detection

C.Removing the background from an image by detecting edges

D.Dividing the training dataset into segments for cross-validation

AnswerB

Segmentation goes beyond bounding boxes to pixel-level classification — enabling precise shape delineation rather than just location.

Why this answer

Image segmentation classifies every pixel in an image into a category, producing pixel-level masks that outline objects with precise boundaries. This differs from object detection, which only draws bounding boxes around objects and does not distinguish object edges or overlapping instances. Option B correctly captures this higher granularity and accuracy.

Exam trap

The trap here is that candidates confuse 'image segmentation' with simple background removal or edge detection, overlooking the requirement for pixel-level classification across all object categories.

How to eliminate wrong answers

Option A is wrong because it describes file splitting for storage, not a computer vision technique; image segmentation operates on pixel data within a single image, not on file distribution. Option C is wrong because it oversimplifies segmentation as mere background removal via edge detection, whereas true segmentation assigns every pixel to a class (e.g., road, car, pedestrian) and handles multiple objects and overlapping regions. Option D is wrong because it confuses dataset partitioning for model validation with the computer vision task of partitioning an image into semantic regions.

Practice this question →

56

MCQeasy

A photo sharing app wants to automatically generate descriptive captions for uploaded photos to improve accessibility for visually impaired users. Which Azure Computer Vision feature should they use?

A.Optical Character Recognition (OCR)

B.Object Detection

C.Image Classification

D.Describe Image (Image Captioning)

AnswerD

This feature generates a descriptive caption for an image, making it suitable for accessibility applications.

Why this answer

Option D is correct because the Describe Image (Image Captioning) feature of Azure Computer Vision generates human-readable captions that describe the content of an image. This directly meets the requirement of automatically generating descriptive captions for uploaded photos to improve accessibility for visually impaired users.

Exam trap

The trap here is that candidates often confuse Object Detection (identifying objects) with Image Captioning (describing the scene), or assume OCR is sufficient for accessibility when it only handles text extraction, not scene understanding.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not descriptive captions about the image content. Option B is wrong because Object Detection identifies and locates specific objects within an image, but does not generate a natural language description of the overall scene. Option C is wrong because Image Classification assigns a single label or category to an image, not a multi-sentence descriptive caption.

Practice this question →

57

MCQmedium

A city traffic department wants to use Azure Computer Vision to automatically analyze live video feeds from traffic cameras. They need to detect and locate common objects such as cars, pedestrians, and bicycles in each frame. The department does not have a labeled dataset for custom training. Which prebuilt Azure Computer Vision capability should they use?

A.Image Analysis (descriptive tags and captions)

B.Optical Character Recognition (OCR) API

C.Object Detection (part of Image Analysis 4.0)

D.Custom Vision object detection

AnswerC

Correct. The Object Detection API in Azure Computer Vision can detect and locate common objects in images without any custom training. It returns bounding boxes for objects like cars, people, and bicycles.

Why this answer

Option C is correct because the Object Detection capability within Image Analysis 4.0 can detect and locate common objects (e.g., cars, pedestrians, bicycles) in images or video frames without requiring any labeled dataset. It provides bounding box coordinates for each detected object, which directly meets the requirement to 'detect and locate' objects in live traffic camera feeds.

Exam trap

The trap here is that candidates may confuse 'descriptive tags' (Option A) with object detection, not realizing that tags only describe the scene without providing spatial location, which is essential for the 'locate' requirement in the question.

How to eliminate wrong answers

Option A is wrong because Image Analysis (descriptive tags and captions) generates labels and natural language descriptions for the entire scene, but it does not provide bounding boxes or precise locations of individual objects. Option B is wrong because Optical Character Recognition (OCR) is designed to extract printed or handwritten text from images, not to detect or locate non-text objects like cars or pedestrians. Option D is wrong because Custom Vision object detection requires a labeled dataset for training a custom model, which the department explicitly does not have.

Practice this question →

58

MCQeasy

What is the Azure AI Vision service's 'Image Analysis 4.0' major new capability compared to previous versions?

A.Support for processing video files, which was not available in version 3.x

B.The Florence foundation model enabling detailed captions, dense captioning, background removal, and multimodal embeddings

C.Support for the first time for color analysis features in images

D.The ability to process images larger than 4MB for the first time

AnswerB

Florence foundation model powers Image Analysis 4.0's advanced capabilities: detailed captions, multi-region descriptions, background removal, and vector embeddings.

Why this answer

Image Analysis 4.0 introduces the Florence foundation model, which significantly enhances image understanding capabilities. This model enables detailed captions, dense captioning (generating captions for multiple regions within an image), background removal, and multimodal embeddings that align images and text in a shared vector space. These features go far beyond the classification, object detection, and OCR capabilities of version 3.x.

Exam trap

The trap here is that candidates may confuse Image Analysis 4.0's new Florence model with general AI improvements, mistakenly thinking video support or larger file sizes are the headline feature, when the core innovation is the foundational model's advanced image understanding.

How to eliminate wrong answers

Option A is wrong because video processing is not a new capability of Image Analysis 4.0; Azure Video Indexer and Azure Media Services handle video, while Image Analysis remains focused on still images. Option C is wrong because color analysis features, such as dominant colors and accent color detection, have been available since earlier versions (e.g., Image Analysis 3.x). Option D is wrong because the 4MB image size limit has not been a hard constraint in previous versions; the service has always accepted images up to 4MB, and version 4.0 does not change this limit.

Practice this question →

59

MCQmedium

A retail chain uses ceiling-mounted cameras to monitor shelf inventory. They need to identify and locate individual products (e.g., a specific brand of cereal) within an image and count how many are present. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Optical character recognition (OCR)

D.Semantic segmentation

AnswerB

Object detection identifies and locates multiple objects of interest within an image, providing bounding boxes and enabling counting of each object type.

Why this answer

Object detection is the correct capability because it not only identifies the presence of a specific product (e.g., a brand of cereal) within an image but also localizes each instance by drawing bounding boxes around them, enabling an accurate count. Image classification would only label the entire image as containing cereal without locating individual boxes, while OCR and semantic segmentation serve different purposes (text extraction and pixel-level labeling, respectively).

Exam trap

The trap here is that candidates confuse object detection with image classification, assuming that labeling the image as 'cereal' is sufficient to count items, when in fact object detection is required for instance-level localization and counting.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'cereal') and cannot distinguish multiple instances or provide their locations, making it impossible to count individual products. Option C is wrong because optical character recognition (OCR) extracts text from images, not objects, so it cannot identify or count non-textual products like cereal boxes. Option D is wrong because semantic segmentation classifies every pixel into categories (e.g., 'cereal box' vs. 'shelf') but does not differentiate between individual instances of the same class, so it cannot count separate boxes of the same brand.

Practice this question →

60

MCQmedium

A warehouse uses AI to monitor inventory. They need to detect the presence and location of specific objects (e.g., forklifts, pallets) in real-time video feeds. Which Azure Computer Vision capability should they use?

A.Image classification

B.OCR (optical character recognition)

C.Object detection

D.Facial recognition

AnswerC

Object detection identifies multiple objects within an image and returns their bounding boxes and class labels, perfect for locating forklifts and pallets in warehouse video.

Why this answer

Object detection is the correct choice because it identifies specific objects (e.g., forklifts, pallets) within an image or video frame and returns bounding box coordinates indicating their location. This capability is designed for real-time spatial awareness, which directly matches the warehouse's need to detect both the presence and position of objects in video feeds.

Exam trap

The trap here is that candidates confuse image classification (which only labels the whole scene) with object detection (which locates individual objects), especially when the question emphasizes 'presence and location' — a classic AI-900 pitfall.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to an entire image (e.g., 'warehouse') but does not locate multiple objects or provide their positions. Option B is wrong because OCR extracts text from images, not physical objects like forklifts or pallets. Option D is wrong because facial recognition identifies or verifies human faces, not inanimate objects such as warehouse inventory.

Practice this question →

61

MCQmedium

Which Azure AI service is used to index and extract insights from large collections of videos at scale?

A.Azure AI Custom Vision

B.Azure AI Video Indexer

C.Azure Blob Storage media services

D.Azure AI Speech transcription only

AnswerB

Video Indexer extracts transcripts, faces, topics, scenes, and more from videos automatically, making video libraries searchable.

Why this answer

Azure AI Video Indexer is the correct service because it is specifically designed to ingest large collections of videos, extract metadata (such as transcripts, faces, emotions, and keyframes), and provide searchable insights at scale. Unlike other Azure AI services, Video Indexer combines multiple AI models (speech, vision, and language) into a single pipeline optimized for video content, making it the appropriate choice for indexing and extracting insights from video libraries.

Exam trap

The trap here is that candidates confuse Azure AI Video Indexer with Azure AI Speech transcription only, assuming that extracting insights from video is solely about transcribing audio, when in fact Video Indexer combines speech, vision, and language AI to provide comprehensive video insights.

How to eliminate wrong answers

Option A is wrong because Azure AI Custom Vision is a service for training custom image classification and object detection models on still images, not for indexing or extracting insights from video collections. Option C is wrong because Azure Blob Storage is a scalable object storage service for unstructured data (including video files), but it does not perform AI-based indexing or insight extraction; it only stores the media. Option D is wrong because Azure AI Speech transcription only handles audio-to-text conversion (speech recognition) and does not provide video-specific insights such as scene detection, facial recognition, or keyframe extraction.

Practice this question →

62

MCQeasy

A security company wants to use Azure Computer Vision to monitor a restricted area. They need to count the number of people present in each camera frame and draw bounding boxes around each person. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Image Analysis (object detection)

C.Face detection

D.Image classification

AnswerB

Object detection identifies objects within an image and returns their bounding boxes, making it suitable for counting and locating people.

Why this answer

Option B (Image Analysis with object detection) is correct because Azure Computer Vision's object detection capability can identify and locate multiple instances of a specific object class—in this case, people—within an image. It returns bounding box coordinates for each detected person, enabling the security company to count individuals and draw boxes around them in each camera frame.

Exam trap

The trap here is confusing face detection (which only finds faces) with object detection (which finds full people), leading candidates to choose Face detection when the requirement is to count people regardless of face visibility.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not people or objects, so it cannot count people or draw bounding boxes around them. Option C is wrong because Face detection specifically identifies and locates human faces, not full bodies; it would miss people whose faces are not visible (e.g., turned away or partially occluded) and does not count people as whole objects. Option D is wrong because Image classification assigns a single label to the entire image (e.g., 'restricted area') and does not provide bounding boxes or count multiple instances of an object within the image.

Practice this question →

63

MCQmedium

What does Azure AI Vision return when it detects that an image may contain adult content?

A.The image is immediately deleted from Azure Storage

B.Boolean flags and confidence scores for adult, racy, and gory content categories

C.A list of specific body parts detected in the image

D.An age verification requirement for the requesting user

AnswerB

Azure Vision returns isAdultContent, isRacyContent, and isGoryContent flags with confidence scores for content moderation decisions.

Why this answer

Azure AI Vision's content moderation feature analyzes images for adult, racy, and gory content. It returns Boolean flags (indicating whether content is detected) and confidence scores (ranging from 0 to 1) for each category, allowing applications to make policy-based decisions without deleting or altering the original image.

Exam trap

The trap here is that candidates assume Azure AI Vision automatically deletes or blocks content (Option A), when in fact it only returns classification metadata, leaving action decisions to the calling application.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision does not automatically delete images from Azure Storage; it only returns classification metadata, and deletion would require explicit application logic. Option C is wrong because Azure AI Vision does not return lists of specific body parts; that would require a different service like Azure AI Video Indexer or custom object detection models. Option D is wrong because Azure AI Vision does not enforce age verification on the requesting user; it simply analyzes the image content and returns scores, leaving access control to the application.

Practice this question →

64

MCQmedium

What is the Azure AI Custom Vision portal used for?

A.Managing Azure subscription billing for AI services

B.Training and evaluating custom image classification and object detection models without code

C.Building chatbots using natural language understanding

D.Monitoring the health of deployed AI services

AnswerB

Custom Vision portal provides a no-code UI for labeling images, training models, evaluating results, and deploying prediction endpoints.

Why this answer

The Azure AI Custom Vision portal is a no-code web interface that allows users to upload images, label them, and train custom image classification or object detection models. It abstracts away the underlying machine learning code, making it accessible for non-developers to build and evaluate computer vision models tailored to their specific use cases.

Exam trap

The trap here is that candidates confuse the Custom Vision portal with other Azure AI services like Computer Vision or LUIS, assuming it handles general image analysis or NLP tasks, when it is specifically for training custom models with user-provided labeled data.

How to eliminate wrong answers

Option A is wrong because managing Azure subscription billing for AI services is handled through the Azure Cost Management + Billing portal, not the Custom Vision portal. Option C is wrong because building chatbots using natural language understanding is the purpose of Azure AI Language (formerly LUIS) or Azure Bot Service, not Custom Vision. Option D is wrong because monitoring the health of deployed AI services is done via Azure Monitor or Application Insights, not the Custom Vision portal.

Practice this question →

65

MCQmedium

A logistics company uses overhead cameras at a shipping dock to read labels on packages. The labels contain text in various fonts, sizes, and orientations, and sometimes the text is partially obscured. Which Azure Computer Vision capability should they use to extract the text from these labels?

A.Object detection

B.Optical Character Recognition (OCR)

C.Image classification

D.Semantic segmentation

AnswerB

OCR extracts text from images and is ideal for reading labels with varying fonts, sizes, and orientations.

Why this answer

Optical Character Recognition (OCR) is the correct choice because it is specifically designed to extract printed or handwritten text from images, handling variations in fonts, sizes, orientations, and partial occlusion. Azure Computer Vision's OCR API (Read API) uses deep-learning models to detect and digitize text from natural scenes, making it ideal for reading labels on packages in a logistics environment.

Exam trap

The trap here is that candidates may confuse object detection (which finds objects) with OCR (which reads text), or assume image classification can handle text extraction, when in fact OCR is the only Azure Computer Vision capability purpose-built for digitizing text from images.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects (e.g., packages, people) within an image, but it does not extract text content from labels. Option C is wrong because image classification assigns a single label or category to an entire image (e.g., 'shipping dock'), but it cannot read or digitize the text on labels. Option D is wrong because semantic segmentation partitions an image into pixel-level regions belonging to different classes (e.g., package vs. floor), but it does not perform text extraction.

Practice this question →

66

MCQeasy

A construction safety team wants to automatically detect whether workers on a job site are wearing hard hats by analyzing images from surveillance cameras. They have a large set of labeled images containing workers wearing hard hats and workers without hard hats. The team needs to train a model that can identify the location of each hard hat in an image. Which Azure Computer Vision service should they use?

A.Custom Vision – Object Detection

B.Computer Vision – Optical Character Recognition (OCR)

C.Face API

D.Custom Vision – Image Classification

AnswerA

Custom Vision object detection can be trained with labeled images that contain bounding boxes around objects of interest, such as hard hats, and then outputs predictions with bounding boxes for new images.

Why this answer

Option A is correct because Custom Vision – Object Detection is specifically designed to identify and locate multiple objects within an image by drawing bounding boxes around them. The construction safety team needs to detect the location of each hard hat, which requires object detection, not just classification. Custom Vision allows training a model with labeled images that include bounding box annotations for objects like hard hats.

Exam trap

The trap here is that candidates often confuse Image Classification with Object Detection, thinking that classifying an image as containing a hard hat is sufficient, but the question explicitly requires identifying the location of each hard hat, which only Object Detection can provide.

How to eliminate wrong answers

Option B is wrong because Computer Vision – Optical Character Recognition (OCR) is used to extract text from images, not to detect objects like hard hats. Option C is wrong because Face API is designed for detecting and analyzing human faces, not for detecting objects such as hard hats. Option D is wrong because Custom Vision – Image Classification assigns a single label to an entire image (e.g., 'hard hat present' or 'no hard hat'), but it does not provide the location or bounding boxes of objects, which is required for identifying where each hard hat is in the image.

Practice this question →

67

MCQmedium

What is the purpose of image 'ground truth' in training computer vision models?

A.The physical location where training images were captured

B.The verified, accurate labels or annotations for training images that the model learns to predict

C.The minimum image resolution required for accurate model training

D.The baseline accuracy of a computer vision model before fine-tuning

AnswerB

Ground truth provides correct answers for training examples — the model's goal is to produce predictions matching the ground truth.

Why this answer

In computer vision, 'ground truth' refers to the verified, accurate labels or annotations for training images. The model uses these correct labels during supervised learning to learn the mapping from image features to outputs, enabling it to make accurate predictions on new, unseen data.

Exam trap

The trap here is confusing 'ground truth' with a physical or performance-related concept, when it strictly refers to the authoritative labels used to supervise model training.

How to eliminate wrong answers

Option A is wrong because 'ground truth' is a data quality concept, not a physical location; the physical capture location is irrelevant metadata. Option C is wrong because 'ground truth' has nothing to do with image resolution; resolution is a preprocessing concern, not a labeling concept. Option D is wrong because 'ground truth' is the correct label set, not a baseline accuracy metric; baseline accuracy is a performance measure, not a data attribute.

Practice this question →

68

MCQeasy

What is the purpose of Azure AI Vision's 'color analysis' feature?

A.Detecting color defects in manufactured products

B.Identifying dominant colors, accent colors, and whether images are black and white

C.Converting images to grayscale for accessibility

D.Measuring the color accuracy of display screens

AnswerB

Color analysis returns the dominant foreground/background colors, accent color, and black-and-white status of images.

Why this answer

Azure AI Vision's color analysis feature is designed to extract color information from images, including the dominant foreground and background colors, accent colors, and whether the image is black-and-white. This helps in understanding the visual composition and mood of an image, which is useful for applications like branding, content moderation, and image categorization.

Exam trap

The trap here is that candidates confuse the descriptive 'color analysis' feature with corrective or diagnostic tasks (like defect detection or display calibration), when in fact it only extracts and reports existing color properties from the image.

How to eliminate wrong answers

Option A is wrong because color analysis in Azure AI Vision does not perform defect detection in manufactured products; that would require a custom computer vision model trained on specific defect patterns, not the general-purpose color analysis API. Option C is wrong because converting images to grayscale is a simple image processing operation, not a feature of Azure AI Vision's color analysis, which instead identifies if an image is already black-and-white. Option D is wrong because measuring color accuracy of display screens is a hardware calibration task, unrelated to Azure AI Vision's cloud-based image analysis capabilities.

Practice this question →

69

MCQeasy

A security system uses cameras to detect whether a person is present at a restricted door. Which Azure Computer Vision capability should they use to detect the presence of human faces in the camera images?

A.Optical Character Recognition (OCR)

B.Face Detection

C.Object Detection

D.Image Classification

AnswerB

Face Detection is designed to locate human faces in an image, making it the appropriate choice for detecting presence of a person via facial features.

Why this answer

Face Detection is the correct choice because it is specifically designed to locate and identify human faces in images, returning bounding box coordinates for each detected face. This capability directly addresses the requirement to detect whether a person is present at a restricted door by identifying faces in camera images, without needing to recognize who the person is.

Exam trap

The trap here is that candidates often confuse Face Detection with Object Detection, thinking that any object detection model can handle faces equally well, but Azure's Face Detection is a specialized, pre-trained service optimized solely for human faces with additional attributes like face landmarks and attributes not available in generic Object Detection.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not human faces, so it cannot detect the presence of a person. Option C is wrong because Object Detection identifies and locates a wide range of objects (e.g., cars, animals) but is not specialized for human faces; while it could be trained to detect people, the question specifically asks for detecting human faces, which is the precise domain of Face Detection. Option D is wrong because Image Classification assigns a single label to an entire image (e.g., 'person present' or 'no person'), but it does not provide the location or bounding box of faces, which is required for detecting presence at a specific door.

Practice this question →

70

MCQmedium

What is 'object tracking' in computer vision and how does it differ from object detection?

A.Detecting the same object across multiple images in a photo album

B.Maintaining the identity of detected objects across consecutive video frames with persistent IDs

C.Monitoring GPS location of physical objects using IoT sensors

D.Detecting when a tracked object leaves the camera's field of view

AnswerB

Tracking gives each object a consistent ID across frames — enabling trajectory analysis and unique person counting.

Why this answer

Object tracking maintains the identity of detected objects across consecutive video frames by assigning persistent IDs, enabling the system to follow the same object over time. This differs from object detection, which identifies and locates objects in a single frame without preserving identity across frames. In Azure Video Indexer or Custom Vision, tracking is essential for scenarios like counting unique people or vehicles in a video stream.

Exam trap

The trap here is that candidates confuse object detection (locating objects in a single frame) with object tracking (maintaining identity across frames), often selecting Option A because they think 'same object across images' implies tracking, but without temporal video context it is just detection or matching.

How to eliminate wrong answers

Option A is wrong because detecting the same object across multiple images in a photo album is a form of image matching or content-based image retrieval, not object tracking, which requires temporal continuity across video frames. Option C is wrong because monitoring GPS location using IoT sensors is a geolocation or telemetry task, not a computer vision workload, and does not involve analyzing visual data. Option D is wrong because detecting when a tracked object leaves the camera's field of view is a specific event detection that relies on tracking, but it is not the definition of object tracking itself; tracking is the continuous assignment of IDs across frames, not just the detection of exit events.

Practice this question →

71

MCQmedium

What is the Azure AI Vision background removal feature used for?

A.Blurring the background to create depth of field effects

B.Automatically separating foreground subjects from the background in images

C.Identifying what type of background (indoor/outdoor) is in an image

D.Replacing backgrounds in video calls

AnswerB

Background removal isolates the main subject from its background, useful for product photography and visual content creation.

Why this answer

Azure AI Vision background removal is designed to automatically separate foreground subjects from the background in images, producing a mask or a cut-out of the primary object. This feature uses deep learning models to identify and isolate the main subject, enabling further processing like compositing or analysis without the background.

Exam trap

The trap here is that candidates confuse background removal (subject isolation) with background replacement or blurring, which are downstream applications of the mask, not the feature itself.

How to eliminate wrong answers

Option A is wrong because blurring the background to create depth of field effects is not a function of Azure AI Vision background removal; that would be a post-processing effect applied after segmentation, not the core separation task. Option C is wrong because classifying the background type (indoor/outdoor) is a scene classification task, not background removal, which focuses on isolating the foreground subject regardless of background category. Option D is wrong because replacing backgrounds in video calls is a real-time video processing feature typically handled by services like Azure Video Indexer or custom solutions, not the static image background removal API of Azure AI Vision.

Practice this question →

72

MCQmedium

A retail company uses ceiling-mounted cameras to monitor shelf stock. They want an automated system that analyzes each camera image to detect if any product is missing from its expected location on the shelf (a product gap). Which Azure Computer Vision capability should they use?

A.Image classification

B.Optical Character Recognition (OCR)

C.Object detection

D.Face detection

AnswerC

Object detection finds and locates objects within an image. By detecting the expected products, the system can determine if any are missing, indicating a gap.

Why this answer

Object detection is the correct choice because it can identify and locate multiple objects (e.g., product boxes) within an image and determine if expected items are missing from their designated positions on the shelf. Unlike image classification, which assigns a single label to the entire image, object detection provides bounding boxes and class labels for each detected object, enabling precise gap analysis.

Exam trap

The trap here is that candidates confuse image classification (which labels the whole scene) with object detection (which locates individual objects), leading them to choose option A when the task requires spatial awareness of multiple items.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'shelf with products') and cannot identify individual product locations or detect missing items. Option B is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant for detecting physical product gaps on shelves. Option D is wrong because face detection is specialized for locating human faces and has no application in monitoring shelf stock or product gaps.

Practice this question →

73

MCQeasy

What is the primary challenge of deploying computer vision AI in real-world environments?

A.Computer vision models are too large to fit in cloud storage

B.Handling real-world variability in lighting, occlusion, image quality, and domain differences

C.The difficulty of displaying results in different languages

D.Obtaining legal permission to use cameras

AnswerB

Real-world deployment faces lighting variation, partial occlusion, quality differences, and training/production data distribution mismatches.

Why this answer

Option B is correct because real-world computer vision systems must cope with significant environmental variability—such as changing lighting conditions, partial occlusions, varying image resolutions, and domain shifts (e.g., training on studio photos but deploying on security camera feeds). These factors directly degrade model accuracy and require robust data augmentation, domain adaptation, or retraining strategies. Azure's Computer Vision service addresses this through pre-built models trained on diverse datasets and the ability to fine-tune with Custom Vision, but the fundamental challenge remains handling this variability at scale.

Exam trap

The trap here is that candidates confuse operational or compliance hurdles (like camera permissions or language display) with the core technical challenge of model robustness in uncontrolled environments, leading them to pick a superficially plausible but incorrect option.

How to eliminate wrong answers

Option A is wrong because computer vision models are not inherently too large for cloud storage; Azure Blob Storage can easily accommodate models of any size, and the real constraint is inference latency and compute cost, not storage capacity. Option C is wrong because displaying results in different languages is a localization concern handled by Azure Translator or UI frameworks, not a primary challenge of computer vision deployment. Option D is wrong because obtaining legal permission to use cameras is a compliance or policy issue, not a technical challenge of deploying computer vision AI; the core difficulty lies in algorithmic robustness, not legal permissions.

Practice this question →

74

MCQmedium

A warehouse uses ceiling-mounted cameras to monitor inventory shelves. The system needs to determine whether each shelf is 'full', 'half full', or 'empty' based on the entire image of the shelf. Which Azure Computer Vision capability should they use?

A.A) Optical Character Recognition (OCR)

B.B) Object detection

C.C) Image classification

D.D) Semantic segmentation

AnswerC

Image classification assigns a category to the entire image. It is ideal for determining whether a shelf is full, half full, or empty based on the overall visual content.

Why this answer

Image classification (C) is the correct choice because the system needs to assign a single label (full, half full, or empty) to the entire image of a shelf. Azure Computer Vision's image classification analyzes the whole image and outputs a single category or tag, which directly matches the requirement of determining the overall state of the shelf. Object detection would identify and locate multiple objects within the image, not classify the entire scene, and semantic segmentation would assign a label to every pixel, which is overkill for this task.

Exam trap

The trap here is that candidates confuse 'object detection' (which finds and locates objects) with 'image classification' (which labels the entire image), leading them to choose object detection when the task is to assign a single category to the whole scene.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not visual content like shelf fullness, and is irrelevant to classifying inventory levels. Option B is wrong because object detection identifies and locates individual objects (e.g., boxes) within an image, but the requirement is to classify the entire shelf image into one of three categories, not to detect multiple items. Option D is wrong because semantic segmentation assigns a class label to every pixel in the image, which provides detailed pixel-level masks rather than a single overall classification for the shelf.

Practice this question →

75

MCQeasy

What is 'liveness detection' in Azure AI Face service?

A.Detecting whether a celebrity face in a photograph is still alive or deceased

B.Verifying that a face presented to a camera is a real live person, not a photo or video replay

C.Detecting human faces in real-time video streaming from security cameras

D.Monitoring whether a face recognition model remains accurate after deployment

AnswerB

Liveness detection prevents face spoofing attacks — distinguishing a live face from a photograph or video used for fraudulent authentication.

Why this answer

Liveness detection in Azure AI Face service is a security feature that distinguishes between a real, live person and a spoofing attempt such as a printed photo, video replay, or a 3D mask. It analyzes subtle cues like eye blinking, skin texture, and depth to ensure the face presented to the camera is physically present and alive. This prevents unauthorized access in identity verification scenarios.

Exam trap

The trap here is that candidates confuse liveness detection with general face detection or recognition, assuming any real-time face processing qualifies, when in fact liveness detection specifically addresses anti-spoofing and presentation attack detection.

How to eliminate wrong answers

Option A is wrong because liveness detection has nothing to do with determining if a celebrity is alive or deceased; that would be a biographical or news-related query, not a computer vision feature. Option C is wrong because detecting human faces in real-time video streaming is a general face detection capability, not specifically liveness detection, which focuses on verifying the authenticity of the face rather than just its presence. Option D is wrong because monitoring model accuracy post-deployment is a model management or MLOps concern, not a feature of the Face service itself.

Practice this question →

Page 1 of 3 · 208 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Describe Features Of Computer Vision Workloads On Azure questions.

Start 20-question session