CCNA Describe Features Of Computer Vision Workloads On Azure Questions — Page 2 of 3

MCQeasy

What does Azure AI Vision's 'optical character recognition' (OCR) feature do?

A.Converts text files into images for archival purposes

B.Extracts printed and handwritten text from images and documents

C.Recognises optical fibre cables in data centre photographs

D.Corrects spelling errors in text extracted from forms

AnswerB

OCR reads text from photos and scanned documents — enabling digitisation of printed/handwritten content for further processing.

Why this answer

Azure AI Vision's OCR feature is designed to extract printed and handwritten text from images and documents, converting visual text into machine-readable data. This is correct because OCR uses deep learning models to detect and read text characters from various visual sources, enabling downstream processing like search or analysis.

Exam trap

The trap here is that candidates may confuse OCR with other computer vision tasks like object detection (Option C) or assume OCR includes post-processing like spell checking (Option D), when in fact OCR is strictly about text extraction from visual media.

How to eliminate wrong answers

Option A is wrong because OCR extracts text from images, not converts text files into images; that would be a rendering or archival process, not OCR. Option C is wrong because OCR recognizes text characters, not optical fibre cables; cable recognition would require object detection or image classification, not OCR. Option D is wrong because OCR only extracts text as-is without correcting spelling errors; spell correction is a separate natural language processing task.

Practice this question →

MCQmedium

Which Azure AI capability can analyze video to identify and track specific people or objects across frames?

A.Azure AI Custom Vision

B.Azure AI Video Indexer

C.Azure AI Face

D.Azure AI Vision OCR

AnswerB

Video Indexer analyzes video content using AI, providing face identification, object tracking, scene detection, and automatic transcription.

Why this answer

Azure AI Video Indexer is the correct choice because it is specifically designed to analyze video content, including the ability to detect, track, and identify people or objects across frames using AI-powered computer vision and audio analysis. It provides features like face detection, object tracking, and motion detection over time, making it suitable for this scenario.

Exam trap

The trap here is that candidates often confuse Azure AI Video Indexer with Azure AI Custom Vision or Azure AI Face, mistakenly thinking that image-based services can handle video analysis, but Video Indexer is the only option that natively supports temporal tracking across video frames.

How to eliminate wrong answers

Option A is wrong because Azure AI Custom Vision is a service for training custom image classification and object detection models on static images, not for analyzing video streams or tracking objects across frames. Option C is wrong because Azure AI Face is focused solely on facial detection, recognition, and analysis in images, lacking the capability to track arbitrary objects or perform cross-frame video analysis. Option D is wrong because Azure AI Vision OCR (Optical Character Recognition) is limited to extracting text from images and documents, with no ability to analyze video or track people/objects.

Practice this question →

MCQmedium

A manufacturing company wants to use computer vision to inspect products on an assembly line. They need to identify and locate specific types of defects (e.g., scratch, dent, crack) in product images. Which Azure Computer Vision capability should they use?

A.Image Classification

B.Object Detection

C.Optical Character Recognition (OCR)

D.Face Detection

AnswerB

Object detection can both classify and localize multiple objects (defects) in an image, providing bounding boxes around each defect type, which matches the requirement.

Why this answer

Object Detection is the correct choice because it not only classifies defects (e.g., scratch, dent, crack) but also provides bounding box coordinates to locate each defect within the product image. This meets the requirement to both identify and locate specific defect types on the assembly line.

Exam trap

The trap here is that candidates confuse Image Classification (which only labels the whole image) with Object Detection (which both classifies and localizes), missing the critical 'locate' requirement in the question.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to the entire image (e.g., 'defective' or 'non-defective') and cannot locate multiple defects or their positions. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, not visual defects like scratches or dents. Option D is wrong because Face Detection identifies human faces, not product defects.

Practice this question →

MCQeasy

What is 'object detection' in computer vision and how does it differ from image classification?

A.Object detection and image classification produce the same output — both label the entire image

B.Object detection locates each object with a bounding box and class label; classification labels the whole image

C.Image classification processes images faster than object detection because it is simpler

D.Object detection only works on images with a single object; classification handles multiple objects

AnswerB

Detection = where are the objects AND what are they? Classification = what is the dominant content of this image?

Why this answer

Option B is correct because object detection goes beyond image classification by not only identifying the class of objects present but also localizing each one with a bounding box. In contrast, image classification assigns a single label to the entire image, regardless of how many objects are present. This distinction is fundamental in computer vision workloads on Azure, where Custom Vision and Computer Vision API offer both capabilities.

Exam trap

The trap here is that candidates may confuse object detection with image classification because both involve labeling objects, but the key differentiator is localization—object detection provides spatial coordinates (bounding boxes), while classification does not.

How to eliminate wrong answers

Option A is wrong because object detection and image classification do not produce the same output; classification labels the entire image, while detection outputs bounding boxes and labels for each object. Option C is wrong because while image classification is generally simpler and can be faster, the statement is not a defining difference—object detection is not inherently slower in all implementations, and the question asks for the functional difference, not performance. Option D is wrong because object detection is specifically designed to handle multiple objects in a single image, not just a single object; classification can also handle multiple objects but only produces one label for the whole scene.

Practice this question →

MCQmedium

What is 'model export' in Azure Custom Vision and what formats are supported?

A.Exporting model training logs and metrics to Excel for analysis

B.Exporting trained models as ONNX, TensorFlow, CoreML, or Docker for offline/edge deployment

C.Exporting the training data to another Azure service for fine-tuning

D.Exporting a Custom Vision project as a YAML configuration file for source control

AnswerB

Custom Vision export enables edge AI — shipping the model to devices where cloud calls aren't possible or desirable.

Why this answer

Model export in Azure Custom Vision allows you to export a trained image classification or object detection model in formats like ONNX, TensorFlow, CoreML, or Docker container images. This enables the model to run offline on edge devices or local servers without requiring a continuous connection to the Azure cloud, which is critical for low-latency or disconnected scenarios.

Exam trap

The trap here is that candidates confuse 'model export' with exporting training data or logs, because Azure Custom Vision does offer data export options elsewhere, but the specific term 'model export' refers exclusively to the trained model artifact for offline deployment.

How to eliminate wrong answers

Option A is wrong because model export does not involve exporting training logs or metrics to Excel; those are accessed via training APIs or the Azure portal for analysis, not as an export feature. Option C is wrong because exporting the training data to another Azure service is not a built-in Custom Vision feature; data can be exported manually, but the 'model export' feature specifically exports the trained model artifact, not the dataset. Option D is wrong because Custom Vision does not export projects as YAML configuration files; project configuration is managed through the portal or SDK, and YAML exports are not a supported format for model deployment.

Practice this question →

MCQeasy

What is the primary use case for Azure AI Document Intelligence's layout model?

A.Generating visual layouts for new document templates

B.Extracting the structural layout of documents including tables, text blocks, and positions

C.Converting documents between different file formats (PDF to DOCX)

D.Checking documents for grammatical and spelling errors

AnswerB

The layout model extracts document structure — identifying tables, paragraphs, headers, and their spatial relationships on the page.

Why this answer

Azure AI Document Intelligence's layout model is designed to extract the structural layout of documents, including tables, text blocks, and their spatial positions. This enables downstream processing like OCR, form understanding, and document analysis by preserving the original reading order and layout hierarchy.

Exam trap

The trap here is that candidates confuse the layout model's structural extraction with format conversion or content generation, leading them to pick options like A or C instead of recognizing its true purpose of spatial layout analysis.

How to eliminate wrong answers

Option A is wrong because generating visual layouts for new document templates is not a capability of the layout model; it is an extraction tool, not a design tool. Option C is wrong because converting documents between file formats (e.g., PDF to DOCX) is not a function of the layout model; format conversion is handled by separate document processing libraries or services. Option D is wrong because checking for grammatical and spelling errors falls under natural language processing (NLP) services like Azure AI Language, not the layout model, which focuses on spatial and structural extraction.

Practice this question →

MCQmedium

A retail company wants to use Azure Computer Vision to monitor shelf inventory. They need to detect whether specific products (e.g., 'Brand A cereal', 'Brand B cereal') are present on a shelf and count the number of units of each product. They have a labeled dataset with images of each product category. Which Azure Computer Vision capability should they use?

A.Custom object detection

B.Optical character recognition (OCR)

C.Prebuilt image analysis (Describe Image)

D.Facial recognition

AnswerA

With custom object detection, you can train a model on labeled images to detect specific products (e.g., Brand A cereal) and count their occurrences, meeting the requirement.

Why this answer

Custom object detection (A) is correct because the retail company needs to detect and count specific product categories (e.g., 'Brand A cereal', 'Brand B cereal') from images, which requires training a model on a labeled dataset of those products. Azure Custom Vision's object detection capability allows you to upload labeled images, train a model to identify and locate multiple instances of each product in an image, and return bounding boxes with counts per class. This is the only option that supports custom, multi-class object detection and counting from user-provided training data.

Exam trap

The trap here is that candidates confuse prebuilt image analysis (which can 'describe' a scene) with custom object detection, not realizing that prebuilt models cannot be trained on specific product categories or provide per-object counts.

How to eliminate wrong answers

Option B (OCR) is wrong because OCR extracts text from images (e.g., product labels or barcodes) but does not detect or count physical objects like cereal boxes; it cannot learn to recognize 'Brand A cereal' as a visual object without text. Option C (Prebuilt image analysis - Describe Image) is wrong because it generates a human-readable caption of the scene (e.g., 'a shelf with boxes') and provides generic tags, but it cannot be trained on custom product categories or output per-product counts with bounding boxes. Option D (Facial recognition) is wrong because it is designed to detect, analyze, and verify human faces, not inanimate objects like cereal boxes on a shelf.

Practice this question →

MCQmedium

A quality control team uses computer vision to inspect manufactured parts. They need to detect whether a part has any defects and also identify the type of defect (e.g., scratch, crack, dent) from an image. Which Azure Computer Vision capability should they use?

A.A: Image classification

B.B: Object detection

C.C: Semantic segmentation

D.D: Optical character recognition (OCR)

AnswerB

Correct: Object detection identifies and localizes multiple defect types within an image.

Why this answer

Object detection is the correct capability because it not only identifies the presence of defects in an image but also localizes each defect with a bounding box and classifies it into specific types (e.g., scratch, crack, dent). This meets both requirements: detecting whether a part has defects and identifying the type of each defect.

Exam trap

The trap here is that candidates often confuse image classification with object detection, assuming that classifying the entire image as 'defective' is sufficient, but the question explicitly requires identifying the type of each defect, which necessitates localization and multi-class output.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'defective' or 'non-defective'), but it cannot identify multiple defect types or their locations within the same image. Option C is wrong because semantic segmentation assigns a class label to every pixel, which is overkill for defect type identification and does not inherently separate individual defect instances or provide bounding boxes. Option D is wrong because optical character recognition (OCR) extracts text from images, which is irrelevant to detecting physical defects like scratches, cracks, or dents.

Practice this question →

MCQeasy

A nature conservation organization wants to create an app that automatically identifies different species of birds from photos uploaded by birdwatchers. They have thousands of labeled images of bird species. Which Azure service should they use to train a custom model?

A.Azure Computer Vision Image Analysis

B.Azure Custom Vision

C.Azure Face API

D.Azure Form Recognizer

AnswerB

Custom Vision is designed for training custom image classification or object detection models using labeled images, perfect for identifying different bird species.

Why this answer

Azure Custom Vision is the correct choice because it allows you to train a custom image classification model using your own labeled dataset of bird species. Unlike the pre-built Computer Vision Image Analysis service, Custom Vision specializes in fine-grained classification tasks where you need to distinguish between dozens or hundreds of visually similar categories, such as different bird species.

Exam trap

The trap here is that candidates confuse the general-purpose Computer Vision Image Analysis (which cannot be retrained) with Custom Vision (which is specifically designed for custom classification), leading them to pick option A.

How to eliminate wrong answers

Option A is wrong because Azure Computer Vision Image Analysis provides pre-built models for general image tagging, OCR, and object detection, but it cannot be trained on custom datasets to recognize specific bird species. Option C is wrong because Azure Face API is designed specifically for detecting and recognizing human faces, not animals or birds. Option D is wrong because Azure Form Recognizer is optimized for extracting text and structured data from documents like invoices and forms, not for image classification of natural subjects.

Practice this question →

MCQeasy

A construction company uses drone images to survey construction sites. They need an automated system that can identify specific types of heavy equipment (e.g., bulldozers, cranes, excavators) in an image and also draw precise pixel-level outlines around each equipment type. Which Azure Computer Vision capability should they use?

A.Object detection

B.Semantic segmentation

C.Image classification

D.Optical Character Recognition (OCR)

AnswerB

Semantic segmentation assigns a class label to every pixel in the image, enabling precise outlines of objects like heavy equipment.

Why this answer

Semantic segmentation is the correct capability because it assigns a class label (e.g., bulldozer, crane, excavator) to every pixel in the image, producing precise pixel-level outlines around each equipment type. Object detection only provides bounding boxes, not pixel-level masks, while image classification labels the entire image without localization. OCR is irrelevant as it extracts text, not equipment shapes.

Exam trap

The trap here is that candidates confuse object detection (bounding boxes) with semantic segmentation (pixel-level masks), because both localize objects, but only segmentation provides the precise outlines required for detailed spatial analysis.

How to eliminate wrong answers

Option A (Object detection) is wrong because it returns bounding boxes around objects, not pixel-level outlines, so it cannot draw precise outlines around each equipment type. Option C (Image classification) is wrong because it assigns a single label to the entire image, failing to identify multiple equipment types or their locations. Option D (Optical Character Recognition) is wrong because it extracts text from images, not heavy equipment, and has no capability for object localization or segmentation.

Practice this question →

MCQmedium

A warehouse deploys cameras to automatically process incoming packages. The system must read the serial numbers printed on each package label to update inventory records. The labels often have varied fonts and sizes, and may be slightly rotated. Which Azure Computer Vision capability should be used to extract the serial numbers?

A.Object detection

B.Optical Character Recognition (OCR)

C.Image classification

D.Facial recognition

AnswerB

OCR extracts text from images, handling various fonts, sizes, and orientations. This is the standard Azure Computer Vision capability for reading printed text like serial numbers.

Why this answer

Optical Character Recognition (OCR) is the correct Azure Computer Vision capability because it is specifically designed to extract printed or handwritten text from images, including serial numbers with varied fonts, sizes, and rotations. Azure's OCR API (part of Computer Vision) can handle skewed or rotated text by automatically detecting and correcting orientation before recognizing characters, making it ideal for warehouse labels that are not perfectly aligned.

Exam trap

The trap here is that candidates may confuse object detection (which can 'see' labels) with OCR, not realizing that object detection only locates objects without reading any text content on them.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects (e.g., packages, boxes) within an image by drawing bounding boxes, but it does not extract text or read serial numbers. Option C is wrong because image classification assigns a single label or category to an entire image (e.g., 'package' or 'label'), but it cannot read or extract specific alphanumeric strings like serial numbers. Option D is wrong because facial recognition is designed to detect, analyze, and identify human faces, not to process text on labels or packages.

Practice this question →

MCQeasy

A historical society has scanned hundreds of books printed in the 19th century. They want to convert the scanned images into searchable, editable text. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Object detection

C.Image classification

D.Facial detection

AnswerA

OCR extracts printed text from images, making it searchable and editable.

Why this answer

Optical Character Recognition (OCR) is the Azure Computer Vision capability designed to extract printed or handwritten text from images and convert it into machine-readable, searchable, and editable text. For the historical society's scanned books, OCR can detect characters and words from the 19th-century prints and output them as digital text, enabling full-text search and editing.

Exam trap

The trap here is that candidates may confuse OCR with general image analysis capabilities like object detection or classification, not realizing OCR is the specific service for text extraction from images.

How to eliminate wrong answers

Option B (Object detection) is wrong because it identifies and locates objects (e.g., cars, animals) within an image, not text characters or words. Option C (Image classification) is wrong because it assigns a single label or category to an entire image (e.g., 'book cover'), rather than extracting specific text content. Option D (Facial detection) is wrong because it detects human faces and their attributes (e.g., age, emotion), which is irrelevant to converting printed text into editable format.

Practice this question →

MCQmedium

A retail warehouse uses a camera system to locate and count boxes on shelves. The system needs to output the exact positions of each box by drawing a rectangular frame around it in the image. Which Azure Computer Vision capability should they use?

A.Object detection

B.Image classification

C.Semantic segmentation

D.Optical Character Recognition (OCR)

AnswerA

Object detection finds objects and returns their bounding boxes, which is precisely what is needed to locate and frame each box in an image.

Why this answer

Object detection is the correct capability because it identifies and localizes multiple objects within an image by drawing bounding boxes around each detected instance. In this scenario, the system needs to locate and count individual boxes on shelves, which requires both classification (what is a box) and localization (where each box is), exactly what object detection provides.

Exam trap

The trap here is that candidates confuse semantic segmentation with object detection because both involve 'segments' or 'regions,' but segmentation does not separate individual instances of the same object type, making it unsuitable for counting distinct boxes.

How to eliminate wrong answers

Option B (Image classification) is wrong because it assigns a single label to the entire image, not identifying or locating individual objects. Option C (Semantic segmentation) is wrong because it classifies every pixel into a category (e.g., 'box' vs 'shelf') but does not separate individual instances of the same class, so it cannot draw distinct bounding boxes around each box. Option D (Optical Character Recognition) is wrong because it extracts text from images, not relevant to locating physical boxes.

Practice this question →

MCQeasy

What does 'confidence score' mean in Azure AI Custom Vision object detection results?

A.The percentage of training images that contained this type of object

B.The model's certainty about a detection, used to set thresholds balancing false positives vs misses

C.The accuracy of the model measured on the test dataset during training

D.A quality rating assigned by human reviewers to confirm the detection is correct

AnswerB

Confidence scores enable threshold tuning — higher thresholds reduce false positives; lower thresholds reduce misses.

Why this answer

In Azure AI Custom Vision, the confidence score is a numerical value (0 to 1) that represents the model's certainty that a detected object is correctly identified and localized. This score allows you to set a threshold to filter out low-certainty detections, balancing false positives (detections with low confidence) against misses (true objects that fall below the threshold). It is not a measure of training data composition, test accuracy, or human review.

Exam trap

The trap here is that candidates confuse the confidence score with overall model accuracy or training data statistics, when in fact it is a per-prediction certainty value used to filter results.

How to eliminate wrong answers

Option A is wrong because the confidence score is not the percentage of training images containing that object; that would be a class distribution metric, not a per-detection certainty. Option C is wrong because the confidence score is a per-prediction value, not the overall model accuracy measured on a test dataset; test accuracy is a separate evaluation metric. Option D is wrong because the confidence score is computed by the model algorithmically, not assigned by human reviewers; human review is a separate validation step.

Practice this question →

MCQmedium

What is 'visual question answering' (VQA) in multi-modal AI?

A.A quiz application that shows images and asks users multiple-choice questions

B.AI that answers natural language questions about the content of a specific image

C.An interview format where candidates answer questions while being recorded on video

D.Generating images in response to visual prompts provided by the user

AnswerB

VQA combines vision and language understanding — answering 'what colour is the car?' or 'how many people?' from image analysis.

Why this answer

Visual Question Answering (VQA) is a multi-modal AI capability that combines computer vision and natural language processing. The system takes an image as input along with a natural language question about that image, and outputs a relevant answer. This is correct because VQA specifically requires the AI to understand both visual content and textual queries to generate a response, which is exactly what option B describes.

Exam trap

The trap here is that candidates confuse 'visual question answering' with 'image captioning' or 'image generation,' but VQA specifically requires answering a natural language question about an image, not describing it generically or creating new images.

How to eliminate wrong answers

Option A is wrong because it describes a quiz application where users answer questions about images, which is a human-driven activity, not an AI system that itself answers questions about images. Option C is wrong because it describes a human interview process with video recording, which has no relation to AI answering questions about image content. Option D is wrong because it describes image generation from prompts (text-to-image or image-to-image), which is the reverse direction of VQA—VQA takes an image and a question to produce an answer, not generate an image.

Practice this question →

MCQeasy

A social media platform wants to automatically generate alternative text descriptions for images posted by users to improve accessibility for visually impaired users. Which Azure Computer Vision capability should be used?

A.Optical Character Recognition (OCR)

B.Image Captioning

C.Object Detection

D.Face Detection

AnswerB

Image Captioning automatically generates a natural language description of an image, making it suitable for alt-text generation.

Why this answer

Image Captioning is the correct capability because it generates human-readable descriptions of image content, which directly meets the requirement to produce alternative text for accessibility. Unlike other options, it synthesizes a complete sentence describing the scene, objects, and actions, making it ideal for screen readers.

Exam trap

The trap here is that candidates confuse Object Detection (which only lists objects) with Image Captioning (which generates a full description), leading them to choose C because they think identifying objects is sufficient for accessibility, but screen readers need natural language descriptions, not just object labels.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not descriptions of visual content, so it cannot describe a photo of a landscape or object. Option C is wrong because Object Detection identifies and locates specific objects within an image but does not generate a coherent textual description of the overall scene. Option D is wrong because Face Detection only identifies human faces and their attributes, ignoring other image content and context needed for alternative text.

Practice this question →

MCQeasy

What types of documents does Azure AI Document Intelligence's prebuilt 'receipt' model extract data from?

A.Only digital PDF receipts with standardized formatting

B.Sales receipts from stores and restaurants, extracting merchant details, items, and totals

C.Medical receipts and prescription records only

D.Electronic bank transfer receipts for financial transactions

AnswerB

The prebuilt receipt model extracts merchant name, date, line items, tax, and total from retail and restaurant receipts.

Why this answer

Option B is correct because Azure AI Document Intelligence's prebuilt 'receipt' model is specifically designed to extract key information from sales receipts, such as merchant details, transaction items, and totals. It uses optical character recognition (OCR) and deep learning models to parse both printed and handwritten receipts from stores and restaurants, handling various formats and layouts.

Exam trap

The trap here is that candidates may assume the receipt model is limited to a specific format or type of receipt, but it is designed for general sales receipts from stores and restaurants, not specialized documents like medical or bank records.

How to eliminate wrong answers

Option A is wrong because the receipt model is not limited to digital PDFs with standardized formatting; it can process scanned images, photos, and various receipt layouts, including those with non-standard formatting. Option C is wrong because the receipt model is not specialized for medical receipts or prescription records; those would require a different prebuilt model (e.g., the 'health insurance' or custom model). Option D is wrong because electronic bank transfer receipts are not the target of this model; the receipt model focuses on point-of-sale receipts, not financial transaction records from banking systems.

Practice this question →

MCQeasy

What can Azure AI Vision's spatial analysis feature do?

A.Extract text from documents and images

B.Analyze video to detect people's presence and movement in physical spaces

C.Identify the 3D coordinates of objects in satellite imagery

D.Generate 3D models from 2D photographs

AnswerB

Spatial analysis uses computer vision on video to count people, track movements, and monitor occupancy in physical environments.

Why this answer

Azure AI Vision's spatial analysis feature is designed to analyze video streams from cameras to detect the presence and movement of people in physical spaces. It uses computer vision models to track individuals, count occupancy, and understand movement patterns in real-time, enabling applications like retail analytics or workplace safety.

Exam trap

The trap here is that candidates confuse spatial analysis with general computer vision features like OCR or 3D reconstruction, assuming it can handle any image or video analysis task, when it is specifically focused on people detection and movement in physical spaces from live or recorded camera feeds.

How to eliminate wrong answers

Option A is wrong because extracting text from documents and images is the function of Azure AI Vision's OCR (Optical Character Recognition) capability, not spatial analysis. Option C is wrong because spatial analysis operates on video feeds from physical cameras, not satellite imagery, and it does not identify 3D coordinates of objects in such imagery. Option D is wrong because generating 3D models from 2D photographs is not a feature of spatial analysis; that would relate to photogrammetry or 3D reconstruction services, not Azure's spatial analysis.

Practice this question →

MCQhard

An autonomous vehicle system needs to both read the speed limit text on traffic signs and detect the presence and location of pedestrians crossing the road. Which combination of Azure Computer Vision capabilities should be used?

A.Image Classification and OCR

B.Semantic Segmentation and OCR

C.Optical Character Recognition (OCR) and Object Detection

D.Face Detection and OCR

AnswerC

OCR reads text from signs, and object detection finds and locates pedestrians, which together meet both requirements.

Why this answer

The autonomous vehicle system requires two distinct capabilities: reading text from speed limit signs (OCR) and detecting the presence and location of pedestrians (Object Detection). OCR extracts text from images, while Object Detection identifies objects and provides bounding boxes around them, making option C the correct combination.

Exam trap

The trap here is that candidates confuse Semantic Segmentation with Object Detection, assuming pixel-level classification is needed for pedestrian location, but Object Detection provides the required bounding boxes for location without the computational overhead of per-pixel segmentation.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to an entire image but does not provide bounding boxes or locations for multiple objects, so it cannot detect pedestrians' positions. Option B is wrong because Semantic Segmentation classifies every pixel into a category (e.g., road, pedestrian) but does not extract text from signs, and OCR alone cannot detect pedestrians. Option D is wrong because Face Detection specifically identifies human faces, not full pedestrian bodies, and cannot detect pedestrians crossing the road or read speed limit text.

Practice this question →

MCQmedium

What is image classification and how is it different from object detection?

A.Image classification labels the whole image; object detection finds and locates multiple objects within it

B.Image classification is faster; object detection is slower but more accurate

C.Image classification works on videos; object detection works on static images only

D.They are the same task with different names

AnswerA

Classification = one label for whole image; object detection = multiple objects each with class label and bounding box coordinates.

Why this answer

Image classification assigns a single label to an entire image based on its dominant content, such as 'cat' or 'dog'. Object detection goes further by not only identifying multiple objects within an image but also drawing bounding boxes around each one, providing both class labels and spatial locations. This distinction is fundamental in computer vision workloads on Azure, where Custom Vision and Computer Vision API offer separate capabilities for classification and detection tasks.

Exam trap

The trap here is that candidates confuse the output granularity—thinking object detection is just a 'more detailed' version of classification rather than a fundamentally different task with spatial localization, leading them to choose Option B or D.

How to eliminate wrong answers

Option B is wrong because while image classification can be computationally simpler, the statement that object detection is 'slower but more accurate' is misleading—accuracy depends on the specific model and use case, not a general trade-off; object detection provides more detailed output (locations), not inherently higher accuracy. Option C is wrong because both image classification and object detection can work on videos (e.g., frame-by-frame analysis) and static images; there is no restriction that classification is for videos and detection only for static images. Option D is wrong because image classification and object detection are fundamentally different tasks—classification labels the whole image, while detection identifies and localizes multiple objects, so they are not the same task with different names.

Practice this question →

MCQeasy

What industries benefit most from Azure AI Document Intelligence's capabilities?

A.Only the entertainment industry for processing movie scripts

B.Finance, healthcare, legal, government, and any industry processing high volumes of documents

C.Only manufacturing for quality control inspection

D.Only retail for product catalog management

AnswerB

Document Intelligence automates data extraction from invoices, medical forms, legal contracts, and government forms across many industries.

Why this answer

Azure AI Document Intelligence (formerly Form Recognizer) is designed to extract, analyze, and structure data from documents at scale using prebuilt and custom models. Industries like finance, healthcare, legal, and government process massive volumes of forms, invoices, medical records, and contracts, making them the primary beneficiaries of automated document processing.

Exam trap

The trap here is that candidates may assume Document Intelligence is limited to a single vertical (like entertainment or manufacturing), when in fact it is a general-purpose service for any industry that handles structured or semi-structured documents.

How to eliminate wrong answers

Option A is wrong because the entertainment industry is not the sole beneficiary; Document Intelligence is built for any high-volume document processing, not just movie scripts. Option C is wrong because manufacturing quality control typically relies on computer vision for object detection and defect analysis, not document extraction. Option D is wrong because retail product catalog management is only one narrow use case, and Document Intelligence is designed for broad document types across many industries.

Practice this question →

MCQhard

An autonomous vehicle team needs a system that not only identifies objects like cars and pedestrians but also creates a precise pixel-level mask for each individual object instance, even when objects overlap. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Instance segmentation

AnswerD

Instance segmentation provides a separate segmentation mask for each object instance, enabling precise separation even when objects overlap.

Why this answer

Instance segmentation (Option D) is the correct choice because it combines object detection with semantic segmentation to identify each individual object instance and generate a precise pixel-level mask for it, even when objects overlap. This capability is essential for autonomous vehicles to distinguish between multiple cars or pedestrians that may partially occlude each other, enabling safe navigation.

Exam trap

The trap here is that candidates confuse semantic segmentation (which labels every pixel by class but not by instance) with instance segmentation, leading them to choose Option C when the question explicitly requires per-instance masks for overlapping objects.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to an entire image, not individual objects or pixel-level masks. Option B is wrong because object detection draws bounding boxes around objects but does not create pixel-level masks, so overlapping objects cannot be precisely separated. Option C is wrong because semantic segmentation assigns a class label to every pixel in the image but does not differentiate between individual instances of the same class (e.g., two overlapping cars would be merged into one 'car' region).

Practice this question →

MCQmedium

What is 'brand detection' in Azure AI Vision?

A.Detecting counterfeit products by analysing product images

B.Identifying well-known brand logos and their locations within images

C.Analysing brand sentiment from customer review text

D.Detecting when Azure resources have been tagged with incorrect brand naming conventions

AnswerB

Brand detection locates and names brand logos in images — enabling media monitoring, retail compliance, and content analysis.

Why this answer

Brand detection in Azure AI Vision is a specialized feature that uses computer vision models to identify well-known brand logos within images and return their locations as bounding box coordinates. It is part of the Image Analysis API, specifically under the 'brands' visual feature, and does not involve text analysis, resource tagging, or counterfeit detection.

Exam trap

The trap here is that candidates confuse 'brand detection' with general object detection or text analysis, mistakenly thinking it involves counterfeit detection (A) or sentiment analysis (C), when in fact it is a specific logo-recognition feature within Azure AI Vision's Image Analysis API.

How to eliminate wrong answers

Option A is wrong because brand detection identifies logos, not counterfeit products; counterfeit detection would require custom model training or additional verification logic beyond the built-in brand detection capability. Option C is wrong because brand detection operates on visual image content, not text; sentiment analysis from customer reviews is a natural language processing (NLP) task handled by Azure AI Language, not Azure AI Vision. Option D is wrong because brand detection analyzes image content for logos, not Azure resource tags or naming conventions; resource tagging is an Azure governance feature unrelated to computer vision.

Practice this question →

MCQeasy

A library wants to digitize a collection of old printed books by converting scanned pages into searchable, editable text. Which Azure Computer Vision capability should they use?

A.Image Analysis (descriptions and tags)

B.Optical Character Recognition (OCR)

C.Object detection

D.Face detection

AnswerB

OCR is designed specifically to detect and extract text from images, making it the ideal choice for converting scanned book pages into editable and searchable text.

Why this answer

Optical Character Recognition (OCR) is the Azure Computer Vision capability specifically designed to extract printed or handwritten text from images and convert it into machine-readable, searchable, and editable text. For digitizing old printed books, OCR can process scanned pages to produce digital text that can be indexed and edited, directly meeting the library's requirement.

Exam trap

The trap here is that candidates may confuse Image Analysis (which can describe a scene containing text) with OCR (which specifically extracts the text itself), leading them to choose option A when the task requires editable text output.

How to eliminate wrong answers

Option A is wrong because Image Analysis provides descriptions and tags for visual content (e.g., objects, scenes, colors) but does not extract text characters from images. Option C is wrong because Object detection identifies and locates objects within an image (e.g., chairs, cars) but cannot read or convert text. Option D is wrong because Face detection identifies human faces in images and provides attributes like age or emotion, which is unrelated to text extraction from scanned documents.

Practice this question →

100

MCQeasy

A city transportation department wants to use a live camera feed at a bus stop to estimate how many people are waiting for the bus. Which Azure Computer Vision capability should they use?

A.A. Optical Character Recognition (OCR)

B.B. Face detection

C.C. Object detection

D.D. Semantic segmentation

AnswerC

Object detection can detect and count occurrences of objects like 'person' across the image, including people whose faces are not visible.

Why this answer

Object detection is the correct capability because it can identify and locate multiple people in a live camera feed, providing bounding boxes around each person. This allows the system to count the number of individuals waiting at the bus stop, which is the core requirement. Optical Character Recognition (OCR) extracts text, face detection identifies faces but not necessarily counts people in a crowd, and semantic segmentation classifies each pixel but is overkill for simple counting.

Exam trap

The trap here is that candidates might confuse face detection with people counting, but face detection fails when faces are not visible, whereas object detection with the 'person' class is more robust for counting people in a crowd.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) is designed to extract printed or handwritten text from images, not to detect or count people. Option B is wrong because face detection identifies faces and can count faces, but it may miss people whose faces are not visible (e.g., turned away or partially occluded), making it unreliable for accurate crowd counting. Option D is wrong because semantic segmentation assigns a class label to every pixel in an image, which is more granular than needed for counting people and is computationally heavier than object detection for this task.

Practice this question →

101

MCQmedium

A retail company wants to use security cameras to automatically detect when products are removed from shelves. They need to identify the specific product type (e.g., a cereal box, a soda can) and count how many units are taken. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Object detection

C.Image tagging

D.Face detection

AnswerB

Object detection identifies and locates multiple objects within an image, returning a list of detected objects with their labels and bounding boxes. This enables the system to recognize product types and count each instance.

Why this answer

Object detection is the correct capability because it can both locate objects within an image (via bounding boxes) and classify them into specific categories (e.g., cereal box, soda can). This allows the system to identify the product type and count the number of units removed from shelves, which aligns directly with the requirement.

Exam trap

The trap here is that candidates often confuse image tagging (which labels the whole scene) with object detection (which identifies and locates individual objects), leading them to choose option C when the question explicitly requires counting and identifying specific product types.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not objects or product types. Option C is wrong because image tagging assigns descriptive labels to the entire image (e.g., 'grocery store') but does not provide bounding boxes or per-object counts. Option D is wrong because face detection is specialized for identifying human faces, not inanimate objects like products on shelves.

Practice this question →

102

MCQeasy

A social media platform wants to automatically generate a textual description for each user-uploaded image to assist visually impaired users. Which prebuilt Azure Computer Vision feature should they use?

A.A

B.B

C.C

D.D

AnswerB

Image Analysis - Describe Image generates a complete sentence describing the image content, ideal for accessibility purposes.

Why this answer

Option B is correct because the Azure Computer Vision Image Analysis API includes a 'caption' feature that generates a human-readable textual description of an image's content. This prebuilt capability is specifically designed to assist visually impaired users by automatically producing alt-text for images, making it the ideal choice for the social media platform's requirement.

Exam trap

The trap here is that candidates often confuse object detection (which lists objects) with image captioning (which describes the scene), leading them to select object detection when the question explicitly asks for a textual description of the entire image.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not a general description of the image content. Option C is wrong because object detection identifies and locates specific objects within an image, but does not produce a coherent textual description of the overall scene. Option D is wrong because facial detection identifies human faces and their attributes, which is unrelated to generating a description of the entire image.

Practice this question →

103

MCQmedium

What is 'retail intelligence' using computer vision and what business value does it provide?

A.AI that recommends products to online shoppers based on browsing history

B.Using store video to analyse traffic flow, dwell time, queue length, and planogram compliance

C.An AI system that processes retail POS transaction data to forecast sales

D.Sentiment analysis of customer reviews from retail websites to improve products

AnswerB

Retail intelligence converts physical store video into actionable analytics — matching the data richness of online shopping analysis.

Why this answer

Option B is correct because retail intelligence using computer vision involves analyzing video feeds from in-store cameras to extract actionable insights such as customer traffic flow, dwell time at shelves, queue lengths, and planogram compliance. This is a classic computer vision workload on Azure, often implemented using Azure Video Indexer or Custom Vision, which processes visual data rather than transactional or textual data.

Exam trap

The trap here is that candidates confuse computer vision with other AI workloads like recommendation engines or NLP, assuming any retail AI is 'retail intelligence' without recognizing the specific visual data source.

How to eliminate wrong answers

Option A is wrong because it describes a recommendation engine based on browsing history, which relies on collaborative filtering or content-based filtering, not computer vision. Option C is wrong because it refers to processing POS transaction data for sales forecasting, which is a time-series analytics task, not a computer vision workload. Option D is wrong because sentiment analysis of customer reviews uses natural language processing (NLP), not computer vision, to analyze text.

Practice this question →

104

MCQmedium

What is 'image embedding' in computer vision and how is it used in visual search?

A.Inserting an image into a Word document or web page as an embedded object

B.Converting images to vectors that capture visual meaning for similarity search and retrieval

C.Compressing images before embedding them in a database to reduce storage costs

D.Annotating images with GPS coordinates embedded in the file metadata

AnswerB

Image embeddings enable finding visually similar images — powering reverse image search, product matching, and visual deduplication.

Why this answer

Image embedding converts images into dense vector representations (embeddings) that capture semantic visual features such as shapes, colors, and textures. In visual search, these embeddings enable similarity comparisons by calculating distances (e.g., cosine similarity) between query image vectors and a pre-indexed database of image vectors, allowing retrieval of visually similar images even without textual metadata.

Exam trap

The trap here is confusing 'embedding' as a general computing term (e.g., embedding an object in a document) with the specific machine learning concept of vector embeddings that capture semantic meaning for similarity search.

How to eliminate wrong answers

Option A is wrong because inserting an image into a document as an embedded object is a file-embedding operation, not a computer vision technique for representing visual content. Option C is wrong because compressing images reduces file size but does not produce a vector representation that captures semantic meaning for similarity search. Option D is wrong because annotating images with GPS coordinates adds geospatial metadata, not a vector embedding that encodes visual features for retrieval.

Practice this question →

105

MCQhard

A manufacturing company uses Azure Computer Vision to analyze assembly line images. They need to identify specific product defects (e.g., scratches, dents) and also read serial numbers printed on the products in various fonts. Which combination of Azure Computer Vision features should they use?

A.Image Analysis (object detection) and OCR

B.Custom Vision (object detection) and OCR

C.Face API and OCR

D.Image Analysis (tags) and OCR

AnswerB

Custom Vision object detection can be trained to identify and locate defects, while OCR reads the serial numbers. This combination solves both tasks effectively.

Why this answer

Option B is correct because the scenario requires two distinct capabilities: identifying specific defect types (scratches, dents) and reading variable-font serial numbers. Custom Vision's object detection model can be trained on labeled defect images to recognize those specific patterns, while Azure's OCR (part of Computer Vision's Read API) extracts printed text regardless of font. Combining these two features directly addresses both requirements.

Exam trap

The trap here is that candidates assume the built-in Image Analysis object detection can be customized for defects, but it is a pre-trained general model, whereas Custom Vision is required for custom training.

How to eliminate wrong answers

Option A is wrong because Image Analysis's built-in object detection is a general-purpose model that cannot be trained to recognize custom defects like scratches or dents; it only detects common objects (e.g., person, car). Option C is wrong because Face API is designed solely for human face detection, recognition, and analysis, not for product defects or text extraction. Option D is wrong because Image Analysis's tagging feature assigns descriptive labels (e.g., 'metal', 'industrial') based on pre-trained categories, not custom defect identification, and cannot be trained for specific product flaws.

Practice this question →

106

MCQmedium

A retail chain wants to automatically detect which specific products are missing from store shelves by analyzing images from in-store cameras. Each product has a distinct shape and label. Which Azure Computer Vision capability is most appropriate for this task?

A.A) Image Classification

B.B) Object Detection

C.C) Optical Character Recognition (OCR)

D.D) Facial Recognition

AnswerB

Correct. Object detection can locate and label multiple objects (products) within an image, allowing detection of missing items.

Why this answer

Object Detection (Option B) is the correct choice because it can identify and locate multiple products within an image by drawing bounding boxes around each detected object. This allows the system to determine which specific products are missing by comparing detected items against an expected inventory list. Image Classification would only label the entire image, not individual products, while OCR focuses on text extraction and Facial Recognition identifies people.

Exam trap

The trap here is that candidates often confuse Image Classification with Object Detection, thinking that classifying the entire image as 'shelf with products' is sufficient, but the task requires locating and identifying individual missing products, which only Object Detection can do.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label to the entire image (e.g., 'shelf with products'), but cannot distinguish or locate individual products to detect which ones are missing. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, but products are identified by shape and label, not solely by text; OCR would fail for products without readable text or with non-textual labels. Option D is wrong because Facial Recognition is designed to identify or verify individuals by facial features, not to detect inanimate objects like products on shelves.

Practice this question →

107

MCQeasy

A logistics company scans thousands of packages daily. They need an automated system to read handwritten shipping labels to sort packages correctly. Which Azure Computer Vision capability should they use?

A.Image Analysis (descriptions and tags)

B.Optical Character Recognition (OCR)

C.Object Detection

D.Face API

AnswerB

OCR in Azure AI Vision is designed to extract printed and handwritten text from images, making it ideal for reading shipping labels.

Why this answer

The correct answer is B, Optical Character Recognition (OCR), because the scenario requires extracting handwritten text from images of shipping labels to automate sorting. OCR is the specific Azure Computer Vision capability designed to detect and read printed or handwritten text from images, returning machine-readable text that can be used for downstream processing.

Exam trap

The trap here is that candidates may confuse Image Analysis (which can describe scenes) with OCR, but Image Analysis does not extract text—it only provides visual descriptions and tags.

How to eliminate wrong answers

Option A is wrong because Image Analysis provides descriptions and tags for visual content (e.g., objects, scenes, colors) but does not extract text from images. Option C is wrong because Object Detection identifies and locates objects within an image (e.g., 'package', 'person') but cannot read or interpret text on labels. Option D is wrong because Face API is specialized for detecting, analyzing, and recognizing human faces, not for reading text.

Practice this question →

108

MCQmedium

A security company needs to monitor a warehouse using video cameras. They want to detect whether any persons are present in a given frame and also know their approximate locations. Which Azure Computer Vision capability should they use?

A.Image classification

B.Object detection

C.Semantic segmentation

D.Optical Character Recognition (OCR)

AnswerB

Object detection identifies multiple objects of interest and provides bounding box coordinates, exactly what is needed to know that persons are present and where they are located.

Why this answer

Object detection is the correct choice because it not only identifies whether persons are present in a video frame but also provides bounding box coordinates indicating their approximate locations. This capability is specifically designed to locate multiple objects of interest within an image, which directly matches the requirement of detecting persons and knowing where they are.

Exam trap

The trap here is that candidates confuse object detection with image classification, thinking that simply labeling an image as containing a person is sufficient, but the question explicitly requires 'approximate locations' which only object detection provides.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image (e.g., 'person present') but does not provide any location information for detected objects. Option C is wrong because semantic segmentation assigns a class label to every pixel in the image, which is overkill for simply locating persons and does not differentiate between individual instances of the same class. Option D is wrong because Optical Character Recognition (OCR) is designed to extract text from images, not to detect or locate persons.

Practice this question →

109

MCQmedium

Which Azure AI service enables you to train a custom image classification model with your own labeled images?

A.Azure AI Vision (pre-built)

B.Azure AI Custom Vision

C.Azure Machine Learning

D.Azure AI Face

AnswerB

Custom Vision lets you train custom image classification and object detection models by uploading and labeling your own images.

Why this answer

Azure AI Custom Vision (option B) is the correct service because it is specifically designed to allow users to upload their own labeled images, train a custom image classification model, and then deploy it via a REST API endpoint. Unlike the pre-built Azure AI Vision service, Custom Vision provides the ability to fine-tune a model on domain-specific visual concepts using transfer learning, making it ideal for bespoke classification tasks.

Exam trap

The trap here is that candidates confuse the pre-built Azure AI Vision service (which cannot be retrained) with the Custom Vision service, assuming that 'AI Vision' includes custom training capabilities, when in fact Custom Vision is a separate Azure resource with a distinct training workflow.

How to eliminate wrong answers

Option A is wrong because Azure AI Vision (pre-built) offers only pre-trained models for general image analysis (e.g., object detection, OCR, landmark recognition) and does not allow you to train a custom model with your own labeled images. Option C is wrong because Azure Machine Learning is a broader platform for building, training, and deploying any type of machine learning model (including custom vision models), but it requires manual implementation of deep learning frameworks and is not a dedicated, out-of-the-box service for image classification with labeled images like Custom Vision. Option D is wrong because Azure AI Face is a specialized service for detecting and analyzing human faces (e.g., age, emotion, identity) and cannot be used to train a custom image classification model for arbitrary objects or scenes.

Practice this question →

110

MCQeasy

A quality control manager at a bottling plant needs an automated system to inspect images of bottles coming off the production line. The system must determine whether each bottle has a correctly sealed cap or is defective (cap missing or crooked). The manager has a set of labeled images showing both acceptable and defective bottles. Which Azure Computer Vision service should they use to build a model that classifies each bottle image as 'acceptable' or 'defective'?

A.Azure Face API

B.Azure Custom Vision (Image Classification)

C.Azure Form Recognizer

D.Azure OCR (Read API)

AnswerB

Custom Vision enables you to train a custom image classifier using your own labeled dataset, which is exactly what is needed to distinguish acceptable bottles from defective ones.

Why this answer

Azure Custom Vision (Image Classification) is the correct service because it allows you to upload labeled images of bottles (acceptable and defective) and train a custom image classification model to distinguish between the two classes. This service is specifically designed for scenarios where you need to classify images into user-defined categories without requiring deep learning expertise.

Exam trap

The trap here is that candidates may confuse Azure Custom Vision with Azure OCR or Form Recognizer because all three involve image analysis, but only Custom Vision allows training a custom classifier for non-text visual features like bottle cap integrity.

How to eliminate wrong answers

Option A is wrong because Azure Face API is designed for detecting, recognizing, and analyzing human faces in images, not for classifying industrial objects like bottle caps. Option C is wrong because Azure Form Recognizer is used for extracting text and structure from documents (e.g., invoices, forms), not for image classification tasks. Option D is wrong because Azure OCR (Read API) extracts printed or handwritten text from images, but does not perform image classification to determine if a bottle cap is sealed or defective.

Practice this question →

111

MCQhard

What is 'zero-shot object detection' in computer vision?

A.Object detection that runs with zero latency for real-time applications

B.Detecting objects described in text without any training examples of that specific class

C.Detection that works on black and white images (zero colour channels)

D.An object detection model with zero false positives on the test set

AnswerB

Zero-shot detection uses vision-language alignment — finding objects from descriptions rather than class-specific labelled examples.

Why this answer

Zero-shot object detection refers to a model's ability to detect objects in images based on a textual description of the target class, without having been trained on any labeled examples of that specific class. This is achieved by leveraging a joint embedding space where visual features and text features are aligned, allowing the model to generalize to unseen categories at inference time.

Exam trap

The trap here is confusing the term 'zero-shot' with performance metrics like latency, image color depth, or accuracy, rather than understanding it as a training paradigm where the model generalizes to unseen classes via natural language descriptions.

How to eliminate wrong answers

Option A is wrong because zero-shot object detection does not imply zero latency; latency depends on model architecture, hardware, and optimization, not on the zero-shot capability. Option C is wrong because zero-shot refers to the absence of training examples for a class, not to the number of color channels in the input image; models can process grayscale or color images regardless. Option D is wrong because zero-shot object detection makes no claim about false positive rate; a model can have false positives even in a zero-shot setting, and achieving zero false positives is an unrealistic performance metric.

Practice this question →

112

MCQeasy

What is 'face detection' vs 'face identification' in Azure AI Vision?

A.Face detection and identification are the same feature with different names

B.Detection locates faces and returns attributes; identification matches faces to a known person database

C.Detection works on live video; identification works only on still images

D.Face detection requires a paid tier; identification is available in the free tier

AnswerB

Detection = where are the faces? Identification = who are they? — identification requires enrolment of known faces and additional responsible AI approval.

Why this answer

Option B is correct because face detection in Azure AI Vision locates human faces in an image and returns attributes such as bounding box coordinates, landmarks (e.g., eyes, nose), and optional attributes like age or emotion. Face identification, part of the Azure Face API, goes a step further by matching a detected face against a secured person database (PersonGroup) to verify or recognize a specific individual. This distinction is fundamental: detection finds faces, identification assigns an identity.

Exam trap

The trap here is that candidates confuse the terms 'detection' and 'identification' as interchangeable, when Azure explicitly separates them as two distinct API operations with different capabilities and pricing tiers.

How to eliminate wrong answers

Option A is wrong because face detection and identification are distinct operations with different purposes and API endpoints; detection uses the 'Detect' operation, while identification uses the 'Identify' operation against a PersonGroup. Option C is wrong because both detection and identification work on still images and video frames; Azure AI Vision supports both modes for each, with no restriction that detection is only for live video or identification only for still images. Option D is wrong because both face detection and identification require a paid (S0) tier of the Face API; the free (F0) tier is limited to a low number of transactions per month and does not support identification at all.

Practice this question →

113

MCQeasy

What is 'background removal' in Azure AI Vision and what is it used for?

A.Removing background noise from audio in video recordings

B.Automatically separating the foreground subject from the image background

C.Deleting metadata embedded in image files before uploading to Azure

D.Removing blurry or out-of-focus areas from photographs

AnswerB

Background removal produces a cut-out of the subject — useful for e-commerce photography, virtual backgrounds, and image compositing.

Why this answer

Background removal in Azure AI Vision uses deep learning models to automatically detect and separate the primary foreground subject (e.g., a person, object, or animal) from the rest of the image. The service outputs either a cut-out image with a transparent background or a binary mask, enabling downstream tasks like compositing, product catalog creation, or privacy-focused image processing. This is a core computer vision capability, not related to audio, metadata, or image sharpness.

Exam trap

The trap here is that candidates confuse 'background removal' with general image cleanup tasks like noise reduction or blur removal, or mistakenly associate it with audio processing because of the word 'background' in a different context.

How to eliminate wrong answers

Option A is wrong because background removal in Azure AI Vision operates on images, not audio; removing background noise from audio is a speech or audio processing task, not a computer vision feature. Option C is wrong because deleting metadata (e.g., EXIF data) is a file management or privacy operation, not a computer vision capability; Azure AI Vision does not remove metadata as part of its image analysis. Option D is wrong because removing blurry or out-of-focus areas is an image enhancement or deblurring task, not the foreground/background segmentation that background removal performs.

Practice this question →

114

MCQmedium

What is 'multi-modal AI' and how does Azure AI Vision support it?

A.AI that processes data in multiple programming languages simultaneously

B.AI that processes and relates multiple data types (text, images, audio) together

C.Deploying AI models across multiple Azure regions for global availability

D.Using multiple AI models in sequence where each model processes a different step

AnswerB

Multi-modal AI understands cross-modal relationships — enabling image-text search, visual QA, and audio-visual analysis in unified models.

Why this answer

Multi-modal AI refers to systems that can process and relate multiple types of data—such as text, images, and audio—simultaneously. Azure AI Vision supports this by providing pre-built models and APIs that extract information from images and video, which can then be combined with text or audio data in a multi-modal pipeline, enabling richer analysis like image captioning or visual question answering.

Exam trap

The trap here is that candidates confuse 'multi-modal' with 'multi-model' or 'multi-region'—Azure AI-900 often tests the precise definition of multi-modal as handling multiple data types (text, image, audio) together, not just using multiple models or deploying across regions.

How to eliminate wrong answers

Option A is wrong because multi-modal AI is not about processing data in multiple programming languages; that describes polyglot programming or multi-language support, not data modality. Option C is wrong because deploying AI models across multiple Azure regions for global availability is a geo-redundancy or high-availability strategy, not a characteristic of multi-modal AI. Option D is wrong because using multiple AI models in sequence where each processes a different step describes a pipeline or chained architecture, not the simultaneous processing and relating of multiple data types that defines multi-modal AI.

Practice this question →

115

MCQmedium

A logistics warehouse uses a conveyor belt system to move packages. They need to automatically read the alphanumeric serial numbers printed on labels attached to each box. The labels may have different fonts and be somewhat dusty. Which Azure Computer Vision feature should they use?

A.Image Classification

B.Optical Character Recognition (OCR) using the Read API

C.Object Detection

D.Image Analysis (captioning and tagging)

AnswerB

The Read API extracts text from images and is robust to various fonts and image quality issues. It can return the serial number as a string, making it ideal for this use case.

Why this answer

The Read API, part of Azure Computer Vision's OCR capabilities, is specifically designed to extract printed and handwritten text from images, including alphanumeric serial numbers. It can handle varying fonts and degraded image quality (e.g., dusty labels) by using deep-learning models optimized for text recognition. This makes it the correct choice for reading serial numbers from conveyor belt packages.

Exam trap

The trap here is that candidates confuse Object Detection (finding objects) with OCR (reading text), or assume Image Classification can handle text extraction, when in fact only the Read API is designed for text recognition under challenging conditions.

How to eliminate wrong answers

Option A is wrong because Image Classification assigns a single label or category to an entire image (e.g., 'box' or 'package'), not extracting specific text characters. Option C is wrong because Object Detection identifies and locates objects (e.g., boxes, people) within an image using bounding boxes, but it does not read or interpret text content. Option D is wrong because Image Analysis (captioning and tagging) generates descriptive captions or tags about the image's content (e.g., 'a box on a conveyor belt'), not extracting alphanumeric strings.

Practice this question →

116

MCQhard

What is 'few-shot learning' in the context of Azure AI Custom Vision model training?

A.Training a model using only a small subset of available compute resources

B.Training an accurate vision model with very few labelled examples using transfer learning

C.A technique for running multiple small training experiments in parallel

D.Limiting training to the first few hundred iterations regardless of convergence

AnswerB

Few-shot vision training leverages pre-trained model knowledge — Azure Custom Vision can learn new categories from as few as 15 examples.

Why this answer

Few-shot learning in Azure AI Custom Vision refers to training an accurate vision model with very few labeled examples by leveraging transfer learning. This approach uses a pre-trained neural network (e.g., ResNet) as a starting point, allowing the model to learn new visual concepts from as few as 2–5 images per class, significantly reducing the data collection burden.

Exam trap

The trap here is confusing 'few-shot learning' with resource-saving techniques like reduced compute or early stopping, when the core concept is about achieving high accuracy with minimal labeled data through transfer learning.

How to eliminate wrong answers

Option A is wrong because it describes reducing compute resources, not the data efficiency technique of few-shot learning. Option C is wrong because it describes parallel training experiments, which is a resource optimization strategy unrelated to few-shot learning. Option D is wrong because it describes early stopping based on iteration count, which is a training termination heuristic, not a method for achieving accuracy with minimal labeled data.

Practice this question →

117

MCQmedium

A logistics company needs to automatically read handwritten addresses from package labels using cameras on a conveyor belt. The handwriting varies greatly in style, size, and orientation. Which Azure Computer Vision capability should they use?

A.Image Analysis (describing the image content)

B.OCR (Read API)

C.Face API

D.Custom Vision

AnswerB

The Read API is built for extracting printed and handwritten text from images, ideal for reading individual addresses.

Why this answer

The OCR (Read API) is specifically designed to extract text from images, including handwritten text, and is optimized for varied styles, sizes, and orientations. Unlike standard OCR, the Read API uses deep-learning models to handle unstructured documents and real-world scenarios like package labels on a conveyor belt.

Exam trap

The trap here is that candidates confuse the general-purpose OCR (Read API) with Image Analysis, which can detect printed text in some cases but is not designed for handwritten or irregular text extraction.

How to eliminate wrong answers

Option A is wrong because Image Analysis describes the content of an image (objects, scenes, tags) but does not extract text, especially handwritten text. Option C is wrong because Face API is dedicated to detecting, recognizing, and analyzing human faces, not text. Option D is wrong because Custom Vision is used to train custom image classifiers or object detectors on specific visual features, not for general-purpose text extraction from varied handwriting.

Practice this question →

118

MCQeasy

A parking management company uses cameras at the entrance and exit of a lot. They need to automatically read the license plate numbers of each car as it enters and exits. Which Azure Computer Vision capability is specifically designed for this task?

A.Optical Character Recognition (OCR)

B.Object detection

C.Image classification

D.Facial recognition

AnswerA

OCR extracts text from images, including license plate numbers, without needing custom training.

Why this answer

Optical Character Recognition (OCR) is the Azure Computer Vision capability specifically designed to extract printed or handwritten text from images, including license plate numbers. In this scenario, the cameras capture images of cars entering and exiting, and OCR processes those images to read the alphanumeric characters on the license plates. This is the exact use case for OCR, as it can handle varied fonts, angles, and lighting conditions common in parking lot environments.

Exam trap

The trap here is that candidates often confuse object detection with OCR, thinking that detecting a license plate as an object is sufficient, but OCR is required to actually read the alphanumeric text on the plate.

How to eliminate wrong answers

Option B (Object detection) is wrong because it identifies and locates objects within an image (e.g., cars, pedestrians) but does not extract text characters from those objects. Option C (Image classification) is wrong because it assigns a single label or category to an entire image (e.g., 'car' or 'truck') and cannot read specific alphanumeric sequences like license plates. Option D (Facial recognition) is wrong because it detects and identifies human faces, not vehicle license plates, and is designed for biometric identification rather than text extraction.

Practice this question →

119

MCQmedium

A transportation company wants to automatically identify whether an image contains a car, a truck, or a motorcycle. The system should output a single label for the entire image. Which computer vision capability in Azure should they use?

A.Object detection

B.Image classification

C.Optical Character Recognition (OCR)

D.Semantic segmentation

AnswerB

Image classification assigns one or more labels to the entire image, matching the requirement to identify the type of vehicle shown.

Why this answer

Image classification assigns a single label to an entire image based on its dominant content. Since the requirement is to output one label (car, truck, or motorcycle) per image, this maps directly to Azure's Custom Vision image classification capability, which trains a model to categorize whole images into predefined classes.

Exam trap

The trap here is that candidates confuse object detection (which finds and labels multiple objects) with image classification (which labels the whole image), especially when the question mentions multiple vehicle types, leading them to incorrectly choose object detection.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates multiple objects within an image using bounding boxes, outputting multiple labels and positions, not a single label for the entire image. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, not vehicle types. Option D is wrong because semantic segmentation assigns a class label to every pixel in the image, creating a pixel-level mask rather than a single image-level label.

Practice this question →

120

MCQeasy

What is the Azure AI Face service's 'liveness detection' feature used for?

A.Detecting whether a person is alive based on their vital signs

B.Determining whether a face is from a live person or a spoofing attempt (photo/video/mask)

C.Counting how many people are in a live video stream

D.Monitoring whether a person remains present during a video call

AnswerB

Liveness detection prevents authentication spoofing attacks by verifying the face is from a real, live person present at the camera.

Why this answer

Option B is correct because Azure AI Face's liveness detection is specifically designed to differentiate between a real, live human face and spoofing artifacts such as printed photos, video replays, or 3D masks. It analyzes subtle cues like micro-movements, texture, and depth to verify the presence of a living person, preventing unauthorized access in facial recognition systems.

Exam trap

The trap here is that candidates confuse liveness detection with general presence detection or vital sign monitoring, leading them to choose options A or D, which describe unrelated features from other Azure services.

How to eliminate wrong answers

Option A is wrong because liveness detection does not measure vital signs like heart rate or blood pressure; it relies on visual cues to assess liveness, not biometric health indicators. Option C is wrong because counting people in a live video stream is a separate capability of the Azure Video Indexer or Computer Vision service, not a function of Face liveness detection. Option D is wrong because monitoring whether a person remains present during a video call is a feature of Azure Communication Services or presence detection, not the Face service's liveness detection, which focuses on spoof prevention at the moment of capture.

Practice this question →

121

MCQeasy

A logistics company receives thousands of handwritten shipping labels daily. They need an automated solution to extract the destination address, sender name, and package weight from these labels. Which prebuilt Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Object detection

C.Image classification

D.Facial recognition

AnswerA

OCR extracts text (including handwriting) from images, perfect for reading shipping labels.

Why this answer

Option A is correct because Azure Computer Vision's Optical Character Recognition (OCR) API is specifically designed to extract printed or handwritten text from images. In this scenario, the handwritten shipping labels contain textual data (destination address, sender name, package weight), and OCR can read and digitize that text for automated processing. The other options address different visual tasks—object detection, classification, or facial recognition—none of which extract text content.

Exam trap

The trap here is that candidates may confuse OCR with object detection, thinking that 'extracting' information from an image is the same as identifying objects, but OCR is the only service that reads text characters from images.

How to eliminate wrong answers

Option B (Object detection) is wrong because it identifies and locates objects (e.g., boxes, vehicles) within an image, not text characters or words. Option C (Image classification) is wrong because it assigns a single label or category to an entire image (e.g., 'shipping label'), but does not extract specific textual details like addresses or weights. Option D (Facial recognition) is wrong because it detects and identifies human faces, which is irrelevant to reading handwritten text on labels.

Practice this question →

122

MCQmedium

A retail store uses ceiling-mounted cameras to analyze customer traffic flow. They need to detect when a person enters a specific aisle and determine the direction they are walking. Which Azure Computer Vision capability should they use?

A.Image Analysis dense captioning

B.Facial recognition

C.People counting (Spatial Analysis)

D.Optical Character Recognition (OCR)

AnswerC

Correct: Spatial analysis can detect people and track their movement, including direction, within a video feed.

Why this answer

Option C is correct because Spatial Analysis, part of Azure Computer Vision, uses ceiling-mounted cameras to track people's movement and direction in a physical space. It specifically provides people counting and trajectory analysis, making it ideal for detecting when a person enters an aisle and determining their walking direction.

Exam trap

The trap here is that candidates may confuse general image analysis or facial recognition with the specialized spatial tracking capability, not realizing that Spatial Analysis is the only Azure service designed for real-time people counting and direction detection in physical spaces.

How to eliminate wrong answers

Option A is wrong because Image Analysis dense captioning generates descriptive captions for images, not real-time spatial tracking of people's movement. Option B is wrong because Facial recognition identifies or verifies individuals by their face, not tracking movement or direction in a physical space. Option D is wrong because Optical Character Recognition (OCR) extracts text from images, not people detection or motion analysis.

Practice this question →

123

MCQmedium

What is 'product recognition' in Azure AI Vision for retail scenarios?

A.Scanning product barcodes to look up inventory information

B.Identifying retail products and checking shelf placement compliance using computer vision

C.Generating product descriptions from images for e-commerce listings

D.Detecting counterfeit or damaged products in a manufacturing quality line

AnswerB

Product recognition analyses shelf images to identify products and verify planogram compliance — enabling automated retail monitoring.

Why this answer

Product recognition in Azure AI Vision for retail scenarios is specifically designed to identify retail products and check shelf placement compliance using computer vision. It uses object detection and image analysis to recognize products in images or video streams, then compares their placement against a predefined planogram to ensure items are correctly stocked and positioned. This capability helps retailers automate inventory management and optimize shelf layouts.

Exam trap

The trap here is that candidates confuse product recognition with general object detection or image tagging, but the exam specifically tests the retail-focused use case of identifying products and verifying shelf compliance against a planogram.

How to eliminate wrong answers

Option A is wrong because scanning product barcodes to look up inventory information relies on barcode scanning technology, not computer vision-based product recognition; Azure AI Vision product recognition identifies products visually without requiring barcodes. Option C is wrong because generating product descriptions from images for e-commerce listings is a feature of Azure AI Vision's image captioning or content moderation, not the specialized product recognition API for retail. Option D is wrong because detecting counterfeit or damaged products in a manufacturing quality line falls under anomaly detection or custom vision models, not the prebuilt product recognition capability designed for retail shelf analysis.

Practice this question →

124

MCQeasy

A real estate company wants to create an application that automatically generates floor plans from photographs of rooms. The application needs to identify and delineate every pixel in the image that corresponds to walls, doors, windows, and furniture. Which Azure Computer Vision capability should the company use?

A.Object Detection

B.Semantic Segmentation

C.Image Classification

D.Optical Character Recognition (OCR)

AnswerB

Semantic Segmentation classifies each pixel in an image into predefined categories, making it ideal for identifying and outlining walls, doors, windows, and furniture.

Why this answer

Semantic segmentation is the correct choice because it classifies every pixel in an image into predefined categories (e.g., walls, doors, windows, furniture), producing a pixel-level mask. This is exactly what the application needs to delineate each structural element and object in the room photograph, enabling accurate floor plan generation.

Exam trap

The trap here is that candidates confuse object detection (bounding boxes) with semantic segmentation (pixel-level masks), mistakenly thinking detection can delineate walls and doors, but only segmentation provides the per-pixel classification required for floor plan generation.

How to eliminate wrong answers

Option A is wrong because object detection only identifies and locates objects with bounding boxes, not pixel-level delineation, so it cannot separate walls from doors or furniture at the granularity required. Option C is wrong because image classification assigns a single label to the entire image (e.g., 'kitchen'), not per-pixel segmentation of multiple elements. Option D is wrong because OCR extracts text from images, which is irrelevant to identifying walls, doors, windows, or furniture in a room.

Practice this question →

125

MCQmedium

What is 'depth estimation' in computer vision and what are its applications?

A.Measuring the depth of colour in an image (number of bits per pixel)

B.Inferring the distance of objects from the camera to produce a spatial depth map

C.Analysing how deeply a subject is embedded in a complex background scene

D.Determining how much detail is captured in a photograph based on lens quality

AnswerB

Depth estimation produces per-pixel distance measurements — enabling obstacle avoidance, 3D reconstruction, and AR scene understanding.

Why this answer

Depth estimation is a computer vision technique that infers the distance of objects from the camera, producing a spatial depth map where each pixel represents a distance value. This is commonly achieved using stereo vision (two cameras) or monocular depth estimation (single camera with deep learning models). It is a core feature of Azure Computer Vision's spatial analysis capabilities, enabling applications like augmented reality, autonomous navigation, and 3D scene reconstruction.

Exam trap

The trap here is that candidates confuse 'depth estimation' with image quality metrics (color depth or lens resolution) or with scene understanding terms like 'depth of field' or 'background embedding', rather than recognizing it as a spatial distance inference task.

How to eliminate wrong answers

Option A is wrong because it describes color depth (bits per pixel), which is a property of image encoding, not a computer vision technique for measuring spatial distance. Option C is wrong because it confuses depth estimation with semantic segmentation or object detection in cluttered scenes; 'depth' here refers to physical distance, not how deeply a subject is embedded in a background. Option D is wrong because it refers to photographic detail determined by lens quality (optical resolution), which is unrelated to the algorithmic inference of object distances from camera data.

Practice this question →

126

Matchingmedium

Match each Azure AI service to its regional availability constraint.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Limited to certain regions due to demand

Available in many regions

Some voices only in specific regions

Available globally

Available in most regions

Why these pairings

Regional availability can affect service usage.

Practice this question →

127

MCQeasy

A government agency needs to digitize thousands of handwritten application forms so that the text can be searched and processed. Which Azure Computer Vision capability should they use?

A.Object detection

B.Optical Character Recognition (Read API)

C.Image classification

D.Face detection

AnswerB

The Read API (OCR) extracts printed and handwritten text from images and documents.

Why this answer

The correct answer is B, Optical Character Recognition (Read API), because the agency needs to extract printed or handwritten text from images of application forms and make it searchable and processable. The Read API is specifically designed for this purpose, handling both printed and handwritten text, and is part of Azure Computer Vision's OCR capabilities.

Exam trap

The trap here is that candidates may confuse image classification (which categorizes the whole image) with OCR, not realizing that only OCR extracts actual text content for searchability.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects (e.g., cars, animals) within an image, not text characters, so it cannot digitize handwritten text. Option C is wrong because image classification assigns a single label or category to an entire image (e.g., 'form' or 'document'), but it does not extract or recognize individual text characters for search and processing. Option D is wrong because face detection identifies human faces in images, analyzing attributes like age or emotion, and has no capability to read or digitize text.

Practice this question →

128

MCQmedium

What is 'dense captioning' in Azure AI Vision v4.0?

A.Generating a very long and detailed caption for the entire image

B.Generating multiple region-specific captions each with a bounding box for different image areas

C.Adding caption text overlaid on top of the image like movie subtitles

D.Captions that include technical details like camera settings and lighting conditions

AnswerB

Dense captioning produces per-region natural language descriptions — richer than a single caption for accessibility and content analysis.

Why this answer

Dense captioning in Azure AI Vision v4.0 goes beyond describing the entire image; it identifies multiple distinct regions within the image and generates a separate caption for each region, along with a bounding box that pinpoints its location. This allows for granular understanding of complex scenes, such as recognizing 'a dog on a couch' and 'a lamp on a table' as separate, localized descriptions.

Exam trap

The trap here is that candidates confuse dense captioning with standard image captioning (Option A), assuming 'dense' simply means a longer or more detailed single caption, rather than recognizing it as a region-specific, multi-caption feature with bounding boxes.

How to eliminate wrong answers

Option A is wrong because dense captioning does not produce a single, very long caption for the whole image; that is the role of standard image captioning, not dense captioning. Option C is wrong because dense captioning does not overlay text onto the image like subtitles; it returns bounding box coordinates and captions as metadata, not as a visual overlay. Option D is wrong because dense captioning focuses on describing the content and context of image regions, not technical metadata like camera settings or lighting conditions, which are unrelated to the feature's purpose.

Practice this question →

129

MCQhard

A manufacturing company wants to use Azure AI to detect surface defects on metal parts. The team has a small set of labeled images of defective and non-defective parts, and images will be taken under various lighting conditions and angles. They need a solution that can leverage a pre-trained model and adapt it to their specific defect types with minimal new training data. Which approach should they take?

A.A. Use Custom Vision to train a classification or object detection model with transfer learning

B.B. Use the Optical Character Recognition (OCR) API

C.C. Use the Describe Image API (Image Captioning)

D.D. Use the Face API

AnswerA

Correct. Custom Vision uses transfer learning from pre-trained models, enabling effective training with a small dataset to detect specific defects.

Why this answer

Option A is correct because Custom Vision allows you to use transfer learning, which starts from a pre-trained model and fine-tunes it on your small labeled dataset of defective and non-defective parts. This approach is ideal when you have limited training data and need to adapt the model to specific defect types under varying lighting and angles, as Custom Vision supports both classification and object detection for surface defects.

Exam trap

The trap here is that candidates may confuse the general-purpose image analysis APIs (OCR, captioning, face) with Custom Vision's specialized ability to train custom models using transfer learning, assuming any Azure AI service can be adapted to a custom task without understanding the underlying training mechanism.

How to eliminate wrong answers

Option B is wrong because the OCR API is designed to extract printed or handwritten text from images, not to detect surface defects on metal parts. Option C is wrong because the Describe Image API (Image Captioning) generates natural language descriptions of image content, which is not suitable for defect detection or classification. Option D is wrong because the Face API is specialized for detecting, analyzing, and recognizing human faces, not for industrial defect inspection on metal parts.

Practice this question →

130

MCQmedium

What is 'video indexer' (Azure Video Indexer) and what insights does it extract?

A.A tool that compresses videos to reduce storage costs in Azure Blob Storage

B.A service that extracts transcripts, faces, speakers, topics, and scenes from video content

C.A database index that speeds up queries on video metadata tables

D.A tool for creating video presentations from a series of images and text

AnswerB

Video Indexer applies multiple AI models to video — producing searchable insights including who speaks, what appears, and what topics are discussed.

Why this answer

Azure Video Indexer is a cloud-based service that uses AI to analyze video and audio content. It extracts rich insights such as transcripts (speech-to-text), identified faces, speaker diarization, topics, scenes, and even sentiment, making it a comprehensive media intelligence tool rather than a storage or indexing utility.

Exam trap

The trap here is that candidates confuse Azure Video Indexer with a storage or database optimization tool, because the word 'indexer' misleadingly suggests indexing for performance, whereas it is actually an AI-based video analysis service for extracting metadata and insights.

How to eliminate wrong answers

Option A is wrong because Azure Video Indexer does not compress videos; compression for storage is handled by Azure Media Services or Blob Storage lifecycle policies, not by Video Indexer. Option C is wrong because Video Indexer is not a database index; it is an AI service that analyzes video content, while database indexing for metadata is a separate concept in Azure Cosmos DB or SQL. Option D is wrong because creating video presentations from images and text is a function of tools like Azure Video Analyzer for Media or PowerPoint, not the core purpose of Video Indexer, which focuses on extracting insights from existing videos.

Practice this question →

131

MCQmedium

What is object detection, and how does it differ from image classification?

A.Object detection identifies what is in an image; image classification also identifies where objects are located

B.Object detection identifies and locates multiple objects with bounding boxes; image classification labels the whole image

C.Object detection and image classification are the same task

D.Object detection is used only for face recognition

AnswerB

Object detection finds multiple objects AND their locations (bounding boxes); image classification assigns a single label to the entire image.

Why this answer

Object detection goes beyond image classification by not only identifying what objects are present in an image but also localizing each object with a bounding box. Image classification assigns a single label to the entire image, whereas object detection can handle multiple objects of different classes simultaneously. This makes object detection suitable for tasks like counting objects or tracking their positions.

Exam trap

The trap here is that candidates often confuse the terms 'classification' and 'detection' by thinking detection only identifies objects without localization, or they assume object detection is a subset of classification—when in fact detection includes both identification and localization.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: image classification labels the whole image, not the location of objects, while object detection identifies both what and where. Option C is wrong because object detection and image classification are distinct tasks with different outputs—classification outputs a single label, detection outputs multiple labels with coordinates. Option D is wrong because object detection is not limited to face recognition; it is used for a wide range of applications such as vehicle detection, defect inspection, and medical imaging.

Practice this question →

132

MCQhard

A logistics company uses drone imagery to monitor a busy container yard. They need to count the exact number of individual shipping containers, even when containers are partially stacked on top of each other or overlapping in the image. Which Azure Computer Vision capability should they choose to achieve the most accurate individual object separation?

A.Image classification

B.Object detection

C.Instance segmentation

D.Semantic segmentation

AnswerC

Instance segmentation identifies each object instance separately and produces a pixel-level mask for each, enabling accurate counting even when objects overlap.

Why this answer

Instance segmentation is the correct choice because it not only detects each individual object in an image but also generates a pixel-level mask for each instance, allowing the model to distinguish between overlapping or stacked objects like shipping containers. This capability provides the most accurate separation of individual containers, even when they partially occlude each other, by assigning unique masks to each instance rather than grouping all containers into a single class.

Exam trap

The trap here is that candidates confuse semantic segmentation (which labels all pixels of a class as one group) with instance segmentation (which separates individual objects), leading them to pick D when they need per-object counting.

How to eliminate wrong answers

Option A is wrong because image classification assigns a single label to the entire image, which cannot count or separate individual objects. Option B is wrong because object detection draws bounding boxes around objects but does not separate overlapping instances at the pixel level, leading to merged or inaccurate counts when containers are stacked. Option D is wrong because semantic segmentation classifies every pixel by class (e.g., 'container') but does not distinguish between individual instances of the same class, so overlapping containers would be grouped together as one blob.

Practice this question →

133

MCQeasy

A company wants to automate the processing of expense reports by extracting printed text from images of receipts. Which Azure Computer Vision capability should they use?

A.Object detection

B.OCR (Read API)

C.Semantic segmentation

D.Image Analysis (description generation)

AnswerB

Correct. The Read API extracts printed text from images and documents.

Why this answer

The OCR (Read API) is the correct Azure Computer Vision capability for extracting printed text from images of receipts. It is specifically designed to detect and extract text from images and documents, supporting both printed and handwritten text, making it ideal for automating expense report processing.

Exam trap

The trap here is that candidates may confuse object detection (which finds objects like a receipt) with OCR (which reads the text on the receipt), leading them to select object detection for a text extraction task.

How to eliminate wrong answers

Option A is wrong because object detection identifies and locates objects within an image (e.g., a receipt in a photo) but does not extract the printed text from those objects. Option C is wrong because semantic segmentation assigns pixel-level labels to image regions (e.g., separating receipt from background) but does not perform text extraction. Option D is wrong because Image Analysis (description generation) produces human-readable captions describing the image content, not extracting specific text characters.

Practice this question →

134

MCQmedium

A logistics company wants to automatically extract the tracking numbers, delivery addresses, and sender names from scanned shipping labels. Which prebuilt Azure Computer Vision capability should they use?

A.Object Detection

B.Optical Character Recognition (OCR)

C.Image Classification

D.Face Detection

AnswerB

Correct. OCR extracts text from images, making it suitable for reading shipping labels.

Why this answer

Option B (Optical Character Recognition, or OCR) is correct because the task requires extracting text (tracking numbers, addresses, sender names) from scanned images. Azure Computer Vision's OCR API is specifically designed to detect and read printed or handwritten text from images, returning the text content along with bounding boxes. Object Detection, Image Classification, and Face Detection do not extract text, making OCR the appropriate prebuilt capability.

Exam trap

The trap here is that candidates may confuse Object Detection (which finds objects) with OCR (which finds text), or assume Image Classification can read text, when in fact only OCR is designed for text extraction from images.

How to eliminate wrong answers

Option A (Object Detection) is wrong because it identifies and locates objects (e.g., boxes, vehicles) within an image, not text characters or words. Option C (Image Classification) is wrong because it assigns a single label or category to an entire image (e.g., 'shipping label'), but does not extract any textual content. Option D (Face Detection) is wrong because it locates human faces and returns face attributes, which is irrelevant to reading text from labels.

Practice this question →

135

MCQmedium

What is the Azure AI Custom Vision service's 'compact' domain used for?

A.Training models on a compact (small) dataset with fewer than 50 images

B.Producing exportable models optimized for deployment on edge devices with limited compute

C.Creating more compact API responses with less metadata

D.Training models that use less storage in Azure blob containers

AnswerB

Compact domains create smaller, exportable model files (ONNX, TensorFlow Lite, CoreML) that run offline on edge devices.

Why this answer

The Azure AI Custom Vision service's 'compact' domain is specifically designed to produce models that can be exported to formats like TensorFlow, ONNX, or CoreML for deployment on edge devices with limited compute, memory, and power. This domain trades some accuracy for a smaller model footprint, enabling real-time inference on devices such as cameras, drones, or IoT gateways.

Exam trap

The trap here is that candidates confuse 'compact' with 'small dataset' or 'reduced API output', when in fact it specifically refers to the model's exportability and optimization for offline edge deployment.

How to eliminate wrong answers

Option A is wrong because the 'compact' domain refers to the model architecture and exportability, not the dataset size; Custom Vision can train on datasets of any size, and the minimum recommended is typically 15-30 images per class, not 50. Option C is wrong because the 'compact' domain has no effect on API response metadata; API response size is controlled by parameters like `maxNumPredictions` or `iterationId`, not the training domain. Option D is wrong because the 'compact' domain does not affect storage in Azure blob containers; storage consumption depends on the number of training images and iterations, not the domain type.

Practice this question →

136

MCQmedium

A library wants to automatically generate descriptive alt text for hundreds of historical photographs in their digital archive. For each photo, the system should produce a natural-language description that includes objects present (e.g., 'a horse', 'a carriage'), the action being performed (e.g., 'pulling'), and the scene type (e.g., 'city street'). Which Azure Computer Vision capability should they use?

A.Image Analysis (Describe image)

B.Optical Character Recognition (OCR)

C.Object detection

D.Face detection

AnswerA

The describe feature of Image Analysis creates natural-language captions summarizing the content of an image, including objects and actions.

Why this answer

The Image Analysis 'Describe image' capability is designed to generate human-readable captions that summarize the content of an image, including objects, actions, and scene context. This directly matches the library's requirement to produce natural-language descriptions for historical photographs, as it uses a combination of object detection and scene understanding to output a full sentence.

Exam trap

Microsoft often tests the distinction between 'Describe image' (which outputs a full sentence) and 'Object detection' (which only outputs labels and bounding boxes), causing candidates to confuse a component feature with the end-to-end captioning capability.

How to eliminate wrong answers

Option B (Optical Character Recognition) is wrong because it extracts text from images, not objects, actions, or scene types; it would only help if the photos contained written captions. Option C (Object detection) is wrong because it only identifies and locates objects within an image (e.g., bounding boxes and labels) but does not generate a natural-language description or infer actions or scene types. Option D (Face detection) is wrong because it specifically identifies human faces and their attributes (e.g., age, emotion) and does not describe objects, actions, or broader scene context.

Practice this question →

137

MCQmedium

A retail company wants to build a system that can verify the identity of customers by comparing their live photo with an uploaded government-issued ID photo. Which Azure Computer Vision service should they use to perform the face comparison?

A.Azure Computer Vision - Image Analysis

B.Azure Face API

C.Azure Custom Vision

D.Azure Form Recognizer

AnswerB

Face API offers face verification, which checks if a live photo matches a reference photo (e.g., the ID photo) by comparing facial features.

Why this answer

The Azure Face API is specifically designed for face detection, verification, and comparison tasks. It can compare a live photo against a reference photo (such as a government-issued ID) using its 'Verify' operation, which returns a confidence score indicating whether the two faces belong to the same person. This makes it the correct choice for identity verification scenarios.

Exam trap

The trap here is that candidates may confuse the general-purpose Azure Computer Vision - Image Analysis service with the specialized Face API, assuming that any computer vision service can perform face comparison, when in fact only the Face API provides dedicated face verification functionality.

How to eliminate wrong answers

Option A is wrong because Azure Computer Vision - Image Analysis provides general image tagging, object detection, and optical character recognition, but it does not include face comparison or verification capabilities. Option C is wrong because Azure Custom Vision is used to train custom image classification or object detection models, not for pre-built face verification tasks. Option D is wrong because Azure Form Recognizer is designed to extract text and structure from documents (such as forms and invoices), not for face comparison or identity verification.

Practice this question →

138

MCQmedium

What is Optical Character Recognition (OCR) and which Azure AI service provides it?

A.Speech recognition; provided by Azure AI Speech

B.Technology that extracts text from images; provided by Azure AI Vision

C.Language translation; provided by Azure AI Translator

D.Handwriting analysis for personality assessment; provided by Azure AI Face

AnswerB

OCR recognizes printed or handwritten text in images and documents — Azure AI Vision's Read API provides this capability.

Why this answer

Optical Character Recognition (OCR) is the technology that extracts printed or handwritten text from images, such as scanned documents or photos, and converts it into machine-readable text. This capability is provided by the Azure AI Vision service, specifically through its Read API, which can process both printed and handwritten text from a variety of image formats.

Exam trap

The trap here is that candidates often confuse OCR with speech recognition or translation, but the key distinction is that OCR specifically extracts text from visual sources like images, not audio or language conversion.

How to eliminate wrong answers

Option A is wrong because speech recognition converts spoken language into text, not text from images, and is provided by Azure AI Speech, not Azure AI Vision. Option C is wrong because language translation converts text from one language to another, not extracting text from images, and is provided by Azure AI Translator. Option D is wrong because handwriting analysis for personality assessment is not a standard OCR capability; Azure AI Face is used for facial recognition and analysis, not text extraction.

Practice this question →

139

MCQmedium

What is the Responsible AI principle most relevant to Azure AI Face's attribute prediction features?

A.Reliability — ensuring Face API returns consistent results across all images

B.Fairness and privacy — preventing bias across demographic groups and avoiding surveillance misuse

C.Inclusiveness — ensuring Face API works for users of all abilities

D.Transparency — documenting how Face API determines attribute values

AnswerB

Fairness (accuracy disparities across skin tones/ages) and privacy (surveillance misuse risk) are the key responsible AI concerns for face attribute prediction.

Why this answer

Azure AI Face's attribute prediction features (e.g., age, emotion, hair color) have been restricted or retired due to concerns about demographic bias and potential misuse for surveillance. The Responsible AI principle of Fairness and privacy directly addresses these issues by requiring that AI systems avoid bias across demographic groups and prevent applications like unauthorized tracking or profiling, which is why this principle is most relevant.

Exam trap

The trap here is that candidates may confuse Transparency (documentation) with the ethical requirement to actually remove biased or privacy-invasive features, not just explain them.

How to eliminate wrong answers

Option A is wrong because Reliability focuses on consistent performance and error handling, not on the ethical concerns of bias or privacy that led to the restriction of Face API attributes. Option C is wrong because Inclusiveness ensures the system works for users of all abilities (e.g., accessibility features), which is unrelated to the demographic bias and surveillance risks inherent in attribute prediction. Option D is wrong because Transparency involves documenting how the system works, but the core issue with Face API attributes is not a lack of documentation—it is the ethical violation of fairness and privacy that caused Microsoft to retire these features.

Practice this question →

140

MCQmedium

What is 'people counting' in Azure AI Vision spatial analysis?

A.Counting how many different people have used a digital service over a time period

B.Using video AI to count people in zones for occupancy, footfall, and queue management

C.Identifying and counting employees who have completed mandatory training

D.Counting the number of faces detected in a photo album for tagging purposes

AnswerB

People counting applies spatial analysis to video — enabling real-time occupancy monitoring and footfall analytics.

Why this answer

People counting in Azure AI Vision spatial analysis uses video AI to detect and track individuals within defined zones, enabling accurate measurement of occupancy, footfall, and queue lengths. This is a core computer vision capability that processes live or recorded video streams to count people in real time, supporting retail, workplace, and public safety scenarios.

Exam trap

The trap here is that candidates confuse 'people counting' with generic face detection or user analytics, but Azure AI Vision spatial analysis specifically requires video input and spatial zone configuration, not static images or digital logs.

How to eliminate wrong answers

Option A is wrong because it describes digital service user analytics, not video-based spatial analysis; Azure AI Vision people counting operates on camera feeds, not digital service logs. Option C is wrong because it refers to HR training compliance tracking, which is unrelated to computer vision and spatial analysis. Option D is wrong because it describes face detection in static images for photo tagging, whereas people counting in spatial analysis focuses on counting individuals in video zones over time, not identifying or tagging faces.

Practice this question →

141

MCQeasy

What is 'invoice analysis' in Azure AI Document Intelligence?

A.Analysing invoice data to predict future payment defaults by customers

B.Extracting vendor, customer, line items, dates, and totals from vendor invoice images

C.Generating invoices from pricing data stored in a database

D.Comparing invoice totals against purchase orders to detect discrepancies

AnswerB

Invoice analysis automates AP processing — extracting all key invoice fields from scanned or digital invoices.

Why this answer

Invoice analysis in Azure AI Document Intelligence is a prebuilt model specifically designed to extract structured data from vendor invoices. It uses optical character recognition (OCR) and deep learning to identify and extract key fields such as vendor name, customer name, line items, invoice date, due date, and totals. This enables automated data entry and downstream processing without manual effort.

Exam trap

The trap here is that candidates confuse 'invoice analysis' (extracting data from invoice images) with downstream business processes like fraud detection, invoice generation, or reconciliation, which are not part of the Document Intelligence service's prebuilt capabilities.

How to eliminate wrong answers

Option A is wrong because predicting payment defaults is a predictive analytics or machine learning task, not a document extraction capability of Azure AI Document Intelligence. Option C is wrong because generating invoices from database data is a business logic or application development task, not a document analysis or extraction feature. Option D is wrong because comparing invoice totals against purchase orders is a reconciliation or audit process that would require additional logic or integration, not a built-in feature of the invoice analysis model.

Practice this question →

142

MCQmedium

Which Azure AI service allows you to analyze medical images for clinical decision support?

A.Azure AI Custom Vision with medical training data

B.Azure AI Health Insights and specialized medical imaging AI

C.Azure AI Translator for medical documents

D.Azure AI Speech for dictation only

AnswerB

Azure AI Health Insights provides clinical decision support capabilities for analyzing health data and medical imaging in healthcare contexts.

Why this answer

Azure AI Health Insights includes specialized medical imaging AI capabilities designed to analyze radiology and other medical images for clinical decision support. This service is built on domain-specific models trained on medical data, unlike general-purpose computer vision services.

Exam trap

The trap here is that candidates may assume Azure AI Custom Vision can be used for medical imaging by simply training it on medical data, but the exam expects knowledge of the dedicated, pre-built medical imaging service (Azure AI Health Insights) that is specifically designed for clinical decision support.

How to eliminate wrong answers

Option A is wrong because Azure AI Custom Vision is a general-purpose image classification and object detection service that requires custom training data; it does not come pre-trained with medical domain knowledge for clinical decision support. Option C is wrong because Azure AI Translator is a text translation service, not an image analysis service, and cannot process medical images. Option D is wrong because Azure AI Speech is for speech-to-text and text-to-speech, not image analysis, and its dictation functionality is unrelated to medical imaging.

Practice this question →

143

MCQmedium

A beverage company uses a camera system to inspect bottles on a conveyor belt. The system must automatically identify which bottles are defective (e.g., cracked or chipped) and which are acceptable, based on the overall appearance of each bottle. The company has thousands of labeled images of bottles (defective and non-defective). Which Azure Computer Vision service should they use to train a custom model?

A.Custom Vision – Object detection

B.Custom Vision – Image classification

C.Optical Character Recognition (OCR)

D.Face API

AnswerB

Image classification assigns a label to the entire image, perfectly matching the need to classify bottles as defective or acceptable.

Why this answer

Option B is correct because the scenario requires classifying each bottle image into one of two categories (defective or acceptable) based on overall appearance. Custom Vision – Image classification is designed exactly for this: it trains a model on labeled images to predict a single label per image, making it ideal for binary or multi-class classification tasks like defect detection.

Exam trap

The trap here is that candidates confuse object detection with image classification, assuming that identifying defects requires bounding boxes, when the question only asks for overall bottle status (defective vs. acceptable) based on appearance.

How to eliminate wrong answers

Option A is wrong because Custom Vision – Object detection identifies and locates multiple objects within an image (e.g., bounding boxes around cracks), but the requirement is only to classify the entire bottle as defective or acceptable, not to pinpoint defects. Option C is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant to detecting physical defects like cracks or chips on bottles. Option D is wrong because Face API is specialized for detecting, analyzing, and recognizing human faces, not for inspecting inanimate objects like bottles.

Practice this question →

144

MCQeasy

A warehouse uses video cameras to monitor a conveyor belt. They need to count the number of boxes passing by each hour to track throughput. Which Azure Computer Vision capability should they use?

A.Optical Character Recognition (OCR)

B.Face Detection

C.Image Classification

D.Object Detection

AnswerD

Object Detection finds and locates multiple objects in an image/video, enabling counting of boxes.

Why this answer

Object Detection is the correct capability because it can identify and locate multiple boxes within each video frame, allowing the system to count them as they move along the conveyor belt. Unlike image classification, which labels an entire image, object detection provides bounding boxes and counts for each detected object, making it ideal for real-time throughput tracking.

Exam trap

The trap here is that candidates confuse Image Classification with Object Detection, assuming that classifying an image as 'box' is sufficient, but classification cannot count multiple objects or provide their locations.

How to eliminate wrong answers

Option A is wrong because Optical Character Recognition (OCR) extracts text from images, not physical objects like boxes. Option B is wrong because Face Detection is specialized for identifying human faces, not inanimate objects such as boxes. Option C is wrong because Image Classification assigns a single label to an entire image, but cannot count or locate multiple instances of boxes within the same frame.

Practice this question →

145

MCQhard

A robotic arm in a factory needs to pick parts from a bin. The system must identify each part and its exact outline to ensure precise grasping. Which Computer Vision capability should be used?

A.Object detection

B.Image classification

C.Semantic segmentation

D.Optical Character Recognition

AnswerC

Semantic segmentation assigns a class label to each pixel, producing a detailed silhouette of each object that can guide precise robotic grasping.

Why this answer

Semantic segmentation is the correct capability because it classifies each pixel in an image, providing a precise outline of each part. This pixel-level classification is essential for a robotic arm to determine the exact shape and boundaries of parts for accurate grasping, unlike object detection which only provides bounding boxes.

Exam trap

The trap here is that candidates often confuse object detection (bounding boxes) with semantic segmentation (pixel-level masks), especially when the question emphasizes 'exact outline' — they may incorrectly choose object detection thinking it provides sufficient location information.

How to eliminate wrong answers

Option A is wrong because object detection identifies objects and their locations using bounding boxes, which do not provide the exact pixel-level outline needed for precise grasping. Option B is wrong because image classification assigns a single label to the entire image, offering no spatial information about individual parts or their outlines. Option D is wrong because Optical Character Recognition (OCR) extracts text from images, which is irrelevant to identifying parts and their shapes.

Practice this question →

146

MCQmedium

What is 'video summarisation' in Azure Video Indexer and how does it work?

A.Generating a text transcript summary of what was said in the video

B.Automatically creating a highlight reel of the most informative video segments from a longer video

C.Compressing video file size while maintaining acceptable visual quality

D.Adding automatic chapter markers and timestamps to a video for navigation

AnswerB

Video summarisation analyses content and selects the best clips — turning hours of video into a concise watchable summary.

Why this answer

Video summarization in Azure Video Indexer automatically creates a highlight reel by selecting the most informative and visually interesting segments from a longer video. It uses AI models to analyze visual content, audio, and scene dynamics to identify key moments, such as changes in activity, faces, or objects, and then stitches these segments into a concise summary. This is distinct from transcript generation or chapter markers, as it focuses on extracting a condensed video output rather than text or navigation aids.

Exam trap

The trap here is that candidates confuse 'video summarization' with 'transcript summarization' (Option A), because both involve summarization, but the key distinction is that video summarization outputs a video clip, not text.

How to eliminate wrong answers

Option A is wrong because generating a text transcript summary of spoken content is a separate feature called 'transcript summarization' or 'speech-to-text with summarization,' not video summarization, which produces a video output. Option C is wrong because compressing video file size while maintaining quality is a video encoding or compression task, unrelated to Azure Video Indexer's AI-driven content analysis and summarization. Option D is wrong because adding automatic chapter markers and timestamps is a feature known as 'scene segmentation' or 'chapter generation,' which provides navigation but does not create a condensed video highlight reel.

Practice this question →

147

MCQhard

What is 'neural radiance field' (NeRF) technology and how does it relate to Azure AI Vision capabilities?

A.A technique for compressing neural network weights using magnetic fields

B.A method for learning 3D scene representations from multiple 2D photographs to enable novel view synthesis

C.A networking technology that transmits images with zero packet loss

D.A type of GPU shader program used for real-time 3D rendering in games

AnswerB

NeRF learns volumetric 3D scene representations from 2D image sets — enabling photorealistic synthesis of never-photographed viewpoints.

Why this answer

Neural Radiance Fields (NeRF) use a neural network to learn a continuous 5D representation of a scene from a sparse set of 2D photographs, enabling the synthesis of novel views from arbitrary camera angles. This relates to Azure AI Vision capabilities because Azure's Computer Vision services can be integrated with NeRF-based models for advanced 3D reconstruction and volumetric rendering tasks, such as generating immersive 3D assets from 2D images in mixed reality or digital twin scenarios.

Exam trap

The trap here is that candidates may confuse NeRF with traditional 3D rendering techniques (like shaders or game engines) or unrelated networking concepts, rather than recognizing it as a neural 3D scene representation method for novel view synthesis.

How to eliminate wrong answers

Option A is wrong because NeRF does not involve compressing neural network weights using magnetic fields; that describes a hypothetical or unrelated concept, not a real computer vision technique. Option C is wrong because NeRF is not a networking technology; it is a 3D scene representation method, and zero packet loss is a networking reliability goal unrelated to NeRF. Option D is wrong because NeRF is not a GPU shader program for real-time game rendering; it is a neural rendering approach that typically requires offline training and inference, not real-time shader execution.

Practice this question →

148

MCQmedium

A museum wants to automatically generate detailed descriptions of artwork for a mobile app. For each painting, the app should produce a natural-language description that includes the dominant colors, the objects present in the scene, and whether the scene is indoor or outdoor. Which Azure Computer Vision capability is best suited for this task?

A.Optical Character Recognition (OCR)

B.Image Analysis (Describe Image / Dense Captions)

C.Face API

D.Object Detection

AnswerB

Image Analysis can generate descriptive captions that include objects, colors, and scene context, meeting all requirements.

Why this answer

Image Analysis with the Describe Image or Dense Captions API is specifically designed to generate human-readable sentences summarizing the content of an image, including dominant colors, objects, and scene attributes like indoor/outdoor. This capability uses pre-trained deep learning models to produce natural-language descriptions, making it the ideal choice for the museum's requirement of detailed, automated artwork descriptions.

Exam trap

The trap here is that candidates often confuse Object Detection (which only identifies objects and their locations) with the full scene understanding and natural-language generation provided by the Describe Image / Dense Captions API, leading them to select option D.

How to eliminate wrong answers

Option A (OCR) is wrong because it extracts text from images, not visual content like colors, objects, or scene type. Option C (Face API) is wrong because it focuses on detecting and analyzing human faces (e.g., age, emotion, landmarks), not general scene understanding or object descriptions. Option D (Object Detection) is wrong because it only identifies and locates specific objects within an image using bounding boxes, but does not generate natural-language descriptions or infer scene attributes like indoor/outdoor.

Practice this question →

149

MCQmedium

A retail company wants to use Azure Computer Vision to monitor product availability on shelves. They need to detect the presence and location of any product (e.g., a box, a bottle) on a shelf image, but they do not need to identify the specific product brand or type. Which prebuilt Azure Computer Vision capability should they use?

A.Object detection

B.Image classification

C.Optical Character Recognition (OCR)

D.Semantic segmentation

AnswerA

Object detection identifies objects and their locations with bounding boxes, which directly fulfills the requirement without needing to identify the specific product type.

Why this answer

Object detection is the correct choice because it identifies and locates multiple objects within an image by drawing bounding boxes around each detected item. For monitoring product availability on shelves, the company needs to know both the presence and position of products (e.g., boxes, bottles) without identifying specific brands or types, which aligns exactly with object detection's capability to output class labels (e.g., 'product') and coordinates.

Exam trap

The trap here is that candidates often confuse object detection with image classification, thinking classification can locate items, but classification only provides a single label for the whole image, not per-object positions.

How to eliminate wrong answers

Option B (Image classification) is wrong because it assigns a single label to the entire image, not detecting individual objects or their locations, so it cannot indicate where products are on a shelf. Option C (Optical Character Recognition) is wrong because it extracts text from images, not physical objects like boxes or bottles, and is irrelevant for detecting product presence. Option D (Semantic segmentation) is wrong because it classifies every pixel in the image into categories (e.g., shelf, product, background) but does not provide distinct bounding boxes or count for individual product instances, making it unsuitable for locating each product separately.

Practice this question →

150

Multi-Selectmedium

A company needs to extract text from scanned invoices and receipts. Which Azure services are suitable for this task? (Choose two.)

Select 2 answers

A.Computer Vision

B.Azure AI Document Intelligence

C.Azure AI Language

D.Custom Vision

AnswersA, B

Computer Vision includes an OCR capability that can detect and extract text from images and documents.

Why this answer

Computer Vision (option A) is correct because it provides OCR capabilities to extract printed and handwritten text from images, including scanned invoices and receipts. Its Read API can process text from various surfaces and layouts, making it suitable for document digitization tasks.

Exam trap

The trap here is that candidates may confuse Azure AI Language (a text analytics service) with OCR capabilities, or think Custom Vision can extract text, when in fact it only classifies or detects objects in images.

Practice this question →