Knowledge + Practice

CCNA Aio Implementing Ai Questions

75 of 125 questions · Page 1/2 · Aio Implementing Ai topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQeasy

When implementing a vector store for a RAG system, which similarity search metric is MOST commonly used to find the most relevant document chunks for a given query embedding?

A.Manhattan distance

B.Euclidean distance

C.Dot product

D.Cosine similarity

AnswerD

Cosine similarity measures orientation similarity and is widely used for comparing dense embeddings.

Why this answer

Cosine similarity is the most common metric for comparing embedding vectors in RAG because it measures the angle between vectors, which works well for high-dimensional semantic embeddings.

Practice this question →

2

MCQhard

A team is implementing a RAG system for legal document retrieval. The documents are long (50-100 pages) with clear section headings. They want to ensure that retrieved chunks are semantically coherent and respect document structure. Which chunking strategy is MOST appropriate?

A.Semantic chunking based on sentence embeddings

B.Fixed-size chunking with 256 tokens and no overlap

C.Recursive character text splitting with chunk size 1000 and chunk overlap 200

D.Hierarchical chunking: first split by sections, then further split each section into fixed-size chunks with overlap

AnswerD

Hierarchical chunking respects the document structure and provides coherent chunks within sections.

Why this answer

Hierarchical chunking preserves document structure by first splitting into sections, then further into chunks, maintaining semantic coherence.

Practice this question →

3

Multi-Selecthard

A company is building a code generation assistant for internal developers. They want the assistant to generate code snippets consistent with the company's coding style and use private libraries. They have a few thousand examples of internal code. Which THREE considerations are critical when deciding between fine-tuning a base LLM and using RAG?

Select 3 answers

A.Fine-tuning a few thousand examples is insufficient; millions are required for any meaningful adaptation.

B.RAG requires the model to have a high context window size to accommodate retrieved code snippets.

C.RAG eliminates the need for any model updates when private libraries change, because it retrieves the latest documentation at inference time.

D.Security constraints may favour RAG because sensitive code is never part of the model's weights.

E.Fine-tuning can encode company-specific coding conventions directly into the model, reducing the need for style instructions in prompts.

AnswersC, D, E

RAG retrieves from a vector store that can be updated without retraining the model.

Why this answer

Fine-tuning can embed coding style and internal library knowledge into model weights, but requires regular updates. RAG is easier to update but may miss stylistic nuances. The volume of examples (a few thousand) is moderate; fine-tuning may still be feasible.

Security and latency/availability are relevant for deployment.

Practice this question →

4

MCQmedium

A team is building a RAG system with a large repository of technical manuals. They want to ensure that each retrieved chunk is semantically coherent and that related concepts are grouped together. Which chunking strategy is BEST?

A.Chunking by page number

B.Fixed-size chunking with 512 tokens

C.Hierarchical chunking with parent-child relationships

D.Semantic chunking using a sentence splitter with topic boundaries

AnswerD

Semantic chunking creates meaningful units, improving retrieval quality by keeping related text together.

Why this answer

Semantic chunking using a sentence splitter with topic boundaries ensures that each chunk is a self-contained, semantically coherent unit by detecting natural topic shifts (e.g., via embedding similarity or discourse markers). This directly supports the requirement for semantically coherent chunks and grouping of related concepts, unlike methods that ignore content meaning.

Exam trap

Cisco often tests the misconception that hierarchical chunking (Option C) is the best for semantic coherence, but its true purpose is multi-granularity retrieval, not ensuring each chunk is internally coherent.

How to eliminate wrong answers

Option A is wrong because chunking by page number ignores semantic boundaries; a single page may contain multiple unrelated topics or split a single concept across pages, breaking coherence. Option B is wrong because fixed-size chunking with 512 tokens treats all content uniformly, often cutting sentences or ideas in half, which destroys semantic coherence and fails to group related concepts. Option C is wrong because hierarchical chunking with parent-child relationships is designed for retrieval over multiple granularities (e.g., summarization or multi-hop QA), not for ensuring each individual chunk is semantically coherent; it can still contain mixed topics within a chunk.

Practice this question →

5

MCQmedium

A data scientist is preparing a dataset for a binary classification model. The dataset has 95% majority class and 5% minority class. Which data preparation technique is BEST to address the class imbalance?

A.Min-max normalization of all features

B.Random undersampling of the majority class

C.Removing all minority class samples

D.SMOTE oversampling of the minority class

AnswerD

SMOTE creates synthetic minority samples by interpolating between existing minority instances, effectively balancing the classes without losing data.

Why this answer

SMOTE (Synthetic Minority Oversampling TEchnique) generates synthetic samples for the minority class, balancing the dataset without simply duplicating existing minority instances.

Practice this question →

6

MCQmedium

An AI agent is designed to book flights by calling an external API. The agent must decide which tool to call based on user input, then generate the correct API parameters. Which pattern is MOST appropriate for this workflow?

A.Chain-of-thought prompting only

B.Zero-shot prompting with JSON mode

C.ReAct pattern with tool descriptions and function calling

D.Simple prompt with no tool descriptions

AnswerC

ReAct enables the agent to reason about the next action and call the appropriate tool with correct parameters.

Why this answer

The ReAct (Reasoning + Acting) pattern interleaves reasoning steps with tool calls, allowing the agent to decide when to call a function and what arguments to use.

Practice this question →

7

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Train a custom model from scratch on the policy documents each month

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without model retraining.

Why this answer

RAG allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

8

Multi-Selectmedium

An organisation is developing a document intelligence system that extracts information from scanned invoices. Which THREE data preparation steps are critical to ensure high extraction accuracy? (Choose THREE.)

Select 3 answers

A.Cleaning and correcting OCR output

B.Removing punctuations and stopwords

C.Normalising all text to lowercase

D.Annotating bounding boxes and field labels

E.Image preprocessing (e.g., deskewing, binarisation)

AnswersA, D, E

OCR errors must be fixed to avoid downstream extraction mistakes.

Why this answer

Image preprocessing (like skew correction), OCR cleaning, and field annotation are essential for accurate extraction.

Practice this question →

9

MCQmedium

A machine learning engineer is building a recommendation system for an e-commerce platform. The system should suggest products based on user purchase history and browsing behavior. Which model selection is BEST suited for this task?

A.Image classification model (e.g., CNN)

B.Linear regression

C.Random forest classifier

D.Collaborative filtering model (e.g., matrix factorization)

AnswerD

Collaborative filtering leverages patterns of user-item interactions to make personalized recommendations, ideal for this scenario.

Why this answer

Collaborative filtering models (e.g., matrix factorization) are effective for recommendation tasks using user-item interaction data. Linear regression is for regression, not recommendation. Image classification is unrelated.

Random forests can be used but are less common for collaborative filtering.

Practice this question →

10

MCQmedium

During testing of an AI system that classifies support tickets into categories, the team notices the model frequently misclassifies tickets about a new product feature that was introduced after the model was trained. Which type of testing should the team prioritize to catch this issue?

A.Unit tests for the data pipeline

B.Regression testing with a test set that includes examples of the new feature

C.Integration tests for API calls

D.Evaluation framework for LLM output quality

AnswerB

Regression testing involves re-running tests after changes; including new feature examples helps detect if the model fails on previously unseen categories.

Why this answer

The model's misclassification of the new product feature is a classic case of data drift, where the production data distribution differs from the training data. Regression testing with a test set that includes examples of the new feature directly validates whether the model still performs correctly on this unseen category. This is the most targeted approach to catch the regression in classification accuracy caused by the new feature.

Exam trap

Cisco often tests the distinction between testing the model's predictive behavior (regression testing) versus testing the infrastructure or data pipeline components, leading candidates to mistakenly choose unit or integration tests.

How to eliminate wrong answers

Option A is wrong because unit tests for the data pipeline verify data ingestion and transformation logic, not the model's classification performance on new feature categories. Option C is wrong because integration tests for API calls check the connectivity and response format between system components, not the semantic accuracy of model predictions. Option D is wrong because an evaluation framework for LLM output quality is designed for generative text tasks, not for a classification model that assigns predefined categories to support tickets.

Practice this question →

11

MCQhard

A developer is fine-tuning a large language model for a code generation task. The available GPU has only 8GB of VRAM, and the base model is 7B parameters. Which fine-tuning technique is MOST feasible?

A.QLoRA (Quantized Low-Rank Adaptation)

B.LoRA (Low-Rank Adaptation)

C.Instruction tuning with a smaller model

D.Full fine-tuning of all parameters

AnswerA

QLoRA quantizes the base model to 4-bit and uses LoRA adapters, making it possible to fine-tune a 7B model on 8GB VRAM.

Why this answer

QLoRA (Quantized Low-Rank Adaptation) combines quantization and LoRA to fine-tune large models on limited VRAM.

Practice this question →

12

MCQmedium

During the evaluation phase of an AI project, the team measures the model's F1 score on a held-out test set. They find the F1 score is 0.92, but when deployed in production, the model performs poorly on new data. What is the MOST likely cause of this discrepancy?

A.The production data has a different distribution than the training data (concept drift)

B.The model's hyperparameters were not properly tuned

C.The model is overfitting to the training data

D.Data leakage occurred between the training and test sets during preparation

AnswerD

Data leakage artificially inflates evaluation metrics; the model may have seen test data during training, leading to a false sense of performance.

Why this answer

Data leakage during preparation can cause overly optimistic evaluation scores. If the test set contains information from the training set, the model appears better than it really is. Overfitting is possible but less likely with a proper hold-out.

Concept drift occurs over time, not immediately. Poor hyperparameter tuning usually yields lower scores, not inflated ones.

Practice this question →

13

Multi-Selectmedium

A developer is building an AI agent that needs to call external tools (e.g., weather API, database) and reason about the results to answer user queries. Which THREE components are essential for implementing this agentic workflow?

Select 3 answers

A.Planning capability (e.g., step-by-step decomposition)

B.ReAct (Reasoning + Acting) loop

C.Fine-tuned domain-specific model

D.A vector store for long-term memory

E.Function calling or tool use interface

AnswersA, B, E

Planning allows the agent to break down complex requests into sub-tasks and execute them in order.

Why this answer

Option A is correct because planning capability enables the agent to decompose complex user queries into manageable sub-tasks, such as retrieving weather data before making a recommendation. This step-by-step reasoning is critical for multi-step workflows where the order of tool calls affects the final answer. Without planning, the agent would lack the structured approach needed to handle dependencies between external tool outputs.

Exam trap

Cisco often tests the misconception that fine-tuning or vector stores are mandatory for agentic workflows, when in fact the core requirements are planning, a reasoning-acting loop, and a tool-use interface, all achievable with a base model and prompt engineering.

Practice this question →

14

MCQmedium

A company is fine-tuning a large language model using PEFT (Parameter-Efficient Fine-Tuning) to reduce GPU memory usage. They have limited hardware and need to fine-tune a 70B parameter model on a single GPU with 24 GB VRAM. Which technique is MOST suitable?

A.Full fine-tuning with gradient checkpointing

B.QLoRA (Quantization-aware LoRA) with 4-bit quantization

C.Instruction tuning with a smaller 7B model

D.LoRA (Low-Rank Adaptation) alone

AnswerB

QLoRA quantizes the base model to 4-bit, drastically reducing memory usage, and uses LoRA adapters for fine-tuning, fitting a 70B model in 24GB VRAM.

Why this answer

QLoRA combines quantization (4-bit) and LoRA to fine-tune very large models on limited hardware, achieving significant memory reduction while maintaining performance.

Practice this question →

15

MCQhard

A team is fine-tuning a large language model using LoRA. They have limited GPU memory. Which technique can further reduce memory consumption while maintaining similar fine-tuning quality?

A.Fine-tune all layers instead of using LoRA

B.Increase the rank of LoRA adapters

C.Use QLoRA with 4-bit quantization of the base model

D.Use a larger batch size

AnswerC

QLoRA quantizes the base model to 4 bits, significantly reducing memory while LoRA adapters handle fine-tuning.

Why this answer

QLoRA combines 4-bit quantization of the base model with LoRA adapters, drastically reducing memory usage while preserving fine-tuning performance.

Practice this question →

16

MCQeasy

A data science team is preparing a dataset for a supervised learning task. They split the data into training and test sets. The team then normalizes the features using the mean and standard deviation calculated from the entire dataset before splitting. What issue does this introduce?

A.It improves model generalization

B.It introduces train/test leakage

C.It causes the model to overfit the training data

D.It reduces the variance of the features

AnswerB

Correct: normalizing using global statistics means test set information is used to transform training data, leaking information.

Why this answer

Using statistics from the entire dataset before splitting causes test data information to influence the training process, leading to train/test leakage and overly optimistic performance estimates.

Practice this question →

17

MCQmedium

A data science team is training an image classification model for a medical imaging application. To prevent data leakage, they must partition the dataset correctly. Which approach ensures that no patient images appear in both training and test sets?

A.Split by patient ID so that all images of a patient go to one set only

B.Shuffle the dataset and take the first 80% for training and last 20% for testing

C.Use k-fold cross-validation without grouping

D.Randomly split all images into training and test sets

AnswerA

Splitting by patient ID ensures no patient appears in both training and test, preventing leakage.

Why this answer

Data leakage occurs when information from the test set leaks into training. Splitting by patient ID ensures that all images from the same patient are kept together in one partition.

Practice this question →

18

MCQeasy

During data preparation for a classification model, the data scientist notices that one class has 95% of the samples and the other has only 5%. Which technique is MOST appropriate to address this imbalance?

A.Shuffle the data randomly before each training epoch

B.Remove the minority class samples entirely

C.Use a larger learning rate to force the model to pay attention to the minority class

D.Apply SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples for the minority class

AnswerD

SMOTE creates synthetic minority samples, balancing the dataset and improving model performance without discarding data.

Why this answer

Resampling techniques like SMOTE generate synthetic samples for the minority class or undersample the majority class, directly addressing class imbalance.

Practice this question →

19

MCQeasy

Which similarity metric is MOST appropriate for comparing dense vector embeddings in a vector store used for document retrieval, when the embeddings are normalized to unit length?

A.Jaccard similarity

B.Manhattan distance

C.Cosine similarity

D.Euclidean distance

AnswerC

Cosine similarity measures the angle between vectors and is standard for normalized embeddings, yielding best semantic match.

Why this answer

Cosine similarity is equivalent to dot product for normalized vectors and is the most common metric for semantic similarity. Euclidean distance is sensitive to magnitude and not ideal for normalized vectors.

Practice this question →

20

Multi-Selectmedium

A data scientist is preparing a dataset for training a customer churn prediction model. To prevent train/test leakage, which TWO practices should be followed? (Select TWO)

Select 2 answers

A.Remove duplicate records only from the test set to ensure uniqueness

B.Shuffle the entire dataset randomly before splitting into train and test sets

C.Split the data chronologically (e.g., use data before a certain date for training, after for testing)

D.Normalize numerical features using statistics computed on the entire dataset before splitting

E.Perform feature selection using only the training data, then apply the same features to the test set

AnswersC, E

Chronological splitting preserves the temporal order, preventing future data from leaking into the training set.

Why this answer

To prevent leakage, time-based splitting respects temporal order (no future data in training). Not normalizing before splitting avoids information from the test set influencing training. The other options either cause leakage or are unrelated.

Practice this question →

21

MCQmedium

A data science team is fine-tuning a large language model for a domain-specific task using LoRA. They have a limited GPU budget and want to minimize memory usage during training. Which technique should they use?

A.Use LoRA (Low-Rank Adaptation) with the base model in full precision

B.Use PEFT (Parameter-Efficient Fine-Tuning) without specifying a specific method

C.Use QLoRA (Quantized LoRA) with a 4-bit quantized base model

D.Full fine-tuning of the entire model

AnswerC

QLoRA quantizes the base model to 4-bit, significantly reducing memory usage while applying LoRA adapters for efficient fine-tuning.

Why this answer

QLoRA (Quantized LoRA) quantizes the base model to 4-bit, drastically reducing memory usage while still applying LoRA adapters. Standard LoRA uses full precision. PEFT is a category, not a specific technique.

Full fine-tuning uses the most memory.

Practice this question →

22

Multi-Selecthard

A company is deploying an LLM-based chatbot that must output responses in a structured JSON format for downstream processing. Which THREE prompt engineering techniques should the team use to ensure the output is valid and correctly structured? (Select three.)

Select 3 answers

A.Include few-shot examples of correct JSON outputs

B.Set temperature to 0 to increase determinism

C.Enable JSON mode or structured output mode in the model API

D.Define the expected JSON schema in the system prompt

E.Use chain-of-thought prompting to reason before output

AnswersA, C, D

Why this answer

A system prompt defining JSON structure, few-shot examples of valid JSON, and JSON mode in the model API all help produce valid structured output. Chain-of-thought and temperature adjustment do not directly enforce JSON format.

Practice this question →

23

MCQmedium

A developer is integrating an AI microservice that accepts image uploads and returns classification labels. The service must handle spikes of up to 1,000 requests per minute but average 100 requests per minute. Which deployment architecture BEST meets these requirements with cost efficiency?

A.Expose the model via a serverless function (e.g., AWS Lambda) with synchronous invocation

B.Use an async processing queue (e.g., RabbitMQ) with a pool of worker instances that auto-scale based on queue depth

C.Deploy the service as a synchronous REST API on a single always-on VM sized for peak load

D.Stream results directly from the model to the client using WebSockets

AnswerB

Async queue buffers spikes, workers scale only when needed, reducing cost while handling bursts.

Why this answer

Async processing with a queue allows buffering during spikes, scaling workers as needed. A synchronous always-on service would be over-provisioned for average load. Serverless with auto-scaling offers cost efficiency.

Practice this question →

24

Multi-Selectmedium

A data science team is developing a churn prediction model. Which TWO data preparation best practices are MOST important to prevent overfitting and ensure generalization?

Select 2 answers

A.Split data into training and test sets before any preprocessing

B.Normalize all features using the entire dataset's statistics

C.Use cross-validation to tune hyperparameters

D.Remove outliers based on the full dataset distribution

E.Encode categorical variables with target encoding on the full dataset

AnswersA, C

Splitting first avoids leaking test information into scaling or imputation.

Why this answer

Splitting data before any preprocessing prevents leakage from test set into training. Cross-validation provides a more robust estimate of generalization than a single hold-out set.

Practice this question →

25

MCQeasy

In the AI project lifecycle, which phase involves splitting the dataset into training, validation, and test sets while ensuring no data leakage?

A.Data preparation

B.Problem definition

C.Data acquisition

D.Model evaluation

AnswerA

Data preparation includes splitting and ensuring no leakage from future information.

Why this answer

Splitting the dataset into training, validation, and test sets is a core data preparation step that must be performed before any model training begins. This phase ensures that data leakage is prevented by keeping the test set completely isolated until final evaluation, which is critical for obtaining an unbiased estimate of model performance. In the AI project lifecycle, data preparation encompasses cleaning, transforming, and partitioning the data, making option A the correct phase.

Exam trap

The trap here is that candidates confuse 'data acquisition' (collecting data) with 'data preparation' (cleaning and splitting), leading them to incorrectly choose option C when the question specifically asks about splitting and leakage prevention.

How to eliminate wrong answers

Option B is wrong because problem definition focuses on identifying business objectives and success criteria, not on technical data partitioning or leakage prevention. Option C is wrong because data acquisition involves collecting raw data from sources (e.g., databases, APIs, sensors) and does not include the splitting or leakage-avoidance steps. Option D is wrong because model evaluation occurs after training and uses the already-split test set to assess performance; it does not involve creating the splits or addressing data leakage.

Practice this question →

26

Multi-Selectmedium

A team is deploying an AI microservice for real-time object detection in streaming video. Which TWO integration patterns are most appropriate? (Choose two.)

Select 2 answers

A.Streaming responses for real-time inference

B.Batch processing with nightly jobs

C.Synchronous request-response with long timeouts

D.Monolithic application deployment

E.AI microservice architecture

AnswersA, E

Streaming enables low-latency, continuous output for video frames.

Why this answer

AI microservices are deployed as independent services, and streaming responses are needed for real-time video processing.

Practice this question →

27

Multi-Selecthard

A data scientist is preparing a dataset for a text classification model. To prevent train/test leakage, which THREE practices should they follow?

Select 3 answers

A.Shuffle the entire dataset before splitting to ensure randomness

B.Use time-based splitting for temporal data

C.Perform train/test split before any data cleaning or normalization

D.Apply feature scaling to the entire dataset before splitting

E.Remove duplicate samples and ensure that no text from the same document appears in both sets

AnswersB, C, E

For time-series or evolving data, splitting by time ensures the model is not trained on future information.

Why this answer

Option B is correct because time-based splitting preserves the temporal order of data, which is critical for time-series or temporal text data to prevent the model from learning from future information that would not be available at inference time. This avoids train/test leakage where future data leaks into the training set, artificially inflating model performance.

Exam trap

Cisco often tests the misconception that shuffling the entire dataset is always safe, but for temporal data or when duplicates exist, shuffling can introduce leakage by mixing future and past samples or spreading identical text across train and test sets.

Practice this question →

28

Multi-Selecteasy

A company wants to use AI to automatically detect anomalies in server log data. The data is time-series and labeled with 'normal' and 'anomaly' for the past year. Which TWO techniques are appropriate for this use case?

Select 2 answers

A.Train an image classification model (CNN) on screenshots of log graphs

B.Use a time-series anomaly detection model (e.g., Isolation Forest with sliding windows)

C.Train a supervised classification model (e.g., XGBoost) on extracted features with the labels

D.Use a code generation model to fix the anomalies automatically

E.Build a recommendation system based on user activity logs

AnswersB, C

Isolation Forest works on numerical features; sliding windows capture temporal patterns.

Why this answer

Option B is correct because Isolation Forest with sliding windows is a well-suited unsupervised technique for detecting anomalies in time-series data by isolating outliers in feature windows extracted from the log stream. Option C is correct because the company has labeled data ('normal' and 'anomaly'), enabling a supervised classification model like XGBoost to learn patterns from engineered features and predict anomalies accurately.

Exam trap

Cisco often tests the distinction between supervised and unsupervised techniques, and candidates mistakenly choose an unsupervised method (like Isolation Forest) when labeled data is available, or they overlook that both supervised and unsupervised approaches can be valid depending on the data and problem framing.

Practice this question →

29

MCQmedium

A company is fine-tuning an LLM for a domain-specific task using LoRA. They have limited GPU memory and need to reduce memory footprint without sacrificing fine-tuning quality. Which approach should they consider?

A.Use QLoRA with 4-bit quantized base model

B.Use a larger batch size to speed up training

C.Fine-tune all layers of the base model

D.Increase the rank of LoRA adapters

AnswerA

QLoRA quantizes the base model to 4-bit, reducing memory while keeping LoRA adapters for fine-tuning.

Why this answer

QLoRA combines 4-bit NormalFloat quantization of the base model with LoRA adapters, drastically reducing GPU memory usage while preserving fine-tuning quality through techniques like double quantization and paged optimizers. This directly addresses the constraint of limited GPU memory without sacrificing the model's ability to learn domain-specific tasks effectively.

Exam trap

Cisco often tests the misconception that increasing model capacity (e.g., higher LoRA rank or full fine-tuning) always improves quality, when in fact memory-constrained environments require efficient techniques like QLoRA that balance resource usage and performance.

How to eliminate wrong answers

Option B is wrong because increasing batch size increases GPU memory consumption, which is counterproductive when memory is limited. Option C is wrong because fine-tuning all layers of the base model requires full gradient storage and optimizer states for every parameter, dramatically increasing memory footprint and defeating the purpose of memory reduction. Option D is wrong because increasing the rank of LoRA adapters increases the number of trainable parameters and their associated optimizer states, raising memory usage without guaranteeing improved fine-tuning quality.

Practice this question →

30

MCQmedium

An AI system uses a pre-trained image classification model to detect defects in manufacturing. The team wants to deploy the model in an edge device with limited GPU memory. Which technique should they consider first?

A.Train the model from scratch using a smaller dataset

B.Apply quantization to reduce model size

C.Use a larger model with more parameters for higher accuracy

D.Increase the batch size to improve throughput

AnswerB

Quantization reduces memory footprint and speeds up inference on edge devices.

Why this answer

Quantization reduces the precision of the model's weights and activations (e.g., from 32-bit floating point to 8-bit integer), which significantly shrinks the model size and memory footprint while often maintaining acceptable accuracy. This is the most direct and effective first step for deploying a pre-trained model on an edge device with limited GPU memory, as it requires no retraining and immediately addresses the memory constraint.

Exam trap

Cisco often tests the misconception that increasing batch size or model size improves performance in resource-constrained environments, when in fact these actions increase memory demand and are counterproductive for edge deployment.

How to eliminate wrong answers

Option A is wrong because training from scratch on a smaller dataset would likely result in poor accuracy due to insufficient data and would not leverage the benefits of transfer learning, making it an inefficient and risky first step. Option C is wrong because using a larger model with more parameters would increase memory consumption, directly contradicting the goal of deploying on a device with limited GPU memory. Option D is wrong because increasing the batch size increases memory usage per inference step, which would worsen the memory constraint rather than alleviating it.

Practice this question →

31

MCQmedium

A data science team is building a binary classifier to detect fraudulent transactions. The dataset has only 2% fraud cases. Which data preparation technique is MOST critical to address this imbalance?

A.Use one-hot encoding on categorical features

B.Remove outliers from the transaction amounts

C.Apply synthetic minority oversampling (SMOTE) to the training set

D.Normalize all numerical features to have zero mean and unit variance

AnswerC

SMOTE creates synthetic fraud examples, balancing the classes and improving recall on the minority class.

Why this answer

With only 2% fraud cases, the dataset is severely imbalanced, which can cause the classifier to be biased toward the majority class (non-fraud) and achieve high accuracy without learning to detect fraud. SMOTE (Synthetic Minority Oversampling Technique) addresses this by generating synthetic examples of the minority class (fraud) in the training set, balancing the class distribution and improving the model's ability to generalize to fraud cases. This is the most critical technique among the options because it directly tackles the class imbalance problem, which is the primary challenge in this scenario.

Exam trap

Cisco often tests the misconception that data scaling or encoding is the primary fix for imbalance, when in fact techniques like SMOTE that directly modify the class distribution are required.

How to eliminate wrong answers

Option A is wrong because one-hot encoding is a technique for converting categorical variables into a numerical format, but it does not address class imbalance; it is relevant for feature representation, not for balancing the dataset. Option B is wrong because removing outliers from transaction amounts may discard legitimate high-value transactions or even some fraud cases, potentially worsening the imbalance and losing valuable information; outlier removal is for data cleaning, not for handling class imbalance. Option D is wrong because normalizing numerical features to have zero mean and unit variance is a scaling technique that helps gradient descent converge faster and ensures features contribute equally, but it does not alter the class distribution or mitigate imbalance.

Practice this question →

32

Multi-Selecteasy

An AI engineer is selecting a PEFT technique to fine-tune a large language model. Which TWO are examples of PEFT (Parameter-Efficient Fine-Tuning)?

Select 2 answers

A.Instruction tuning on a large dataset

B.LoRA

C.QLoRA

D.Gradient checkpointing

E.Full fine-tuning of all parameters

AnswersB, C

LoRA adds low-rank matrices to transformer layers, a standard PEFT method.

Why this answer

LoRA and QLoRA are popular PEFT methods that update only a small number of additional parameters while keeping the base model frozen.

Practice this question →

33

MCQeasy

Which stage of the AI project lifecycle involves splitting data into training, validation, and test sets?

A.Model evaluation

B.Data acquisition

C.Problem definition

D.Data preparation

AnswerD

Data preparation includes cleaning, transforming, and splitting data into subsets.

Why this answer

Data preparation includes splitting data, cleaning, normalization, and preventing leakage.

Practice this question →

34

MCQmedium

A developer is building an AI agent that needs to call external APIs (e.g., get weather, send email) based on user requests. Which pattern is BEST for enabling the agent to autonomously decide when to call these APIs?

A.Hard-code the API calls in the agent's logic

B.Use a chain-of-thought prompt to reason about the steps

C.Implement function calling in the LLM to generate structured API calls

D.Use a planning agent with a predefined workflow

AnswerC

Function calling allows the LLM to produce a JSON object specifying which function to call and with what arguments, enabling the agent to decide autonomously.

Why this answer

Function calling allows the LLM to output structured commands that invoke specific API functions, enabling autonomous tool use in a controlled manner.

Practice this question →

35

MCQhard

A developer is implementing a RAG system and needs to chunk large legal documents. The documents contain nested clauses and cross-references that should not be split across chunks. Which chunking strategy is MOST suitable?

A.Random chunking with varying sizes

B.Hierarchical chunking combining small and large chunks

C.Semantic chunking based on sentence and paragraph boundaries

D.Fixed-size chunking with 256 tokens

AnswerC

Semantic chunking respects natural language boundaries, keeping related clauses together.

Why this answer

Semantic chunking uses sentence boundaries or natural breakpoints, preserving logical units.

Practice this question →

36

MCQeasy

An AI engineer needs to select a similarity measure for comparing dense embedding vectors in a vector store for document retrieval. Which two measures are commonly used?

A.Pearson correlation and Spearman rank

B.Cosine similarity and Jaccard similarity

C.Dot product and cosine similarity

D.Euclidean distance and Manhattan distance

AnswerC

Both are commonly used for dense vectors; cosine similarity is dot product after normalization.

Why this answer

Option C is correct because dot product and cosine similarity are the two most commonly used measures for comparing dense embedding vectors in vector stores. Cosine similarity computes the cosine of the angle between vectors, making it length-invariant, while dot product is efficient and directly related to cosine similarity when vectors are normalized. Both are widely supported in vector databases like FAISS and Pinecone for document retrieval tasks.

Exam trap

Cisco often tests the distinction between similarity measures and distance metrics, and the trap here is that candidates confuse Euclidean distance (a distance metric) with a similarity measure, or incorrectly pair Jaccard similarity (for sets) with dense vectors.

How to eliminate wrong answers

Option A is wrong because Pearson correlation and Spearman rank are statistical measures for linear and monotonic relationships, respectively, not designed for comparing dense embedding vectors in vector stores. Option B is wrong because Jaccard similarity is used for comparing sets or binary vectors, not dense embeddings, and cosine similarity alone is not the pair; the question asks for two measures, and Jaccard is inappropriate for dense vectors. Option D is wrong because Euclidean distance and Manhattan distance are distance metrics, not similarity measures, and while they can be used, they are less common than dot product and cosine similarity for dense embeddings in retrieval tasks.

Practice this question →

37

MCQhard

A team is evaluating an LLM-based code generation assistant. They want to measure the quality of generated code for correctness, security, and efficiency. Which evaluation framework is BEST suited for this task?

A.Human evaluation by a panel of experienced developers

B.Pass@k metric using unit tests from benchmarks like HumanEval

C.Perplexity of the model on a code corpus

D.BLEU score comparing generated code to reference code

AnswerB

Pass@k measures the probability that any of k generated samples pass a set of unit tests, directly assessing correctness.

Why this answer

HumanEval and similar frameworks (e.g., MBPP) use unit tests to automatically assess functional correctness of generated code, which is the most objective measure for code generation tasks.

Practice this question →

38

Multi-Selectmedium

A data science team is preparing a dataset for a binary classification model to detect fraudulent transactions. The dataset has 99% legitimate and 1% fraudulent examples. Which TWO techniques should the team apply to improve model performance on the minority class?

Select 2 answers

A.Use class weights in the loss function

B.Oversample the minority class using SMOTE

C.Undersample the majority class randomly

D.Apply data normalisation (z-score) to all features

E.Randomly shuffle the dataset to prevent train/test leakage

AnswersA, B

Class weights penalise misclassifications of the minority class more heavily.

Why this answer

Oversampling the minority class (e.g., SMOTE) and using class weights during training are standard approaches to handle imbalanced data. Undersampling the majority class can also help but is less common here; train/test leakage is a separate issue; normalisation may not be needed.

Practice this question →

39

MCQhard

During testing a chatbot, the QA team observes that the bot sometimes responds with harmful content when given adversarial prompts. Which type of testing should be prioritised to catch these edge cases?

A.Red-teaming and adversarial testing

B.Unit tests for data pipeline functions

C.Regression testing on previously fixed bugs

D.Integration tests for API connectivity

AnswerA

Red-teaming systematically probes the model with harmful or tricky inputs to expose weaknesses.

Why this answer

Red-teaming and adversarial testing are specifically designed to probe an AI system for vulnerabilities, including generating harmful or unsafe outputs from adversarial prompts. This approach simulates real-world attacks to uncover edge cases that standard functional tests miss, making it the correct priority for catching harmful content in a chatbot.

Exam trap

Cisco often tests the distinction between functional testing (unit, regression, integration) and security-focused testing (red-teaming), trapping candidates who confuse general software testing with AI-specific adversarial evaluation.

How to eliminate wrong answers

Option B is wrong because unit tests for data pipeline functions verify data integrity and transformation logic, not the chatbot's response to malicious inputs. Option C is wrong because regression testing ensures previously fixed bugs remain resolved, but it does not proactively discover new adversarial vulnerabilities. Option D is wrong because integration tests for API connectivity check whether system components communicate correctly, not whether the chatbot produces harmful content under attack.

Practice this question →

40

MCQhard

An AI practitioner is fine-tuning a large language model for a domain-specific task using a small labeled dataset (500 examples). They have limited GPU memory. Which technique is MOST suitable?

A.Full fine-tuning of all model parameters

B.QLoRA (Quantized Low-Rank Adaptation)

C.Instruction tuning with the full dataset

D.Retrieval-Augmented Generation (RAG) without fine-tuning

AnswerB

QLoRA quantizes the base model to 4-bit and applies low-rank adapters, enabling fine-tuning with minimal memory without sacrificing performance.

Why this answer

QLoRA (Quantized Low-Rank Adaptation) is the most suitable technique because it combines 4-bit quantization of the base model with low-rank adapter modules, drastically reducing GPU memory usage while still allowing fine-tuning on a small dataset. This approach preserves the model's pre-trained knowledge and avoids catastrophic forgetting, which is critical when only 500 labeled examples are available.

Exam trap

Cisco often tests the misconception that 'fine-tuning always means updating all parameters' or that 'RAG alone can replace fine-tuning for domain adaptation,' leading candidates to overlook memory-efficient adapter methods like QLoRA.

How to eliminate wrong answers

Option A is wrong because full fine-tuning updates all model parameters, requiring substantial GPU memory (often >24GB for a 7B model) and risks overfitting on a tiny dataset of 500 examples. Option C is wrong because instruction tuning typically requires a large, diverse dataset of instruction-response pairs (thousands to millions) and does not inherently reduce memory consumption; it is a data-formatting strategy, not a memory-saving technique. Option D is wrong because RAG without fine-tuning does not adapt the model's internal weights to the domain-specific task, so the model cannot learn the specialized patterns or terminology from the small labeled dataset.

Practice this question →

41

MCQeasy

An AI system must extract text from scanned invoices and output structured fields (invoice number, date, total amount). Which type of AI application is this?

A.Chatbot/virtual assistant

B.Code generation

C.Image classification/object detection

D.Document intelligence

AnswerD

Document intelligence extracts structured information from documents using OCR and NLP.

Why this answer

Document intelligence (D) is the correct answer because it specifically refers to AI systems that extract, classify, and structure data from documents like invoices, receipts, and forms. This application uses optical character recognition (OCR) combined with natural language processing (NLP) to identify and output structured fields such as invoice number, date, and total amount, which is exactly what the question describes.

Exam trap

Cisco often tests the distinction between general image analysis (object detection) and specialized document processing (document intelligence), so candidates may mistakenly choose image classification because they think scanning an invoice is just 'looking at a picture,' but the key is that the system extracts structured text fields, not just identifies objects.

How to eliminate wrong answers

Option A is wrong because a chatbot/virtual assistant is designed for conversational interactions (e.g., answering questions or performing tasks via dialogue), not for extracting structured data from scanned documents. Option B is wrong because code generation focuses on producing programming code from natural language or other inputs, not on processing scanned invoices. Option C is wrong because image classification/object detection identifies objects or categories within an image (e.g., 'this is a cat' or 'there is a car'), but does not extract specific text fields like invoice numbers or amounts from documents.

Practice this question →

42

MCQmedium

An AI system for detecting anomalies in manufacturing sensor data uses a model trained on normal operation data only. During monitoring, the model flags many false positives. Which adjustment is MOST likely to reduce false positives?

A.Switch from an autoencoder to a one-class SVM

B.Add synthetic anomalies to the training set and retrain as a supervised classifier

C.Adjust the anomaly detection threshold to be less sensitive (e.g., require a higher reconstruction error)

D.Increase the size of the training dataset with more normal operation data

AnswerC

Raising the threshold means only more extreme deviations are flagged, reducing false positives.

Why this answer

Changing the anomaly detection threshold (e.g., lowering sensitivity) reduces false positives. Retraining with labeled anomalies is ideal but not always feasible. Using a different model type may not directly reduce false positives.

Practice this question →

43

MCQhard

A developer is implementing a RAG system for legal document review. The documents are long (50-100 pages) with dense sections. They need to chunk the documents in a way that preserves semantic coherence while keeping chunks small enough for effective retrieval. Which chunking strategy is MOST appropriate?

A.Hierarchical chunking with parent-child relationships

B.Fixed-size chunking with 512 tokens and no overlap

C.Semantic chunking based on paragraph and section boundaries

D.Chunking by a fixed number of sentences without considering content

AnswerC

Semantic chunking preserves the natural units of legal text, maintaining coherence and improving retrieval quality.

Why this answer

Semantic chunking splits text at natural boundaries (e.g., paragraphs, sections) while ensuring each chunk is coherent, which is crucial for legal documents where meaning can span multiple sentences.

Practice this question →

44

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly

B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

C.Train a custom model from scratch on the policy documents each month

D.Use a larger foundation model with a longer context window and paste all documents into each prompt

AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Practice this question →

45

MCQhard

A company is building a recommendation system for an e-commerce site. They have historical user-item interaction data. Which approach is most appropriate?

A.Use a large language model to generate random product suggestions

B.Use a pre-trained image classification model to recommend visually similar products

C.Deploy a rule-based system that always recommends best-selling items

D.Train a collaborative filtering model on user-item interactions

AnswerD

Collaborative filtering leverages interaction data to find patterns and make personalized recommendations.

Why this answer

Collaborative filtering uses user-item interactions to recommend items based on patterns from similar users or items, without requiring content features.

Practice this question →

46

MCQhard

A team is deploying a fine-tuned LLM for code generation. They need to ensure the model output is always valid JSON. Which prompt engineering technique should they use?

A.Chain-of-thought prompting

B.Few-shot examples of valid JSON outputs

C.Temperature setting to 0

D.Using a larger model variant

AnswerB

Including several examples of valid JSON in the prompt guides the model to output JSON, and combined with a system instruction, it is highly reliable.

Why this answer

Structured output modes like JSON mode constrain the model to output valid JSON, which is exactly what is needed for programmatic consumption.

Practice this question →

47

MCQhard

A data scientist is preparing a dataset for a binary classification model to detect fraudulent transactions. The dataset contains 1% fraudulent and 99% legitimate transactions. The goal is to maximize recall for the fraud class while maintaining a precision above 0.5. Which data preparation strategy is MOST effective?

A.Apply random undersampling of the majority class until the dataset is balanced

B.Remove all duplicate transactions from the dataset

C.Use the raw dataset without any resampling, relying on class weights during training

D.Apply SMOTE (Synthetic Minority Oversampling Technique) to generate synthetic fraud examples

AnswerD

SMOTE creates synthetic minority samples, balancing the classes without losing majority data, improving recall while maintaining reasonable precision.

Why this answer

Handling imbalanced data typically requires resampling. For recall maximization with moderate precision constraint, oversampling the minority class (e.g., SMOTE) is effective. Undersampling loses too many majority samples, and using raw data leads the model to predict the majority class.

Practice this question →

48

Multi-Selectmedium

A data scientist is fine-tuning a large language model for a domain-specific task using QLoRA. Which TWO statements correctly describe QLoRA's advantages?

Select 2 answers

A.It enables fine-tuning on consumer-grade GPUs by reducing memory requirements

B.It reduces memory usage by quantizing the base model to 4-bit precision

C.It requires more training data than full fine-tuning to achieve comparable accuracy

D.It trains the full model parameters with low precision

E.It increases inference speed compared to the base model

AnswersA, B

Lower memory allows fine-tuning on smaller GPUs, such as consumer-grade hardware.

Why this answer

QLoRA uses 4-bit quantization to reduce memory, and freezes the base model while training low-rank adapters. It does not reduce inference latency (base model still runs), and does not require more data than full fine-tuning.

Practice this question →

49

MCQmedium

A developer is building an AI agent that needs to call external APIs to complete user requests. The agent must decide which API to call based on the user's natural language input. Which technique should the developer use to enable the agent to invoke APIs?

A.Fine-tuning the LLM on API documentation

B.Chain-of-thought reasoning

C.Few-shot prompting with examples of API calls

D.Function calling

AnswerD

Function calling allows the model to output a structured JSON that triggers an API call.

Why this answer

Function calling is a technique where the LLM outputs a structured request to call a predefined function/API, enabling tool use.

Practice this question →

50

MCQhard

A team fine-tunes a 7B parameter LLM using LoRA on a custom instruction dataset. After training, they observe that the model's outputs are only marginally different from the base model. Which is the MOST likely cause?

A.The dataset contained too many examples, overfitting the adapter

B.The base model was too small to benefit from fine-tuning

C.The LoRA rank was set too low (e.g., r=1), limiting the adapter's capacity to learn the task

D.The learning rate was too high, causing the model to diverge

AnswerC

Low rank reduces the number of trainable parameters; the adapter may not have enough capacity to alter behavior significantly.

Why this answer

LoRA has a rank hyperparameter that controls adapter expressiveness. If the rank is too low, the adapter cannot capture the desired task. Other hyperparameters like learning rate affect convergence but rank directly impacts capacity.

Practice this question →

51

MCQhard

A team is deploying a multi-modal AI model that processes both text and images. They need to ensure that inference requests are handled quickly even during traffic spikes. Which integration pattern is BEST suited for this use case?

A.Deploy a synchronous REST API with auto-scaling

B.Stream responses directly from the model to the client

C.Pre-compute all possible outputs and cache them

D.Use an event-driven architecture with a queue and worker instances

AnswerD

Queues decouple request submission from processing, enabling resilient scaling and handling spikes.

Why this answer

Asynchronous processing with a message queue allows requests to be buffered and processed without blocking the user, scaling out workers as needed.

Practice this question →

52

MCQhard

A company wants to fine-tune a 70B-parameter LLM for a specialized domain but has limited GPU memory (e.g., 24 GB VRAM). Which technique allows fine-tuning with minimal memory footprint?

A.Instruction tuning with a smaller dataset

B.QLoRA (Quantized Low-Rank Adaptation)

C.LoRA (Low-Rank Adaptation) on the base model

D.Full fine-tuning with gradient checkpointing

AnswerB

QLoRA quantizes the base model to 4-bit and uses LoRA adapters, fitting a 70B model in 24 GB VRAM.

Why this answer

QLoRA (Quantized Low-Rank Adaptation) uses 4-bit quantization and low-rank adapters, reducing memory requirements dramatically while preserving fine-tuning quality.

Practice this question →

53

MCQmedium

Which chunking strategy for RAG is MOST appropriate when documents have a natural hierarchical structure (e.g., sections, subsections)?

A.Hierarchical chunking that preserves document structure

B.Fixed-size chunking with no overlap

C.Semantic chunking based on sentence boundaries

D.Random chunking with varying sizes

AnswerA

Hierarchical chunking maintains the document's section and subsection organization, improving retrieval relevance.

Why this answer

Hierarchical chunking respects the document structure, preserving context and enabling retrieval at different levels (e.g., section or paragraph).

Practice this question →

54

Multi-Selecthard

A company is deploying a generative AI application that produces structured JSON output for downstream processing. They want to ensure the output is consistently valid JSON and matches a specific schema. Which THREE techniques should they use? (Select THREE)

Select 3 answers

A.Fine-tune the model on a dataset of JSON outputs

B.Increase the temperature parameter to 1.5

C.Provide few-shot examples of the desired output

D.Include a system prompt specifying the expected JSON schema

E.Use JSON mode (structured output) in the API call

AnswersC, D, E

Correct: few-shot examples help the model understand the exact schema.

Why this answer

Structured output (JSON mode) forces valid JSON, system prompts instruct the model, and few-shot examples demonstrate the required schema.

Practice this question →

55

MCQmedium

A developer is implementing a RAG system and needs to choose a similarity metric for retrieving document chunks. The embedding model produces normalized vectors. Which metric is computationally efficient and equivalent to cosine similarity for normalized vectors?

A.Euclidean distance

B.Hamming distance

C.Manhattan distance

D.Dot product

AnswerD

When vectors are unit normalized, dot product equals cosine similarity, and it is computationally efficient.

Why this answer

For normalized vectors, dot product is equivalent to cosine similarity and is faster to compute.

Practice this question →

56

MCQhard

During testing of a customer service chatbot, the team notices that the model sometimes generates plausible-sounding but factually incorrect answers about company policies. Which evaluation approach is BEST to systematically detect and quantify this issue?

A.Regression testing comparing old and new model outputs

B.Unit tests on the data pipeline

C.Integration tests for API calls

D.Evaluation framework with faithfulness and answer relevancy metrics on a held-out test set

AnswerD

An evaluation framework using metrics like faithfulness (whether the answer is supported by the source) and answer relevancy can detect hallucination and quantify model performance.

Why this answer

A comprehensive LLM output evaluation framework using a ground-truth dataset of question-answer pairs and metrics like faithfulness and answer relevancy can detect hallucination systematically.

Practice this question →

57

MCQhard

A team is deploying a generative AI model for a real-time customer-facing application. They need to balance cost and latency. Which deployment strategy is MOST suitable?

A.Monolithic API with serverless functions

B.Edge deployment on user devices

C.Batch processing with synchronous requests

D.AI microservices with streaming responses and async processing queues

AnswerD

Microservices with streaming and async queues reduce perceived latency and handle variable load efficiently.

Why this answer

Option D is correct because AI microservices with streaming responses and async processing queues decouple inference from the request lifecycle, allowing the system to handle variable loads efficiently while maintaining low latency for real-time interactions. This architecture balances cost by scaling only the necessary components (e.g., GPU-backed inference services) and uses streaming (e.g., Server-Sent Events or WebSockets) to deliver partial results, reducing perceived latency for the customer.

Exam trap

Cisco often tests the misconception that serverless functions (Option A) are always the cheapest and fastest option, but they ignore cold-start latency and the overhead of monolithic orchestration in real-time AI workloads.

How to eliminate wrong answers

Option A is wrong because a monolithic API with serverless functions introduces cold-start latency and tight coupling, which is unsuitable for real-time customer-facing applications where consistent sub-second response times are critical. Option B is wrong because edge deployment on user devices requires significant on-device compute resources, model compression, and frequent updates, which increases deployment complexity and cost, and may not be feasible for large generative models. Option C is wrong because batch processing with synchronous requests is designed for high-throughput, non-real-time workloads (e.g., nightly report generation) and would force users to wait for batch completion, violating the real-time requirement.

Practice this question →

58

Multi-Selectmedium

A team is setting up a test suite for an AI system that includes a data pipeline, an LLM API call, and an output evaluation step. Which TWO types of tests should they prioritize to ensure the system's reliability?

Select 2 answers

A.End-to-end tests simulating full user workflows

B.Unit tests for data pipeline components

C.Regression tests comparing output to previous versions

D.Load tests to measure performance under high traffic

E.Integration tests for API calls to the LLM

AnswersB, E

Unit tests validate individual data transformations, catching bugs early.

Why this answer

Unit tests for data pipelines catch data transformation errors early, and integration tests for API calls verify that the LLM endpoint responds correctly. E2E tests and regression tests are also important but not the first priority for reliability.

Practice this question →

59

Multi-Selecthard

An organization wants to fine-tune a 7B parameter LLM for a specialized legal document summarization task. They have a small labeled dataset (500 examples) and limited GPU budget. Which THREE techniques should they consider? (Choose three.)

Select 3 answers

A.Use LoRA (Low-Rank Adaptation)

B.Create an instruction-tuning dataset with input-summary pairs

C.Train a new model from scratch on legal text

D.Full fine-tuning of all model parameters

E.Use QLoRA with 4-bit quantization

AnswersA, B, E

LoRA freezes base weights and trains small adapters, drastically reducing memory.

Why this answer

PEFT methods like LoRA and QLoRA are designed for efficient fine-tuning with limited resources. Instruction tuning datasets improve task performance. Full fine-tuning is too expensive.

Practice this question →

60

MCQmedium

A machine learning engineer is deploying a real-time anomaly detection system for manufacturing sensor data. The system must process thousands of readings per second with minimal latency. Which deployment architecture is BEST suited?

A.Batch processing using Apache Spark jobs triggered hourly

B.Serverless functions deployed on a CDN

C.A monolithic web application with a relational database

D.AI microservices with an async processing queue and streaming responses

AnswerD

Microservices with async queues and streaming allow scalable, low-latency processing of high-throughput data.

Why this answer

AI microservices with async processing queues and streaming responses can handle high throughput and low latency for real-time data.

Practice this question →

61

Multi-Selectmedium

A company is choosing between fine-tuning and RAG for a legal document assistant. Which TWO factors would MOST strongly favor RAG over fine-tuning?

Select 2 answers

A.The legal documents are updated frequently (weekly)

B.The model needs to understand complex legal terminology

C.The queries require deep reasoning across multiple documents

D.The assistant must cite specific sources for its answers

E.The company has limited compute budget for training

AnswersA, D

RAG retrieves current documents at inference time, avoiding costly retraining cycles.

Why this answer

RAG allows dynamic retrieval from a changing document base without retraining, and provides citation sources for transparency — critical in regulated domains like law.

Practice this question →

62

Multi-Selectmedium

A data scientist is preparing a dataset for a regression model. The dataset contains 100 features, some of which are highly correlated. To improve model performance and reduce overfitting, which TWO techniques should the data scientist apply? (Select TWO)

Select 2 answers

A.Feature selection

B.Dimensionality reduction (e.g., PCA)

C.Data augmentation

D.Adding more hidden layers to the neural network

E.Increasing the learning rate

AnswersA, B

Correct: selecting relevant features reduces noise and overfitting.

Why this answer

Feature selection reduces the number of features, and dimensionality reduction (e.g., PCA) handles multicollinearity, both helping to reduce overfitting.

Practice this question →

63

MCQhard

A team is implementing a RAG system for a large legal document repository. They need to chunk the documents for efficient retrieval. The documents contain long sections with subsections, and the team wants to preserve the hierarchical structure. Which chunking strategy is MOST appropriate?

A.Hierarchical chunking that preserves section and subsection boundaries

B.Overlapping chunking with a 10% token overlap

C.Semantic chunking based on topic segmentation

D.Fixed-size chunking with 512 tokens per chunk

AnswerA

Hierarchical chunking maintains the document's structure, allowing retrieval of relevant subsections along with their parent context, essential for legal documents.

Why this answer

Hierarchical chunking preserves the document structure by maintaining parent-child relationships, which is crucial for legal documents where context from headings matters. Fixed-size may break logical sections; semantic chunking splits by topic but loses hierarchy; overlapping chunks help continuity but don't preserve structure.

Practice this question →

64

MCQhard

A recommendation system for an e-commerce platform is experiencing a high false positive rate in its anomaly detection module, causing legitimate transactions to be flagged as fraudulent. The team wants to reduce false positives without significantly increasing false negatives. Which action is MOST effective?

A.Decrease the anomaly detection threshold

B.Increase the anomaly detection threshold

C.Use a different anomaly detection algorithm

D.Increase the size of the training dataset

AnswerB

Raising the threshold means only transactions with a very high anomaly score are flagged, reducing false positives.

Why this answer

Adjusting the classification threshold to be more conservative (requiring higher anomaly score) will reduce false positives at the cost of some increase in false negatives, but the goal is to minimize false positives while maintaining acceptable recall.

Practice this question →

65

MCQeasy

A data scientist is preparing a dataset for a classification model. The dataset has missing values in several features and features with very different scales. Which two data preparation steps should be applied?

A.Cleaning and normalization

B.Outlier removal and binning

C.Feature selection and dimensionality reduction

D.Data augmentation and one-hot encoding

AnswerA

Correct: cleaning addresses missing values, normalization addresses scale differences.

Why this answer

Cleaning handles missing values (e.g., imputation), and normalization scales features to a similar range, which is important for many ML algorithms.

Practice this question →

66

MCQmedium

A company wants to build a code generation tool that helps developers write Python functions. The tool must generate syntactically correct code. Which prompt engineering technique is MOST effective?

A.Chain-of-thought prompting with step-by-step reasoning

B.System prompt instructing the model to output JSON

C.Instruction fine-tuning on a large Python corpus

D.Few-shot prompting with examples of valid Python functions

AnswerD

Few-shot examples demonstrate the expected syntax and structure, guiding the LLM to produce correct Python code.

Why this answer

Few-shot examples showing valid Python function syntax help the model understand the expected output format and generate correct code.

Practice this question →

67

MCQmedium

A data science team is building a model to detect fraudulent transactions. They have a dataset of 1 million normal transactions and 1,000 fraudulent ones. What is the MOST effective data preparation step to handle this imbalance?

A.Delete all normal transactions until the dataset is balanced

B.Duplicate the fraudulent transactions 1,000 times

C.Apply SMOTE to generate synthetic fraudulent transactions and randomly undersample normal transactions

D.Train the model on the original dataset; class imbalance does not affect model performance

AnswerC

SMOTE creates synthetic fraud samples, and undersampling reduces the majority class, creating a more balanced dataset.

Why this answer

Combining oversampling of the minority class (e.g., SMOTE) with undersampling of the majority class is a common and effective approach to balance the dataset.

Practice this question →

68

MCQmedium

A company wants to build a conversational agent that can handle complex multi-step tasks such as booking a flight, reserving a hotel, and scheduling a car rental in a single session. The agent must be able to break down the user's request into sub-tasks, call external APIs, and reason about the results. Which design pattern is BEST suited for this requirement?

A.Retrieval-Augmented Generation (RAG) with a vector store

B.An agentic workflow implementing the ReAct pattern with tool use

C.A single large language model prompt with all instructions

D.Fine-tuning a model on a dataset of flight, hotel, and rental conversations

AnswerB

ReAct (Reasoning+Acting) agents iteratively decompose tasks, call APIs, and reason about results, perfectly suiting complex multi-step workflows.

Why this answer

Agentic workflows, particularly the ReAct pattern, combine reasoning and acting (tool calls) allowing the agent to iteratively decompose tasks, use APIs, and adapt based on results.

Practice this question →

69

MCQeasy

A data scientist is building a binary classification model to predict customer churn. The dataset has 90% non-churn and 10% churn. After training, the model achieves 90% accuracy, but the recall for the churn class is only 20%. Which metric should the team primarily focus on to evaluate the model's effectiveness?

A.Recall for the churn class

B.Accuracy

C.Area Under the ROC Curve (AUC-ROC)

D.Precision for the non-churn class

AnswerA

Recall measures how many actual churners are correctly identified, which is the key concern.

Why this answer

When classes are imbalanced, accuracy is misleading. Recall (or F1) for the minority class is more informative.

Practice this question →

70

Multi-Selectmedium

A team is implementing a RAG system. They are designing the document loading and chunking strategy. Which TWO techniques are commonly used for chunking documents? (Select two.)

Select 2 answers

A.Fixed-size chunking with a token limit

B.Frequency-based chunking by term occurrence

C.Character-level chunking with no overlap

D.Semantic chunking using sentence boundaries

E.Hierarchical chunking using document structure

AnswersA, D

Why this answer

Fixed-size chunking (based on token count) and semantic chunking (based on natural boundaries) are both standard approaches. Hierarchical chunking is less common, and character-level is rarely used. Overlap is a parameter, not a chunking strategy.

Practice this question →

71

MCQmedium

A team is developing an AI agent to assist users with multi-step tasks such as booking a flight, reserving a hotel, and scheduling a car rental. The agent needs to reason about the order of steps and handle dependencies. Which pattern is BEST suited?

A.Simple tool use without reasoning

B.Using a single prompt with all instructions

C.Fine-tuning a model to output all steps at once

D.ReAct pattern (Reasoning and Acting)

AnswerD

ReAct interleaves reasoning and acting, allowing the agent to plan and adjust.

Why this answer

The ReAct pattern (Reasoning and Acting) is best suited because it interleaves reasoning traces with tool calls, allowing the agent to dynamically plan and adjust steps based on intermediate results. For multi-step tasks with dependencies (e.g., booking a flight before a hotel), ReAct enables the agent to reason about order, handle failures, and call external APIs step-by-step, which is essential for robust task completion.

Exam trap

Cisco often tests the misconception that a single large prompt or fine-tuned output can handle all multi-step tasks, but the key exam trap is that candidates overlook the need for dynamic reasoning and tool interaction, which only the ReAct pattern provides.

How to eliminate wrong answers

Option A is wrong because simple tool use without reasoning lacks the ability to plan or handle dependencies; it can only execute isolated function calls without context. Option B is wrong because using a single prompt with all instructions cannot adapt to dynamic changes or intermediate results; it assumes a static plan that fails if any step requires conditional logic or error recovery. Option C is wrong because fine-tuning a model to output all steps at once (single-shot generation) cannot handle real-time feedback from external systems or adapt to variable execution order, making it brittle for interactive multi-step workflows.

Practice this question →

72

MCQeasy

A company wants to build a system that automatically tags uploaded images with objects they contain (e.g., 'car', 'tree', 'person'). Which AI application type is this?

A.Image classification/object detection

B.Recommendation system

C.Anomaly detection

D.Document intelligence

AnswerA

Object detection identifies and localizes objects in images, matching the requirement.

Why this answer

Option A is correct because the task of identifying and labeling objects (e.g., 'car', 'tree', 'person') within an image is a classic use case for image classification combined with object detection. Image classification assigns a single label to the entire image, while object detection localizes and classifies multiple objects within the image, which is exactly what the system requires.

Exam trap

Cisco often tests the distinction between image classification (single label per image) and object detection (multiple localized objects), so candidates may mistakenly choose image classification alone when the question implies multiple objects per image.

How to eliminate wrong answers

Option B is wrong because recommendation systems analyze user behavior and preferences to suggest items (e.g., movies, products), not to identify objects in images. Option C is wrong because anomaly detection identifies unusual patterns or outliers in data (e.g., fraud detection), not the presence of common objects in images. Option D is wrong because document intelligence focuses on extracting text, structure, and information from documents (e.g., OCR, form processing), not on visual object recognition.

Practice this question →

73

MCQmedium

An AI team is deploying a large language model for a customer-facing application. They need to ensure that the model's output is always in valid JSON format for downstream processing. Which prompt engineering technique should they use?

A.Enable JSON mode in the model's API parameters

B.Use few-shot examples of JSON outputs in the prompt

C.Post-process the output with a JSON validator and reject invalid responses

D.Add a system prompt that says 'You are a helpful assistant.'

AnswerA

JSON mode instructs the model to produce only valid JSON, ensuring downstream parsability.

Why this answer

Structured output via JSON mode (available in many LLM APIs) constrains the model to output only valid JSON, which is critical for programmatic consumption.

Practice this question →

74

MCQeasy

A startup wants to add an AI-powered virtual assistant to their mobile app. They have limited in-house AI expertise and need a solution that can be integrated quickly with minimal infrastructure management. Which deployment pattern is MOST suitable?

A.Implement an asynchronous processing queue for all user requests

B.Train and deploy a custom model on an on-premises server

C.Deploy the model on edge devices for offline inference

D.Use a cloud-based AI microservice (e.g., Amazon Lex, Azure Bot Service) with a pre-built model

AnswerD

Cloud AI microservices provide ready-to-use models, easy integration, and managed infrastructure, ideal for rapid development.

Why this answer

Using AI microservices from a cloud provider (e.g., AWS, Azure, GCP) allows quick integration, scalability, and minimal management. Training on-premises requires expertise and resources. Edge deployment is complex.

Async queues are for batch processing, not real-time assistant.

Practice this question →

75

MCQeasy

Which embedding type is MOST suitable for capturing semantic meaning of text in a RAG pipeline?

A.Bag-of-words vectors

B.Dense embeddings from a pre-trained transformer model

C.TF-IDF vectors

D.One-hot encoding

AnswerB

Dense embeddings capture contextualized semantic meaning, enabling effective similarity search.

Why this answer

Dense embeddings represent semantic meaning in a continuous vector space, ideal for similarity search in RAG.

Practice this question →

Page 1 of 2 · 125 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Aio Implementing Ai questions.

Start 20-question session