CCNA Aio Implementing Ai Questions

50 of 125 questions · Page 2/2 · Aio Implementing Ai topic · Answers revealed

76
Multi-Selecthard

A team is implementing a RAG system for a legal document Q&A. They need to chunk documents effectively. Which THREE chunking strategies should they consider to improve retrieval accuracy for legal texts that contain hierarchical sections (clauses, sub-clauses, definitions)?

Select 3 answers
A.Hierarchical chunking that indexes chunks at clause and sub-clause levels with parent relationships
B.Overlapping chunks with a 10% overlap between consecutive chunks
C.Fixed-size chunking with a 512-token window and no overlap
D.Chunking based on the document's table of contents and section hierarchy
E.Semantic chunking that splits at natural boundaries (e.g., section headings, paragraph breaks)
AnswersA, D, E

Allows retrieval of granular chunks while maintaining broader context.

Why this answer

Semantic chunking splits at natural boundaries (e.g., paragraphs, sections), preserving meaning. Hierarchical chunking indexes with parent-child relationships for context. Fixed-size chunking is simple but may break sentences or clauses.

Overlapping chunks can help but is not a primary strategy for accuracy; sliding window is a specific technique.

77
MCQmedium

An organization wants to implement an AI system to automatically categorize support tickets into predefined categories. They have a labeled dataset of 10,000 tickets. Which approach is MOST appropriate?

A.Use a rule-based system with keyword matching
B.Use a prompt-based LLM with few-shot examples
C.Fine-tune a pre-trained text classification model
D.Train a custom neural network from scratch
AnswerC

Fine-tuning leverages existing knowledge and works well with 10k labeled examples.

Why this answer

Fine-tuning a pre-trained text classification model is a standard and effective approach for supervised classification when labeled data is available.

78
MCQeasy

In the AI project lifecycle, which phase involves partitioning the dataset into training, validation, and test sets?

A.Data acquisition
B.Model selection
C.Data preparation
D.Problem definition
AnswerC

Data preparation encompasses cleaning, normalisation, and splitting into train/validation/test sets.

Why this answer

Data preparation includes splitting the data to evaluate model performance and prevent leakage.

79
MCQmedium

A company is building an AI-powered document intelligence system to extract key fields from scanned invoices. The data contains 95% of invoices from one vendor and 5% from others. During model training, the F1 score is 0.95 on the overall test set, but the performance on the minority vendor invoices is very poor. What is the MOST likely cause?

A.The model is overfitting on the minority class
B.The dataset is imbalanced, and the model is biased toward the majority class
C.The data has a train/test leakage problem
D.The feature extraction is incorrect for the minority vendor invoices
AnswerB

Imbalanced data causes the model to optimize for the majority class, ignoring minority classes.

Why this answer

The imbalanced dataset causes the model to learn mostly from the majority class, leading to poor performance on the minority class. The other options are less likely given the high overall F1 score.

80
Multi-Selectmedium

A company is building a RAG-based Q&A system for a large collection of technical manuals. They need to choose an embedding model and a similarity search method. Which TWO choices are most appropriate for this scenario? (Select TWO)

Select 2 answers
A.Use a general-purpose embedding model like text-embedding-ada-002
B.Use dot product as the similarity metric for non-normalized embeddings
C.Use Euclidean distance as the similarity metric for vector search
D.Use a domain-specific embedding model fine-tuned on technical documentation
E.Use cosine similarity as the similarity metric for vector search
AnswersD, E

A domain-specific model captures the nuances of technical language, improving retrieval precision for the Q&A system.

Why this answer

Cosine similarity is the standard metric for comparing embeddings (normalized vectors). Using a domain-specific embedding model (e.g., fine-tuned on technical text) yields better retrieval accuracy. Dot product can be used but is less common; Euclidean distance is not ideal for high-dimensional embeddings; a general-purpose model may perform poorly on domain-specific language.

81
MCQmedium

A data scientist is training a binary classifier and observes that the training accuracy is 99% but the test accuracy is only 70%. Which of the following is the MOST likely cause?

A.The model is underfitting the training data
B.The learning rate is too high
C.The model is overfitting the training data
D.The test set contains data leakage from the training set
AnswerC

High training accuracy with much lower test accuracy is classic overfitting.

Why this answer

A large gap between training and test accuracy indicates overfitting: the model memorized training data but fails to generalize.

82
Multi-Selectmedium

A team is designing an AI agent that needs to interact with external APIs, search the web, and perform multi-step reasoning. Which TWO architectural components are essential for this agentic workflow? (Choose TWO.)

Select 2 answers
A.Fine-tuning the base model
B.ReAct pattern (Reasoning + Acting)
C.Tool use / function calling
D.Single-turn response generation
E.Static prompt with no iterations
AnswersB, C

ReAct combines reasoning traces with actions, allowing the agent to plan and adapt.

Why this answer

Tool use allows the agent to call external APIs, and the ReAct pattern (Reasoning + Acting) enables iterative reasoning and action steps.

83
MCQmedium

A team is evaluating a fine-tuned LLM for a code generation task. They notice the model rarely generates correct syntax but often produces plausible-looking code. Which evaluation metric is MOST appropriate to quantify this issue?

A.BLEU score
B.Perplexity
C.Pass@k (execution success rate)
D.Exact match accuracy
AnswerC

Pass@k runs the generated code against test cases, directly measuring functional correctness.

Why this answer

Pass@k measures the probability that at least one of k generated code samples passes a set of unit tests, directly quantifying execution correctness. Since the model produces plausible-looking but syntactically incorrect code, execution success rate (Pass@k) is the most appropriate metric to capture whether the code actually runs correctly, unlike surface-level similarity metrics.

Exam trap

Cisco often tests the misconception that BLEU or exact match are sufficient for code generation evaluation, but the trap here is that plausible-looking code can score high on n-gram overlap while being syntactically or semantically invalid, making execution-based metrics like Pass@k the only reliable measure of functional correctness.

How to eliminate wrong answers

Option A is wrong because BLEU score measures n-gram overlap between generated and reference text, which rewards lexical similarity but cannot detect syntax errors or functional correctness; plausible-looking code can have high BLEU while being invalid. Option B is wrong because perplexity measures how well a language model predicts a sequence, reflecting fluency rather than code execution validity; low perplexity can still yield syntactically invalid code. Option D is wrong because exact match accuracy requires the generated code to match a single reference exactly, which is too brittle for code generation where multiple valid solutions exist and does not assess runtime behavior.

84
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Train a custom model from scratch on the policy documents each month
B.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
C.Fine-tune a base LLM on the policy documents monthly
D.Use a larger foundation model with a longer context window and paste all documents into each prompt
AnswerB

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

Retrieval-Augmented Generation (RAG) is the most appropriate approach because it allows the chatbot to answer questions based on the latest policy documents without retraining the model. By indexing the documents in a vector store, the system retrieves relevant chunks for each query and passes them to a pre-trained LLM for generation, ensuring up-to-date responses with minimal maintenance overhead.

Exam trap

Cisco often tests the misconception that fine-tuning or training from scratch is necessary for domain-specific knowledge, when in fact RAG provides a cost-effective, zero-retraining solution for frequently updated documents.

How to eliminate wrong answers

Option A is wrong because training a custom model from scratch each month is computationally expensive, time-consuming, and requires significant expertise and data, which contradicts the constraint of not being able to retrain a model each time. Option C is wrong because fine-tuning a base LLM monthly still requires retraining the model on new documents, which incurs similar costs and complexity as training from scratch, and does not solve the update problem efficiently. Option D is wrong because pasting all policy documents into each prompt exceeds typical context window limits (e.g., 4K-32K tokens), leading to high latency, token costs, and potential loss of relevant information due to the model's attention limitations.

85
Multi-Selectmedium

A team is designing a RAG system for a large collection of PDFs. They need to choose document chunking strategies. Which TWO strategies are considered best practices? (Choose two.)

Select 2 answers
A.Semantic chunking (e.g., sentence or paragraph boundaries)
B.Fixed-size chunking with no overlap
C.Hierarchical chunking (sections, subsections)
D.Single chunk per document
E.Random character-length chunks
AnswersA, C

Semantic chunking maintains coherence within each chunk, improving embedding quality.

Why this answer

Semantic chunking preserves natural boundaries, and hierarchical chunking respects document structure, both improving retrieval quality.

86
MCQmedium

A team is considering whether to fine-tune a base LLM or use RAG for a question-answering system over a large, static corpus of scientific papers. The answer must be highly accurate and grounded in the papers. Which approach is BEST and why?

A.Fine-tuning because it adapts the model to the scientific domain
B.Fine-tuning because it is faster at inference time
C.RAG because it retrieves and grounds answers in the source documents
D.RAG because it does not require any labeled data
AnswerC

RAG retrieves relevant passages and passes them as context, ensuring the answer is directly supported by the source, reducing hallucination.

Why this answer

RAG retrieves exact passages and grounds answers in the source, which is critical for accuracy and citation in scientific domains. Fine-tuning may memorize but can hallucinate.

87
Multi-Selecteasy

An organization is deploying an image classification model to detect defects on a production line. Which TWO steps are essential during the model monitoring phase of the AI project lifecycle?

Select 2 answers
A.Track model performance metrics such as precision and recall on a held-out test set over time
B.Perform data cleaning and normalization on the production data before inference
C.Periodically retrain the model with newly labeled data
D.Automatically roll back to the previous model version if a metric drops below a threshold
E.Monitor for data drift between the training data distribution and incoming production data
AnswersA, E

Tracking metrics on live data (or periodic golden sets) identifies model drift.

Why this answer

Monitoring should track data drift (changes in input distribution) and model drift (performance degradation). Retraining is an action after monitoring, not a step of monitoring itself. Data preparation is completed before deployment.

88
MCQmedium

A team is evaluating an LLM-based chatbot that frequently hallucinates when answering questions about internal policies. Which testing approach would MOST effectively quantify this issue?

A.Evaluation frameworks for LLM output quality
B.Integration tests for API calls
C.Unit tests for the data pipeline
D.Regression testing of model accuracy over time
AnswerA

Evaluation frameworks specifically measure output quality metrics like faithfulness and hallucination rate.

Why this answer

Option A is correct because evaluation frameworks for LLM output quality, such as those using metrics like faithfulness, factuality, or ROUGE/BLEU scores, are specifically designed to detect and quantify hallucinations by comparing generated responses against a ground-truth knowledge base. This directly measures the rate at which the chatbot fabricates or misstates internal policy details, providing a quantitative baseline for improvement.

Exam trap

Cisco often tests the distinction between functional testing (e.g., API integration, data pipeline) and output quality evaluation, leading candidates to mistakenly choose integration or unit tests when the real issue is semantic accuracy of generated content.

How to eliminate wrong answers

Option B is wrong because integration tests for API calls verify that the chatbot's endpoints and external service interactions work correctly, but they do not assess the semantic accuracy or factual consistency of the generated text. Option C is wrong because unit tests for the data pipeline validate data ingestion, transformation, and storage logic, not the output quality of the LLM's responses. Option D is wrong because regression testing of model accuracy over time typically measures performance on a static benchmark (e.g., classification accuracy) rather than quantifying open-ended hallucination rates in a conversational context.

89
MCQmedium

A recommendation system for an e-commerce site is producing stale suggestions that do not reflect recent user behavior. The system is updated offline every 24 hours. Which change would MOST directly address this issue?

A.Increase the number of features used in the model
B.Add more training data from the past year
C.Use a deeper neural network architecture
D.Implement online learning to update the model incrementally in real time
AnswerD

Online learning updates the model with each new interaction, reflecting recent behavior immediately.

Why this answer

Shorter update cycles or online learning can incorporate recent interactions more quickly, improving freshness.

90
MCQmedium

A developer is building an AI microservice that processes document intelligence requests asynchronously. Users upload PDFs, and the service extracts text and analyzes it with an LLM. The processing time per document can be up to 5 minutes. Which integration pattern is MOST appropriate?

A.Synchronous REST API call that waits for the LLM response
B.Async processing with a message queue and separate worker service
C.WebSocket connection for real-time streaming
D.Serverless function triggered by HTTP request
AnswerB

The user submits the job, a queue holds tasks, workers process them asynchronously, and the user polls or gets a callback with results.

Why this answer

Async processing queues decouple the frontend from long-running tasks, allowing the user to receive results later without blocking.

91
MCQeasy

A company is building a document intelligence system that extracts key fields from scanned invoices. They have a labeled dataset of 10,000 invoices but need to decide between a traditional OCR+rule-based pipeline and an AI-based model. Which use case characteristic STRONGLY favors the AI-based approach?

A.Invoice layouts vary significantly between different vendors and often change
B.The system must process invoices in real time with sub-second latency
C.The team has limited access to labeled training data
D.Invoices have a fixed, standardized layout across all vendors
AnswerA

AI models learn patterns from data and adapt to varying layouts, whereas rule-based systems require manual updates for each new layout.

Why this answer

When invoice layouts vary widely, AI models (especially computer vision + NLP) generalize better than fixed rules. If layouts were consistent, rules might suffice. The other options either favor traditional approaches or are neutral.

92
MCQmedium

A data science team is preparing a dataset for a binary classification model. The dataset has 95% negative class and 5% positive class. Which technique should they apply to avoid biased model predictions?

A.Apply resampling techniques such as SMOTE or random undersampling
B.Normalise all numerical features to a [0,1] range
C.Shuffle the dataset randomly before splitting into train and test sets
D.Remove all rows with missing values
AnswerA

Resampling balances the class distribution, allowing the model to learn from both classes effectively.

Why this answer

Handling imbalanced data (e.g., oversampling the minority class or undersampling the majority class) is necessary to prevent the model from always predicting the majority class.

93
MCQmedium

A chatbot application uses a system prompt to set the assistant's behavior. The developer wants the LLM to output structured JSON for downstream processing. Which technique BEST ensures the output is valid JSON?

A.Set the temperature to 0 to make output deterministic
B.Include a few-shot example showing a JSON output in the prompt
C.Add a chain-of-thought reasoning step before the output
D.Use the LLM's built-in JSON mode (e.g., response_format='json_object')
AnswerD

JSON mode enforces valid JSON output, reducing parsing errors and ensuring structure.

Why this answer

Many LLMs support a JSON mode that constrains output to valid JSON. System prompts can request JSON but may be ignored. Few-shot examples help but are not foolproof.

Chain-of-thought is for reasoning, not formatting.

94
MCQmedium

A team is building a document intelligence application that extracts key fields from invoices. They have 10,000 labeled invoices. What is the first step in the AI project lifecycle?

A.Model selection – choose a pre-trained vision transformer
B.Data preparation – clean and normalize the invoice images
C.Problem definition – specify which fields to extract and accuracy targets
D.Data acquisition – collect additional invoices from public sources
AnswerC

Problem definition sets the scope and goals, ensuring the team works toward a clear objective before any other lifecycle stage.

Why this answer

Problem definition must come first to scope the project, define success criteria, and decide what fields to extract before any data work.

95
MCQmedium

An AI application needs to generate structured JSON output from an LLM. The development team wants to ensure the output always conforms to a specific schema. Which prompt engineering technique is MOST suitable?

A.Few-shot examples showing correct JSON
B.System prompt with JSON schema and a 'respond only with valid JSON' instruction
C.Chain-of-thought prompting
D.Fine-tuning the model on JSON datasets
AnswerB

A system prompt with schema and JSON mode enforces structured output reliably.

Why this answer

Option B is correct because providing the JSON schema directly in the system prompt, combined with an explicit instruction to respond only with valid JSON, is the most direct and reliable way to constrain an LLM's output format. This technique leverages the model's instruction-following capability and schema awareness without requiring examples or retraining, ensuring strict adherence to the desired structure.

Exam trap

Cisco often tests the misconception that few-shot examples alone are sufficient for format control, but the trap here is that without an explicit schema and strict instruction, the model may still produce inconsistent or non-compliant output, especially when the schema is complex or the prompt context shifts.

How to eliminate wrong answers

Option A is wrong because few-shot examples can guide the model but do not guarantee strict schema conformance; the model may still deviate from the schema, especially with complex or nested structures. Option C is wrong because chain-of-thought prompting encourages step-by-step reasoning, which often produces intermediate text or explanations, not a clean JSON output, and can actually increase the risk of malformed JSON. Option D is wrong because fine-tuning on JSON datasets is a resource-intensive process that requires significant data, compute, and time, and is overkill for a task that can be solved with a simple prompt-level constraint; it also does not dynamically adapt to schema changes as easily as a system prompt.

96
MCQeasy

In prompt engineering, which technique involves providing a few correct input-output examples in the prompt to guide the model's response?

A.System prompt engineering
B.Chain-of-thought prompting
C.Few-shot prompting
D.Zero-shot prompting
AnswerC

Few-shot provides a handful of examples to steer the model's output.

Why this answer

Few-shot learning includes examples in the prompt to demonstrate the desired output format and reasoning pattern.

97
MCQeasy

A team is deploying an anomaly detection system for real-time monitoring of server metrics. The system should alert when metrics deviate significantly from normal patterns. Which type of AI model is MOST suitable?

A.Autoencoder neural network
B.Recommendation system model
C.Linear regression model
D.Image classification model
AnswerA

Autoencoders learn to reconstruct normal data; high reconstruction error indicates an anomaly, making them ideal for this task.

Why this answer

Autoencoders learn normal patterns and detect anomalies by high reconstruction error. Linear regression predicts continuous values. Image classifiers are for images.

Recommendation systems are for user-item interactions.

98
Multi-Selecthard

A team is developing an AI agent that can answer questions by querying a SQL database and a REST API. The agent should decide which tool to call, parse the response, and reason about the next step. Which THREE concepts should be implemented to build this agent?

Select 3 answers
A.ReAct pattern for iterative reasoning and tool use
B.Content filtering to sanitize database results
C.Function calling to enable the LLM to invoke SQL and API tools
D.Chain-of-thought prompting without tool integration
E.Planning agent that decomposes the question into sub-tasks
AnswersA, C, E

ReAct combines reasoning traces with actions, perfect for multi-step tool use.

Why this answer

The ReAct pattern (Reasoning + Acting) enables iterative tool selection. Function calling allows the LLM to output structured tool calls. Planning agents can decompose a question into subtasks.

Chain-of-thought is a reasoning technique but not a full agent framework; content filtering is not needed.

99
Multi-Selectmedium

An AI team is deploying a real-time document intelligence service that extracts key-value pairs from invoices. The pipeline includes an LLM that calls a function to parse structured output. Which TWO testing strategies are essential before production deployment?

Select 2 answers
A.An evaluation framework that compares extracted fields against ground truth for a test set of invoices
B.Load testing to simulate peak invoice volume (e.g., end of month)
C.Regression tests on the model training pipeline to ensure the base LLM hasn't changed
D.Integration tests that call the LLM API with sample invoices and verify the JSON output structure
E.Unit tests for the data pipeline that cleans and normalises invoice images
AnswersA, D

Measures accuracy (e.g., precision, recall) of the extraction, critical for business requirements.

Why this answer

Integration tests for the API call ensure the function calling mechanism works end-to-end. An evaluation framework for LLM output quality measures extraction accuracy. Unit tests for data pipelines are important but less critical here; load testing is operational; model training tests are irrelevant.

100
MCQhard

A company is deploying a code generation AI assistant for internal developers. They want to ensure the assistant does not generate code with security vulnerabilities. Which testing approach is MOST critical?

A.Unit tests for the data pipeline that preprocesses prompts
B.Evaluation framework that measures BLEU score on a held-out set of code samples
C.Regression tests that compare outputs of new model versions against a golden dataset
D.Integration tests that send security-focused prompts and validate the generated code against a static analysis tool
AnswerD

Integration tests with security scanning directly validate that the model avoids generating vulnerable code.

Why this answer

Integration tests that call the model with security-related prompts and scan outputs for vulnerabilities directly assess this requirement. Unit tests on data pipelines are for data correctness, not security of generated code.

101
MCQmedium

A company has an existing AI chatbot that uses a fine-tuned LLM to answer customer queries. They want to add the ability to retrieve real-time order status from their database. Which integration pattern should they use?

A.Implement function calling so the model can trigger a database query and receive the result
B.Prompt the user to check the order status manually
C.Use RAG to retrieve order status from a vector store
D.Embed the database query results directly into the model's training data
AnswerA

Function calling enables the model to request live data from external systems like a database.

Why this answer

Function calling allows the LLM to request database queries, and the application can execute them and return results.

102
Multi-Selectmedium

A team is designing an AI microservice for image classification. Which THREE practices should they implement for effective integration and testing?

Select 3 answers
A.Use streaming responses for real-time feedback
B.Write unit tests for the data preprocessing pipeline
C.Train the model on the full dataset before deployment
D.Deploy the model as a monolithic application to simplify testing
E.Perform integration tests on the API endpoint to the model
AnswersA, B, E

Streaming reduces perceived latency for image processing results.

Why this answer

Unit tests for data pipelines ensure data quality, integration tests verify API communication, and streaming responses improve user experience. These cover the main integration concerns.

103
MCQmedium

A data scientist is preparing a dataset for a binary classification model to detect fraudulent transactions. The dataset has 1% fraud cases (minority class) and 99% non-fraud cases. Which data preparation technique is MOST appropriate to address the class imbalance before training?

A.Duplicate the minority class samples until the class ratio is 50:50
B.Normalize all features to a range of 0 to 1
C.Random undersample the majority class to match the minority class size
D.Apply SMOTE (Synthetic Minority Over-sampling Technique) to the minority class
AnswerD

SMOTE creates synthetic examples for the minority class, effectively balancing the dataset without simply duplicating existing samples.

Why this answer

Synthetic Minority Over-sampling Technique (SMOTE) generates synthetic samples for the minority class, balancing the dataset without losing data. Undersampling would discard valuable majority samples, and oversampling by duplication can cause overfitting. Normalization does not fix class imbalance.

104
MCQmedium

A team is implementing a document intelligence solution to extract key-value pairs from invoices. They plan to use a pre-trained vision-language model with a RAG pipeline that indexes invoice images. Which chunking strategy is BEST suited for invoice documents that have a consistent layout but vary in length?

A.Semantic chunking based on sentence boundaries
B.Hierarchical chunking that groups lines into logical sections (header, line items, totals)
C.Fixed-size chunking with 512 tokens per chunk
D.No chunking; pass the entire invoice as one document per query
AnswerB

Hierarchical chunking preserves the invoice structure, enabling retrieval of complete sections for accurate key-value extraction.

Why this answer

Invoices typically have sections (header, line items, totals). Hierarchical chunking preserves this structure, enabling retrieval at the section level. Fixed-size may split important fields, semantic chunking is less predictable on structured documents.

105
MCQhard

An AI system performs anomaly detection on sensor data in a manufacturing plant. The model is deployed and running well. After two months, the plant installs new sensors that produce data with a different distribution. The anomaly detection starts failing with many false positives. Which action should the team take?

A.Retrain the model using data from the new sensors and re-deploy
B.Roll back the model to the previous version and ignore the new sensors
C.Increase the anomaly threshold to reduce false positives
D.Use an ensemble of the old and a new model without retraining
AnswerA

Retraining on the new distribution adapts the model to the changed sensor data, reducing false positives.

Why this answer

The model needs monitoring and retraining on data from the new sensors. This falls under model monitoring and lifecycle management in deployment.

106
MCQeasy

A team is building a recommendation system for an e-commerce platform. They need to update recommendations in real-time as users browse. Which integration pattern is MOST suitable?

A.Streaming responses from a single monolithic model
B.Batch processing with nightly updates
C.Async processing queue with delayed responses
D.AI microservices with a REST API
AnswerD

Microservices with a REST API can handle individual requests on-demand, scale out, and integrate easily with the platform's frontend for real-time recommendations.

Why this answer

AI microservices deployed behind a REST API allow each recommendation request to be handled independently, scaling horizontally to meet real-time demands.

107
MCQeasy

Which similarity search metric is BEST for comparing dense vector embeddings when the magnitude of the vectors is not important, only the direction?

A.Euclidean distance
B.Dot product
C.Manhattan distance
D.Cosine similarity
AnswerD

Cosine similarity normalizes the inner product by magnitudes, yielding a score that depends only on the angle between vectors.

Why this answer

Cosine similarity measures the cosine of the angle between vectors, focusing on direction and ignoring magnitude, which is ideal when only direction matters.

108
Multi-Selecthard

A team is deploying a fine-tuned LLM for generating code snippets. They want to test the system thoroughly before production. Which THREE testing types should they include in their test plan? (Select THREE)

Select 3 answers
A.Integration tests for API calls between the application and the LLM endpoint
B.Regression testing only after model updates
C.Load testing to determine maximum concurrent users
D.Evaluation framework for LLM output quality, such as BLEU or custom correctness metrics
E.Unit tests for data pipelines that prepare training and inference data
AnswersA, D, E

Integration tests verify that the application correctly communicates with the LLM service, handling requests and responses.

Why this answer

Unit tests for data pipelines ensure data quality. Integration tests verify API interactions. Evaluation frameworks assess output quality (e.g., correctness, syntax).

Load testing is for performance, not correctness. Regression testing is important but not listed as a separate type here; the three chosen cover data, integration, and output quality.

109
Multi-Selecthard

An organization is deploying an AI agent that uses the ReAct pattern to answer customer queries by calling external APIs. Which THREE components are essential in this agentic workflow?

Select 3 answers
A.A planning algorithm to decompose tasks
B.Fine-tuning the LLM on domain-specific data
C.Multi-step reasoning with a reasoning trace
D.Tool use / function calling
E.A vector store for long-term memory
AnswersA, C, D

Planning agents decompose complex queries into sub-tasks, which is part of advanced agentic workflows.

Why this answer

The ReAct pattern interleaves reasoning traces with tool calls (function calling), enabling the agent to plan and execute actions step by step. Tool use and multi-step reasoning are core to the pattern.

110
MCQeasy

Which similarity measure is commonly used in vector search to find the angle between vectors, making it well-suited for high-dimensional embeddings?

A.Manhattan distance
B.Euclidean distance
C.Dot product
D.Cosine similarity
AnswerD

Cosine similarity is the standard metric for semantic similarity in vector databases.

Why this answer

Cosine similarity measures the cosine of the angle between two vectors, ranging from -1 to 1, and is robust to magnitude differences.

111
Multi-Selecteasy

A team is building an AI-powered recommendation system for an e-commerce platform. They want to test the system before deployment. Which TWO types of testing are MOST relevant for this AI system? (Select TWO)

Select 2 answers
A.Load testing the web server
B.Integration tests for API calls
C.Evaluation frameworks for model output quality
D.Unit tests for data pipelines
E.Regression testing on the UI
AnswersC, D

Correct: measures recommendation accuracy and relevance.

Why this answer

Unit tests for data pipelines ensure data integrity, and evaluation frameworks for LLM output quality (or model evaluation) test recommendation relevance.

112
MCQhard

A developer is building a RAG system and needs to choose a similarity metric for retrieving document chunks. The embedding model they use produces normalized vectors (unit vectors). Which similarity metric is equivalent to cosine similarity in this case?

A.Jaccard similarity
B.Euclidean distance
C.Manhattan distance
D.Dot product
AnswerD

For unit vectors, dot product equals cosine similarity because ||a|| ||b|| = 1.

Why this answer

For normalized vectors, cosine similarity and dot product are equivalent because the dot product equals the cosine of the angle times the product of magnitudes (which are 1).

113
MCQeasy

Which component in a RAG system is responsible for converting document chunks into numerical representations that enable similarity search?

A.Vector store index
B.Document chunker
C.Large language model (LLM)
D.Embedding model
AnswerD

The embedding model converts text chunks into vector embeddings.

Why this answer

An embedding model (or encoder) transforms text into dense vectors. The vector store indexes these vectors, and the LLM generates answers. Chunking is the splitting step.

114
MCQhard

An AI team is deploying a fine-tuned LLM for a code generation assistant. They need to ensure the model outputs only syntactically valid JSON for integration with downstream systems. Which prompt engineering technique is MOST effective for enforcing structured output?

A.Enable JSON mode in the API call, specifying the desired JSON schema
B.Provide a few-shot example of a valid JSON response in the prompt
C.Include a system prompt that says 'You are a helpful coding assistant.'
D.Use chain-of-thought prompting to have the model reason step-by-step before answering
AnswerA

JSON mode forces the model to output valid JSON, often with schema constraints, ensuring downstream parsability.

Why this answer

JSON mode (or structured output) instructs the model to output valid JSON, often with schema enforcement. Few-shot examples may help but aren't as reliable as a dedicated mode. System prompts set the role but don't enforce syntax.

Chain-of-thought improves reasoning but not format.

115
MCQeasy

In the AI project lifecycle, after a model is trained and evaluated, it is deployed to a production environment. What is the NEXT critical step to ensure the model continues to perform well over time?

A.Collect more training data
B.Archive the model and start a new project
C.Monitoring the model's performance and data drift
D.Re-train the model from scratch
AnswerC

Continuous monitoring detects performance degradation, data drift, or concept drift, enabling proactive maintenance.

Why this answer

Monitoring tracks model performance, data drift, and concept drift in production, triggering retraining or alerts when degradation is detected.

116
MCQhard

An AI developer is building an agent that can book flights and hotels by calling external APIs. The agent needs to decide which API to call and in what order based on user requests. Which pattern is BEST suited for this multi-step reasoning and tool use?

A.Implement a simple Retrieval-Augmented Generation (RAG) pipeline
B.Fine-tune a model to output API call sequences directly
C.Use function calling with a fixed sequence of API calls
D.Apply the ReAct pattern (Reasoning and Acting)
AnswerD

ReAct interleaves reasoning steps (thinking about what to do next) with actions (calling APIs), enabling dynamic multi-step workflows.

Why this answer

The ReAct pattern (Reasoning + Acting) combines chain-of-thought reasoning with tool use, allowing the agent to plan, call APIs, and incorporate results. Function calling alone doesn't provide multi-step reasoning. Simple RAG is for retrieval, not action.

Fine-tuning doesn't give dynamic tool selection.

117
MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Fine-tune a base LLM on the policy documents monthly
B.Train a custom model from scratch on the policy documents each month
C.Use a larger foundation model with a longer context window and paste all documents into each prompt
D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store
AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

118
Multi-Selectmedium

An AI team is evaluating whether to use AI for a customer segmentation task. They have a dataset of customer demographics and purchase history. Which TWO conditions would make AI a better choice than a traditional rule-based approach? (Select two.)

Select 2 answers
A.The segmentation criteria are well-understood and can be expressed in simple if-then rules
B.The data contains complex, non-linear patterns that are not easily captured by rules
C.The business requires the model to adapt automatically as new customer data arrives
D.The segmentation must be fully explainable to regulators
E.The team has no access to labeled data
AnswersB, C

Why this answer

Option B is correct because AI techniques like neural networks or gradient-boosted trees excel at capturing complex, non-linear interactions in high-dimensional data (e.g., purchase sequences combined with demographics) that rule-based systems cannot express without an explosion of brittle, hand-crafted conditions. This makes AI the better choice when the underlying patterns are not linearly separable or easily codified as if-then logic.

Exam trap

Cisco often tests the misconception that AI is always superior to rule-based systems, but the trap here is that candidates overlook the specific constraints of explainability (Option D) and data requirements (Option E) that make rule-based approaches more appropriate in those contexts.

119
Multi-Selecthard

A machine learning engineer is deploying a production model that requires strict monitoring. Which TWO monitoring strategies should be implemented to detect data drift and model degradation? (Choose TWO.)

Select 2 answers
A.Logging all input data for manual review
B.Monitoring prediction confidence scores over time
C.Monitoring the distribution of model predictions
D.Monitoring input feature distribution (e.g., via PSI)
E.Retraining the model weekly as a routine
AnswersC, D

Shifts in prediction distribution can signal model decay or data drift.

Why this answer

Tracking prediction distribution shifts and monitoring feature distribution over time help detect drift and degradation.

120
MCQmedium

A team is implementing a RAG system for legal document retrieval. The documents are long and cover multiple topics. Which chunking strategy is MOST appropriate to ensure each chunk contains coherent information?

A.Hierarchical chunking with overlapping windows
B.Semantic chunking based on topic boundaries
C.Fixed-size chunking with 512 tokens
D.Character-level chunking with no overlap
AnswerB

Semantic chunking preserves coherent blocks of text (e.g., paragraphs or sections), improving retrieval and downstream generation accuracy.

Why this answer

Semantic chunking based on topic boundaries is the most appropriate strategy because legal documents are long and cover multiple topics. By splitting at natural topic shifts (e.g., clauses, sections, or argument transitions), each chunk preserves coherent meaning, which is critical for accurate retrieval and generation in a RAG system. This approach avoids mixing unrelated content within a single chunk, which would degrade the quality of retrieved context.

Exam trap

Cisco often tests the misconception that fixed-size token chunking is always optimal for simplicity, but in domain-specific RAG systems with long, multi-topic documents, semantic boundaries are essential to maintain chunk coherence and retrieval accuracy.

How to eliminate wrong answers

Option A is wrong because hierarchical chunking with overlapping windows adds complexity and redundancy without guaranteeing topic coherence; overlapping windows can introduce duplicate or fragmented information across chunks, which is inefficient for retrieval. Option C is wrong because fixed-size chunking with 512 tokens ignores semantic boundaries, often splitting a single legal argument or clause across two chunks, leading to incomplete or misleading context for the LLM. Option D is wrong because character-level chunking with no overlap destroys all semantic structure, producing arbitrary fragments that are useless for coherent retrieval and generation.

121
MCQeasy

During the data preparation phase of an AI project, a data scientist discovers that the target variable in a binary classification dataset is heavily imbalanced: 95% negative class and 5% positive class. Which technique should be applied to improve model performance on the minority class?

A.Apply oversampling of the minority class using techniques like SMOTE
B.Remove all samples from the majority class to balance the dataset
C.Normalize all numerical features to have zero mean and unit variance
D.Use a train-test split of 80-20 without any modification
AnswerA

SMOTE generates synthetic samples for the minority class, balancing the dataset and improving recall.

Why this answer

Oversampling the minority class (e.g., SMOTE) or undersampling the majority class are standard techniques to handle imbalanced datasets and improve recall on the minority class.

122
MCQmedium

A company is deciding between fine-tuning and RAG for a domain-specific legal assistant that must provide accurate answers based on a changing set of statutes and regulations. The statutes are updated quarterly. Which approach is PREFERRED and why?

A.Fine-tuning, because it allows the model to internalize the statutes for faster inference
B.Fine-tuning with LoRA, because it is parameter-efficient and can be updated frequently
C.RAG, because it can retrieve the latest documents without retraining the model
D.RAG with fine-tuning on the initial statutes, then update the index quarterly
AnswerC

RAG indexes the latest documents and retrieves relevant chunks at query time, ensuring answers are based on current statutes without any model retraining.

Why this answer

RAG is preferred when the knowledge base changes frequently, as it retrieves the latest documents at inference time without requiring model retraining.

123
MCQmedium

A team is building a recommendation system for an e-commerce platform. They want to use collaborative filtering but have a cold-start problem for new users. Which hybrid approach BEST addresses cold start while leveraging collaborative signals?

A.Apply user clustering based on demographic data and then use collaborative filtering within clusters
B.Use only content-based filtering for all users
C.Use matrix factorization with implicit feedback only
D.Implement a hybrid model that combines content-based features with collaborative filtering via a weighted ensemble
AnswerD

Hybrid leverages content features for new users and collaborative signals for warm users, balancing both.

Why this answer

Content-based filtering uses user/item features to handle cold start, then combines with collaborative filtering for accuracy. Clustering or matrix factorization alone do not solve cold start.

124
MCQmedium

A team is training a image classification model. They split the dataset into training, validation, and test sets. After training, the model achieves 98% accuracy on the training set but only 72% on the test set. Which step in the AI project lifecycle should the team focus on?

A.Data acquisition – collect more data
B.Model selection – use regularization or reduce model complexity
C.Deployment – re-deploy with a different serving framework
D.Data preparation – check for train/test leakage
AnswerB

The high training accuracy and low test accuracy is classic overfitting. Regularization, dropout, or simpler models can reduce the gap.

Why this answer

The large gap indicates overfitting, which is a model selection/regularization issue. They need to apply techniques like dropout, data augmentation, or reduce model complexity.

125
MCQhard

A data scientist notices that a model's performance on the training set is excellent, but validation accuracy is poor. The team used the same dataset for feature engineering and model selection. What is the MOST likely cause?

A.Train/test leakage caused by using the same data for feature engineering and model selection
B.The dataset is too small for the model complexity
C.The learning rate is too high
D.The model is overfitting due to high variance
AnswerA

Using validation data to guide feature engineering leaks information, inflating training performance and hurting generalization.

Why this answer

Reusing the same data for feature engineering and model selection creates data leakage from validation set into training decisions, leading to overfitting and poor generalization.

← PreviousPage 2 of 2 · 125 questions total

Ready to test yourself?

Try a timed practice session using only Aio Implementing Ai questions.