CCNA Data Operations Support Questions

12 of 387 questions · Page 6/6 · Data Operations Support topic · Answers revealed

376
MCQeasy

A company uses Amazon Kinesis Data Streams to ingest clickstream data. The data is consumed by a custom consumer application that writes to Amazon S3 every 5 minutes. The consumer is falling behind and processing lag is increasing. Which action is MOST effective to reduce the lag?

A.Switch to Amazon Kinesis Data Firehose to deliver data directly to S3
B.Increase the batch size of records written to S3
C.Increase the number of shards in the Kinesis stream
D.Reduce the retention period of the stream
AnswerC

More shards increase parallelism and throughput, allowing the consumer to keep up.

Why this answer

The consumer is falling behind because the stream's throughput capacity is insufficient for the incoming data volume. Increasing the number of shards in the Kinesis stream directly increases the total read capacity (each shard provides 2 MB/s read throughput and 5 transactions/second), allowing the consumer to process more data in parallel and reduce lag.

Exam trap

The trap here is that candidates often confuse throughput scaling with batch size or delivery destination changes, but the only way to increase read throughput from a Kinesis stream is to increase the number of shards or use enhanced fan-out.

How to eliminate wrong answers

Option A is wrong because switching to Kinesis Data Firehose does not change the underlying stream's throughput; Firehose is a delivery service that still reads from the same shards, so it would not resolve the consumer's processing lag. Option B is wrong because increasing the batch size written to S3 only affects the write operation to S3, not the consumer's ability to read from the stream faster; the bottleneck is the consumer's read throughput, not the S3 write batch size. Option D is wrong because reducing the retention period (default 24 hours to 1 hour) does not increase read throughput; it only causes data to expire sooner, which could lead to data loss but does not help the consumer catch up.

377
MCQeasy

A company uses Amazon S3 to store raw data and AWS Lambda to process files as they arrive. The Lambda function sometimes times out when processing large files. The team wants to improve reliability and scalability. Which approach should the team take?

A.Replace Lambda with AWS Batch and use S3 event notifications to trigger the batch job.
B.Use Amazon S3 event notifications to send events to an Amazon SNS topic, which triggers Lambda.
C.Increase the Lambda function timeout to 15 minutes and memory to 3 GB.
D.Use Amazon S3 event notifications to send events to an Amazon SQS queue, and then have Lambda poll the queue in batches.
AnswerD

Decoupling with SQS allows Lambda to process at its own pace.

Why this answer

Option A is correct because S3 event notifications to SQS decouple the producer and consumer, allowing Lambda to poll at its own pace and process in batches. Option B is wrong because SNS is push-based and can still cause timeouts. Option C is wrong because increasing Lambda timeout is a temporary fix.

Option D is wrong because AWS Batch is for long-running batch jobs, not event-driven processing.

378
MCQeasy

A data engineer is troubleshooting a failed AWS Glue Crawler. The crawler logs show 'Insufficient permissions to access S3 bucket'. What should the engineer do to resolve this?

A.Grant the crawler's IAM user access to the bucket
B.Attach a VPC endpoint to the S3 bucket
C.Enable S3 default encryption on the bucket
D.Update the IAM role used by the crawler to include S3 read permissions
AnswerD

The role must have s3:GetObject and s3:ListBucket.

Why this answer

The AWS Glue Crawler uses an IAM role to access data sources. The error 'Insufficient permissions to access S3 bucket' indicates that the IAM role attached to the crawler lacks the necessary S3 read permissions (e.g., s3:GetObject, s3:ListBucket). Updating the IAM role's policy to include these permissions resolves the issue, as the crawler operates under that role, not under a specific IAM user.

Exam trap

The trap here is that candidates may confuse the crawler's execution context with an IAM user, leading them to choose Option A, but AWS Glue Crawlers always run under an IAM role, not a user.

How to eliminate wrong answers

Option A is wrong because AWS Glue Crawlers do not use an IAM user for execution; they use an IAM role. Granting access to an IAM user would not affect the crawler's permissions. Option B is wrong because a VPC endpoint enables private connectivity between a VPC and S3 but does not grant or modify IAM permissions; the error is about authorization, not network connectivity.

Option C is wrong because enabling S3 default encryption controls server-side encryption settings and does not affect IAM permission policies; the crawler still needs explicit read access regardless of encryption.

379
MCQmedium

A company runs a data pipeline using AWS Lambda to process records from an Amazon Kinesis Data Stream. Recently, the Lambda function has been experiencing high invocation errors and the stream is throttling. The function performs simple transformations and writes to Amazon S3. What is the most effective way to reduce throttling and errors?

A.Increase the Lambda function timeout.
B.Enable provisioned concurrency on the Lambda function.
C.Increase the number of shards in the Kinesis stream.
D.Increase the batch size in the Lambda event source mapping.
AnswerD

Larger batch sizes mean fewer invocations, reducing throttling and errors.

Why this answer

Increasing the batch size in the Lambda event source mapping allows each invocation to process more records from the Kinesis stream, reducing the number of total invocations. This lowers the rate at which Lambda polls the stream, which decreases the likelihood of hitting the Kinesis read throughput limits (5 transactions per second per shard) and reduces throttling errors. The simple transformations and S3 writes are likely I/O-bound, so larger batches improve throughput without increasing invocation concurrency.

Exam trap

The trap here is that candidates mistakenly believe throttling is caused by Lambda concurrency limits or cold starts, when in fact the root cause is the Kinesis stream's read throughput limit per shard, which is reduced by increasing the batch size in the event source mapping.

How to eliminate wrong answers

Option A is wrong because increasing the Lambda function timeout does not reduce the invocation rate or the number of concurrent executions; it only allows a single invocation to run longer, which does not address throttling caused by excessive polling or read throughput limits. Option B is wrong because provisioned concurrency pre-warms execution environments to reduce cold starts, but it does not reduce the number of invocations or the rate at which Lambda polls the Kinesis stream; it may even increase concurrency and exacerbate throttling. Option C is wrong because increasing the number of shards would increase the total read throughput capacity of the stream, but it does not reduce the per-shard invocation rate or the number of Lambda invocations; it could actually increase the total number of concurrent invocations, potentially worsening throttling if the batch size remains small.

380
MCQmedium

A data engineer is running an Amazon Athena query that scans a large amount of data in Amazon S3, resulting in high costs. The data is stored in Parquet format in a partitioned table. Which strategy would be MOST effective in reducing the amount of data scanned?

A.Ensure the query includes a WHERE clause that filters on partition columns.
B.Convert the Parquet files to CSV format and apply GZIP compression.
C.Use S3 Intelligent-Tiering storage class to reduce storage costs.
D.Increase the number of partitions by adding more partition columns.
AnswerA

Partition pruning reduces the amount of data scanned.

Why this answer

Option D is correct because using a WHERE clause on partition columns allows Athena to use partition pruning, scanning only the relevant partitions. Option A is incorrect because converting from Parquet to CSV would increase data scanned. Option B is incorrect because increasing the number of partitions without querying on them does not reduce scan.

Option C is incorrect because compressing with GZIP reduces storage size but Athena still decompresses and scans the full data if no partition pruning is used.

381
Multi-Selecthard

A company is running a Redshift cluster and wants to improve query performance for a frequently used dashboard. Which THREE approaches are recommended?

Select 3 answers
A.Enable concurrency scaling
B.Apply column compression encoding
C.Define sort keys on columns used in WHERE clauses
D.Add more nodes to the cluster
E.Choose an appropriate distribution key for large tables
AnswersB, C, E

Reduces I/O and storage.

Why this answer

Option A is correct because distribution keys reduce data movement. Option C is correct because sort keys enable range-restricted scans. Option E is correct because compression reduces I/O.

Option B is wrong because more nodes adds cost and may not be needed. Option D is wrong because concurrency scaling addresses concurrent queries, not single query speed.

382
MCQeasy

A company uses AWS DMS to migrate data from an on-premises Oracle database to Amazon RDS for PostgreSQL. The migration completes successfully, but the target database has inconsistent data. What should the team do to ensure data consistency?

A.Use 'Limited LOB mode' and set the maximum LOB size to a higher value.
B.Enable 'Full LOB mode' in the DMS task settings.
C.Restart the DMS task after truncating the target tables.
D.Configure the DMS task to use 'Full LOB mode' with parallel threads and enable 'BatchApply'.
AnswerD

This ensures all LOBs are migrated and applied efficiently.

Why this answer

Option C is correct because using LOB mode and parallel threads improves consistency and performance. Option A is wrong because full LOB mode can be slow but not cause inconsistency. Option B is wrong because limited LOB mode truncates data.

Option D is wrong because task restart is not a solution for inconsistency.

383
MCQeasy

A data engineer is tasked with setting up a data pipeline that moves data from an on-premises Oracle database to Amazon S3 every hour. The network bandwidth is limited, and the engineer needs to ensure data consistency. Which AWS service should the engineer use?

A.AWS DataSync.
B.Amazon Kinesis Data Firehose.
C.S3 Transfer Acceleration.
D.AWS Database Migration Service (DMS) with change data capture (CDC).
AnswerD

DMS supports continuous replication and ensures data consistency via CDC.

Why this answer

AWS DMS with CDC is the correct choice because it can continuously replicate ongoing changes from an on-premises Oracle database to Amazon S3 while ensuring data consistency. CDC captures only the incremental changes (inserts, updates, deletes) after an initial full load, minimizing the data transferred over limited bandwidth and maintaining transactional integrity.

Exam trap

The trap here is that candidates often confuse AWS DataSync (a file-transfer service) with database replication, or assume S3 Transfer Acceleration can solve bandwidth issues without addressing the need for change data capture and consistency from a live database.

How to eliminate wrong answers

Option A is wrong because AWS DataSync is designed for large-scale file and object transfers between on-premises storage and AWS, not for streaming database changes from a relational database like Oracle. Option B is wrong because Amazon Kinesis Data Firehose is a streaming ingestion service for real-time data into S3, but it cannot directly connect to an on-premises Oracle database or perform change data capture. Option C is wrong because S3 Transfer Acceleration only speeds up uploads to S3 over the public internet by using AWS edge locations; it does not handle database replication, CDC, or data consistency from an on-premises source.

384
MCQhard

A data engineer is troubleshooting an AWS Glue crawler that is not correctly inferring the schema of CSV files stored in Amazon S3. The files have headers, but the crawler is treating the header row as data. The crawler is configured with a custom classifier that has a CSV classifier with 'Column header' set to 'Use first row as header'. What is the most likely reason the crawler is not recognizing the header?

A.The CSV classifier's 'Quote symbol' setting does not match the files.
B.The CSV files have a varying number of columns across rows.
C.The CSV files have a different delimiter than the default comma.
D.The header row contains uppercase letters.
AnswerA

If the classifier expects a quote symbol but the files have none, the classifier may not apply, causing the crawler to treat header as data.

Why this answer

The CSV classifier may not be applied if the file does not match the classifier's 'Quote symbol' or 'Allow single column' settings. Option D is correct because if the classifier expects a quote symbol but the file has none, it may not match. Option A is wrong because the classifier is set to use first row as header.

Option B is wrong because the crawler does not require the header to be uppercase. Option C is wrong because the number of columns is not a header recognition issue.

385
MCQmedium

A company is ingesting streaming data from thousands of IoT devices into Amazon Kinesis Data Streams. The data is processed by a Kinesis Data Analytics application. Recently, the application started reporting high iterator age (millisBehindLatest). Which action would BEST reduce the iterator age?

A.Decrease the data retention period of the Kinesis stream.
B.Increase the data retention period of the Kinesis stream.
C.Increase the record size limit in the Kinesis stream.
D.Increase the number of shards in the Kinesis stream.
AnswerD

More shards allow higher throughput and reduce the backlog, decreasing iterator age.

Why this answer

Option C is correct because increasing the number of shards increases throughput and reduces iterator age. Option A is incorrect because increasing retention does not affect processing speed. Option B is incorrect because decreasing retention may cause data loss.

Option D is incorrect as a larger record size could increase processing time.

386
MCQeasy

A company stores sensitive customer data in an S3 bucket. The data engineer needs to ensure that all data is encrypted at rest. Which S3 feature should be enabled?

A.S3 Versioning
B.S3 Block Public Access
C.Bucket policy requiring aws:SecureTransport
D.Default encryption
AnswerD

Default encryption automatically encrypts new objects.

Why this answer

Option C is correct because default encryption ensures all new objects are encrypted with SSE-S3, SSE-KMS, or SSE-C. Option A is wrong because S3 Block Public Access is a security feature for access control, not encryption. Option B is wrong because bucket policies can enforce encryption, but default encryption is simpler.

Option D is wrong because versioning does not encrypt data.

387
MCQhard

A data pipeline uses Amazon Kinesis Data Firehose to deliver data to an Amazon S3 bucket. The delivery stream is configured with a buffer size of 5 MB and a buffer interval of 60 seconds. The team notices that the S3 objects are much smaller than 5 MB. What is the most likely explanation?

A.The incoming data volume is low, so the 60-second buffer interval triggers delivery before the 5 MB buffer is filled.
B.The S3 bucket has event notifications that split the objects.
C.The S3 bucket has a lifecycle policy that transitions objects to Glacier.
D.The delivery stream is using GZIP compression, which reduces the object size.
AnswerA

Low volume causes frequent small deliveries.

Why this answer

Option A is correct because if the incoming data rate is low, the buffer interval (60 seconds) expires before the buffer size (5 MB) is reached, causing small objects. Option B is wrong because compression would reduce size but not cause small objects. Option C is wrong because S3 events are not related.

Option D is wrong because S3 lifecycle would delete objects, not create small ones.

← PreviousPage 6 of 6 · 387 questions total

Ready to test yourself?

Try a timed practice session using only Data Operations Support questions.