CCNA Data Store Management Questions — Page 7 of 7

451

MCQhard

A company uses Amazon DynamoDB for a gaming application. The table has a partition key of 'user_id' and a sort key of 'game_timestamp'. The application frequently queries by 'user_id' and filters by 'game_timestamp' within a specific date range. The queries are slow. The table has a global secondary index (GSI) on 'game_timestamp'. What is the most likely cause of the slow queries?

A.The GSI has insufficient read capacity.

B.The GSI is used instead of the base table for queries on 'user_id'.

C.A hot partition exists due to uneven access pattern on 'user_id'.

D.The sort key is not used in the query.

AnswerC

If a few 'user_id' values are accessed frequently, they create hot partitions, slowing queries.

Why this answer

Option B is correct because querying by 'user_id' using the base table is efficient, but if the GSI is being used for queries that filter by 'game_timestamp' without the partition key, the GSI may not be designed optimally. However, the stem says queries are by 'user_id' and filter by 'game_timestamp' – that should use the base table. Option A is not likely because sort key filtering is efficient.

Option C (hot partition) is possible if 'user_id' distribution is skewed. Option D (GSI write capacity) doesn't affect reads. The most likely cause is a hot partition due to uneven 'user_id' distribution.

Practice this question →

452

MCQeasy

A company uses Amazon Redshift for its data warehouse. The data engineering team loads data daily from Amazon S3 using COPY commands. Recently, the load performance has degraded because the S3 bucket contains many small files. The team needs to optimize the COPY operation to improve performance. Which approach should they take?

A.Use Redshift Spectrum to query the data directly from S3 without loading.

B.Increase the number of nodes in the Redshift cluster.

C.Use a manifest file that lists only the necessary files, and consolidate small files into larger ones before loading.

D.Enable automatic compression on the Redshift table.

AnswerC

Fewer large files improve COPY performance.

Why this answer

Option C is correct because the performance degradation is caused by the overhead of processing many small files during the COPY command. Consolidating small files into larger ones (e.g., 100 MB–1 GB each) reduces the number of S3 GET requests and the metadata overhead on Redshift, directly improving load throughput. Using a manifest file further optimizes by explicitly listing only the required files, avoiding unnecessary S3 list operations.

Exam trap

The trap here is that candidates often confuse scaling the cluster (Option B) with optimizing data ingestion, failing to recognize that the bottleneck is the number of S3 objects, not the cluster's compute capacity.

How to eliminate wrong answers

Option A is wrong because Redshift Spectrum queries data in place from S3 without loading it into Redshift tables, which does not optimize the COPY operation for loading data into the warehouse. Option B is wrong because increasing the number of nodes adds compute and storage capacity but does not address the root cause of many small files; the COPY command still suffers from the same per-file overhead regardless of cluster size. Option D is wrong because automatic compression (via the COPY command with the COMPUPDATE option) optimizes column encoding for storage efficiency, not the file-level I/O performance during the load process.

Practice this question →

453

MCQeasy

A data engineer is designing a data lake on Amazon S3 for storing raw sensor data. The data is append-only and accessed infrequently after 30 days. Compliance requires that data be retained for 7 years. Which S3 storage class is the MOST cost-effective for data older than 30 days?

A.S3 Standard-IA

B.S3 Glacier Deep Archive

C.S3 One Zone-IA

D.S3 Intelligent-Tiering

AnswerB

This is the lowest-cost storage class for long-term archival data with infrequent access.

Why this answer

B is correct because Amazon S3 Glacier Deep Archive is the most cost-effective storage class for data that is accessed infrequently and must be retained for long periods (7 years). For data older than 30 days, the retrieval time of 12 hours is acceptable given the append-only, infrequent access pattern, and the storage cost is significantly lower than other classes.

Exam trap

The trap here is that candidates often choose S3 Glacier Flexible Retrieval (not listed) or S3 Standard-IA, mistakenly thinking that faster retrieval is necessary for compliance data, when in fact the 12-hour retrieval time of Glacier Deep Archive is sufficient for infrequent access patterns and offers the lowest cost.

How to eliminate wrong answers

Option A is wrong because S3 Standard-IA is designed for infrequently accessed data but has higher storage costs than Glacier Deep Archive, making it less cost-effective for 7-year retention. Option C is wrong because S3 One Zone-IA does not provide the durability of 99.999999999% across multiple Availability Zones, which is critical for compliance-retained data, and its storage cost is higher than Glacier Deep Archive. Option D is wrong because S3 Intelligent-Tiering automatically moves data between tiers but incurs a monthly monitoring and automation fee per object, and it does not include a Deep Archive tier by default, so it would not achieve the lowest cost for data older than 30 days without manual configuration.

Practice this question →

454

MCQhard

A company is running a production Amazon Aurora PostgreSQL database. The database experiences high write latency during peak hours. The data engineer suspects that the issue is due to a large number of small transactions. Which action would MOST effectively reduce write latency?

A.Enable parallel query for the database

B.Increase the instance size and use Provisioned IOPS storage

C.Enable Aurora Auto Scaling for read replicas

D.Enable Performance Insights to identify the bottleneck

AnswerB

Larger instances provide more CPU and memory, and Provisioned IOPS can reduce I/O latency, helping with write performance under high transaction loads.

Why this answer

Increasing the instance size and using Provisioned IOPS storage directly addresses high write latency by providing more CPU and memory resources to handle transaction processing, while Provisioned IOPS ensures consistent, low-latency I/O for write operations. This is the most effective action because small transactions create high I/O demand, and scaling up the instance with dedicated IOPS reduces contention and write queue depth.

Exam trap

The trap here is that candidates often confuse scaling read replicas (which only help read scaling) with solving write latency, or they mistake monitoring tools (like Performance Insights) for performance fixes, when the real solution is to provision more write capacity through larger instances and dedicated IOPS.

How to eliminate wrong answers

Option A is wrong because parallel query is designed for read-heavy analytical queries, not for reducing write latency from small transactions; it does not improve write throughput or I/O performance. Option C is wrong because Aurora Auto Scaling for read replicas only scales read capacity, not write capacity; write latency is a primary node issue and read replicas do not offload write operations. Option D is wrong because Performance Insights is a monitoring and diagnostic tool that helps identify bottlenecks but does not directly reduce write latency; it provides visibility but no performance improvement.

Practice this question →

455

Multi-Selectmedium

A data engineer is designing a data lake on Amazon S3 that will store sensitive financial data. The engineer needs to implement encryption at rest and ensure that only authorized users can access the data. Which TWO actions should the engineer take to meet these requirements? (Choose TWO.)

Select 2 answers

A.Configure a bucket policy that denies writes if the object is not encrypted.

B.Use server-side encryption with customer-provided keys (SSE-C).

C.Enable S3 Transfer Acceleration for the bucket.

D.Enable object-level access control lists (ACLs).

E.Create IAM policies that grant least privilege access to users.

AnswersA, E

Bucket policies can enforce encryption and control access.

Why this answer

Option A is correct because S3 bucket policies can be used to enforce encryption and control access. Option C is correct because IAM policies define user permissions. Option B is incorrect because SSE-C is not recommended as it requires managing your own keys.

Option D is incorrect because S3 Transfer Acceleration is for speed, not security. Option E is incorrect because ACLs are legacy and not recommended for access control.

Practice this question →

456

MCQmedium

A data engineer is troubleshooting a slow Amazon Redshift query that joins several large tables. The query plan shows a large number of broadcasts. Which design change would most likely reduce the broadcast operations?

A.Change the SORT KEY on all tables to match the join column.

B.Change the DISTSTYLE to EVEN on all tables.

C.Change the DISTKEY on all tables to match the join column.

D.Change the DISTSTYLE to ALL on all large tables.

AnswerC

Matching DISTKEY on join columns ensures data is co-located, avoiding broadcasts.

Why this answer

Option C is correct because setting the DISTKEY on all tables to the join column ensures that rows with the same join key value are co-located on the same compute node. This allows Redshift to perform a collocated join, eliminating the need to broadcast entire tables across the network, which is the primary cause of the slow query.

Exam trap

The trap here is that candidates confuse SORT KEY (which optimizes data skipping and range scans) with DISTKEY (which controls data distribution for joins), leading them to pick Option A, even though broadcast reduction is purely a distribution concern.

How to eliminate wrong answers

Option A is wrong because changing the SORT KEY affects the order of data on disk and can improve range-restricted scans, but it does not influence data distribution across nodes; broadcast operations are caused by distribution mismatches, not sort order. Option B is wrong because changing DISTSTYLE to EVEN distributes rows randomly across nodes, which maximizes the chance that join keys are scattered, forcing Redshift to broadcast rows to satisfy the join. Option D is wrong because changing DISTSTYLE to ALL on large tables copies the entire table to every node, which reduces broadcasts but at the cost of massive storage and maintenance overhead, making it impractical for large tables and often degrading overall performance.

Practice this question →