A company runs a daily batch ETL job using AWS Glue. The job processes 500 GB of data from Amazon RDS to Amazon S3. The job currently uses a single DPU and takes 6 hours to complete. The team wants to reduce runtime to under 1 hour without increasing costs significantly. Which approach should they use?
More workers enable parallelism, reducing runtime.
Why this answer
Option D is correct because increasing the number of workers (DPUs) allows parallel processing, reducing runtime. Option A is wrong because the job type (Spark vs Python) affects resource usage but increasing workers is more direct. Option B is wrong because using Spark in Glue (which is default) already offers parallelism.
Option C is wrong because using a larger instance type for RDS may improve read throughput but is not a Glue optimization and could increase database cost.