A data engineer notices that an AWS Glue job processing data from an Amazon S3 bucket frequently fails with 'OutOfMemoryError'. The job reads CSV files, applies transformations, and writes Parquet to another S3 bucket. The job has 10 workers of type G.1X. Which change is MOST likely to resolve the issue?
Trap 1: Increase the number of workers to 20
Adding workers distributes the load but does not increase memory per worker; the job still runs out of memory on individual executors.
Trap 2: Change the worker type from G.1X to G.8X
G.8X provides 8x memory but is overkill and more expensive; G.2X is sufficient and cost-effective.
Trap 3: Enable the Spark UI to monitor memory and tune the job
Monitoring helps diagnose but does not fix the memory shortage; action is still needed.
- A
Change the worker type from G.1X to G.2X
G.2X provides 2x the memory of G.1X, directly addressing the OutOfMemoryError.
- B
Increase the number of workers to 20
Why wrong: Adding workers distributes the load but does not increase memory per worker; the job still runs out of memory on individual executors.
- C
Change the worker type from G.1X to G.8X
Why wrong: G.8X provides 8x memory but is overkill and more expensive; G.2X is sufficient and cost-effective.
- D
Enable the Spark UI to monitor memory and tune the job
Why wrong: Monitoring helps diagnose but does not fix the memory shortage; action is still needed.