A company uses AWS Glue to run ETL jobs that prepare data for machine learning. The source data in S3 has a schema that evolves over time (new columns are added occasionally). The Glue job schema is defined as a fixed schema in the job script. After an update to the source data, the Glue job fails with an error about mismatched schemas. How should the data engineer modify the data preparation process to handle schema evolution?
Dynamic frames with schema detection can adapt to schema changes.
Why this answer
Option A is correct because AWS Glue DynamicFrames natively handle schema evolution by allowing you to apply a mapping that can include new columns. By using `applyMapping` with `resolveChoice`, you can define how to handle new fields (e.g., cast to a type or keep as a struct), preventing job failures when the source schema changes. This avoids the rigidity of a fixed schema in the job script.
Exam trap
The trap here is that candidates often assume updating the Data Catalog via a crawler is sufficient, but they miss that the job script's fixed schema must also be updated or made dynamic to avoid mismatches.
How to eliminate wrong answers
Option B is wrong because running a Glue crawler updates the Data Catalog but does not automatically adapt the fixed schema defined in the job script; the job will still fail if the script expects a specific schema. Option C is wrong because storing the schema in a separate S3 file and reading it at runtime still requires manual updates to that file when the schema changes, which does not provide dynamic adaptation. Option D is wrong because manually updating the job script each time the schema changes is error-prone, not scalable, and defeats the purpose of automated ETL processing.