DEA-C01 Data Ingestion and Transformation • Complete Question Bank
Complete DEA-C01 Data Ingestion and Transformation question bank — all 0 questions with answers and detailed explanations.
Refer to the exhibit. "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:GetShardIterator", "kinesis:GetRecords", "kinesis:ListShards" ], "Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/input-stream"
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag a concept onto its matching description — or click a concept then click the description.
Serverless ETL and data catalog
Data warehousing and SQL analytics
Big data processing using Hadoop/Spark
Building and managing data lakes
Real-time streaming data ingestion
Drag a concept onto its matching description — or click a concept then click the description.
Migrate databases with minimal downtime
Physical device for large data transfer
Online data transfer between on-prem and AWS
Fast uploads over long distances
Combine data across sources into views
Drag a concept onto its matching description — or click a concept then click the description.
Relational database with managed operations
NoSQL key-value and document database
In-memory caching for low latency
Graph database for connected data
Time-series data for IoT and analytics
Drag a concept onto its matching description — or click a concept then click the description.
Managed encryption keys
User and role access control
Audit API activity
Discover and protect sensitive data
Web application firewall
Refer to the exhibit. You have the following IAM policy attached to an IAM role used by an AWS Glue job:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-data-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetTable",
"glue:UpdateTable"
],
"Resource": "*"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-data-bucket/*"
},
{
"Effect": "Deny",
"Action": [
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-data-bucket/*"
}
]
}Error Log: [ERROR] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 6, ip-10-0-0-12.ec2.internal, executor 1): java.lang.OutOfMemoryError: Java heap space at org.apache.spark.sql.catalyst.expressions.UnsafeRow.<init>(UnsafeRow.java:42)
Refer to the exhibit.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "arn:aws:glue:us-east-1:123456789012:job/etl-job"
}
]
}
```{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-lake-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/clickstream"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::my-data-bucket/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "*"
}
]
}Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/my-stream"
}
]
}Refer to the exhibit.
{
"Events": [
{
"EventID": "1",
"EventVersion": "1.0",
"EventSource": "aws:s3",
"AwsRegion": "us-east-1",
"EventName": "ObjectCreated:Put",
"UserIdentity": {
"principalId": "AWS:AIDAEXAMPLE"
},
"RequestParameters": {
"sourceIPAddress": "192.0.2.1"
},
"ResponseElements": {
"x-amz-request-id": "EXAMPLE123"
},
"S3": {
"s3SchemaVersion": "1.0",
"bucket": {
"name": "source-bucket",
"arn": "arn:aws:s3:::source-bucket"
},
"object": {
"key": "data/file.csv",
"size": 1024,
"eTag": "abc123",
"sequencer": "0055AED6DCD90281E5"
}
}
}
]
}Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Deny",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::example-bucket/public/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
]
}Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Allow",
"Action": "kinesis:PutRecord",
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/my-stream"
}
]
}Refer to the exhibit. 2019-11-15T10:00:00Z ERROR: Task failed: 'NoneType' object has no attribute 'read'
Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/my-stream"
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:DescribeStream",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/my-stream"
},
{
"Effect": "Allow",
"Action": [
"firehose:PutRecord",
"firehose:PutRecordBatch"
],
"Resource": "arn:aws:firehose:us-east-1:123456789012:deliverystream/my-firehose"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}Refer to the exhibit. Error log from AWS Glue job: ``` An error occurred while calling o123.pyWriteDynamicFrame. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 8, ip-10-0-1-45.ec2.internal): java.lang.OutOfMemoryError: Java heap space ```
Refer to the exhibit.
IAM policy for an IAM role used by an AWS Glue crawler:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-data-lake/*",
"arn:aws:s3:::my-data-lake"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable"
],
"Resource": "*"
}
]
}
```{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-bucket",
"arn:aws:s3:::data-bucket/*"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/input-stream"
}
]
}Refer to the exhibit.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::data-lake-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/input-stream"
}
]
}
```Refer to the exhibit.
```
{
"Records": [
{
"eventVersion": "2.0",
"eventSource": "aws:s3",
"awsRegion": "us-east-1",
"eventName": "ObjectCreated:Put",
"s3": {
"s3SchemaVersion": "1.0",
"bucket": {
"name": "my-bucket",
"arn": "arn:aws:s3:::my-bucket"
},
"object": {
"key": "data/2024/01/01/file.json",
"size": 1024,
"eTag": "abc123"
}
}
}
]
}
```Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "*"
}
]
}Refer to the exhibit. Resource: "arn:aws:logs:us-east-1:123456789012:log-group:my-log-group:*"
Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-data-lake/*"
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "*"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-lake-primary/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/clickstream"
}
]
}CREATE EXTERNAL TABLE IF NOT EXISTS my_database.sales ( order_id INT, customer_name STRING, product STRING, amount DECIMAL(10,2), order_date STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim' = ',' ) LOCATION 's3://my-bucket/sales/'
Refer to the exhibit.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "*"
}
]
}
```Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"glue:StartJobRun",
"glue:GetJobRun"
],
"Resource": "*"
}
]
}Refer to the exhibit.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-lake-bucket/*"
}
]
}Refer to the exhibit.
IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetRecords",
"kinesis:GetShardIterator"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/input-stream"
}
]
}Refer to the exhibit.
AWS Glue Job Script (PySpark):
```
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sales_db", table_name = "orders", transformation_ctx = "datasource0")
datasource1 = glueContext.create_dynamic_frame.from_options(connection_type = "s3", connection_options = {"paths": ["s3://data-lake/raw/"]}, format = "json", transformation_ctx = "datasource1")
job.commit()
```Refer to the exhibit.
Kinesis Data Firehose Configuration (Partial):
- DeliveryStreamName: "my-firehose"
- Destination: "s3"
- S3DestinationConfiguration:
BucketARN: "arn:aws:s3:::data-bucket"
Prefix: "data/"
ErrorOutputPrefix: "errors/"
BufferingHints:
IntervalInSeconds: 300
SizeInMBs: 5
- ProcessingConfiguration:
Enabled: True
Processors:
- Type: "Lambda"
Parameters:
- ParameterName: "LambdaArn"
ParameterValue: "arn:aws:lambda:us-east-1:123456789012:function:transform"