DA0-001 Analyzing and Modeling Data • Complete Question Bank
Complete DA0-001 Analyzing and Modeling Data question bank — all 0 questions with answers and detailed explanations.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag a concept onto its matching description — or click a concept then click the description.
Ensures data quality and adherence to policies
Manages technical environment and data access
Has accountability for specific data assets
Sets strategic direction for data management
Designs data structures and integration processes
Drag a concept onto its matching description — or click a concept then click the description.
Each member has equal chance of selection
Population divided into subgroups; random sample from each
Randomly select entire groups (clusters)
Select every k-th element from a list
Sample based on ease of access
SELECT department, COUNT(*) as employee_count FROM employees WHERE hire_year > 2020 GROUP BY department HAVING COUNT(*) > 5;
{"model_type": "random_forest", "n_estimators": 100, "max_depth": 5, "criterion": "gini"}2024-01-15 10:23:45 ERROR: DataTypeMismatchException - Column 'age' contains mixed data types: INT and VARCHAR. Pipeline 'user_profile_etl' failed.
Refer to the exhibit.
Python pandas code and output:
```
import pandas as pd
df = pd.read_csv('employees.csv')
df['salary'].fillna(df['salary'].median(), inplace=True)
print(df['salary'].describe())
```
Output:
```
count 1000.000000
mean 55000.000000
std 15000.000000
min 25000.000000
25% 45000.000000
50% 52000.000000
75% 65000.000000
max 120000.000000
Name: salary, dtype: float64
```Refer to the exhibit.
JSON policy:
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::data-lake/*"
}
]
}
```Refer to the exhibit. SELECT customer_id, COUNT(order_id) AS order_count FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31' GROUP BY customer_id HAVING COUNT(order_id) > 5;
Refer to the exhibit.
Call:
lm(formula = price ~ sqft_living + bedrooms + bathrooms, data = housing)
Residuals:
Min 1Q Median 3Q Max
-1.2345 -0.3456 -0.0123 0.3456 2.3456
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.123456 0.012345 10.000 <2e-16 ***
sqft_living 0.001234 0.000123 10.000 <2e-16 ***
bedrooms -0.056789 0.012345 -4.600 4.23e-06 ***
bathrooms 0.234567 0.045678 5.135 3.45e-07 ***
--
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4567 on 496 degrees of freedom
Multiple R-squared: 0.789, Adjusted R-squared: 0.787
F-statistic: 617.8 on 3 and 496 DF, p-value: < 2.2e-16Refer to the exhibit.
import pandas as pd
df = pd.read_csv('data.csv')
df['total'] = df['price'] * df['quantity']
df.head()