Exam Name: | Databricks Certified Machine Learning Associate Exam | ||
Exam Code: | Databricks-Machine-Learning-Associate Dumps | ||
Vendor: | Databricks | Certification: | ML Data Scientist |
Questions: | 74 Q&A's | Shared By: | syeda |
A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.
Which of the following code blocks will accomplish this task?
A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model bycomparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.
Which of the following possible explanations for this difference is invalid?
A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.
They have developed the following code block to accomplish this task:
The code block is not accomplishing the task.
Which reasons describes why the code block is not accomplishing the imputation task?
A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?