Exam Name: | Databricks Certified Associate Developer for Apache Spark 3.0 Exam | ||
Exam Code: | Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Dumps | ||
Vendor: | Databricks | Certification: | Databricks Certification |
Questions: | 180 Q&A's | Shared By: | zaviyar |
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of
the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))
In which order should the code blocks shown below be run in order to create a DataFrame that shows the mean of column predError of DataFrame transactionsDf per column storeId and productId,
where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending order by column storeId, leaving out any nulls in that column?
DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
1. .mean("predError")
2. .groupBy("storeId")
3. .orderBy("storeId")
4. transactionsDf.filter(transactionsDf.storeId.isNotNull())
5. .pivot("productId", [2, 3])
The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))