🧪 Databricks MCQ Quiz Hub

Databricks Mcq Question Set 1

Choose a topic to test your knowledge and improve your Databricks skills

1. Which one of the following is not a operations that can be performed using Azure Databricks?




2. To which one of the following sources do Azure Databricks connect for collecting streaming data?




3. Which one of the following is a Databrick concept?




4. Which of the following ensures data reliability even after termination of cluster in Azure Databricks?




5. Choose the correct option with respect to ETL operations of data in Azure Databricks?




6. Which one of the following is incorrect regarding Workspace of Azure Databricks concept?




7. Which of the following Azure datasources can be connected to Azure Databricks?




8. Streaming data can be captured by?




9. Authentication and authorization in databricks can be managed for :




10. Which one of the following is a set of components that run on clusters of Azure Databricks?




11. Spark was initially started by ______ at UC Berkeley AMPLab in 2009.




12. ______ is a component on top of Spark Core.




13. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.




14. _______ leverages Spark Core fast scheduling capability to perform streaming analytics.




15. ____ is a distributed machine learning framework on top of Spark.




16. Given a dataframe df, select the code that returns its number of rows:




17. Users can easily run Spark on top of Amazon’s _____




18. Which of the following can be used to launch Spark jobs inside MapReduce?




19. Which of the following language is not supported by Spark?




20. Spark is packaged with higher level libraries, including support for _________ queries.




21. Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.




22. Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:




23. Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.




24. Spark powers a stack of high-level tools including Spark SQL, MLlib for _____




25. For Multiclass classification problem which algorithm is not the solution?




26. Which of the following is a tool of Machine Learning Library?




27. Which of the following is true for Spark core?




28. Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.




29. Which of the following is true for Spark MLlib?




30. Which of the following is true for RDD?




31. RDD is fault-tolerant and immutable




32. The read operation on RDD is




33. The write operation on RDD is




34. Which one of the following commands does NOT trigger an eager evaluation?




35. Which one of the following command triggers an eager evaluation?




36. Is it possible to mitigate stragglers in RDD?




37. Fault Tolerance in RDD is achieved using




38. What is action in Spark RDD?




39. The shortcomings of Hadoop MapReduce was overcome by Spark RDD by




40. Spark is developed in which language




41. Which of the following is NOT an actions




42. Which of the following is an actions




43. Which of the following is a transformation?




44. Which of the following is not a component of the Spark Ecosystem?




45. Which of the following algorithm is not present in MLlib?




46. Which of the following is not the feature of Spark?




47. Which of the following is the reason for Spark being Speedy than MapReduce?




48. Which of the following statements are NOT true for broadcast variables ?




49. Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.




50. broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered