Olete.in
Articles
Mock Tests
🧪 Databricks MCQ Quiz Hub
Databricks Mcq Question Set 2
Choose a topic to test your knowledge and improve your Databricks skills
1. The code below should return a new DataFrame with 50 percent of random records from DataFrame df without replacement.
df.sample(False, 0.5, 5)
df.random(False, 0.5, 5)
df.sample(False, 5, 25)
df.sample(False, 50, 5)
2. Which of the following DataFrame commands will NOT generate a shuffle of data from each executor across the cluster?
df.map()
df.collect()
df.orderBy()
df.repartition()
3. Which of the following DataFrame commands is a narrow transform?
df.drop()
df.collect()
df.orderBy()
df.repartition()
4. Which of the following DataFrame commands is a wide transform?
df.drop()
df.contains()
df.filter()
df.repartition()
5. When Spark runs in Cluster Mode, which of the following statements about nodes is correct ?
There is one single worker node that contains the Spark driver and all the executors.
The Spark Driver runs in a worker node inside the cluster.
There is always more than one worker node.
There are less executors than total number of worker nodes.
6. The DataFrame df includes a time string column named timestamp_1. Which is the correct syntax that creates a new DataFrame df1 that is just made by the time string field converted to a unix timestamp?
df1 = df.select(unix_timestamp(col(“timestamp_1″),”MM-dd-yyyy HH:mm:ss”).as(“timestamp_1”))
df1 = df.select(unix_timestamp(col(“timestamp_1″),”MM-dd-yyyy HH:mm:ss”, “America/Los Angeles”).alias(“timestamp_1”))
df1 = df.select(unix_timestamp(col(“timestamp_1″),”America/Los Angeles”).alias(“timestamp_1”))
df1 = df.select(unix_timestamp(col(“timestamp_1″),”MM-dd-yyyy HH:mm:ss”).alias(“timestamp_1”))
7. If you wanted to: 1. Cache a df as SERIALIZED Java objects in the JVM and; 2. If the df does not fit in memory, store the partitions that don’t fit on disk, and read them from there when they’re needed; 3. Replicate each partition on two cluster nodes. which command would you choose ?
df.persist(StorageLevel.MEMORY_ONLY)
df.persist(StorageLevel.MEMORY_AND_DISK_SER)
df.persist(StorageLevel.MEMORY_AND_DISK_2_SER)
df.cache(StorageLevel.MEMORY_AND_DISK_2_SER)
8. Spark is best suited for ______ data.
Real-time
Virtual
Structured
All of the above
9. Which of the following Features of Apache Spark?
Speed
Supports multiple languages
Advanced Analytics
All of the above
10. In how many ways Spark uses Hadoop?
2
3
4
5
11. When was Apache Spark developed ?
2007
2008
2009
2010
12. Which of the following is incorrect way for Spark deployment?
Standalone
Hadoop Yarn
Spark in MapReduce
Spark SQL
13. _____ is a distributed graph processing framework on top of Spark.
MLlib
Spark Streaming
GraphX
none of the above
14. Point out the correct statement.
Spark enables Apache Hive users to run their unmodified queries much faster
Spark interoperates only with Hadoop
Spark is a popular data warehouse solution running on top of Hadoop
All of the above
15. Which of the following is True regarding Azure Cosmos DB?
It supports relational data model.
It can be scaled horizontally.
The data can be distributed to fixed number of Azure regions.
Both a and b
16. Which one of the following is a data model supported by Azure Cosmos DB?
key-value
graph
table
All of the above
17. Which container is supported in Azure Cosmos DB?
Fixed container
Unlimited Container
Both Fixed container and Unlimited Container
Neither Fixed container nor Unlimited Container
18. What is the maximum size of graph DB that a Fixed Container in cosmos DB can store?
10GB
15GB
100GB
50GB
19. Select the API from the following options that Azure Cosmos DB support?
Gremlin API
Apache Cassandra API
Mongo DB API
All of the above
20. Which one of the following logical partitions a single container in cosmos DB cannot have?
PHYSICAL
SALES
HR
Product backlog
21. Elastically scalable throughput and storage is possible in:
Azure Cosmos DB Graph API
Azure Cosmos DB SQL API
Both A and B
none of the above
22. In which of the APIs of Azure Cosmos DB, is automatic indexing possible?
Graph API, SQL API, Table API
Graph API, SQL API
Graph API
SQL API, Table API
23. What is the maximum latency limit for reads and writes in case of Azure Cosmos DB Table API?
100ms
10ms
50ms
5ms
24. Which one of the following is the feature of Azure Cosmos DB Graph API?
Automatic indexing
Fully managed
Multi-region replication
All of the above
25. Which one of the following is not correct regarding Azure storage?
Datas are highly available.
Storing datas in Azure Storage is secured.
No redundant data
None of these.
26. Which one of the following is the data service provided by Azure Storage platform?
Azure Blobs
Azure Tables
Azure Queues
All of these.
27. Which one of the following provides block level storage volumes for Azure VMs?
Azure Disks
Azure Blobs
Azure Queues
Azure Tables
28. Which one of the following is most preferred for storing streaming videos and audios?
Azure Files
Azure Queues
Azure Blobs
Azure Tables
29. How many replication options are there while creating an Azure storage account?
2
5
8
4
30. While creating Azure Storage account, which replication option is the cheapest one?
Zone redundant storage
Locally redundant storage
Geo redundant storage
Read access geo redundant storage
31. How many copies of data are created in case of geo redundant storage replication?
6
2
3
4
32. Choose the incorrect option regarding Zone redundant storage replication.
It can be used for blobs only
3 copies of data are created.
Copies of data must be created in the facilities of same region.
None of these
33. What can be the maximum size of a queue message?
256 KB
64 KB
128 KB
No maximum size is there.
34. Choose the correct option regarding Azure Storage.
It is possible to have role based access control for Blob and Queue storage service of Azure.
Shared key authorization is also possible.
It is possible to specify a container and its blob public.
All of these.
35. Which one of the following is an orchestration software which can be used for scaling containers?
Azure Batch.
Azure Kubernetes.
Azure Data Factory.
Azure key vault.
36. What is the basic operational unit of Kubernetes?
Pod
Container
Nodes
Task
37. Which one of the following can be done for a container based application using Azure Kubernetes?
Making container scalability easy.
Make workloads portable.
Build more extensible apps.
All of the above.
38. Which one of the following helps to set up cluster autoscaler for adding capacity as per demand?
Virtual nodes
VM Scale sets
Container
None of the above.
39. Which one of the following is incorrect regarding Azure Kubernetes?
Azure Kubernetes does not mandatorily need resources to be created in cloud.
Azure Kubernetes manages and makes deployment of container based applications easy.
Azure Kubernetes helps in automatic scheduling of container based application.
None of these.
40. Choose the correct option.
Azure Kubernetes is an open source platform.
etcd is used to maintain the state of Kubernetes cluster and configuration.
Both A and B.
Neither A nor B.
41. Choose the wrong statement regarding Azure Kubernetes.
Use of Azure Kubernetes demands a ver low minimum monthly charge.
It can integrate with Visual Studio Code.
It provides elastic scalability.
None of these.
42. Which one of the following is correct regarding clusters of Azure Kubernetes?
Cluster name need not be unique within the selected resource group.
Azure CLI can be used to create clusters.
Both A and B.
Neither A nor B.
43. Choose the correct option.
It can integrate with Azure Active Directory.
Role based access control is possible in Azure Kubernetes.
Both A and B.
None of these.
44. Which one of the following can be considered as the primary data store of Kubernetes?
node
pod
VM scale sets.
etcd
45. Which of the following Azure services is used for performing high performance parallel computing jobs in the cloud?
Azure Batch Service.
Azure Kubernetes Service.
Azure Key Vault.
Azure App Services.
46. What are compute nodes in Azure Batch?
Applications
Job
Virtual Machines
Task
47. How data files stored in Azure blob storage are being accessed by Azure Batch?
Separate software needs to be installed.
Azure Batch already has built in support for accessing those files.
With the help of Compute nodes.
Azure Batch cannot access those files.
48. For which of the following options, Azure Batch can be used?
Fluid Dynamics
Image processing.
Software test execution.
All of the above.
49. What does Azure Batch provide by default for parallelization?
cluster
container
multiple nodes
None of these.
50. Which one of the following is incorrect regarding Azure Batch?
It allows to run large-scale parallel workloads although the cost is high.
Auto scaling is possible whenever required.
By auto scaling, it means that it can provide more nodes if the number of queued tasks are more.
None of these.
51. With which of the following can Azure Batch integrate for fetching data?
Azure Blob Storage only
Azure Data Lake storage only
Both Azure Blob Storage and Azure Data Lake Storage.
Neither Azure Blob Storage nor Azure Data Lake Storage.
52. Which type of node is not supported by Azure Batch?
Linux nodes
Windows nodes
Dockers
None of these.
53. Choose the correct option.
A task is a collection of jobs.
A job is a collection of tasks.
A job is a collection of compute nodes.
A task is a collection of compute nodes.
54. Choose the correct option.
Azure Batch is a non visual tool.
Azure Btach allows users to fully configure the nodes.
Azure Batch provides job scheduling and automatically scales and manages the VMs running these jobs.
All of these.
Submit