Olete.in – MCQs, Mock Tests & Government Job Prep| Apache Spark Apache Spark

1. How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can?
10 times faster
20 times faster
100 times faster
200 times faster

2. Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics.
RDD
GraphX
Spark Streaming
Spark R

3. Which of the following is the reason for Spark being Speedy than MapReduce?
DAG execution engine and in-memory computation
Support for different language APIs like Scala, Java, Python and R
RDDs are immutable and fault-tolerant
None of the above

4. Can you combine the libraries of Apache Spark into the same Application, for example, MLlib, GraphX, SQL and DataFrames etc.
yes
no
none
None of These

5. Which of the following is true for RDD? None of the above
RDD is programming paradigm
RDD in Apache Spark is an immutable collection of objects
It is database
None of the above

6. Which of the following is not a function of Spark Context in Apache Spark?
Entry point to Spark SQL
To Access various services
To set the configuration
To get the current status of Spark Application

7. What are the features of Spark RDD?
In-memory computation
Lazy evaluations
Fault Tolerance
All of the above

8. How many Spark Context can be active per JVM?
More than one
Only one
Not specific
None of the above

9. In how many ways RDD can be created?
4
3
2
1

10. How many tasks does Spark run on each partition?
Any number of task
one
More than one less than five
None of These

11. Which of the following is not a transformation?
Flatmap
Map
Reduce
Filter

12. Which of the following is not an action?
collect()
take(n)
top()
map

13. You can connect R program to a Spark cluster from –
RStudio
R Shell
Rscript
All of the above

14. Which of the following is true for RDD? None of the above
RDD is programming paradigm
RDD in Apache Spark is an immutable collection of objects
It is database
None of the above

15. For Multiclass classification problem which algorithm is not the solution?
Naive Bayes
Random Forests
Logistic Regression
Decision Trees

16. For Regression problem which algorithm is not the solution?
Logistic Regression
Ridge Regression
Decision Trees
Gradient-Boosted Trees

17. Which of the following is true about DataFrame?
Data Frames provide a more user-friendly API than RDDs.
Data Frame API have provision for compile-time type safety
Both the above
None of the above

18. Which of the following is a tool of Machine Learning Library?
Persistence
Utilities like linear algebra, statistics
Pipelines
All of the above

19. Which of the following is false for Apache Spark?
It provides high-level API in Java, Python, R, Scala
It can be integrated with Hadoop and can process existing Hadoop HDFS data
Spark is an open source framework which is written in Java
Spark is 100 times faster than Bigdata Hadoop

20. Which of the following is true for Spark SQL?
It is the kernel of Spark
Provides an execution platform for all the Spark applications
It enables users to run SQL / HQL queries on the top of Spark.
It enables users to run SQL / HQL queries on the top of Spark.

21. Which of the following is true for Spark core?
It is the kernel of Spark
It enables users to run SQL / HQL queries on the top of Spark.
It is the scalable machine learning library which delivers efficiencies
Improves the performance of iterative algorithm drastically.

22. Which of the following is true for Spark R?
It allows data scientists to analyze large datasets and interactively run jobs
It is the kernel of Spark
It is the scalable machine learning library which delivers efficiencies
It enables users to run SQL / HQL queries on the top of Spark.

23. Which of the following is true for Spark MLlib?
Provides an execution platform for all the Spark application
It is the scalable machine learning library which delivers efficiencies
enables powerful interactive and data analytics application across live streaming data
All of the above

24. Which of the following is true for Spark Shell?
It helps Spark applications to easily run on the command line of the system
It runs/tests application code interactively
It allows reading from many types of data sources
All of the above

25. Which of the following is true for RDD? We can operate Spark RDDs in parallel with a low-level API
We can operate Spark RDDs in parallel with a low-level API
RDDs are similar to the table in a relational database
It allows processing of a large amount of structured data
It has built-in optimization engine

26. In which of the following cases do we keep the data in-memory?
Iterative algorithms
Interactive data mining tools
Both the above
None of These

27. When does Apache Spark evaluate RDD?
Upon action
Upon transformation
On both transformation and action
None of the above

28. The write operation on RDD is
Fine-grained
Coarse-grained
Either fine-grained or coarse-grained
Neither fine-grained nor coarse-grained

29. What is action in Spark RDD?
The ways to send result from executors to the driver
Takes RDD as input and produces one or more RDD as output.
Creates one or many new RDDs
All of the above

30. Which of the following is true about narrow transformation –
The data required to compute resides on multiple partitions.
The data required to compute resides on the single partition.
Both the above
None

31. Which of the following is true about wide transformation –
The data required to compute resides on multiple partitions.
The data required to compute resides on the single partition.
Both 1 and 2
None of the both

32. The shortcomings of Hadoop MapReduce was overcome by Spark RDD by All of the above
Lazy-evaluation
DAG
In-memory processing
All of the above

33. Which of the following is the entry point of Spark Application –
SparkSession
SparkContext
None of the both
Only 1

34. Which of the following is the entry point of Spark SQL?
SparkSession
SparkContext
Both 1 and 2
None

35. Which of the following is open-source?
Apache Spark
Apache Hadoop
Apache Flink
All of the above

36. Apache Spark supports –
Batch processing
Stream processing
Graph processing
All of the above

37. Which of the following is not true for map() Operation?
Map transforms an RDD of length N into another RDD of length N.
In the Map operation developer can define his own custom business logic.
It applies to each element of RDD and it returns the result as new RDD
Map allows returning 0, 1 or more elements from map function.

38. FlatMap transforms an RDD of length N into another RDD of length M. which of the following is true for N and M. a. N>M b. N<M c. N<=M
Either a or b
Either b or c
Either a or c
None of the above

39. FlatMap transforms an RDD of length N into another RDD of length M. which of the following is true for N and M. a. N>M b. N<M c. N<=M
Either a or b
Either b or c
Either a or c
None of the above

40. Which of the following is a transformation?
take(n)
top()
countByValue()
mapPartitionWithIndex()

41. Which of the following is action?
Union(dataset)
Intersection(other-dataset)
Distinct()
CountByValue()

42. In which of the following Action the result is not returned to the driver.
collect()
top()
countByValue()
foreach()