Olete.in
Articles
Mock Tests
🧪 Apache Hadoop MCQ Quiz Hub
Hadoop Multiple Choice Question
Choose a topic to test your knowledge and improve your Apache Hadoop skills
1. What does commodity Hardware in Hadoop world mean?
Very cheap hardware
Industry standard hardware
Discarded hardware
Low specifications Industry grade hardware
2. Which of the following are NOT big data problem(s)?
Parsing 5 MB XML file every 5 minutes
Processing IPL tweet sentiments
Processing online bank transactions
both (a) and (c)
3. What does “Velocity” in Big Data mean?
Speed of input data generation
Speed of individual machine processors
Speed of ONLY storing data
Speed of storing and processing data
4. The term Big Data first originated from:
Stock Markets Domain
Banking and Finance Domain
Genomics and Astronomy Domain
Social Media Domain
5. Which of the following Batch Processing instance is NOT an example of Big Data Batch Processing?
Processing 10 GB sales data every 6 hours
Processing flights sensor data
Web crawling app
Trending topic analysis of tweets for last 15 minutes
6. Which of the following are example(s) of Real Time Big Data Processing?
Complex Event Processing (CEP) platforms
Stock market data analysis
Bank fraud transactions detection
both (a) and (c)
7. Sliding window operations typically fall in the category of__________________.
OLTP Transactions
Big Data Batch Processing
Big Data Real Time Processing
Small Batch Processing
8. What is HBase used as?
Tool for Random and Fast Read/Write operations in Hadoop
Faster Read only query engine in Hadoop
MapReduce alternative in Hadoop
Fast MapReduce layer in Hadoop
9. What is Hive used as?
Hadoop query engine
MapReduce wrapper
Hadoop SQL interface
All of the above
10. Which of the following are NOT true for Hadoop?
It’s a tool for Big Data analysis
It supports structured and unstructured data analysis
It aims for vertical scaling out/in scenarios
Both (a) and (c)
11. Which of the following are the core components of Hadoop?
HDFS
Map Reduce
HBase
Both (a) and (b)
12. Hadoop is open source.
ALWAYS True
True only for Apache Hadoop
True only for Apache and Cloudera Hadoop
ALWAYS False
13. Hive can be used for real time queries.
True
False
True if a data set is small
True for some distributions
14. What is the default HDFS block size?
32 MB
64 KB
128 KB
64 MB
15. What is the default HDFS replication factor?
4
1
3
2
16. Which of the following is NOT a type of metadata in NameNode?
List of files
Block locations of files
No. of file records
File access control information
17. Which of the following is/are correct?
NameNode is the SPOF in Hadoop 1.x
NameNode is the SPOF in Hadoop 2.x
NameNode keeps the image of the file system also
Both (a) and (c)
18. The mechanism used to create replica in HDFS is____________.
Gossip protocol
Replicate protocol
HDFS protocol
Store and Forward protocol
19. NameNode tries to keep the first copy of data nearest to the client machine.
ALWAYS true
ALWAYS False
True if the client machine is the part of the cluster
True if the client machine is not the part of the cluster
20. Where is the HDFS replication factor controlled?
mapred-site.xml
yarn-site.xml
core-site.xml
hdfs-site.xml
21. Which of the following Hadoop config files is used to define the heap size?
hdfs-site.xml
core-site.xml
hadoop-env.sh
Slaves
22. Which of the following is not a valid Hadoop config file?
mapred-site.xml
hadoop-site.xml
core-site.xml
Masters
23. Read the statement: NameNodes are usually high storage machines in the clusters.
True
False
Depends on cluster size
True if co-located with Job tracker
24. From the options listed below, select the suitable data sources for the flume.
Publicly open web sites
Local data folders
Remote web servers
Both (a) and (c)
25. Read the statement and select the correct options: distcp command ALWAYS needs fully qualified hdfs paths.
True
False
True, if source and destination are in the same cluster
False, if source and destination are in the same cluster
26. Which of following statement(s) are true about distcp command? (A)
It invokes MapReduce in background
It invokes MapReduce if source and destination are in the same cluster
It can’t copy data from the local folder to hdfs folder d)
You can’t overwrite the files through distcp command
27. Which of the following is NOT the component of Flume? (B)
Sink
Database
Source
Channel
28. Which of the following is the correct sequence of MapReduce flow?
Combine ??Reduce ??Map
Map ??Combine ??Reduce
Reduce ??Combine ??Map
None of These
29. Which of the following can be used to control the number of part files in a map reduce program output directory?
Number of Mappers
Number of Reducers
Counter
Partitioner
30. Which of the following operations can’t use Reducer as combiner also?
Group by Minimum
Group by Maximum
Group by Count
Group by Average
31. Which of the following is/are true about combiners?
Combiners can be used for mapper only job
Combiners can be used for any Map Reduce operation
Mappers can be used as a combiner class
Combiners are primarily aimed to improve Map Reduce performance
32. Reduce side join is useful for
Very large datasets
Very small data sets
One small and other big data sets
One big and other small datasets
33. Distributed Cache can be used in
Mapper phase only
Reducer phase only
In either phase, but not on both sides simultaneously
In either phase
34. What is the optimal size of a file for distributed cache?
<=10 MB
>=250 MB
<=100 MB
<=35 MB
35. Number of mappers is decided by the
Mappers specified by the programmer
Available Mapper slots
Available heap memory
Input Splits
36. Which of the following type of joins can be performed in Reduce side join operation?
Equi Join
Left Outer Join
Full Outer Join
All of the above
37. What should be an upper limit for counters of a Map Reduce job?
~5s
~15
~150
~50
38. Which of the following class is responsible for converting inputs to key-value Pairs of Map Reduce
FileInputFormat
InputSplit
RecordReader
Mapper
39. Which of the following writable can be used to know the value from a mapper/reducer?
Text
IntWritable
Nullwritable
String
40. A Map reduce job can be written in:
Java
Ruby
Python
Any Language which can read from input stream
41. Pig is a:
Programming Language
Data Flow Language
Query Language
Database
42. Pig is good for:
Data Factory operations
Creating multiple datasets from a single large dataset
Implementing complex SQLs
Both (A) and (B)
43. Which of the following is the correct representation to access ‘’Skill” from the Bag {‘Skills’,55, (‘Skill’, ‘Speed’), {2, (‘San’, ‘Mateo’)}}
$3.$1
$3.$0
$2.$0
$2.$1
44. Maximum size allowed for small dataset in replicated join is:
10KB
10 MB
100 MB
500 MB
45. Parameters could be passed to Pig scripts from:
Parent Pig Scripts
Shell Script
Configuration File
All the above except (a)
46. The schema of a relation can be examined through:
ILLUSTRATE
DESCRIBE
DUMP
EXPLAIN
47. Data can be supplied to PigUnit tests from:
HDFS Location
Within Program
Both (a) and (b)
None of the above
48. Which of the following constructs are valid Pig Control Structures?
f-else
For Loop
Until Loop
None of the above
49. Which of following is the return data type of Filter UDF?
String
Integer
Boolean
None of the above
50. Which of the following are not possible in Hive?
Creating Synonym
Writing Update Statements
Creating Indexes
Both (a) and (b)
51. Who will initiate the mapper?
Task tracker
Job tracker
Combiner
Reducer
52. Which of the following are the Big Data Solutions Candidates?
Processing 30 minutes Flight sensor data
Interconnecting 50K data points (approx. 1 MB input file)
Processing User clicks on a website
All of the above
53. Hadoop is a framework that allows the distributed processing of:
Small Data Sets
Semi-Large Data Sets
Large Data Sets
Large and Small Data sets
54. Which of the following are NOT metadata items?
HDFS block locations
Replication factor of files
Access Rights
File Records distribution
55. What decides number of Mappers for a MapReduce job?
File Location
mapred.map.tasks parameter
Input file size
Input Splits
56. Name Node monitors block replication process
TRUE
FALSE
Depends on file type
All of the above
57. Which of the following are true for Hadoop Pseudo Distributed Mode?
It runs on multiple machines
Runs on multiple machines without any daemons
Runs on Single Machine with all daemons
Runs on Single Machine without all daemons
58. Which of following statement(s) are correct?
Master and slaves files are optional in Hadoop 2.x
Master file has list of all name nodes
Core-site has hdfs and MapReduce related common properties
hdfs-site file is now deprecated in Hadoop 2.x
59. Which of the following is true for Hive?
Hive is the database of Hadoop
Hive supports schema checking
Hive doesn’t allow row level updates
Hive can replace an OLTP system
60. Which of the following is the highest level of Data Model in Hive?
Table
View
Database
Partitions
61. Hive queries response time is in order of
Hours at least
Minutes at least
Seconds at least
Milliseconds at least
62. Managed tables in Hive:
Can load the data only from HDFS
Can load the data only from local file system
Are useful for enterprise wide data
Are Managed by Hive for their data and metadata
Submit