[MCQs] Big Data - Last Moment Tuitions (2024)

1.A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication
Answer: b
Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

2.Point out the correct statement.
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: There can be any number of DataNodes in a Hadoop Cluster.

3.HDFS works in a __________ fashion.
a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned
Answer: a
Explanation: NameNode servers as the master and each DataNode servers as a worker/slave

________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned
Answer: c
Explanation: Secondary namenode is used for all time availability and reliability.

Point out the wrong statement.
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
Answer: d
Explanation: NameNode is aware of the files to which the blocks stored on it belong to.

Which of the following scenario may not be a good fit for HDFS?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned
Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

Learn Big Data Analytics from Scratch

Understand the Concept of Big Data Analytics in Detail [Videos + Notes] Click Here!

Point out the correct statement.
a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows information about the NameNode itself.

For ________ the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned
Answer: a
Explanation: HBase Master UI provides information about the number of live, dead and transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.

During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
Answer: c
Explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

Learn Big Data Analytics from Scratch

Understand the Concept of Big Data Analytics in Detail [Videos + Notes] Click Here!

Point out the correct statement.
a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function
c) Reduce Task in MapReduce is performed using the Map() function
d) All of the mentioned
Answer: a
Explanation: This feature of MapReduce is “Data Locality”.

___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer: a
Explanation: Map Task in MapReduce is performed using the Map() function.

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: Reduce function collates the work and resolves the results.

Point out the wrong statement.
a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on <key, value> pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
d) None of the mentioned
Answer: d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned
Answer: a
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer: b
Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

__________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
Answer: a
Explanation: Maps are the individual tasks that transform input records into intermediate records.

The number of maps is usually driven by the total size of ____________
a) inputs
b) outputs
c) tasks
d) None of the mentioned
Answer: a
Explanation: Total size of inputs means the total number of blocks of the input files.

_________ is the default Partitioner for partitioning key space.
a) HashPar
b) Partitioner
c) HashPartitioner
d) None of the mentioned
Answer: c
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.

Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.