HDFS Summary

HDFS architecture is actually similar to the GFS system. It’s a popular topic especially for big data engineer. It can also be used during the inverview as a part of the system design.

So, what is HDFS?

Hadoop distributed file system
First, it is a file system. But it is not just a file system. It’s a fault-tolernat file system and it’s designed to run on inexpensive hardware

Why we need it?

It’s faster(compared with reading data within one machine) and easy to use (with all the configuration)

How to use it?

You can use hdfs commands to use it e.g. setup input and output

What’s the architecture of the HDFS?

Similar to GFS, it has master node and slave node. Master node will store the meta data and control the file storage location. Slave node will save the blocks.
It also has the replication function and using hdfs client to send requests.