This project is a Distributed File Storage System inspired by the Google File System (GFS) paper. The motive of this project is to create a file system which will support storing a file, appending an existing file and deleting a file. It should also be fault-tolerant against network partitions and node failure, Disk failure, Namenode Crash.

High Level View

In this section we will discuss the high level design of this system. The system consists of three major components forming one file storage cluster. Each of these has its specific role in the system. System will also use the external components like Namenode will use the Log Server for persistent storage , ElasticSearch and Kibana for distributed log management, APM server for performance monitoring. Each of these component is explained below :

  1. Client : This is a user facing component of the system. User will only connect with the client which will be responsible for connecting to Namenode and Datanode, to fetch and store data. It is basically an abstraction on whole system to make it easy for users to use the system. Primary tasks of client are listed below :

    1. Append file: It will support the Append operation on the already stored file. This functionality is missing in phase 1 implementation more details about this in future.
    2. Store file : It will receive file from user and then request target Datanodes from the Namenode. For the file storage and then connect to Datanode sent by Namenode to store the data, It is responsible for dividing the file into chunks based on the offsets provided by the Namenode and transferring these chunks to assigned Datanode.
    3. Fetch File : Second task is to fetch the file from system as chunks and provide it as a single file to the user, it will connect to the Namenode to fetch the details and location of the chunks of stored file and then it contacts to the Datanodes to fetch chunks, merge them into single file and return it to the user.
    4. Delete File : Client can also request the Namenode to delete a file. In this task clients only responsibility is to send request to Namenode after that it is Namenode’s responsibility to remove all the chunks associated with file.
    5. Retry on failure: It is also responsibility of the client to retry in case recoverable/retry-able error occur in above tasks, the retry mechanism used is based on the policy (Currently uses exponential backoff) selected.

    High level view of system

    High level view of system

  2. Namenode : This is primary component of the whole system, it is responsible for maintaining entire metadata about the data nodes , stored files location, providing client with locations, checking the heartbeat of the Datanodes. It does not store actual file data but maintains metadata. Persistent state is managed via the Log Server. It is responsible for load distribution across Datanodes and maintaining the replication factor (handle under and over replication blocks). Primary tasks of Namenode include below tasks :

    1. Provide data nodes to store : This component is responsible for serving the client with data node list for file storage. This is done based on the current Datanode status and load, It uses configurable policies to choose best candidates among all.
    2. List blocks location of file : Client can request Namenode about the location of the stored chunks. It is done based on the metadata maintained by it using the state sync messages, and current load on the Datanode to avoid the hotspots.
    3. Maintaining the replication factor : It is responsible for working as watchdog to check if any block is under replicated/ over replicated, it will connect to data nodes to order them for replication or deleting the block in order to handle both under and over replication.
    4. Maintaining metadata : It will maintain the metadata about current status of cluster i.e. which Datanode is under load, where is chunk located, remove node from location if Datanode goes offline etc.
  3. Datanode : This component is primarily responsible for storing the data. it will respond to client requests to store chunks and is also responsible for piping the data stream received from the client to other peers in order to do replication. Primary task of Datanode include below tasks :

    1. Store data : Primary task of this component is to store the data. It stores data in two cases one when client sends data or from other data node (during the data piping for replication). This is done using the TCP based protocol.
    2. Delete data : Datanode can delete the stored chunk. It will do this only after receiving a delete chunk message from Namenode, The client cannot directly request Datanode to delete the chunk this request must go through the Namenode. Namenode can also instruct the Datanode to delete the chunk in case of over replication.
    3. Heartbeat messages : It is responsible for sending heartbeat message to the Namenode to prove its liveliness to the Namenode.
    4. State sync message: Datanode also share its current state to Namenode which helps Namenode to maintain metadata and take decisions based on this. This message includes stored chunks, available storage and details regarding the current load on Datanode.
    5. Piping data : When Datanode receives the chunk store request (gRPC) it also receives details of next peers to which this chunk should be replicated based on this request Datanode will forward the data stream to peer Datanodes for replication..

The above high-level view outlines the core responsibilities of each component that should be fulfilled in order to system to work.

Implementation of this project is done in multiple phases currently(29/07/25) i have completed 1st phase. All the section in the below are based this phase implementation.

Tech Stack

For this project I am using Rust as a primary language for programming. Project uses both gRPC, custom TCP protocol for communication. I have used multiple Rust crates in this project, most important among them is described below:

Deep Dive: How Things Work

In this section we will discuss high level view of system and how different tasks are supported by system. All the discussion in this section is based on the current (phase 1) implementation of the Whispering Woods. It will provide us with the clarity how each component of system interact with each other. Currently we are using CLI as the medium for sending instructions from the user to the client. The user can type the instruction with arguments to client CLI and client will complete task as instructed by user. Complete details related to each operation is discussed below: