Hedvig: Architecture

From Define Wiki
Jump to navigation Jump to search

Introduction

Hedvig architecture is based on Cassandra open source distributed database management system, which was originally designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Hedvig is highly scalable software defined distributed storage which could be assessed via multiple industry standard interfaces.

Terms and Definitions

  • Hedvig Storage Proxy - A lightweight storage access layer that runs as a guest VM or Docker container to provide storage resources to compute environments.
  • Hedvig Virtual Disk - The storage disk volume abstraction presented by the Hedvig Distributed Storage Platform.
  • Hedvig deploy - Servers/VM which is responsible for deploying storage and managing services
  • storage nodes - Servers hosting storage.

Architecture

Hedvig Storage services

These are responsible for transforming the storage hardware to be served as virtual storage. The Hedvig Storage Service operates as an optimized key value store and is responsible for writing data directly to the storage media. It captures all random writes into the system, sequentially ordering them into a log structured format that flushes sequential writes to disk. This provides the ability to ingest data at a high rate, as well as optimize the disk utilization.

The Hedvig Storage Service consist of two primary processes

1. Data process. : This process define the layout of the data on raw disk , Hedvig storage have mainly two units

1. storage pools : logical grouping of 3 disk in a node.
2. containers : 16 gb chunk of data. (Mostly on SSDs)

Virtual disk are stored over container , each residing on unique storage pool on each node.

2. Metadata process. Responsible for how /where data have to be read and write. Number of Replica , Replica location, storage pools containers all information is handled by metadata process. Metadata is present across all the nodes and can be cached at proxies for better access.

Hedvig Storage proxies

Hedvig replication

  • Can create upto 6 repication of data across 4 active data centers
  • Single Vdisk configured with 3 replication factor , has successfull write only when storage 2 storage node send acknowledgment.
  • Wide-stripe policy for failed disk . This background process is independent to primary I/O , hence no performance implication
  • Rack Aware Virtual Disk policy : uses the third octet to determine the Rack.
  • Datacenter Aware policy : uses the second octet to determine the datacenter.


Hedvig Deduplication

Hedvig provides inline ,global deduplication to optimize I/O. Data reduction efficiency benefits from all data deduplicated and stored across the cluster.