Skip to main content
back to projects

Mimori

distributed kv store written in go from scratch on raft. strong consistency, follower reads, dynamic membership, full observability stack.

GoRaftgRPCPrometheusGrafana

Mimori is a distributed key-value store I built in Go to actually understand Raft. Not "wrap an existing library and call it learning" — the consensus protocol, persistence, membership transitions, all implemented by hand against the paper.

Related writing: raft: how distributed systems actually agree.

why build this

Distributed systems blogs make Raft sound clean. Implementing it teaches you how many edge cases the diagrams quietly skip — split votes, stale leaders, log entries committed under one term and replicated under another, snapshots that arrive before the receiver has caught up. The only way to internalise the algorithm is to write the parts that hurt.

what works

  • Raft from scratch: leader election (under 500ms), log replication, snapshotting
  • Strong consistency for writes through the leader, with follower reads for scale — bounded staleness around 300ms, roughly 72× the throughput of going through Raft
  • Dynamic membership: add and remove nodes without downtime
  • Leader transfer for graceful maintenance
  • Pebble (LSM) as the on-disk storage engine
  • gRPC for all cluster RPCs, with health and readiness endpoints

interfaces

A Go client library, a CLI (mimorictl) with leader discovery and auto-retry, an embedded web dashboard, a REST + JSON API, and direct gRPC access. Pick whatever fits.

observability

Every node exposes Prometheus metrics — term, commit index, applied index, RPC latencies. The repo ships a Docker Compose stack with Prometheus + Grafana so you can actually watch the cluster: see leader churn, replication lag, the moment a partition heals.

numbers

Measured on a 3-node cluster running on Docker (M1 MacBook Pro, 8-core, 16GB):

  • ~32 ops/sec writes through Raft consensus
  • ~2,310 ops/sec stale reads from followers
  • p95 latency: 253ms write, 10ms read
  • 24-hour stress tests, plus chaos tests covering partitions and node crashes

Good enough for what it's actually for: configuration stores, metadata services, coordination tasks, learning. Not built to replace etcd in a real production deployment — and the README is honest about that.