I wanted to see how the original Google MapReduce paper actually works under the hood, so I built a fault-tolerant version from scratch in Go.
It’s a Coordinator/Worker setup talking over gRPC. You can plug in your own Map/Reduce functions (I included wordcount and invertedindex), and the whole thing runs as an 8-worker cluster in Docker.
Grab the sample data and spin up the cluster:
# Get the sample text corpus
./scripts/download_corpus.sh
# Boot the Coordinator and 8 Workers
docker-compose up --build
The coolest part of MapReduce is watching it heal. While the cluster is crunching data, try this:
- Find a worker:
docker ps | grep worker - Nuke it:
docker kill <container_id> - Watch the
docker-composelogs.
The Coordinator tracks heartbeats. After 10 seconds of silence, it marks the missing worker as dead. Crucially, it takes all the tasks assigned to that worker—including the completed Map tasks, since the intermediate data was sitting on that dead worker's local disk—and reassigns them to the survivors. The job just keeps going and finishes successfully.
I wrote a few scripts to prove it actually works and scales:
# Compare the distributed output against a single-threaded reference implementation
./test/test_correctness.sh
# Automated chaos testing
./test/test_fault_tolerance.sh
# See how it scales from 1 to 8 workers
./bench/bench.sh
# (You can also throw a huge file at it: ./bench/bench.sh input/large_10gb_corpus.txt)
- No GFS: The real MapReduce relies heavily on the Google File System. I faked this by having Map workers write to local folders and serve the intermediate files over an HTTP server to the Reduce workers. The Reduce workers then just dump their final output to a shared local folder.
- Simple Partitioning: Instead of writing a custom partition function, I just used Go's built-in
hash/fnvto divide up the keys.