GoMapReduce

I wanted to see how the original Google MapReduce paper actually works under the hood, so I built a fault-tolerant version from scratch in Go.

It’s a Coordinator/Worker setup talking over gRPC. You can plug in your own Map/Reduce functions (I included wordcount and invertedindex), and the whole thing runs as an 8-worker cluster in Docker.

Getting it Running

Grab the sample data and spin up the cluster:

# Get the sample text corpus
./scripts/download_corpus.sh

# Boot the Coordinator and 8 Workers
docker-compose up --build

Pulling the Plug (Testing Fault Tolerance)

The coolest part of MapReduce is watching it heal. While the cluster is crunching data, try this:

Find a worker: docker ps | grep worker
Nuke it: docker kill <container_id>
Watch the docker-compose logs.

The Coordinator tracks heartbeats. After 10 seconds of silence, it marks the missing worker as dead. Crucially, it takes all the tasks assigned to that worker—including the completed Map tasks, since the intermediate data was sitting on that dead worker's local disk—and reassigns them to the survivors. The job just keeps going and finishes successfully.

Tests & Benchmarks

I wrote a few scripts to prove it actually works and scales:

# Compare the distributed output against a single-threaded reference implementation
./test/test_correctness.sh

# Automated chaos testing
./test/test_fault_tolerance.sh

# See how it scales from 1 to 8 workers
./bench/bench.sh 

# (You can also throw a huge file at it: ./bench/bench.sh input/large_10gb_corpus.txt)

Deviations from the Paper

No GFS: The real MapReduce relies heavily on the Google File System. I faked this by having Map workers write to local folders and serve the intermediate files over an HTTP server to the Reduce workers. The Reduce workers then just dump their final output to a shared local folder.
Simple Partitioning: Instead of writing a custom partition function, I just used Go's built-in hash/fnv to divide up the keys.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apps		apps
bench		bench
coordinator		coordinator
input		input
mapreduce		mapreduce
pb		pb
scripts		scripts
sequential		sequential
test		test
worker		worker
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoMapReduce

Getting it Running

Pulling the Plug (Testing Fault Tolerance)

Tests & Benchmarks

Deviations from the Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GoMapReduce

Getting it Running

Pulling the Plug (Testing Fault Tolerance)

Tests & Benchmarks

Deviations from the Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages