I understood Go’s concurrency model conceptually. Goroutines are lightweight threads, channels handle communication, the sync package provides synchronization. But I had never compared the patterns side by side with code and benchmarks.

I decided to implement and benchmark them myself. One project, three approaches: mutex, channel, and lock-free.

Mutex

The first implementation was a concurrency-safe map using sync.RWMutex. Writes use Lock(), reads use RLock(), allowing multiple goroutines to access the map concurrently.

After implementing it, I benchmarked against Go’s standard sync.Map. I created three scenarios: contended writes on the same key, disjoint writes across different keys per goroutine, and a read-heavy workload at 90% reads.

On contended same-key writes, both performed similarly. But on disjoint key writes, sync.Map was 2-3x faster, and on read-heavy workloads, 33% faster. The results matched exactly what the sync.Map documentation states as its optimization conditions. Conversely, on concentrated same-key writes, sync.Map only used more memory with no performance advantage.

Channel

For the channel pattern, I implemented data flow control. FanOut distributes data from one input channel to multiple output channels. It uses a select statement to send to whichever output channel is ready first.

TurnOut routes from multiple inputs to multiple outputs while handling shutdown signals through a quit channel. Including the quit channel in the select statement lets the loop handle both data processing and graceful shutdown naturally. I also implemented the cleanup process of draining remaining data after closing channels.

Generics ([T any]) made the implementations reusable across types.

Lock-free

This was the most interesting part. I implemented two lock-free patterns.

SpinningCAS implements a lock using atomic.CompareAndSwapInt32. When another goroutine holds the lock, instead of entering a wait queue, it spins by repeating the CAS operation. runtime.Gosched() proved critical here. Without yielding the CPU during the spin loop, other goroutines couldn’t execute, creating a near-deadlock situation. One line of code changed the entire behavior.

I benchmarked SpinningCAS against the standard sync.Mutex. On a high-contention scenario incrementing a single shared variable, SpinningCAS was about 7x faster. Mutex carries the overhead of parking and unparking goroutines in a wait queue, while CAS retries immediately. The numbers confirmed that spinning wins on short critical sections.

TicketStorage addresses cases requiring ordering guarantees. atomic.AddUint64 issues ticket numbers, and each goroutine spins with CAS until its ticket comes up. It guarantees fairness (FIFO) but trades off longer wait times under high contention.

Retrospective

Understanding concurrency patterns conceptually and experiencing them through benchmarks were different things.

The biggest lesson was benchmark methodology. I initially wrote benchmarks that spawned a fixed number of goroutines, and results varied between runs. Switching to Go’s b.RunParallel, which lets the framework auto-calibrate iteration counts, stabilized results and made pattern differences clear. Benchmark code accuracy determines result quality.

sync.Map is not “always a faster map” — its advantage appeared only under the conditions stated in the official documentation. SpinningCAS dominated Mutex on short critical sections, but longer sections or lower contention could reverse the result. Each tool has optimal conditions, and verifying those conditions is what benchmarks are for.

The experience of runtime.Gosched() changing behavior with a single line also stayed with me. In concurrent code, a theoretically correct implementation can behave differently in practice.

Knowing concurrency patterns conceptually versus implementing them and facing the numbers. This project confirmed the difference.

References