Introduction

While learning Goroutine and Channel, I found there is something interesting about pre-allocate and allocate in Go. That is the reason I wrote this article xD

TLDR: Use pre-allocate when you have fixed size of data. Example for fixed size of data:

// Create slice with fixed size
newSlice := make([]int, 0, 10)
// Create map with fixed size
newMap := make(map[string]int, 10)

Story

I was just comparing two functions to see which one is cleaner (preparing for new article related to the lazy-hole project)

First example:

func slowCheck(servers []string) []Server {
    var result []Server
    for _, server := range servers {
        result = append(result, healthCheck(server))
    }

    return result
}

Second example:

func checkAllSlow(names []string) []Server {
  var results []Server
  for _, name := range names {
    result := checkHealth(name) // Block here, wait for each host finish
    results = append(results, result)
  }
  return results
}

Both of them give the same result, just a little different syntax, nothing more. But while chat with Grok, Grok gave me a little recommendation for optimization.

func slowCheck(servers []string) []Server {
    result := make([]Server, 0, len(servers)) // pre-allocate, better than append multiple times
    for _, server := range servers {
        result = append(result, healthCheck(server))
    }
    return result
}

I was like, wow, what is that? How is it different/better? Then new question: how can I see it is better, is there any way I can see it is better? It's time to benchmark bro!!!

Difference between pre-allocate and allocate

We need to init project first!

go mod init demo
touch main_test.go # yes, only need main_test

Let me copy a simple example (generated by Grok) to show the difference between pre-allocate and allocate in Go xD

// main_test.go
package main

import (
    "testing"
)

const n = 1000000 // 1M

func BenchmarkNoPrealloc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var s []int
        for j := 0; j < n; j++ {
            s = append(s, j)
        }
    }
}

func BenchmarkPrealloc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        s := make([]int, 0, n)
        for j := 0; j < n; j++ {
            s = append(s, j)
        }
    }
}

// Run: go test -bench=. -benchmem

First test result:

go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24            68      15625847 ns/op    41678453 B/op         40 allocs/op
BenchmarkPrealloc-24            1100        997340 ns/op     8003586 B/op          1 allocs/op
PASS
ok      demo    2.294s

So before we compare, we need to understand what those values mean:

68/1100: Number of iterations. Why 68/1100 iterations? No idea, I guess there is something related to Go benchmark framework that automatically adjusts the number of iterations to make it correct?
ns/op: nanoseconds per operation
B/op: bytes allocated per operation
allocs/op: number of allocations per operation

So we can see that Pre-alloc is faster than No Pre-alloc, it runs more times and allocates less memory.

I already converted for easy comparison:

ns -> ms
B -> MB

Note: Mallocs = memory allocations

Metric	No Pre-alloc	Pre-alloc	Improvement
Time	15.6ms	0.99ms	15.6/0.99 ~= 15.76x faster
RAM	41MB	8MB	41MB/8MB ~= 5.12x less
Mallocs	40 allocs/op	1 allocs/op	40/1 ~= 40x less

Ok, let's change n to 1B (1000000000) and run again

Holy shiet, i have 18Gb memory free...

go test -bench=. -benchmem
signal: killed
FAIL    demo    21.587s

Better reduce to 100M LOL.

goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24             1    2057327668 ns/op    4589028512 B/op       85 allocs/op
BenchmarkPrealloc-24               5     216684883 ns/op    800006144 B/op         1 allocs/op
PASS
ok      demo    4.373s

Metric	No Pre-alloc	Pre-alloc	Improvement
Time	2057ms	216ms	2057/216 ~= 9.5x faster
RAM	4589MB	800MB	4589MB/800MB ~= 5.7x less
Mallocs	85 allocs/op	1 allocs/op	85/1 ~= 85x less

Ok, how about 500M (500000000):

go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24             1    10683335060 ns/op   21885821088 B/op         111 allocs/op
BenchmarkPrealloc-24               1    1084275671 ns/op    4000011168 B/op        8 allocs/op
PASS
ok      demo    12.742s

Metric	No Pre-alloc	Pre-alloc	Improvement
Time	10683ms	1084ms	10683/1084 ~= 9.8x faster
RAM	21885MB	4000MB	21885MB/4000MB ~= 5.47x less
Mallocs	111 allocs/op	8 allocs/op	111/8 ~= 13.87x less

Hmm, only 1 iteration and 8 allocations. No idea why it needs 8 allocations. I will try to learn more about it someday, not today xD

Growth Factor of Slice

Another interesting thing to research. What I understand:

Each resize would equal 1 malloc
It will copy all data to new memory
Old memory/The old underlying array will be deallocated by Garbage Collector (GC)

When append grow over capacity, it will create a new underlying array with capacity larger than the old one, update the pointer to the new array. Old array have no reference to it. GC clean it up.

But there is a big advanced topic about it, if you want to know more, here you go (I'm done already, those contents are complicated AF for me! I have no fucking idea about it!)

Conclusion

Resource: https://alexanderobregon.substack.com/p/go-testing-benchmarks-and-how-they

Just remember to use pre-allocate when you have fixed size of data.

Difference between pre-allocate and allocate in Go

Introduction

Story

Difference between pre-allocate and allocate

Growth Factor of Slice

Conclusion

Published

Category

Tags

Contact