Introduction

While learning Goroutine and Channel, I found there is something interesting about pre-allocate and allocate in Go. That is the reason I wrote this article xD

TLDR: Use pre-allocate when you have fixed size of data. Example for fixed size of data:

// Create slice with fixed size
newSlice := make([]int, 0, 10)
// Create map with fixed size
newMap := make(map[string]int, 10)

Story

I was just comparing two functions to see which one is cleaner (preparing for new article related to the lazy-hole project)

First example:

func slowCheck(servers []string) []Server {
    var result []Server
    for _, server := range servers {
        result = append(result, healthCheck(server))
    }

    return result
}

Second example:

func checkAllSlow(names []string) []Server {
  var results []Server
  for _, name := range names {
    result := checkHealth(name) // Block here, wait for each host finish
    results = append(results, result)
  }
  return results
}

Both of them give the same result, just a little different syntax, nothing more. But while chat with Grok, Grok gave me a little recommendation for optimization.

func slowCheck(servers []string) []Server {
    result := make([]Server, 0, len(servers)) // pre-allocate, better than append multiple times
    for _, server := range servers {
        result = append(result, healthCheck(server))
    }
    return result
}

I was like, wow, what is that? How is it different/better? Then new question: how can I see it is better, is there any way I can see it is better? It's time to benchmark bro!!!


Difference between pre-allocate and allocate

We need to init project first!

go mod init demo
touch main_test.go # yes, only need main_test

Let me copy a simple example (generated by Grok) to show the difference between pre-allocate and allocate in Go xD

// main_test.go
package main

import (
    "testing"
)

const n = 1000000 // 1M

func BenchmarkNoPrealloc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var s []int
        for j := 0; j < n; j++ {
            s = append(s, j)
        }
    }
}

func BenchmarkPrealloc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        s := make([]int, 0, n)
        for j := 0; j < n; j++ {
            s = append(s, j)
        }
    }
}

// Run: go test -bench=. -benchmem

First test result:

go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24            68      15625847 ns/op    41678453 B/op         40 allocs/op
BenchmarkPrealloc-24            1100        997340 ns/op     8003586 B/op          1 allocs/op
PASS
ok      demo    2.294s

So before we compare, we need to understand what those values mean:

  • 68/1100: Number of iterations. Why 68/1100 iterations? No idea, I guess there is something related to Go benchmark framework that automatically adjusts the number of iterations to make it correct?
  • ns/op: nanoseconds per operation
  • B/op: bytes allocated per operation
  • allocs/op: number of allocations per operation

So we can see that Pre-alloc is faster than No Pre-alloc, it runs more times and allocates less memory.

I already converted for easy comparison:

  • ns -> ms
  • B -> MB

Note: Mallocs = memory allocations

Metric No Pre-alloc Pre-alloc Improvement
Time 15.6ms 0.99ms 15.6/0.99 ~= 15.76x faster
RAM 41MB 8MB 41MB/8MB ~= 5.12x less
Mallocs 40 allocs/op 1 allocs/op 40/1 ~= 40x less

Ok, let's change n to 1B (1000000000) and run again

Holy shiet, i have 18Gb memory free...

go test -bench=. -benchmem
signal: killed
FAIL    demo    21.587s

Better reduce to 100M LOL.

goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24             1    2057327668 ns/op    4589028512 B/op       85 allocs/op
BenchmarkPrealloc-24               5     216684883 ns/op    800006144 B/op         1 allocs/op
PASS
ok      demo    4.373s
Metric No Pre-alloc Pre-alloc Improvement
Time 2057ms 216ms 2057/216 ~= 9.5x faster
RAM 4589MB 800MB 4589MB/800MB ~= 5.7x less
Mallocs 85 allocs/op 1 allocs/op 85/1 ~= 85x less

Ok, how about 500M (500000000):

go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor            
BenchmarkNoPrealloc-24             1    10683335060 ns/op   21885821088 B/op         111 allocs/op
BenchmarkPrealloc-24               1    1084275671 ns/op    4000011168 B/op        8 allocs/op
PASS
ok      demo    12.742s
Metric No Pre-alloc Pre-alloc Improvement
Time 10683ms 1084ms 10683/1084 ~= 9.8x faster
RAM 21885MB 4000MB 21885MB/4000MB ~= 5.47x less
Mallocs 111 allocs/op 8 allocs/op 111/8 ~= 13.87x less

Hmm, only 1 iteration and 8 allocations. No idea why it needs 8 allocations. I will try to learn more about it someday, not today xD


Growth Factor of Slice

Another interesting thing to research. What I understand:

  • Each resize would equal 1 malloc
  • It will copy all data to new memory
  • Old memory/The old underlying array will be deallocated by Garbage Collector (GC)

When append grow over capacity, it will create a new underlying array with capacity larger than the old one, update the pointer to the new array. Old array have no reference to it. GC clean it up.

But there is a big advanced topic about it, if you want to know more, here you go (I'm done already, those contents are complicated AF for me! I have no fucking idea about it!)


Conclusion

Resource: https://alexanderobregon.substack.com/p/go-testing-benchmarks-and-how-they

Just remember to use pre-allocate when you have fixed size of data.


Published

Category

Coding

Tags

Contact