Introduction
While learning Goroutine and Channel, I found there is something interesting about pre-allocate and allocate in Go. That is the reason I wrote this article xD
TLDR: Use pre-allocate when you have fixed size of data. Example for fixed size of data:
// Create slice with fixed size
newSlice := make([]int, 0, 10)
// Create map with fixed size
newMap := make(map[string]int, 10)
Story
I was just comparing two functions to see which one is cleaner (preparing for new article related to the lazy-hole project)
First example:
func slowCheck(servers []string) []Server {
var result []Server
for _, server := range servers {
result = append(result, healthCheck(server))
}
return result
}
Second example:
func checkAllSlow(names []string) []Server {
var results []Server
for _, name := range names {
result := checkHealth(name) // Block here, wait for each host finish
results = append(results, result)
}
return results
}
Both of them give the same result, just a little different syntax, nothing more. But while chat with Grok, Grok gave me a little recommendation for optimization.
func slowCheck(servers []string) []Server {
result := make([]Server, 0, len(servers)) // pre-allocate, better than append multiple times
for _, server := range servers {
result = append(result, healthCheck(server))
}
return result
}
I was like, wow, what is that? How is it different/better? Then new question: how can I see it is better, is there any way I can see it is better? It's time to benchmark bro!!!
Difference between pre-allocate and allocate
We need to init project first!
go mod init demo
touch main_test.go # yes, only need main_test
Let me copy a simple example (generated by Grok) to show the difference between pre-allocate and allocate in Go xD
// main_test.go
package main
import (
"testing"
)
const n = 1000000 // 1M
func BenchmarkNoPrealloc(b *testing.B) {
for i := 0; i < b.N; i++ {
var s []int
for j := 0; j < n; j++ {
s = append(s, j)
}
}
}
func BenchmarkPrealloc(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, n)
for j := 0; j < n; j++ {
s = append(s, j)
}
}
}
// Run: go test -bench=. -benchmem
First test result:
go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor
BenchmarkNoPrealloc-24 68 15625847 ns/op 41678453 B/op 40 allocs/op
BenchmarkPrealloc-24 1100 997340 ns/op 8003586 B/op 1 allocs/op
PASS
ok demo 2.294s
So before we compare, we need to understand what those values mean:
- 68/1100: Number of iterations. Why 68/1100 iterations? No idea, I guess there is something related to Go benchmark framework that automatically adjusts the number of iterations to make it correct?
- ns/op: nanoseconds per operation
- B/op: bytes allocated per operation
- allocs/op: number of allocations per operation
So we can see that Pre-alloc is faster than No Pre-alloc, it runs more times and allocates less memory.
I already converted for easy comparison:
- ns -> ms
- B -> MB
Note: Mallocs = memory allocations
| Metric | No Pre-alloc | Pre-alloc | Improvement |
|---|---|---|---|
| Time | 15.6ms | 0.99ms | 15.6/0.99 ~= 15.76x faster |
| RAM | 41MB | 8MB | 41MB/8MB ~= 5.12x less |
| Mallocs | 40 allocs/op | 1 allocs/op | 40/1 ~= 40x less |
Ok, let's change n to 1B (1000000000) and run again
Holy shiet, i have 18Gb memory free...
go test -bench=. -benchmem
signal: killed
FAIL demo 21.587s
Better reduce to 100M LOL.
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor
BenchmarkNoPrealloc-24 1 2057327668 ns/op 4589028512 B/op 85 allocs/op
BenchmarkPrealloc-24 5 216684883 ns/op 800006144 B/op 1 allocs/op
PASS
ok demo 4.373s
| Metric | No Pre-alloc | Pre-alloc | Improvement |
|---|---|---|---|
| Time | 2057ms | 216ms | 2057/216 ~= 9.5x faster |
| RAM | 4589MB | 800MB | 4589MB/800MB ~= 5.7x less |
| Mallocs | 85 allocs/op | 1 allocs/op | 85/1 ~= 85x less |
Ok, how about 500M (500000000):
go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: demo
cpu: AMD Ryzen 9 5900X 12-Core Processor
BenchmarkNoPrealloc-24 1 10683335060 ns/op 21885821088 B/op 111 allocs/op
BenchmarkPrealloc-24 1 1084275671 ns/op 4000011168 B/op 8 allocs/op
PASS
ok demo 12.742s
| Metric | No Pre-alloc | Pre-alloc | Improvement |
|---|---|---|---|
| Time | 10683ms | 1084ms | 10683/1084 ~= 9.8x faster |
| RAM | 21885MB | 4000MB | 21885MB/4000MB ~= 5.47x less |
| Mallocs | 111 allocs/op | 8 allocs/op | 111/8 ~= 13.87x less |
Hmm, only 1 iteration and 8 allocations. No idea why it needs 8 allocations. I will try to learn more about it someday, not today xD
Growth Factor of Slice
Another interesting thing to research. What I understand:
- Each resize would equal 1 malloc
- It will copy all data to new memory
- Old memory/The old underlying array will be deallocated by Garbage Collector (GC)
When append grow over capacity, it will create a new underlying array with capacity larger than the old one, update the pointer to the new array. Old array have no reference to it. GC clean it up.
But there is a big advanced topic about it, if you want to know more, here you go (I'm done already, those contents are complicated AF for me! I have no fucking idea about it!)
Conclusion
Resource: https://alexanderobregon.substack.com/p/go-testing-benchmarks-and-how-they
Just remember to use pre-allocate when you have fixed size of data.