High Performance Code in GO GoTO, August 2019 Tags: GraphQL, API, GoLang, Postgres Vikram Rangnekar https://twitter.com/dosco * About me Co-founder of *movremote.com* a platform to connect developers with Silicon Valley companies hiring remote. Previously worked on Platform, Frontend and Ads @ Linkedin building the distributed targeting and serving infrastructure behind Linkedin Ads. Also currently building Super Graph an open source instant GraphQL engine for Postgres and Rails. Written in GO MOV Remote .link https://movremote.com Super Graph .link https://supergraph.dev * Why does it matter? - Computer are not getting dramatically faster - Our software is getting slower - Demands on our software are increasing - Scale of internet poducts is accelerating - Faster = More money (For you) * What does high performance mean? - Code that runs fast (relative) - Minimizes I/O latency - Efficient in terms of GC "Premature optimization is the root of all evil (or at least most of it) in programming." -- Donald Knuth "Measure twice cut once" -- Someone * Code that runs fast 1. Algorithm choices 2. Rewrite in GO 3. Reuse Memory 4. Parallelize I/O 4. Keep it simple * Benchmarking * $ benchcmp bench.1 bench.2 .image https://pbs.twimg.com/media/D8_uRFWU0AUdWkM?format=jpg&name=large _ 970 * Howto benchmark Single threaded func BenchmarkYourFunc(b *testing.B) { for n := 0; n < b.N; n++ { _, err := yourFunction(data) ... } } Parallel func BenchmarkYourFuncP(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { _, err := yourFunction(data) ... } }) } * Profileing Your Code go test -bench=. -benchmem -memprofile mem.out -run=XXX go tool pprof -cum mem.out Get a nice command line pkg: github.com/dosco/super-graph/psql BenchmarkCompile-8 100000 15138 ns/op 3553 B/op 35 allocs/op BenchmarkCompileParallel-8 300000 4760 ns/op 3583 B/op 35 allocs/op PASS ok github.com/dosco/super-graph/psql 3.174s Type: alloc_space Time: Aug 21, 2019 at 11:56am (EDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) Powerful commands top, web, png, pdf, ... and more * Top - Shows you the top allocating functions (pprof) top Showing nodes accounting for 1.07GB, 77.92% of 1.37GB total Showing top 10 nodes out of 34 flat flat% sum% cum cum% 0.01GB 0.89% 0.89% 1.11GB 80.77% github.com/[...]/qcode.(*Compiler).Compile 0 0% 0.89% 1.02GB 74.22% github.com/dosco/super-graph/ 0.52GB 37.74% 38.63% 0.94GB 68.50% github.com/[...]/qcode.(*Compiler).compileQuery 0.54GB 39.29% 77.92% 0.54GB 39.29% github.com/dosco/super-graph/util.NewStack Digging deeper (pprof) top .compileQuery focus=.compileQuery Showing nodes accounting for 1006.59MB, 69.36% of 1451.18MB total Showing top 10 nodes out of 26 flat flat% sum% cum cum% 0 0% 0% 1006.05MB 69.33% github.com/[...]/qcode.(*Compiler).Compile 579.44MB 39.93% 39.93% 1006.05MB 69.33% github.com/[...]/qcode.(*Compiler).compileQuery * Cool Graphs .image https://matoski.com/article/golang-profiling-flamegraphs/cpu-profile-graph-001.png 500 _ * Reducing Allocations - Part 1 Pre-allocate m := make(map[string]someStruct{}, len(whatever)) mp := &m[i] Work with bytes if possible inlineToLower(&value) instead of bytes.ToLower(value) Reuse Memory var nodePool = sync.Pool{ New: func() interface{} { return new(Node) }, } Use Builders var b strings.Builder b.WriteString("hello "); b.WriteString("world") * Reducing Allocations - Part 2 Use streaming (io.Reader and io.Writer) r := strings.NewReader("some io.Reader stream to be read\n") _, err := io.Copy(os.Stdout, r); Allocate Together type Node struct { Children []Child childA [5]Child } n := Node{} n.Children = n.childA[:0] Use 'Append' functions strconv.AppendInt(b10, 42, 10) instead of strconv.FormatInt(42, 10) * Reducing Allocations - Part 3 Use 'unsafe' if you know what you're doing func bytesToString(b []byte) string { return *(*string)(unsafe.Pointer(&b)) } * Squeezing out more performance Avoid reflection use generators Inlined Assembly (Crazy) // add.go package main import "fmt" func add(x, y int64) int64 func main() { fmt.Println(add(2, 3)) } // add_amd64.s TEXT ·add(SB),NOSPLIT,$0 MOVQ x+0(FP), BX MOVQ y+8(FP), BP ADDQ BP, BX MOVQ BX, ret+16(FP) RET