204 lines
4.5 KiB
Plaintext
204 lines
4.5 KiB
Plaintext
|
High Performance Code in GO
|
||
|
GoTO, August 2019
|
||
|
Tags: GraphQL, API, GoLang, Postgres
|
||
|
|
||
|
Vikram Rangnekar
|
||
|
https://twitter.com/dosco
|
||
|
|
||
|
* About me
|
||
|
|
||
|
Co-founder of *movremote.com* a platform to connect developers with Silicon Valley
|
||
|
companies hiring remote.
|
||
|
|
||
|
Previously worked on Platform, Frontend and Ads @ Linkedin building the distributed targeting and serving infrastructure behind Linkedin Ads.
|
||
|
|
||
|
Also currently building Super Graph an open source instant GraphQL engine for Postgres and Rails. Written in GO
|
||
|
|
||
|
MOV Remote
|
||
|
.link https://movremote.com
|
||
|
|
||
|
Super Graph
|
||
|
.link https://supergraph.dev
|
||
|
|
||
|
* Why does it matter?
|
||
|
|
||
|
- Computer are not getting dramatically faster
|
||
|
- Our software is getting slower
|
||
|
- Demands on our software are increasing
|
||
|
- Scale of internet poducts is accelerating
|
||
|
- Faster = More money (For you)
|
||
|
|
||
|
* What does high performance mean?
|
||
|
|
||
|
- Code that runs fast (relative)
|
||
|
- Minimizes I/O latency
|
||
|
- Efficient in terms of GC
|
||
|
|
||
|
|
||
|
"Premature optimization is the root of all evil (or at least most of it) in programming."
|
||
|
-- Donald Knuth
|
||
|
|
||
|
"Measure twice cut once"
|
||
|
-- Someone
|
||
|
|
||
|
* Code that runs fast
|
||
|
|
||
|
1. Algorithm choices
|
||
|
2. Rewrite in GO
|
||
|
3. Reuse Memory
|
||
|
4. Parallelize I/O
|
||
|
4. Keep it simple
|
||
|
|
||
|
* Benchmarking
|
||
|
|
||
|
* $ benchcmp bench.1 bench.2
|
||
|
|
||
|
.image https://pbs.twimg.com/media/D8_uRFWU0AUdWkM?format=jpg&name=large _ 970
|
||
|
|
||
|
* Howto benchmark
|
||
|
|
||
|
Single threaded
|
||
|
|
||
|
func BenchmarkYourFunc(b *testing.B) {
|
||
|
for n := 0; n < b.N; n++ {
|
||
|
_, err := yourFunction(data)
|
||
|
...
|
||
|
}
|
||
|
}
|
||
|
|
||
|
Parallel
|
||
|
|
||
|
func BenchmarkYourFuncP(b *testing.B) {
|
||
|
|
||
|
b.RunParallel(func(pb *testing.PB) {
|
||
|
for pb.Next() {
|
||
|
_, err := yourFunction(data)
|
||
|
...
|
||
|
}
|
||
|
})
|
||
|
}
|
||
|
|
||
|
* Profileing Your Code
|
||
|
|
||
|
go test -bench=. -benchmem -memprofile mem.out -run=XXX
|
||
|
go tool pprof -cum mem.out
|
||
|
|
||
|
Get a nice command line
|
||
|
|
||
|
pkg: github.com/dosco/super-graph/psql
|
||
|
BenchmarkCompile-8 100000 15138 ns/op 3553 B/op 35 allocs/op
|
||
|
BenchmarkCompileParallel-8 300000 4760 ns/op 3583 B/op 35 allocs/op
|
||
|
PASS
|
||
|
ok github.com/dosco/super-graph/psql 3.174s
|
||
|
Type: alloc_space
|
||
|
Time: Aug 21, 2019 at 11:56am (EDT)
|
||
|
Entering interactive mode (type "help" for commands, "o" for options)
|
||
|
(pprof)
|
||
|
|
||
|
Powerful commands
|
||
|
|
||
|
top, web, png, pdf, ... and more
|
||
|
|
||
|
* Top - Shows you the top allocating functions
|
||
|
|
||
|
(pprof) top
|
||
|
Showing nodes accounting for 1.07GB, 77.92% of 1.37GB total
|
||
|
Showing top 10 nodes out of 34
|
||
|
flat flat% sum% cum cum%
|
||
|
0.01GB 0.89% 0.89% 1.11GB 80.77% github.com/[...]/qcode.(*Compiler).Compile
|
||
|
0 0% 0.89% 1.02GB 74.22% github.com/dosco/super-graph/
|
||
|
0.52GB 37.74% 38.63% 0.94GB 68.50% github.com/[...]/qcode.(*Compiler).compileQuery
|
||
|
0.54GB 39.29% 77.92% 0.54GB 39.29% github.com/dosco/super-graph/util.NewStack
|
||
|
|
||
|
Digging deeper
|
||
|
|
||
|
(pprof) top .compileQuery
|
||
|
focus=.compileQuery
|
||
|
Showing nodes accounting for 1006.59MB, 69.36% of 1451.18MB total
|
||
|
Showing top 10 nodes out of 26
|
||
|
flat flat% sum% cum cum%
|
||
|
0 0% 0% 1006.05MB 69.33% github.com/[...]/qcode.(*Compiler).Compile
|
||
|
579.44MB 39.93% 39.93% 1006.05MB 69.33% github.com/[...]/qcode.(*Compiler).compileQuery
|
||
|
|
||
|
* Cool Graphs
|
||
|
|
||
|
.image https://matoski.com/article/golang-profiling-flamegraphs/cpu-profile-graph-001.png 500 _
|
||
|
|
||
|
* Reducing Allocations - Part 1
|
||
|
|
||
|
Pre-allocate
|
||
|
|
||
|
m := make(map[string]someStruct{}, len(whatever))
|
||
|
mp := &m[i]
|
||
|
|
||
|
Work with bytes if possible
|
||
|
|
||
|
inlineToLower(&value) instead of bytes.ToLower(value)
|
||
|
|
||
|
Reuse Memory
|
||
|
|
||
|
var nodePool = sync.Pool{
|
||
|
New: func() interface{} { return new(Node) },
|
||
|
}
|
||
|
|
||
|
Use Builders
|
||
|
|
||
|
var b strings.Builder
|
||
|
b.WriteString("hello ");
|
||
|
b.WriteString("world")
|
||
|
|
||
|
* Reducing Allocations - Part 2
|
||
|
|
||
|
Use streaming (io.Reader and io.Writer)
|
||
|
|
||
|
r := strings.NewReader("some io.Reader stream to be read\n")
|
||
|
|
||
|
_, err := io.Copy(os.Stdout, r);
|
||
|
|
||
|
Allocate Together
|
||
|
|
||
|
type Node struct {
|
||
|
Children []Child
|
||
|
childA [5]Child
|
||
|
}
|
||
|
n := Node{}
|
||
|
n.Children = n.childA[:0]
|
||
|
|
||
|
Use 'Append' functions
|
||
|
|
||
|
strconv.AppendInt(b10, 42, 10) instead of strconv.FormatInt(42, 10)
|
||
|
|
||
|
* Reducing Allocations - Part 3
|
||
|
|
||
|
Use 'unsafe' if you know what you're doing
|
||
|
|
||
|
func bytesToString(b []byte) string {
|
||
|
return *(*string)(unsafe.Pointer(&b))
|
||
|
}
|
||
|
|
||
|
* Squeezing out more performance
|
||
|
|
||
|
Avoid reflection use generators
|
||
|
|
||
|
Inlined Assembly (Crazy)
|
||
|
|
||
|
// add.go
|
||
|
package main
|
||
|
|
||
|
import "fmt"
|
||
|
|
||
|
func add(x, y int64) int64
|
||
|
|
||
|
func main() {
|
||
|
fmt.Println(add(2, 3))
|
||
|
}
|
||
|
|
||
|
// add_amd64.s
|
||
|
TEXT ·add(SB),NOSPLIT,$0
|
||
|
MOVQ x+0(FP), BX
|
||
|
MOVQ y+8(FP), BP
|
||
|
ADDQ BP, BX
|
||
|
MOVQ BX, ret+16(FP)
|
||
|
RET
|
||
|
|