Designing for Performance

speaker: Martin Thompson event: Singapore Java User Group

** what is performance? throughput response time scalability

leaf cutter ants drop the leaf 50% of the time, could optimize that but the ants needs to be a lot smarter , but the bottleneck is bringing the leaves back to the nest

queueing theory

service time, queueing time
response time = queueing time + service time
products often only quote the service time not the response time
at high utilisation, response time increase
- a team that is highly utilised is not very responsive
pro tip: ensure you have sufficient capacity

using more processes

amdahl’s law
- a task that is 95% parallel can only have at most 20x speedup
- 5% must be done sequentially
meetings are the sequential part of a company
universal scalability law
- coherence penalty
- contention penalty
java logging frameworks doesn’t scale up

** what is clean and representative? code is the best place to capture our current understanding of a model

don’t rush to create unnecessary abstractions, do it when you see three instances

abstractions must pay for themselves

megamorphism (multiple implementations of an interface)
=> branch hell
say no to big frameworks
pro tip: abstract when you are sure of the benefits

abstraction allows us to be more precise about what we mean (see Dijkstra’s quote)

** implementing efficient models memory subsystem assumptions

temporal bet
- caches
spatial bet
- cache line
pattern bet
- predictable memory access pattern

playing well with memory can have big wins

coupling and cohesion

feature envy, getting the fields from a different object
train wreck
pro tip: respect the locality of reference
B+ trees have fewer cache misses

relationships

name the relationships
ordered or unordered
fifo or lifo

batching

amortise the expensive costs
writing to disk is expensive, write a bunch at once
pro tip: batching processing is not just for offline

branches

don’t pass null around and require checks
don’t check for isEmpty before iterating
pro tip: respect the principle of least surprise

loops

write it once, work on something else, reread it again
small inner loops can fit into the instruction cache on the CPU
- avoids the cost of decoding the x86 instruction
pro top: craft major loops like good prose

composition

“inlining is THE optimisation.” – Cliff Click
prefer smaller methods
single reponsibility
pro tips: small atoms can combine to build anything

APIs allow the caller to pass in the collection to be used, rather than allocate a new collection and return it

data

have a predictable access patern
storing in column format
pro tip: embrace Set theory and FP techniques

** why performance test?

use histograms, not averages
- HdrHistograms
coordinated omission
JMH for micro benchmarks
CPU performance counters
performance test as part of CI
build telemetry into production systems
- see how F1 does it
- see performance counters in Aeron
different profiler give different perspectives
- safepoint sampling based is good for IO bottlenecks
- honest profiler / misson control is good for hot CPU, missed blocking or waiting threads
- mission control helps to find excessive allocations

Melvin's digital garden

Designing for Performance

Links to this note