Melvin's digital garden

Designing for Performance

speaker: Martin Thompson event: Singapore Java User Group

** what is performance? throughput response time scalability

leaf cutter ants drop the leaf 50% of the time, could optimize that but the ants needs to be a lot smarter , but the bottleneck is bringing the leaves back to the nest

queueing theory

  • service time, queueing time
  • response time = queueing time + service time
  • products often only quote the service time not the response time
  • at high utilisation, response time increase
    • a team that is highly utilised is not very responsive
  • pro tip: ensure you have sufficient capacity

using more processes

  • amdahl’s law
    • a task that is 95% parallel can only have at most 20x speedup
    • 5% must be done sequentially
  • meetings are the sequential part of a company
  • universal scalability law
    • coherence penalty
    • contention penalty
  • java logging frameworks doesn’t scale up

** what is clean and representative? code is the best place to capture our current understanding of a model

don’t rush to create unnecessary abstractions, do it when you see three instances

abstractions must pay for themselves

  • megamorphism (multiple implementations of an interface)

  • => branch hell

  • say no to big frameworks

  • pro tip: abstract when you are sure of the benefits

abstraction allows us to be more precise about what we mean (see Dijkstra’s quote)

** implementing efficient models memory subsystem assumptions

  • temporal bet
    • caches
  • spatial bet
    • cache line
  • pattern bet
    • predictable memory access pattern

playing well with memory can have big wins

coupling and cohesion

  • feature envy, getting the fields from a different object
  • train wreck
  • pro tip: respect the locality of reference
  • B+ trees have fewer cache misses

relationships

  • name the relationships
  • ordered or unordered
  • fifo or lifo

batching

  • amortise the expensive costs
  • writing to disk is expensive, write a bunch at once
  • pro tip: batching processing is not just for offline

branches

  • don’t pass null around and require checks
  • don’t check for isEmpty before iterating
  • pro tip: respect the principle of least surprise

loops

  • write it once, work on something else, reread it again
  • small inner loops can fit into the instruction cache on the CPU
    • avoids the cost of decoding the x86 instruction
  • pro top: craft major loops like good prose

composition

  • “inlining is THE optimisation.” – Cliff Click
  • prefer smaller methods
  • single reponsibility
  • pro tips: small atoms can combine to build anything

APIs allow the caller to pass in the collection to be used, rather than allocate a new collection and return it

data

  • have a predictable access patern
  • storing in column format
  • pro tip: embrace Set theory and FP techniques

** why performance test?

  • use histograms, not averages
    • HdrHistograms
  • coordinated omission
  • JMH for micro benchmarks
  • CPU performance counters
  • performance test as part of CI
  • build telemetry into production systems
    • see how F1 does it
    • see performance counters in Aeron
  • different profiler give different perspectives
    • safepoint sampling based is good for IO bottlenecks
    • honest profiler / misson control is good for hot CPU, missed blocking or waiting threads
    • mission control helps to find excessive allocations

Links to this note