Program
Performance on Modern Architectures
- Quick assembler basics (for the following examples)
- Architecture: Memory/cache hierarchy, Out-of-order execution, Prediction, Effects of decoded instruction caches
- Consequences: Interpreting results from instruction sampling tools (perf), Interpreting microbenchmarks
- Operating system and system calls: What is expensive, What is blocking
OCaml Runtime
- Comprehensive overview of the garbage collector (GC) operation
- Cost of different parts of the GC: Worst-case and average algorithmic costs, unusual effects, Example cases of unexpected effects
- FFI (performance-relevant aspects): Allocations/invariants, Annotations [@...] / boxing, Roots
OCaml Compiler
- Architecture: Tracking transformations of various constructs through compiler passes
- Cost of language constructs
- Guided tour of optimizations
- Differences with C-like compilers/runtimes
Tools
- perf and derived tools
- Proper benchmarking: Analyzing the type of performance sought (speed/latency (average/worst)...), Analyzing benchmark statistics, Non-regression benchmarks
- Valgrind/callgrind/cachegrind...
- The gdb debugger
Approaches to Improve Performance
- Using optimizations / choosing appropriate code structures: Controlling optimizations, Ensuring they apply
- Translating parts into C-like code: Analyzing relevance, Effectively using the FFI
- Identifying, recognizing, and fixing anti-patterns
- Fixing memory leaks and over-allocations: Reducing lifetimes, List of leak types
Organizing Optimization in a Project
- Estimating achievable performance for a program: Segmenting the program and separating analysis, Comparing with reality
- Identifying meaningful optimizations: Prioritizing the most useful aspects, What not to optimize