FPGA Resource Utilization Journey
From baseline to optimized: the resource tradeoffs that enabled 49.8× speedup
❌ Baseline (Unoptimized)
Flip-Flops
25,211 / 106,400
❌ Problems:
- 83% LUT utilization → routing congestion
- Low BRAM usage → insufficient buffering
- Clock: 10ns → tight timing
- Latency: 397ms → memory-bound
✅ Optimized
BRAM 18K
196 / 280 blocks
Flip-Flops
23,210 / 106,400
✅ Optimizations Applied:
- Loop pipelining with II=1
- Array partitioning for parallel access
- Tiled convolution with on-chip buffers
- Clock relaxed to 15ns → timing closure
- Dataflow pragma for layer pipelining
The Key Tradeoff: We traded BRAM (23% → 70%) for LUT reduction (83% → 62%) and massive speedup (49.8×).
On-chip buffering eliminated memory bottlenecks at the cost of using more block RAM. This is the classic FPGA
optimization pattern: use precious on-chip memory to avoid slow DDR accesses.
Resource Utilization Comparison
Latency Journey