- Read path: eliminate redundant allocation in bsPerformCommand - remove
the pre-allocation before bs.Read() and the append loop for zero-fill,
use direct copy and in-place zero-fill instead
- parseHeader: use command pool (getCommand) instead of direct allocation,
reducing GC pressure on the hot path
- Unmap: use a shared 1MB zero buffer instead of allocating per-descriptor,
dramatically reducing allocations for large unmap operations
- Network I/O: add 256KB bufio.Writer to iSCSI connections, batching
small PDU writes into fewer syscalls. Flush after txHandler completes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>