Implement an allocator for allocating memory region of different lengths.
The allocator registers 4MB memory chunks as memory regions, and select a
free segment from one of them each time.
4KB is the minimum allocation unit, and free segments in the same chunk can be
merged into a larger free segment by the rule of buddy allocation. As a result,
size of allocated segments will be power of 2, this may waste some space but
produces less fragments.
In each chunk, a complete binary tree (which is actully an array) is used
to maintain free segments. Each node records the order of the largest segment
can be allocated from its subtree. Here's a miniature example.
A chunk with all segments free:
level 4 4(0x1)
level 3 3(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
After allocate a 16KB(order=3) memory region:
level 4 3(0x1)
level 3 0(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
It tooks 1 comparison to determine if a chunk can satisfy and at most 11
loops to find the leftmost free segment meets the requirments.
The value of each node is not more than 11, and a 8-bit integer is enough
to store it, so only 2048 bytes is required for each tree. And since the
entire tree is in a contiguous piece of memory and no rotations are needed,
it's far more efficient than self-balancing trees of the same size.
Different 4MB chunks are linked as a list, and the selection order is from
head to tail each time. If no existing chunks can satisfy the allocation,
the allocator will register another 4M chunk and add it to the tail.
+---------+ +---------+ +---------+
|4MB chunk| ---> |4MB chunk| ---> |4MB chunk|
+---------+ +---------+ +---------+
In most cases, smaller IOs can always get memory regions from the first or
second chunk and never traverse the list too much, and if we really send a lot
of large IOs, the cost of the traversal is rarely critical.
At last, obviously, the chunks can only allocate a maximum of 4MB memory region,
if a larger memory region is needed, the allocater registers/deregisters a
memory region directly regardless of buffer.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
Align iscsi_header[] and data[] on an 8-byte boundary instead of on a 4-byte
boundary. With this patch applied pahole produces the following output:
struct iser_rx_desc {
struct iser_hdr iser_header; /* 0 28 */
char pad1[4]; /* 28 4 */
char iscsi_header[48]; /* 32 48 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
char data[128]; /* 80 128 */
/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
struct ibv_sge rx_sg; /* 208 16 */
struct ibv_mr * hdr_mr; /* 224 8 */
char pad2[24]; /* 232 24 */
/* size: 256, cachelines: 4, members: 7 */
};
Additionally, this patch fixes the following build errors:
iser.c: In function 'iser_alloc_rx_descriptors':
iser.c:916:11: error: taking address of packed member of 'struct iser_rx_desc' may result in an unaligned pointer value [-Werror=address-of-packed-member]
916 | rx_sg = &rx_desc->rx_sg;
| ^~~~~~~~~~~~~~~
iser.c: In function 'iser_post_recvm':
iser.c:955:20: error: taking address of packed member of 'struct iser_rx_desc' may result in an unaligned pointer value [-Werror=address-of-packed-member]
955 | rx_wr->sg_list = &rx_desc->rx_sg;
| ^~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
This splits a transport into static driver specific functions for the common
iscsi commands. Optionally, a driver specific opaque memory is introduced
which is currently only used by iSER transport.
Last a lot of functions changed to static.
Signed-off-by: Peter Lieven <pl@kamp.de>
This commit includes all iSER implementation in libscsi
library and utilities.
Also, adding iser option in url.
Change-Id: I55ca8a9d4db802e72eb991061260dbb0bd0ef9ba
Signed-off-by: Roy Shterman <roysh@mellanox.com>