Commit Graph

9 Commits

Author SHA1 Message Date
Hou Pu
03fa3f627c iser: fix segmentation fault when task management pdu is received
As iser_pdu->desc->data_dir is not initialised when sending a PDU.
The value remains what it was when it was used last time. Thus
a PDU could be considered to have data if it previously had and
might cause segmentation fault.

For example if a pdu is a reset task management task with no data
to transfer and the pdu is previously used as a read task. Thus
it would cause fault like below:

> struct scsi_iovector *iovector_in = &task->iovector_in;

0  0x00007ffff7bcb2d1 in iser_rcv_completion (rx_desc=0x555555b79e48, iser_conn=0x555555b573a0) at iser.c:1349
1  0x00007ffff7bcb53e in iser_handle_wc (wc=0x7fffffffdc00, iser_conn=0x555555b573a0) at iser.c:1426
2  0x00007ffff7bcb685 in cq_event_handler (iser_conn=0x555555b573a0) at iser.c:1468
3  0x00007ffff7bcb81b in cq_handle (iser_conn=0x555555b573a0) at iser.c:1516
4  0x00007ffff7bc8b28 in iscsi_iser_service (iscsi=0x555555b58710, revents=1) at iser.c:118
5  0x00007ffff7bc3862 in iscsi_service (iscsi=0x555555b58710, revents=1) at socket.c:1016
6  0x00007ffff7bc3f6c in event_loop (iscsi=0x555555b58710, state=0x7fffffffe000) at sync.c:71
7  0x00007ffff7bc4605 in iscsi_task_mgmt_sync (iscsi=0x555555b58710, lun=0, function=ISCSI_TM_LUN_RESET, ritt=4294967295, rcmdsn=0) at sync.c:281
8  0x00007ffff7bc46cf in iscsi_task_mgmt_lun_reset_sync (iscsi=0x555555b58710, lun=0) at sync.c:312
9  0x000055555555500d in iscsi_lun_reset_sync (iscsi=0x555555b58710) at iscsiclient_lun_reset.c:34
10 0x0000555555555680 in main (argc=7, argv=0x7fffffffe1c8) at iscsiclient_lun_reset.c:211

Signed-off-by: Hou Pu <houpu@bytedance.com>
2020-11-08 15:45:00 +08:00
wanghonghao
843a01cbd8 iser: aggregate ack completion queue (CQ) events
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
2020-04-11 12:46:40 +08:00
wanghonghao
2212021747 iser: enhance connection procedure
This patch is used to fix the following problems in the current connection
method:
1. iscsi_iser_connect() waits until the connection is established or failed,
and may block the caller for a long time.
2. Although there's a cm_thread handles communication events, but in fact it
has no effects after the connection is established.
3. Resources are not released properly after reconnection failed. And once we
try to reconnect again, the resources will leak permanently.
(see iscsi_reconnect()).

This patch eliminate cm_thread and handle communication events in the caller
thread.
Connection procedure:
1. Create a mock fd by eventfd() (or just use old_iscsi->fd while reconnecting),
and assign it to iscsi->fd.
2. Create communication event channel, make it non-blocking and dup the
notifier fd to iscsi->fd.
3. Handle communication events by iscsi_which_events()/iscsi_service() loop
until connection established or falied.
4. If connection is established successfully, dup the notifier fd of completion
queue (CQ) events to iscsi->fd.
5. Handle completion queue (CQ) events by iscsi_which_events()/iscsi_service()
loop.
The entire procedure is non-blocking.

After established, whenever iscsi_service() is called with revents=0 or
queue_pdu() is called with a NOP pdu, communication events will be checked.

When connection failed, iser transport cleanup itself before callbacks.

Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
2020-04-07 10:38:37 +08:00
wanghonghao
68ce3363aa iser: dynamic memory region allocator
Implement an allocator for allocating memory region of different lengths.
The allocator registers 4MB memory chunks as memory regions, and select a
free segment from one of them each time.

4KB is the minimum allocation unit, and free segments in the same chunk can be
merged into a larger free segment by the rule of buddy allocation. As a result,
size of allocated segments will be power of 2, this may waste some space but
produces less fragments.

In each chunk, a complete binary tree (which is actully an array) is used
to maintain free segments. Each node records the order of the largest segment
can be allocated from its subtree. Here's a miniature example.

A chunk with all segments free:
level 4                         4(0x1)
level 3            3(0x2)                      3(0x3)
level 2     2(0x4)        2(0x5)       2(0x6)        2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)

After allocate a 16KB(order=3) memory region:
level 4                         3(0x1)
level 3            0(0x2)                      3(0x3)
level 2     2(0x4)        2(0x5)       2(0x6)        2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)

It tooks 1 comparison to determine if a chunk can satisfy and at most 11
loops to find the leftmost free segment meets the requirments.
The value of each node is not more than 11, and a 8-bit integer is enough
to store it, so only 2048 bytes is required for each tree. And since the
entire tree is in a contiguous piece of memory and no rotations are needed,
it's far more efficient than self-balancing trees of the same size.

Different 4MB chunks are linked as a list, and the selection order is from
head to tail each time. If no existing chunks can satisfy the allocation,
the allocator will register another 4M chunk and add it to the tail.
+---------+       +---------+      +---------+
|4MB chunk| --->  |4MB chunk| ---> |4MB chunk|
+---------+       +---------+      +---------+

In most cases, smaller IOs can always get memory regions from the first or
second chunk and never traverse the list too much, and if we really send a lot
of large IOs, the cost of the traversal is rarely critical.

At last, obviously, the chunks can only allocate a maximum of 4MB memory region,
if a larger memory region is needed, the allocater registers/deregisters a
memory region directly regardless of buffer.

Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
2020-04-01 11:35:42 +08:00
wanghonghao
51391285d8 iser: remove __packed from struct iser_cm_hdr declaration
`__packed` is not defined previously, and was treated as a varible
declaration.

Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
2019-12-10 14:42:38 +08:00
wanghonghao
22d7360b5e iser: fix struct iser_rx_desc
iSER header is followed by iSCSI PDU without any pad in an RCaP Message.

Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
2019-12-09 13:17:55 +08:00
Bart Van Assche
a55f11ee68 Improve iser_rx_desc alignment
Align iscsi_header[] and data[] on an 8-byte boundary instead of on a 4-byte
boundary. With this patch applied pahole produces the following output:

struct iser_rx_desc {
        struct iser_hdr    iser_header;                  /*     0    28 */
        char                       pad1[4];              /*    28     4 */
        char                       iscsi_header[48];     /*    32    48 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        char                       data[128];            /*    80   128 */
        /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
        struct ibv_sge     rx_sg;                        /*   208    16 */
        struct ibv_mr *            hdr_mr;               /*   224     8 */
        char                       pad2[24];             /*   232    24 */

        /* size: 256, cachelines: 4, members: 7 */
};

Additionally, this patch fixes the following build errors:

iser.c: In function 'iser_alloc_rx_descriptors':
iser.c:916:11: error: taking address of packed member of 'struct iser_rx_desc' may result in an unaligned pointer value [-Werror=address-of-packed-member]
  916 |   rx_sg = &rx_desc->rx_sg;
      |           ^~~~~~~~~~~~~~~
iser.c: In function 'iser_post_recvm':
iser.c:955:20: error: taking address of packed member of 'struct iser_rx_desc' may result in an unaligned pointer value [-Werror=address-of-packed-member]
  955 |   rx_wr->sg_list = &rx_desc->rx_sg;
      |                    ^~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
2019-10-31 14:16:56 -07:00
Peter Lieven
fa123fc397 abstract transport to static driver functions and opaque driver specific information.
This splits a transport into static driver specific functions for the common
iscsi commands. Optionally, a driver specific opaque memory is introduced
which is currently only used by iSER transport.
Last a lot of functions changed to static.

Signed-off-by: Peter Lieven <pl@kamp.de>
2016-08-05 11:28:43 +02:00
Roy Shterman
a628264ef0 Libiscsi: iSER implementation
This commit includes all iSER implementation in libscsi
library and utilities.

Also, adding iser option in url.

Change-Id: I55ca8a9d4db802e72eb991061260dbb0bd0ef9ba
Signed-off-by: Roy Shterman <roysh@mellanox.com>
2016-06-03 18:59:01 -07:00