The pdu indata alloc by iscsi_malloc with a undetermined size, but free
by iscsi_sfree. The iscsi_sfree can only be used to free memory which
size is equal to iscsi->smalloc_size.
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>
Instead of adding __attribute__((unused)) to unused arguments, add the
-Wno-unused-parameter compiler flag.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
As iser_pdu->desc->data_dir is not initialised when sending a PDU.
The value remains what it was when it was used last time. Thus
a PDU could be considered to have data if it previously had and
might cause segmentation fault.
For example if a pdu is a reset task management task with no data
to transfer and the pdu is previously used as a read task. Thus
it would cause fault like below:
> struct scsi_iovector *iovector_in = &task->iovector_in;
0 0x00007ffff7bcb2d1 in iser_rcv_completion (rx_desc=0x555555b79e48, iser_conn=0x555555b573a0) at iser.c:1349
1 0x00007ffff7bcb53e in iser_handle_wc (wc=0x7fffffffdc00, iser_conn=0x555555b573a0) at iser.c:1426
2 0x00007ffff7bcb685 in cq_event_handler (iser_conn=0x555555b573a0) at iser.c:1468
3 0x00007ffff7bcb81b in cq_handle (iser_conn=0x555555b573a0) at iser.c:1516
4 0x00007ffff7bc8b28 in iscsi_iser_service (iscsi=0x555555b58710, revents=1) at iser.c:118
5 0x00007ffff7bc3862 in iscsi_service (iscsi=0x555555b58710, revents=1) at socket.c:1016
6 0x00007ffff7bc3f6c in event_loop (iscsi=0x555555b58710, state=0x7fffffffe000) at sync.c:71
7 0x00007ffff7bc4605 in iscsi_task_mgmt_sync (iscsi=0x555555b58710, lun=0, function=ISCSI_TM_LUN_RESET, ritt=4294967295, rcmdsn=0) at sync.c:281
8 0x00007ffff7bc46cf in iscsi_task_mgmt_lun_reset_sync (iscsi=0x555555b58710, lun=0) at sync.c:312
9 0x000055555555500d in iscsi_lun_reset_sync (iscsi=0x555555b58710) at iscsiclient_lun_reset.c:34
10 0x0000555555555680 in main (argc=7, argv=0x7fffffffe1c8) at iscsiclient_lun_reset.c:211
Signed-off-by: Hou Pu <houpu@bytedance.com>
The target sometimes sends a logout request to libiscsi
in case it is going down or for some other reason.
The opcode of such a request is ISCSI_PDU_ASYNC_MSG.
On receiving these kinds of PDU, there is no related pdu on the
list of iscsi->waitpdu. Just skip finding them from iscsi->waitpdu.
Or segment fault might happen.
Also rename nop_target label to no_waitpdu to be more clear.
Signed-off-by: Hou Pu <houpu@bytedance.com>
Allocate `iser_pdu` from small allocation pool.
Lifecycle of `iscsi_in_pdu` is inside the function in iSER transport. Allocate
it on stack.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
This commit is to fix compatibility with CHAP.
iSER transport only post `login_resp_buf` (which is larger than `rx_desc`) as
work request (WR) once, but there may be multiple requests and responses during
login phase (e.g. when CHAP is used) and login can't be finished in such cases.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
tx_desc and memory region buffer assigned to iser pdus should be given back to
tx_desc list and allocator before free all memory regions.
This may happend during reconnecting/disconnecting.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
This patch is used to fix the following problems in the current connection
method:
1. iscsi_iser_connect() waits until the connection is established or failed,
and may block the caller for a long time.
2. Although there's a cm_thread handles communication events, but in fact it
has no effects after the connection is established.
3. Resources are not released properly after reconnection failed. And once we
try to reconnect again, the resources will leak permanently.
(see iscsi_reconnect()).
This patch eliminate cm_thread and handle communication events in the caller
thread.
Connection procedure:
1. Create a mock fd by eventfd() (or just use old_iscsi->fd while reconnecting),
and assign it to iscsi->fd.
2. Create communication event channel, make it non-blocking and dup the
notifier fd to iscsi->fd.
3. Handle communication events by iscsi_which_events()/iscsi_service() loop
until connection established or falied.
4. If connection is established successfully, dup the notifier fd of completion
queue (CQ) events to iscsi->fd.
5. Handle completion queue (CQ) events by iscsi_which_events()/iscsi_service()
loop.
The entire procedure is non-blocking.
After established, whenever iscsi_service() is called with revents=0 or
queue_pdu() is called with a NOP pdu, communication events will be checked.
When connection failed, iser transport cleanup itself before callbacks.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
Implement an allocator for allocating memory region of different lengths.
The allocator registers 4MB memory chunks as memory regions, and select a
free segment from one of them each time.
4KB is the minimum allocation unit, and free segments in the same chunk can be
merged into a larger free segment by the rule of buddy allocation. As a result,
size of allocated segments will be power of 2, this may waste some space but
produces less fragments.
In each chunk, a complete binary tree (which is actully an array) is used
to maintain free segments. Each node records the order of the largest segment
can be allocated from its subtree. Here's a miniature example.
A chunk with all segments free:
level 4 4(0x1)
level 3 3(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
After allocate a 16KB(order=3) memory region:
level 4 3(0x1)
level 3 0(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
It tooks 1 comparison to determine if a chunk can satisfy and at most 11
loops to find the leftmost free segment meets the requirments.
The value of each node is not more than 11, and a 8-bit integer is enough
to store it, so only 2048 bytes is required for each tree. And since the
entire tree is in a contiguous piece of memory and no rotations are needed,
it's far more efficient than self-balancing trees of the same size.
Different 4MB chunks are linked as a list, and the selection order is from
head to tail each time. If no existing chunks can satisfy the allocation,
the allocator will register another 4M chunk and add it to the tail.
+---------+ +---------+ +---------+
|4MB chunk| ---> |4MB chunk| ---> |4MB chunk|
+---------+ +---------+ +---------+
In most cases, smaller IOs can always get memory regions from the first or
second chunk and never traverse the list too much, and if we really send a lot
of large IOs, the cost of the traversal is rarely critical.
At last, obviously, the chunks can only allocate a maximum of 4MB memory region,
if a larger memory region is needed, the allocater registers/deregisters a
memory region directly regardless of buffer.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
Hit iser hang in rdma_destroy_id with trace:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f96ecbbcbb3 in rdma_destroy_id () from /usr/lib/librdmacm.so.1
#2 0x00005632027311d4 in iser_conn_release (iser_conn=iser_conn@entry=0x7f96d4027440) at iser.c:261
#3 0x0000563202731428 in iscsi_iser_connect (iscsi=0x563205206c70, sa=<optimized out>, ai_family=<optimized out>)
at iser.c:1516
#4 0x000056320273dd3c in iscsi_connect_async (iscsi=iscsi@entry=0x563205206c70,
portal=portal@entry=0x563205207084 "210.32.124.205:3260", cb=cb@entry=0x56320272b220 <iscsi_connect_cb>,
private_data=private_data@entry=0x7f96d4008b00) at socket.c:389
#5 0x000056320272b325 in iscsi_full_connect_async (iscsi=0x563205206c70,
portal=0x563205207084 "210.32.124.205:3260", lun=1, cb=cb@entry=0x56320272aef0 <iscsi_reconnect_cb>,
private_data=private_data@entry=0x0) at connect.c:230
#6 0x000056320272b711 in iscsi_reconnect (iscsi=<optimized out>) at connect.c:473
#7 0x00005632026810a8 in iscsi_timed_check_events (opaque=0x563205206ae0) at block/iscsi.c:387
Currently use pthread_cancel to kill cmthread forcefully, cmthread may
exits without rdma_ack_cm_event, then unacknowledged event will be
remained in librdmacm. rdma_destroy_id hangs until uplayer ack all
the cm event.
Since destroying qp, cm thread will handle DISCONNECTED event, and
exits by itself. Joining cm thread to wait cm thread to exit
gracefully.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Hit the crash stack:
#0 iser_initialize_headers (iser_pdu=0x7f1a3404ef50, iser_conn=0x0) at iser.c:514
#1 iscsi_iser_send_pdu (iscsi=0x7f1a3406d700, pdu=0x7f1a3404ef50) at iser.c:714
#2 0x000055e3160f0157 in iscsi_scsi_command_async (iscsi=0x7f1a3406d700, iscsi@entry=0x55e317fbcc70,
lun=lun@entry=1, task=task@entry=0x7f1a34026610, cb=cb@entry=0x55e316044c10 <iscsi_co_generic_cb>,
d=d@entry=0x7f15feeb7710, private_data=private_data@entry=0x7f15feeb77e0) at iscsi-command.c:282
#3 0x000055e3160f1616 in iscsi_write10_iov_task (iscsi=0x55e317fbcc70, lun=1, lba=lba@entry=10401896,
data=data@entry=0x0, datalen=4096, blocksize=<optimized out>, wrprotect=0, dpo=0, fua=0, fua_nv=0,
group_number=0, cb=0x55e316044c10 <iscsi_co_generic_cb>, private_data=0x7f15feeb77e0, iov=0x7f1a34042090,
niov=1) at iscsi-command.c:1107
#4 0x000055e31604680f in iscsi_co_writev (bs=<optimized out>, sector_num=<optimized out>,
nb_sectors=<optimized out>, iov=0x7f1a3404e380, flags=<optimized out>) at block/iscsi.c:640
#5 0x000055e31601e89c in bdrv_driver_pwritev (bs=bs@entry=0x55e317fb6570, offset=offset@entry=5325770752,
bytes=bytes@entry=4096, qiov=qiov@entry=0x7f1a3404e380, qiov_offset=qiov_offset@entry=0, flags=flags@entry=0)
at block/io.c:1220
The reason is that during async reconnection, before reconnecting
call back function gets woked, we have closed the old connection,
and the new connection is not ready.
At the same time, up layer still sends pdu to the old iscsi context.
In this patch, before reconnecting successfully, just add the pdu to
waitpdu without sending.
Suggested by Bart, do not show iser related log here.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[ bvanassche: reformatted patch ]
Checking whether a pointer is NULL after it has been dereferenced is not
useful. This was detected by Coverity.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reduce the size of struct iscsi_context by reordering the members of this
data structure. Additionally, change the rdma_ack_timeout value from
'unsigned char' into 'uint8_t' to make it clear that this variable
represents an integer.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Since 2c1619edef61a03cb516efaa81750784c3071d10 for linux kernel and
55843c4ab8f559679d28c559cc4d681836be769b for rdma-core, rdma cma
supports RDMA_OPTION_ID_ACK_TIMEOUT. It's useful for RDMA out of
sequence case. Because this feature is added recently, we have to
check this in autogen.sh before building source code.
Depend on production enviroument, tunning rdma ack timeout could get
the best performance. Suggested by Bart and Ronnie, instead of using
a fixed timeout value, add two methods to set rdma ack timeout value.
1, add URL variable 'LIBISCSI_RDMA_ACK_TIMEOUT'. This could works
for a specified connection.
2, add env argument 'LIBISCSI_RDMA_ACK_TIMEOUT'. This works as a
common setting for all the connection of a process.
Test under different packet loss rate and different ack timeout, run
fio (iodepth=1) in a guest os, I got this result:
latency under packet loss rate 0.00001:
timeout 19: avg 170.22, pct99.9 215
timeout 10: avg 160.08, pct99.9 215
timeout 8 : avg 146.39, pct99.9 177
timeout 7 : avg 148.37, pct99.9 211
latency under packet loss rate 0.0001:
timeout 19: avg 949.23, pct99.9 306
timeout 10: avg 818.53, pct99.9 378
timeout 8 : avg 615.84, pct99.9 189
timeout 7 : avg 618.89, pct99.9 310
Base on this test report, setting ack timeout to 8(1048.576 usec) is
a good choice in my test enviroument.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
After iser reconnects successfully, iser drive should close old
connection and release resources.
Fix resource leak in this patch, and test a lot, this patch works
fine.
Test env:
192.168.122.204: run as a software gateway
192.168.122.205: run iser target, default gateway 192.168.122.204
192.168.122.206: run QEMU as intiator, default gateway 192.168.122.204
run script on 192.168.122.204:
for i in `seq 1 100`
do
iptables -s 192.168.122.205/32 -A FORWARD -m statistic --mode random --probability 1 -j DROP
iptables -s 192.168.122.206/32 -A FORWARD -m statistic --mode random --probability 1 -j DROP
sleep 30
iptables -F
sleep 30
done
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
If a thread is created without any attr, it works in attached mode.
It means that we need run pthread_join to relaim stack of thread.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Hit segfault at iser_reg_mr during attaching disk with backtrace:
#0 0x000055ace9635b0f in iser_reg_mr (iser_conn=0x55aceca33820) at iser.c:1060
#1 iser_connected_handler (cma_id=<optimized out>) at iser.c:1300
#2 iser_cma_handler (event=0x7f29ef1f7950, cma_id=<optimized out>, iser_conn=0x55aceca33820) at iser.c:1326
#3 cm_thread (arg=0x55aceca33820) at iser.c:1380
#4 0x00007f2e2c31c4a4 in start_thread (arg=0x7f29ef1f8700) at pthread_create.c:456
#5 0x00007f2e2c05ed0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb) p *iser_conn->tx_desc
Cannot access memory at address 0x20
This issue can be reproduced easily by attaching several disks of iser
protocol:
# virsh attach-device stretch iser0.xml
# virsh attach-device stretch iser1.xml
...
Initialize instances with zero to avoid random value pointer.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
libiscsi is usually linked by QEMU, and QEMU sets thread proc name
by function. But iser cm thread is created by libiscsi privately,
QEMU can't set this thread. After attaching a iser disk, we can find
a new thread 'qemu-system-x86' in QEMU process.
With this patch, iser cm thread works with thread name
'iscsi_cm_thread'.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
A PDU is sent directly in iscsi_iser_queue_pdu even if the cmdsn of
it exceeds maxcmdsn, and it may be ignored by the target.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
When ImmediateData=Yes, DataSegmentLength is set in iSCSI layer
but immediate data is not sent in the RCaP message.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
Discovered this while running iSCSI.iSCSITMF AbortTaskSimpleAsync
test case. For Task Management command iser_pdu->iscsi_pdu.scsi_cbdata
is not set. When test case tries to send Task Management command
via common API iser_send_command() - it calls overflow_data_size
which tries to dereference scsi_cbdata leading to SEGFAULT.
Added a non-NULL check for scsi_cbdata before accessing it.
Added support for negotiating below keys:
RDMAExtensions, TargetRecvDataSegmentLength, and
InitiatorRecvDataSegmentLength.
These are required to support iSER. See RFC5046 Section 6.
The old code is effectively always posting iser_conn->min_posted_rx
descriptors, since it is
if (outstanding + iser_conn->min_posted_rx <= iser_conn->qp_max_recv_dtos) {
if(iser_conn->qp_max_recv_dtos - outstanding > iser_conn->min_posted_rx)
count = iser_conn->min_posted_rx;
else
count = iser_conn->qp_max_recv_dtos - outstanding;
which is equivalent to
if(iser_conn->qp_max_recv_dtos - outstanding >= iser_conn->min_posted_rx)
if(iser_conn->qp_max_recv_dtos - outstanding > iser_conn->min_posted_rx)
count = iser_conn->min_posted_rx;
else
count = iser_conn->min_posted_rx;
So the "if" is redundant and the "min_posted_rx" is actually behaving more
like a _maximum_ number of posted descriptors in one iser_post_recvm.
Fix it with the (presumably) intended logic and remove a goto along
the way.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The code was implicitly dependent on container_of from
inifiniband/verbs.h, however that's been removed in rdma-core
latest release:
ce0274acff
Define container_of locally if it's not already defined
Win32 has been rotting for a while. This patch adds vs17 build files
as well as fixing up all build errors that have accumulated.
There are still build warnings but those can be addressed in a followup
patch.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
This splits a transport into static driver specific functions for the common
iscsi commands. Optionally, a driver specific opaque memory is introduced
which is currently only used by iSER transport.
Last a lot of functions changed to static.
Signed-off-by: Peter Lieven <pl@kamp.de>
This commit includes all iSER implementation in libscsi
library and utilities.
Also, adding iser option in url.
Change-Id: I55ca8a9d4db802e72eb991061260dbb0bd0ef9ba
Signed-off-by: Roy Shterman <roysh@mellanox.com>