From Adam Endrodi <adam.endrodi@nsn.com>
Fix a bug and clear is_corked during socket disconnect so that we
can re-use the context for a new connection.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
set_tcp_sockopt in particular conflicts with libnfs's function of the same
name and prevents a a program from statically linking against both libnfs and
libiscsi.
Similar fix should also go into libnfs.
This issue was introduced via patch "pdu: introduce ISCSI_PDU_CORK_WHEN_SENT"
on June 13, 2014 (commit 99585b6996).
Valgrind reported this use-after-free as follows:
Invalid read of size 4
at 0x5267606: iscsi_write_to_socket (socket.c:721)
by 0x5267A72: iscsi_service (socket.c:823)
by 0x526827C: event_loop (sync.c:67)
by 0x52698A4: iscsi_compareandwrite_sync (sync.c:823)
by 0x408111: compareandwrite (iscsi-support.c:1752)
by 0x4139E2: test_compareandwrite_simple (test_compareandwrite_simple.c:88)
by 0x503D260: ??? (in /usr/lib64/libcunit.so.1.0.1)
by 0x503D578: ??? (in /usr/lib64/libcunit.so.1.0.1)
by 0x503D8B5: CU_run_all_tests (in /usr/lib64/libcunit.so.1.0.1)
by 0x4046C6: main (iscsi-test-cu.c:1241)
Address 0x639f258 is 8 bytes inside a block of size 256 free'd
at 0x4C291E7: free (vg_replace_malloc.c:473)
by 0x525321B: iscsi_free (init.c:68)
by 0x52532F0: iscsi_sfree (init.c:110)
by 0x5257AD9: iscsi_free_pdu (pdu.c:179)
by 0x5267601: iscsi_write_to_socket (socket.c:719)
by 0x5267A72: iscsi_service (socket.c:823)
by 0x526827C: event_loop (sync.c:67)
by 0x52698A4: iscsi_compareandwrite_sync (sync.c:823)
by 0x408111: compareandwrite (iscsi-support.c:1752)
by 0x4139E2: test_compareandwrite_simple (test_compareandwrite_simple.c:88)
by 0x503D260: ??? (in /usr/lib64/libcunit.so.1.0.1)
by 0x503D578: ??? (in /usr/lib64/libcunit.so.1.0.1)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Peter Lieven <pl@kamp.de>
Since writing headers and payload in a single iov has never been
implementend and after thinking about it several times seems to
be very hairy I would like to revert this change since
the original implementation is in O(1) while the changed one
is in O(n). This results in a complexity of O(n^2) instead of
O(n) for the whole send operation.
This reverts commit 06eab264f6.
if the socket is corked we otherwise set POLLOUT and then
do not sent. Depending on the event loop implementation this
can result in a busy wait.
Signed-off-by: Peter Lieven <pl@kamp.de>
Rename the macros for managing the linked lists from SLIST_* to ISCSI_LIST_*
to avoid a clash on *BSD which already have other macros SLIST_*
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
The iSCSI protocol adds padding to a data packet if the data size is not
a multiple of four. The iovector provided by QEMU does not include such
padding, and libiscsi then complains that there was a protocol error.
This patch fixes this by reading the padding in a separate "recv"
system call. These packets anyway do not happen in the data path,
where the packet size is a multiple of 512.
This fixes QEMU's scsi-generic backend, which triggered the problem when
the target sent a 66-byte INQUIRY response.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch finally introduces a small allocation pool
which recycles all the small portions of memory that
are used for headers and pdu structures. This was
the initial idea behind wrapping all memory functions
in libiscsi.
The results of booting are test system up to the login
prompt are quite impressive:
BEFORE:
libiscsi:5 memory is clean at iscsi_destroy_context() after 10712 mallocs, 18 realloc(s) and 10712 free(s)
AFTER:
libiscsi:5 memory is clean at iscsi_destroy_context() after 41 mallocs, 18 realloc(s), 41 free(s) and 10584 reused small allocations
Signed-off-by: Peter Lieven <pl@kamp.de>
Default to 0 meaning no timeout.
Implement a test for iSCS to test what happens if we send a command
with CMDSN being higher than the target allows.
In this case we dont strictly know what will happen, just that what should
NOT happen is the target responding with success.
But we have to be prepared for any kind of failure, including a timeout,
scsi sense, or even iscsi reject or session failure.
iscsi_data.size is used as parameter to memory
operation functions (such as malloc) which except
size_t rather than int.
Signed-off-by: Peter Lieven <pl@kamp.de>
Ordering by itt adds the special case of handling
itt == 0xffffffff. Order by CmdSN instead as described
by RFC3720.
Signed-off-by: Peter Lieven <pl@kamp.de>
Queing packets with itt = 0xffffffff (e.g. NOP-Out replies) ahead
of queue because they might have a higher CmdSN than following
packets. This again could lead to a deadlock AND its a protocol
violotion:
RFC3720 Section 3.2.2.1 second last paragraph on Page 21:
"On any connection, the iSCSI initiator MUST send the commands in
increasing order of CmdSN, except for commands that are retransmitted
due to digest error recovery and connection recovery."
Signed-off-by: Peter Lieven <pl@kamp.de>
This patch adds support for read/writev to directly read and write
from/to iovectors. Before this patch on read and write from/to socket
the operation was limited by the iovec boundaries. If there is enough
data in the buffer or enough buffer space available its now possible
to transfer the whole data in one atomic operaion.
Signed-off-by: Peter Lieven <pl@kamp.de>
This will set TCP_NODELAY on the socket connection
to the target. This is the first step to improve latency.
For systems supporting TCP_CORK we plan to add a cork
around certain PDUs e.g. DATA-Out, but this needs further
testing.
Signed-off-by: Peter Lieven <pl@kamp.de>
We where modifying out_offset and out_len in iscsi_write_to_socket().
If the packet that was being sent before reconnect was a write command
the was a significant change that out_offset and out_len where already
touched. When requeing the packet after reconnect we where
sending garbage!
Signed-off-by: Peter Lieven <pl@kamp.de>
A storage might sent an R2T response for a WRITE command while
we still sending out the WRITE command PDU. This is especially
the case when the command PDU carries immediata data.
Without this patch the R2T response will get lost as
the cmdpdu for the R2T cannot be found in iscsi_process_pdu()
leading to a deadlock.
Signed-off-by: Peter Lieven <pl@kamp.de>
This patch fixes a deadlock case where all available
cmdsns have been used for command PDUs which need
additional Data-OUT PDUs to succeed (e.g. a write16
which is larger than first_burst_len).
In this case the target will never increase the maxcmdsn
leading to a deadlock in iscsi_write_to_socket().
If we receive the R2T for such a write request we will
queue the Data-OUT PDUs but at the end of queue. As
a result they will never be sent as the outqueue is already
stuck.
We fix this by sorting the outqueue by ascending itt.
Signed-off-by: Peter Lieven <pl@kamp.de>
This finally makes T1040 trigger the cmdsn overflow deadlock.
Its a bad idea to mangle on negotiated parameters as this leads
to protocol errors.
We also need to avoid overlapping writes as this leads also
to protocol errors.
Additionally we test with and with (if available) immediate data.
Signed-off-by: Peter Lieven <pl@kamp.de>
RFC3720 says that cmdsn comparison must be done using
serial32 arithmetic. This will definetly avoid a deadlock
if cmdsn wraps from 2^32-1 to 0.
Signed-off-by: Peter Lieven <pl@kamp.de>
Change iscsi_scsi_command_async() to write data-out using iovectors
attached to the scsi task structure instead of copying the data into
the buffer holding the header.
Still allow passing the data via an argument to the funtcion so that the
ABI does not change but then just conver the data to an iovector.
Update the write_to_socket functions to know about the iovectors and write them
as part of the pdu.
Convert write10_task to use iovectors.
This will allow 'zero-copy' writes through libiscsi.
However, as 'zero-copy writes does mean that we do more send() calls into
the kernel this may degrade performance for very small i/o.
A scsi write will not take at least 2 send() calls.
One send call for the iscsi header structure and a second send call for the
payload data.
This will be more expensive than the old memcpy() of payload data plus one send() call since the send() will be a lot more expensive than memcpy() of a small amount of data.