We where modifying out_offset and out_len in iscsi_write_to_socket().
If the packet that was being sent before reconnect was a write command
the was a significant change that out_offset and out_len where already
touched. When requeing the packet after reconnect we where
sending garbage!
Signed-off-by: Peter Lieven <pl@kamp.de>
This patch adds the abilitiy to libiscsi to count the number
of consecutive outstanding NOPs.
With this ability its fairly easy to implement a keepalive
check with NOPs in your application.
Periodically (e.g. every 5 secs) create a NOP-Out with:
iscsi_nop_out_async(iscsi, NULL, NULL, 0, NULL);
At that time check the number of consecutive missing NOP-Ins
with
iscsi_get_nops_in_flight(iscsi) > N.
Where N is the number of missing NOP-Ins you will allow.
Please note that it is legitime for the Target to ignore
a NOP if the load is very high as those packet are mark
as IMMEDIATE.
Signed-off-by: Peter Lieven <pl@kamp.de>
Just like DATA_OUT we should just discard NOPs instead of requeueing them
on session reconnect.
Add new flag that to indicate this behaviour on reconnect and set it for
both data out and nops
In specific situation it might be useful to give up if a reconnect
is not successful or after a given number reconnect retries.
This patch adds the ability to change that. The default remains
the same: retry forever.
Signed-off-by: Peter Lieven <pl@kamp.de>
The preallocation logic was messing with size and offset of pdu->indata.
This was broken when we removed iscsi_data->alloc_size. Removing
entirely since reading without iovectors is deprecated.
Signed-off-by: Peter Lieven <pl@kamp.de>
A storage might sent an R2T response for a WRITE command while
we still sending out the WRITE command PDU. This is especially
the case when the command PDU carries immediata data.
Without this patch the R2T response will get lost as
the cmdpdu for the R2T cannot be found in iscsi_process_pdu()
leading to a deadlock.
Signed-off-by: Peter Lieven <pl@kamp.de>
Add comments to explain how/why we send unsolicited data.
This is a somewhat hairy area and it is easy to make mistakes here, so
extra comments on the how/why is useful.
When deciding if to generate a data out queue for the first sequence
the test was wrong.
We should not check if INITIAL_R2T is NO and IMMEDIATE_DATA=NO.
Immediate data does not matter here.
What we should check is IF we have more data we need to send and IF
INITIAL_R2T allows us to send more data, then we generate a train of DATA-OUT.
If MaxRecvDataSegmentLength is less than FirstBurstLength we will often have
to send unsolicited data as both ImmediateData and also as unsolicited
DATA-OUT PDUs.
Example:
Assume Target has responded :
MaxRecvDataSegmentLength = 8k
FirstBurstLength = 64k
ImmediateData=YES
InitialR2T=NO
Then this should generate the following sequence :
I->T ISCSI_COMMAND + 8K of immediate data. F-Bit is not set.
I->T DATA-OUT 8K
I->T DATA-OUT 8K
I->T DATA-OUT 8K
I->T DATA-OUT 8K
I->T DATA-OUT 8K
I->T DATA-OUT 8K
I->T DATA-OUT 8K. Final PDU in sequence so F-bit is set.
T->I R2T
...
If we do a write that spans more than first-burst-length amount of data
and we can use immediate data.
This will lead to a sequence
I->T SCSI_COMMAND + furst_burst_length of immediate data
T->I R2T
I->T DATA-OUT with the remainder of the data.
T->I SCSI_RESPONSE
In this situation we still have to set the F-bit in the flags for the ISCSI_COMMAND.
The F-bit indicates that the PDU is the final PDU in the current sequence
and is what will trigger the target to process what it has currently received
after which the target will either respond with a SCSI-RESULT or DATA-IN if this
was the final sequence of the command,
Or in the case this was not the last sequence, then the target will send a R2T
to request the next sequence for this command.
In this scenario above we have two sequences.
the first sequence is the SCSI_COMMAND with immediate data and since it has filled the full amount of immediate data and can not send any more data until an R2T, this means this pdu ends the sequence and thus has the F-bit set.
Once we receive the R2T the second sequence starts, which also will have the F-bit set in the PDU.
This fixes the IMMEDIATE_DATA_YES case if the paylaod did
not fit into the first PDU and fixes no unsolicited data
sent out iff IMMEDIATE_DATA_NO and INITIAL_R2T_NO.
Signed-off-by: Peter Lieven <pl@kamp.de>
set the cmdsn in the pdu struct so we can compare with
maxcmdsn when sending to socket even if data-out pdus
do not carry a cmdsn on the wire.
if we would not set pdu->cmdsn comparsion with
iscsi->maxcmdsn would cause a deadlock after
maxcmdsn increases to 2^31.
Signed-off-by: Peter Lieven <pl@kamp.de>
Remove the size field as it is not used. If we would keep it
we would have to calculate it in scsi_task_set_iov_in/out which
would add unneccassry wals to the iovec array.
Signed-off-by: Peter Lieven <pl@kamp.de>
The send logic was completely broken for any cases except
ISCSI_INITIAL_R2T_NO and ISCSI_IMMEDIATE_DATA_YES.
The final flag was set wrong or no data was sent.
It was also broken if the data did not fit into the cmd_pdu as
the consecutive pdus did not have the scsi_cbdata set which
lead to a segfault in iscsi_get_user_out_buffer().
Unfortunately we need to include scsi-lowlevel.h again in iscsi-private.h.
This should be fixed asap by introduction of an iscsi_task struct
to avoid to store iscsi relevant data in the scsi_task.
Signed-off-by: Peter Lieven <pl@kamp.de>
Conflicts:
lib/iscsi-command.c
Signed-off-by: Peter Lieven <pl@kamp.de>
obsolete the old api.
After discussions, It is probably not the right thing to do.
Lets leave the old api as is. Most simpler applications will
continue to pass a single linear buffer to the iscsi_<send-data-out-to-device>_task() for convenience so it would be wrong to force them all to
migrate to amore complex iovector api.
Convert any linear buffers, if provided, to a one element data-out iovector
if a buffer was provided.
This patch fixes a deadlock case where all available
cmdsns have been used for command PDUs which need
additional Data-OUT PDUs to succeed (e.g. a write16
which is larger than first_burst_len).
In this case the target will never increase the maxcmdsn
leading to a deadlock in iscsi_write_to_socket().
If we receive the R2T for such a write request we will
queue the Data-OUT PDUs but at the end of queue. As
a result they will never be sent as the outqueue is already
stuck.
We fix this by sorting the outqueue by ascending itt.
Signed-off-by: Peter Lieven <pl@kamp.de>
This finally makes T1040 trigger the cmdsn overflow deadlock.
Its a bad idea to mangle on negotiated parameters as this leads
to protocol errors.
We also need to avoid overlapping writes as this leads also
to protocol errors.
Additionally we test with and with (if available) immediate data.
Signed-off-by: Peter Lieven <pl@kamp.de>
This patch avoid incrementing itt to 0xffffffff which is
a reserved value for immediate pdus. Avoid incrementing
it to 0xfffffff to avoid unexpected behaviour.
Signed-off-by: Peter Lieven <pl@kamp.de>
RFC3720 says that cmdsn comparison must be done using
serial32 arithmetic. This will definetly avoid a deadlock
if cmdsn wraps from 2^32-1 to 0.
Signed-off-by: Peter Lieven <pl@kamp.de>
We can simply next the iscsi_scsi_cbdata in the iscsi_pdu struct
since it is only used inside the iscsi_pdu.
This saves one malloc for each pdu.
Signed-off-by: Peter Lieven <pl@kamp.de>
Change iscsi_scsi_command_async() to write data-out using iovectors
attached to the scsi task structure instead of copying the data into
the buffer holding the header.
Still allow passing the data via an argument to the funtcion so that the
ABI does not change but then just conver the data to an iovector.
Update the write_to_socket functions to know about the iovectors and write them
as part of the pdu.
Convert write10_task to use iovectors.
This will allow 'zero-copy' writes through libiscsi.
However, as 'zero-copy writes does mean that we do more send() calls into
the kernel this may degrade performance for very small i/o.
A scsi write will not take at least 2 send() calls.
One send call for the iscsi header structure and a second send call for the
payload data.
This will be more expensive than the old memcpy() of payload data plus one send() call since the send() will be a lot more expensive than memcpy() of a small amount of data.
This has the nice side effect to remove the compiler warning
"dereferencing type-punned pointer will break strict-aliasing rules"
which occur since gcc-4.7.
There are 79 locations where the warning occurs. All of them are in
statements where the htonl/htons/ntohl/ntohs functions are used, e.g.:
in lib/pdu.c itt = ntohl(*(uint32_t *)&in->hdr[16]);
in lib/scsi-lowlevel.c *(uint32_t *)&task->cdb[2] = htonl(lba);
The warning is not related to the htonl/htons/ntohl/ntohs functions but
to the casting/dereferencing operation. If the dereferenced variable is
already a pointer, the warning does not not occur, e.g. this one:
in lib/pdu.c itt = ntohl(*(uint32_t *)&in->data[16]);
The warning is caused by the -fstrict-aliasing option. The
-fstrict-aliasing option is enabled at optimization levels -O2, -O3, -Os.
Signed-off-by: Bernhard Kohl <bernhard.kohl@gmx.net>
As long as we have no struct iscsi_task we cannot track the
free() of data.indata buffer. This leads to a leaked memory
report in iscsi_context_destroy().
We fix this by assuming it has been freed when we pass it
it to scsi_task.
Signed-off-by: Peter Lieven <pl@kamp.de>
This patch defines an scsi_iovec struct which is guaranteed
to be POSIX compatible. It furthermore adds support for
in+out iovectors for bi-directional operations
Signed-off-by: Peter Lieven <pl@kamp.de>
If an application passes buffers to libiscsi for reading they are
currently added to a linked list one by one. This leads to a malloc
for each buffer object plus O(n) walks trough the list to add
the buffer to then end of the list. Additionally the buffer read
routine takes up to O(n) iterations to find the right buffer
for a request.
This patch introduces an scsi_iovector struct to pass buffers to
an scsi task. Adding a new buffer is in O(1) and finding the
right buffer to also. Malloc requirements are in O(log(n)).
Additionally the scsi_iovector struct is itended to be binary
compatible to an QEMUIOVector allowing to pass this structure
directly to the library.
Initial tests have been made booting an Ubuntu LTS 12.04.1
Desktop server up to the login prompt. The following observations
have been made with regards to scsi_malloc calls:
original implementation: ~11.500 mallocs
using iovector instead of list: ~ 7.500 mallocs
passing the iovector directly: 0 mallocs
To enable this feature in qemu for testing the following patch might
be used:
diff --git a/block/iscsi.c b/block/iscsi.c
index a6a819d..2809c15 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -390,11 +390,16 @@ iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num,
return NULL;
}
+#if defined(LIBISCSI_FEATURE_IOVECTOR)
+ assert(sizeof(struct QEMUIOVector) == sizeof(struct scsi_iovector));
+ scsi_iovector_assign(acb->task, (struct scsi_iovector*) acb->qiov);
+#else
for (i = 0; i < acb->qiov->niov; i++) {
scsi_task_add_data_in_buffer(acb->task,
acb->qiov->iov[i].iov_len,
acb->qiov->iov[i].iov_base);
}
+#endif
iscsi_set_events(iscsilun);
---
Signed-off-by: Peter Lieven <pl@kamp.de>