Patchwork [Open-FCoE,v3,02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly

login
register
mail settings
Submitter Johannes Thumshirn
Date Oct. 14, 2016, 7:38 a.m.
Message ID <20161014073821.qvkpjtwraqnwzzs3@linux-x5ow.site>
Download mbox | patch
Permalink /patch/362/
State New
Headers show

Comments

Johannes Thumshirn - Oct. 14, 2016, 7:38 a.m.
On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> Hm, still behaves for me like I reported for v2:
> http://marc.info/?l=linux-scsi&m=147637177902937&w=2

Hi Steffen,

Can you please try the following on top of 2/16?




The rational behind this is, in fc_req_to_bsgjob() we're assigning
job->request as req->cmd and job->request_len = req->cmd_len. But without
checkinf job->request_len we don't know whether we're save to touch
job->request (a.k.a. bsg_request).

In the meanwhile I try to reproduce your report here.

Thanks,
	Johannes
Johannes Thumshirn - Oct. 25, 2016, 7:43 a.m.
On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > Hm, still behaves for me like I reported for v2:
> > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> 
> Hi Steffen,
> 
> Can you please try the following on top of 2/16?
> 
> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> index 4149dac..baebaab 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -3786,6 +3786,12 @@ enum fc_dispatch_result {
>  	int cmdlen = sizeof(uint32_t);	/* start with length of msgcode */
>  	int ret;
>  
> +	/* check if we really have all the request data needed */
> +	if (job->request_len < cmdlen) {
> +		ret = -ENOMSG;
> +		goto fail_host_msg;
> +	}
> +
>  	/* Validate the host command */
>  	switch (bsg_request->msgcode) {
>  	case FC_BSG_HST_ADD_RPORT:
> @@ -3831,12 +3837,6 @@ enum fc_dispatch_result {
>  		goto fail_host_msg;
>  	}
>  
> -	/* check if we really have all the request data needed */
> -	if (job->request_len < cmdlen) {
> -		ret = -ENOMSG;
> -		goto fail_host_msg;
> -	}
> -
>  	ret = i->f->bsg_request(job);
>  	if (!ret)
>  		return FC_DISPATCH_UNLOCKED;
> @@ -3887,6 +3887,12 @@ enum fc_dispatch_result {
>  	int cmdlen = sizeof(uint32_t);	/* start with length of msgcode */
>  	int ret;
>  
> +	/* check if we really have all the request data needed */
> +	if (job->request_len < cmdlen) {
> +		ret = -ENOMSG;
> +		goto fail_rport_msg;
> +	}
> +
>  	/* Validate the rport command */
>  	switch (bsg_request->msgcode) {
>  	case FC_BSG_RPT_ELS:
> 
> 
> 
> The rational behind this is, in fc_req_to_bsgjob() we're assigning
> job->request as req->cmd and job->request_len = req->cmd_len. But without
> checkinf job->request_len we don't know whether we're save to touch
> job->request (a.k.a. bsg_request).

Hi Steffen,
Did you have any chance testing this? I hacked fcping to work with non-FCoE
and rports as well and tested with FCoE and lpfc. No problems seen from my
side. I've also pused the series (With this change folded in) to my git 
tree at [1] if this helps you in any way.

[1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4

Thanks a lot,
	Johannes
Christoph Hellwig - Nov. 3, 2016, 3:17 p.m.
On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > Hm, still behaves for me like I reported for v2:
> > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> 
> Hi Steffen,
> 
> Can you please try the following on top of 2/16?

How does this fit into the patch we're replying to?
Johannes Thumshirn - Nov. 3, 2016, 5:42 p.m.
On Thu, Nov 03, 2016 at 08:17:51AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > > Hm, still behaves for me like I reported for v2:
> > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> > 
> > Hi Steffen,
> > 
> > Can you please try the following on top of 2/16?
> 
> How does this fit into the patch we're replying to?

Sorry but I don't quite get you.

If you look at a mixed source assembly listing for fc_host_dispatch() [1] we load
the the offset of 24 from %r12 into %r1 and then the beginning of %r1 into %r2
and crash. So if we now check if  job->request_len is smaller than uint32_t we
know we can't have a messagecode in the request and bail out early, instead of
accessing a possible NULL pointer. At least that's my analysis, feel free to
correct me if I'm wrong.

[1] Listing:
static int fc_bsg_host_dispatch(struct Scsi_Host *shost, struct bsg_job *job)
{
        struct fc_internal *i = to_fc_internal(shost->transportt);
        struct fc_bsg_request *bsg_request = job->request;
    5d54:       e3 10 c0 18 00 04       lg      %r1,24(%r12)
        struct fc_bsg_reply *bsg_reply = job->reply;
    5d5a:       e3 b0 c0 20 00 04       lg      %r11,32(%r12)
        int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
        int ret;

        /* check if we really have all the request data needed */
        if (job->request_len < cmdlen) {
    5d60:       a7 c4 00 48             jle     5df0 <fc_bsg_dispatch+0x1f0>
                ret = -ENOMSG;
                goto fail_host_msg;
        }

        /* Validate the host command */
        switch (bsg_request->msgcode) {
    5d64:       58 20 10 00             l       %r2,0(%r1)
    5d68:       c2 2f 80 00 00 03       clfi    %r2,2147483651
    5d6e:       a7 84 00 1a             je      5da2 <fc_bsg_dispatch+0x1a2>
    5d72:       a7 24 00 0a             jh      5d86 <fc_bsg_dispatch+0x186>
    5d76:       c0 19 80 00 00 01       iilf    %r1,2147483649
    5d7c:       ec 21 00 2b a0 77       clrj    %r2,%r1,10,5dd2 <fc_bsg_dispatch+0x1d2>
    5d82:       a7 f4 00 3b             j       5df8 <fc_bsg_dispatch+0x1f8>
    5d86:       c0 59 80 00 00 04       iilf    %r5,2147483652
    5d8c:       ec 25 00 0b 80 76       crj     %r2,%r5,8,5da2 <fc_bsg_dispatch+0x1a2>
    5d92:       c0 59 80 00 00 ff       iilf    %r5,2147483903
    5d98:       ec 25 00 13 80 76       crj     %r2,%r5,8,5dbe <fc_bsg_dispatch+0x1be>
    5d9e:       a7 f4 00 2d             j       5df8 <fc_bsg_dispatch+0x1f8>


Thanks,
	Johannes
Johannes Thumshirn - Nov. 15, 2016, 11:56 a.m.
On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote:
> On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > > Hm, still behaves for me like I reported for v2:
> > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2

[...]

> > 
> > The rational behind this is, in fc_req_to_bsgjob() we're assigning
> > job->request as req->cmd and job->request_len = req->cmd_len. But without
> > checkinf job->request_len we don't know whether we're save to touch
> > job->request (a.k.a. bsg_request).
> 
> Hi Steffen,
> Did you have any chance testing this? I hacked fcping to work with non-FCoE
> and rports as well and tested with FCoE and lpfc. No problems seen from my
> side. I've also pused the series (With this change folded in) to my git 
> tree at [1] if this helps you in any way.
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4
> 

So I finally have a test system up and running. I have good and bad news. The
good news is, I can't get the system crashing with my patches, the bad news is
I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR
with my patches and without.

And btw, I renamed the branch to fc-bsg-rewrite-v4 in case you want to clone
from it (it has patch 2/16 changed to the v3 submission).

Can you please have a look with your setup?

Thanks,
	Johannes
Johannes Thumshirn - Nov. 15, 2016, 2:48 p.m.
On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote:
> Hi Johannes,
> 
> On 11/15/2016 12:56 PM, Johannes Thumshirn wrote:
> > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote:
> > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > > > > Hm, still behaves for me like I reported for v2:
> > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> > 
> > [...]
> > 
> > > > 
> > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning
> > > > job->request as req->cmd and job->request_len = req->cmd_len. But without
> > > > checkinf job->request_len we don't know whether we're save to touch
> > > > job->request (a.k.a. bsg_request).
> > > 
> > > Hi Steffen,
> > > Did you have any chance testing this? I hacked fcping to work with non-FCoE
> > > and rports as well and tested with FCoE and lpfc. No problems seen from my
> > > side. I've also pused the series (With this change folded in) to my git
> > > tree at [1] if this helps you in any way.
> > > 
> > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4
> > > 
> > 
> > So I finally have a test system up and running. I have good and bad news. The
> > good news is, I can't get the system crashing with my patches, the bad news is
> > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR
> > with my patches and without.
> 
> Assuming you run the latest package version on s390x:
> 
> Do steps 2 and 3 of the procedure in
> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lhdd/lhdd_t_fcp_api_runappl.html
> help?

No, I do have a correct /etc/hba.conf:
$ grep libzfcphbaapi /etc/hba.conf 
com.ibm.libzfcphbaapi /usr/lib64/libzfcphbaapi.so.0
and bumping the log level via export LIB_ZFCP_HBAAPI_LOG_LEVEL=1 just adds 
'(vlib.c:105)_initvlib: libzfcphbaapi.so loaded at Nov 15 15:43:23' to the 1st
line of zfcp_ping output.

Here's the full output:
$ zfcp_ping -v -a 0.0.fc00 -d 0x50050763051b473a
(vlib.c:105)_initvlib: libzfcphbaapi.so loaded at Nov 15 15:45:33
Sending PNG from BUS_ID=0.0.fc00 WWPN=0xc05076e0f3002344 ID=0x760c1c
dev=/dev/bsg/fc_host1 speed=8 GBit/s
--- REQUEST cmd = 0x0401 ---
03 00 00 00 fa 01 00 00 
04 01 00 01 00 00 00 00 
00 00 00 01 00 02 00 08 
50 05 07 63 05 1b 47 3a 
00 00 00 00 
--- RESPONSE rc = 0x1 ---
00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 
00 00 00 00 
--- REQUEST cmd = 0x0401 ---
03 00 00 00 fa 01 00 00 
04 01 00 01 00 00 00 00 
00 00 00 01 00 02 00 08 
50 05 07 63 05 1b 47 3a 
00 00 00 00 
--- RESPONSE rc = 0x1 ---
00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 
00 00 00 00 
--- REQUEST cmd = 0x0401 ---
03 00 00 00 fa 01 00 00 
04 01 00 01 00 00 00 00 
00 00 00 01 00 02 00 08 
50 05 07 63 05 1b 47 3a 
00 00 00 00 
--- RESPONSE rc = 0x1 ---
00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 
00 00 00 00 
Error received for FPNG request, aborting.

---------- ping statistics -----------
min/avg/max = 0.000/0.000/0.000 ms
--------------------------------------

I agree this looks a bit suspicious and I'll add some debugs in the kernel to
see what's going on.

> 
> The only other thing I can think of from the top of my head is that BSG
> ioctls are sensitive regarding ABI and I once had the kernel ioctl return
> EINVAL due to unmatching kernel-headers and libzfcphbaapi maps this EINVAL
> to HBA_STATUS_ERROR because there is no more specifically suitable HBA
> constant [old SUSE bugs 834498 and 834500].
> 
> > And btw, I renamed the branch to fc-bsg-rewrite-v4 in case you want to clone
> > from it (it has patch 2/16 changed to the v3 submission).
> > 
> > Can you please have a look with your setup?
> 
> I'm going to re-test hopefully within the next few days.

That'll be great, thanks.

Byte,
	Johannes
Johannes Thumshirn - Nov. 15, 2016, 3:39 p.m.
On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote:
> Hi Johannes,
> 
> On 11/15/2016 12:56 PM, Johannes Thumshirn wrote:
> > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote:
> > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > > > > Hm, still behaves for me like I reported for v2:
> > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> > 
> > [...]
> > 
> > > > 
> > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning
> > > > job->request as req->cmd and job->request_len = req->cmd_len. But without
> > > > checkinf job->request_len we don't know whether we're save to touch
> > > > job->request (a.k.a. bsg_request).
> > > 
> > > Hi Steffen,
> > > Did you have any chance testing this? I hacked fcping to work with non-FCoE
> > > and rports as well and tested with FCoE and lpfc. No problems seen from my
> > > side. I've also pused the series (With this change folded in) to my git
> > > tree at [1] if this helps you in any way.
> > > 
> > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4
> > > 
> > 
> > So I finally have a test system up and running. I have good and bad news. The
> > good news is, I can't get the system crashing with my patches, the bad news is
> > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR
> > with my patches and without.

Please ignore my last mails, apparently it's a wise idea to check which user
id one has before running zfcp_ping...

The good news for this is, I can now recreate the crashes you have and thus
have a chance to fix them :-)

Byte,
	Johannes
Johannes Thumshirn - Nov. 16, 2016, 11:35 a.m.
On Tue, Nov 15, 2016 at 04:39:33PM +0100, Johannes Thumshirn wrote:
> On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote:
> > Hi Johannes,
> > 
> > On 11/15/2016 12:56 PM, Johannes Thumshirn wrote:
> > > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote:
> > > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> > > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > > > > > Hm, still behaves for me like I reported for v2:
> > > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning
> > > > > job->request as req->cmd and job->request_len = req->cmd_len. But without
> > > > > checkinf job->request_len we don't know whether we're save to touch
> > > > > job->request (a.k.a. bsg_request).
> > > > 
> > > > Hi Steffen,
> > > > Did you have any chance testing this? I hacked fcping to work with non-FCoE
> > > > and rports as well and tested with FCoE and lpfc. No problems seen from my
> > > > side. I've also pused the series (With this change folded in) to my git
> > > > tree at [1] if this helps you in any way.
> > > > 
> > > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4
> > > > 
> > > 
> > > So I finally have a test system up and running. I have good and bad news. The
> > > good news is, I can't get the system crashing with my patches, the bad news is
> > > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR
> > > with my patches and without.
> 
> Please ignore my last mails, apparently it's a wise idea to check which user
> id one has before running zfcp_ping...
> 
> The good news for this is, I can now recreate the crashes you have and thus
> have a chance to fix them :-)

So JFTR, I was able to fix the 1 problem introduced by this patch, it's the
follwoing hunk:
@@ -3726,9 +3729,9 @@ fc_req_to_bsgjob(struct Scsi_Host *shost, struct fc_rport *rport,
        if (i->f->dd_bsg_size)
                job->dd_data = (void *)&job[1];
        spin_lock_init(&job->job_lock);
-       job->request = (struct fc_bsg_request *)req->cmd;
+       bsg_request = (struct fc_bsg_request *)req->cmd;
        job->request_len = req->cmd_len;
-       job->reply = req->sense;
+       bsg_reply = req->sense;
        job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
                                                 * allocated */
        if (req->bio) {

But as fc_req_to_bsgjob() get's deleted in Patch 15/16 the problem is
re-introduced. Unfortunately the fix isn't as trivial as for 2/16 so I'm still
trying to nail it down.

Thanks,
	Johannes

Patch

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 4149dac..baebaab 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3786,6 +3786,12 @@  enum fc_dispatch_result {
 	int cmdlen = sizeof(uint32_t);	/* start with length of msgcode */
 	int ret;
 
+	/* check if we really have all the request data needed */
+	if (job->request_len < cmdlen) {
+		ret = -ENOMSG;
+		goto fail_host_msg;
+	}
+
 	/* Validate the host command */
 	switch (bsg_request->msgcode) {
 	case FC_BSG_HST_ADD_RPORT:
@@ -3831,12 +3837,6 @@  enum fc_dispatch_result {
 		goto fail_host_msg;
 	}
 
-	/* check if we really have all the request data needed */
-	if (job->request_len < cmdlen) {
-		ret = -ENOMSG;
-		goto fail_host_msg;
-	}
-
 	ret = i->f->bsg_request(job);
 	if (!ret)
 		return FC_DISPATCH_UNLOCKED;
@@ -3887,6 +3887,12 @@  enum fc_dispatch_result {
 	int cmdlen = sizeof(uint32_t);	/* start with length of msgcode */
 	int ret;
 
+	/* check if we really have all the request data needed */
+	if (job->request_len < cmdlen) {
+		ret = -ENOMSG;
+		goto fail_rport_msg;
+	}
+
 	/* Validate the rport command */
 	switch (bsg_request->msgcode) {
 	case FC_BSG_RPT_ELS: