Patchwork [Open-FCoE,RFC] libfcoe: Make fcoe_sysfs optional / fix fnic NULL exception

login
register
mail settings
Submitter Robert Love
Date Sept. 4, 2013, 11:47 p.m.
Message ID <20130904234727.6982.11442.stgit@fritz>
Download mbox | patch
Permalink /patch/104/
State Accepted
Headers show

Comments

Robert Love - Sept. 4, 2013, 11:47 p.m.
fnic doesn't use any of the create/destroy/enable/disable interfaces
either from the (legacy) module paramaters or the (new) fcoe_sysfs
interfaces. When fcoe_sysfs was introduced fnic wasn't changed since
it wasn't using the interfaces. libfcoe incorrectly assumed that that
all of its users were using fcoe_sysfs and when adding and deleting
FCFs would assume the existance of a fcoe_ctlr_device. fnic was not
allocating this structure because it doesn't care about the standard
user interfaces (fnic starts on link only). If/When libfcoe tried to use
the fcoe_ctlr_device's lock for the first time a NULL pointer exception
would be triggered.

Since fnic doesn't care about sysfs or user interfaces, the solution
is to drop libfcoe's assumption that all drivers are using fcoe_sysfs.

This patch accomplishes this by changing some of the structure
relationships.

We need a way to determine when a LLD is using fcoe_sysfs or not and
we can do that by checking for the existance of the fcoe_ctlr_device.
Prior to this patch, it was assumed that the fcoe_ctlr structure was
allocated with the fcoe_ctlr_device and immediately followed it in
memory. To reach the fcoe_ctlr_device we would simply go back in memory
from the fcoe_ctlr to get the fcoe_ctlr_device.

Since fnic doesn't allocate the fcoe_ctlr_device, we cannot keep that
assumption. This patch adds a pointer from the fcoe_ctlr to the
fcoe_ctlr_device. For bnx2fc and fcoe we will continue to allocate the
two structures together, but then we'll set the ctlr->cdev pointer
to point at the fcoe_ctlr_device. fnic will not change and will continue
to allocate the fcoe_ctlr itself, and ctlr->cdev will remain NULL.

When libfcoe adds fcoe_fcf's to the fcoe_ctlr it will check if ctlr->cdev
is set and only if so will it continue to interact with fcoe_sysfs.

Cc: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |    1 
 drivers/scsi/fcoe/fcoe.c          |    1 
 drivers/scsi/fcoe/fcoe_ctlr.c     |   93 +++++++++++++++++++++++++------------
 include/scsi/libfcoe.h            |    7 ++-
 4 files changed, 70 insertions(+), 32 deletions(-)
Neil Horman - Sept. 5, 2013, 12:45 p.m.
On Wed, Sep 04, 2013 at 04:47:27PM -0700, Robert Love wrote:
> fnic doesn't use any of the create/destroy/enable/disable interfaces
> either from the (legacy) module paramaters or the (new) fcoe_sysfs
> interfaces. When fcoe_sysfs was introduced fnic wasn't changed since
> it wasn't using the interfaces. libfcoe incorrectly assumed that that
> all of its users were using fcoe_sysfs and when adding and deleting
> FCFs would assume the existance of a fcoe_ctlr_device. fnic was not
> allocating this structure because it doesn't care about the standard
> user interfaces (fnic starts on link only). If/When libfcoe tried to use
> the fcoe_ctlr_device's lock for the first time a NULL pointer exception
> would be triggered.
> 
> Since fnic doesn't care about sysfs or user interfaces, the solution
> is to drop libfcoe's assumption that all drivers are using fcoe_sysfs.
> 
> This patch accomplishes this by changing some of the structure
> relationships.
> 
> We need a way to determine when a LLD is using fcoe_sysfs or not and
> we can do that by checking for the existance of the fcoe_ctlr_device.
> Prior to this patch, it was assumed that the fcoe_ctlr structure was
> allocated with the fcoe_ctlr_device and immediately followed it in
> memory. To reach the fcoe_ctlr_device we would simply go back in memory
> from the fcoe_ctlr to get the fcoe_ctlr_device.
> 
> Since fnic doesn't allocate the fcoe_ctlr_device, we cannot keep that
> assumption. This patch adds a pointer from the fcoe_ctlr to the
> fcoe_ctlr_device. For bnx2fc and fcoe we will continue to allocate the
> two structures together, but then we'll set the ctlr->cdev pointer
> to point at the fcoe_ctlr_device. fnic will not change and will continue
> to allocate the fcoe_ctlr itself, and ctlr->cdev will remain NULL.
> 
> When libfcoe adds fcoe_fcf's to the fcoe_ctlr it will check if ctlr->cdev
> is set and only if so will it continue to interact with fcoe_sysfs.
> 
> Cc: Hiral Patel <hiralpat@cisco.com>
> Signed-off-by: Robert Love <robert.w.love@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Neil Horman - Oct. 7, 2013, 1:42 p.m.
On Thu, Sep 05, 2013 at 07:47:27AM -0000, Robert Love wrote:
> fnic doesn't use any of the create/destroy/enable/disable interfaces
> either from the (legacy) module paramaters or the (new) fcoe_sysfs
> interfaces. When fcoe_sysfs was introduced fnic wasn't changed since
> it wasn't using the interfaces. libfcoe incorrectly assumed that that
> all of its users were using fcoe_sysfs and when adding and deleting
> FCFs would assume the existance of a fcoe_ctlr_device. fnic was not
> allocating this structure because it doesn't care about the standard
> user interfaces (fnic starts on link only). If/When libfcoe tried to use
> the fcoe_ctlr_device's lock for the first time a NULL pointer exception
> would be triggered.
> 
> Since fnic doesn't care about sysfs or user interfaces, the solution
> is to drop libfcoe's assumption that all drivers are using fcoe_sysfs.
> 
> This patch accomplishes this by changing some of the structure
> relationships.
> 
> We need a way to determine when a LLD is using fcoe_sysfs or not and
> we can do that by checking for the existance of the fcoe_ctlr_device.
> Prior to this patch, it was assumed that the fcoe_ctlr structure was
> allocated with the fcoe_ctlr_device and immediately followed it in
> memory. To reach the fcoe_ctlr_device we would simply go back in memory
> from the fcoe_ctlr to get the fcoe_ctlr_device.
> 
> Since fnic doesn't allocate the fcoe_ctlr_device, we cannot keep that
> assumption. This patch adds a pointer from the fcoe_ctlr to the
> fcoe_ctlr_device. For bnx2fc and fcoe we will continue to allocate the
> two structures together, but then we'll set the ctlr->cdev pointer
> to point at the fcoe_ctlr_device. fnic will not change and will continue
> to allocate the fcoe_ctlr itself, and ctlr->cdev will remain NULL.
> 
> When libfcoe adds fcoe_fcf's to the fcoe_ctlr it will check if ctlr->cdev
> is set and only if so will it continue to interact with fcoe_sysfs.
> 
> Cc: Hiral Patel <hiralpat@cisco.com>
> Signed-off-by: Robert Love <robert.w.love@intel.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 
Rob, whats the state of this?  I'm getting some questions about it, and I
noticed that you proposed it, I acked it, but its been sitting in NEW state in
patchwork since the start of september.  Are you planning on integrating it
still?  If not, what was the alternate fix?

Best
Neil
Robert Love - Oct. 7, 2013, 5:20 p.m.
> From: Neil Horman [mailto:nhorman@redhat.com]
> Sent: Monday, October 07, 2013 6:33 AM
> To: Love, Robert W
> Cc: fcoe-devel@open-fcoe.org; hiralpat@cisco.com; hare@suse.com
> Subject: Re: [Open-FCoE, RFC] libfcoe: Make fcoe_sysfs optional / fix fnic
> NULL exception
> 
> On Thu, Sep 05, 2013 at 07:47:27AM -0000, Robert Love wrote:
> > fnic doesn't use any of the create/destroy/enable/disable interfaces
> > either from the (legacy) module paramaters or the (new) fcoe_sysfs
> > interfaces. When fcoe_sysfs was introduced fnic wasn't changed since
> > it wasn't using the interfaces. libfcoe incorrectly assumed that that
> > all of its users were using fcoe_sysfs and when adding and deleting
> > FCFs would assume the existance of a fcoe_ctlr_device. fnic was not
> > allocating this structure because it doesn't care about the standard
> > user interfaces (fnic starts on link only). If/When libfcoe tried to
> > use the fcoe_ctlr_device's lock for the first time a NULL pointer
> > exception would be triggered.
> >
> > Since fnic doesn't care about sysfs or user interfaces, the solution
> > is to drop libfcoe's assumption that all drivers are using fcoe_sysfs.
> >
> > This patch accomplishes this by changing some of the structure
> > relationships.
> >
> > We need a way to determine when a LLD is using fcoe_sysfs or not and
> > we can do that by checking for the existance of the fcoe_ctlr_device.
> > Prior to this patch, it was assumed that the fcoe_ctlr structure was
> > allocated with the fcoe_ctlr_device and immediately followed it in
> > memory. To reach the fcoe_ctlr_device we would simply go back in
> > memory from the fcoe_ctlr to get the fcoe_ctlr_device.
> >
> > Since fnic doesn't allocate the fcoe_ctlr_device, we cannot keep that
> > assumption. This patch adds a pointer from the fcoe_ctlr to the
> > fcoe_ctlr_device. For bnx2fc and fcoe we will continue to allocate the
> > two structures together, but then we'll set the ctlr->cdev pointer to
> > point at the fcoe_ctlr_device. fnic will not change and will continue
> > to allocate the fcoe_ctlr itself, and ctlr->cdev will remain NULL.
> >
> > When libfcoe adds fcoe_fcf's to the fcoe_ctlr it will check if
> > ctlr->cdev is set and only if so will it continue to interact with fcoe_sysfs.
> >
> > Cc: Hiral Patel <hiralpat@cisco.com>
> > Signed-off-by: Robert Love <robert.w.love@intel.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> >
> Rob-
>      Whats going on with this patch?  I just noticed its been sitting in patchwork
> for a month, and its still in the new state.  I acked it, but still don't see it in the
> fcoe tree.  Are you still planning on merging it?
> 

I was hoping the fnic maintainers would test it, and send a Tested-by line.
I can't reproduce the problem and I don't know if this is all that is needed to
get fnic working again. Do you think I should move it forward on the strength
of your Ack alone?

If people really want it to move forward without testing, I don't necessarily
have a problem with that either. 

Thanks, //Rob
Hiral Patel (hiralpat) - Oct. 7, 2013, 6:33 p.m.
On 9/5/13 5:45 AM, "Neil Horman" <nhorman@tuxdriver.com> wrote:

>On Wed, Sep 04, 2013 at 04:47:27PM -0700, Robert Love wrote:
>> fnic doesn't use any of the create/destroy/enable/disable interfaces
>> either from the (legacy) module paramaters or the (new) fcoe_sysfs
>> interfaces. When fcoe_sysfs was introduced fnic wasn't changed since
>> it wasn't using the interfaces. libfcoe incorrectly assumed that that
>> all of its users were using fcoe_sysfs and when adding and deleting
>> FCFs would assume the existance of a fcoe_ctlr_device. fnic was not
>> allocating this structure because it doesn't care about the standard
>> user interfaces (fnic starts on link only). If/When libfcoe tried to use
>> the fcoe_ctlr_device's lock for the first time a NULL pointer exception
>> would be triggered.
>> 
>> Since fnic doesn't care about sysfs or user interfaces, the solution
>> is to drop libfcoe's assumption that all drivers are using fcoe_sysfs.
>> 
>> This patch accomplishes this by changing some of the structure
>> relationships.
>> 
>> We need a way to determine when a LLD is using fcoe_sysfs or not and
>> we can do that by checking for the existance of the fcoe_ctlr_device.
>> Prior to this patch, it was assumed that the fcoe_ctlr structure was
>> allocated with the fcoe_ctlr_device and immediately followed it in
>> memory. To reach the fcoe_ctlr_device we would simply go back in memory
>> from the fcoe_ctlr to get the fcoe_ctlr_device.
>> 
>> Since fnic doesn't allocate the fcoe_ctlr_device, we cannot keep that
>> assumption. This patch adds a pointer from the fcoe_ctlr to the
>> fcoe_ctlr_device. For bnx2fc and fcoe we will continue to allocate the
>> two structures together, but then we'll set the ctlr->cdev pointer
>> to point at the fcoe_ctlr_device. fnic will not change and will continue
>> to allocate the fcoe_ctlr itself, and ctlr->cdev will remain NULL.
>> 
>> When libfcoe adds fcoe_fcf's to the fcoe_ctlr it will check if
>>ctlr->cdev
>> is set and only if so will it continue to interact with fcoe_sysfs.
>> 
>> Cc: Hiral Patel <hiralpat@cisco.com>
>> Signed-off-by: Robert Love <robert.w.love@intel.com>
>Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Hiral Patel <hiralpat@cisco.com>

Patch

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index 69ac554..27f2cc4 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -1381,6 +1381,7 @@  struct bnx2fc_interface *bnx2fc_interface_create(struct bnx2fc_hba *hba,
 		return NULL;
 	}
 	ctlr = fcoe_ctlr_device_priv(ctlr_dev);
+	ctlr->cdev = ctlr_dev;
 	interface = fcoe_ctlr_priv(ctlr);
 	dev_hold(netdev);
 	kref_init(&interface->kref);
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 07453bb..e142992 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -408,6 +408,7 @@  static struct fcoe_interface *fcoe_interface_create(struct net_device *netdev,
 	}
 
 	ctlr = fcoe_ctlr_device_priv(ctlr_dev);
+	ctlr->cdev = ctlr_dev;
 	fcoe = fcoe_ctlr_priv(ctlr);
 
 	dev_hold(netdev);
diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c
index 203415e..e230101 100644
--- a/drivers/scsi/fcoe/fcoe_ctlr.c
+++ b/drivers/scsi/fcoe/fcoe_ctlr.c
@@ -160,18 +160,22 @@  void fcoe_ctlr_init(struct fcoe_ctlr *fip, enum fip_state mode)
 }
 EXPORT_SYMBOL(fcoe_ctlr_init);
 
+/**
+ * fcoe_sysfs_fcf_add() - Add a fcoe_fcf{,_device} to a fcoe_ctlr{,_device}
+ * @new: The newly discovered FCF
+ *
+ * Called with fip->ctlr_mutex held
+ */
 static int fcoe_sysfs_fcf_add(struct fcoe_fcf *new)
 {
 	struct fcoe_ctlr *fip = new->fip;
-	struct fcoe_ctlr_device *ctlr_dev = fcoe_ctlr_to_ctlr_dev(fip);
+	struct fcoe_ctlr_device *ctlr_dev;
 	struct fcoe_fcf_device temp, *fcf_dev;
 	int rc = 0;
 
 	LIBFCOE_FIP_DBG(fip, "New FCF fab %16.16llx mac %pM\n",
 			new->fabric_name, new->fcf_mac);
 
-	mutex_lock(&ctlr_dev->lock);
-
 	temp.fabric_name = new->fabric_name;
 	temp.switch_name = new->switch_name;
 	temp.fc_map = new->fc_map;
@@ -181,53 +185,80 @@  static int fcoe_sysfs_fcf_add(struct fcoe_fcf *new)
 	temp.fka_period = new->fka_period;
 	temp.selected = 0; /* default to unselected */
 
-	fcf_dev = fcoe_fcf_device_add(ctlr_dev, &temp);
-	if (unlikely(!fcf_dev)) {
-		rc = -ENOMEM;
-		goto out;
-	}
-
 	/*
-	 * The fcoe_sysfs layer can return a CONNECTED fcf that
-	 * has a priv (fcf was never deleted) or a CONNECTED fcf
-	 * that doesn't have a priv (fcf was deleted). However,
-	 * libfcoe will always delete FCFs before trying to add
-	 * them. This is ensured because both recv_adv and
-	 * age_fcfs are protected by the the fcoe_ctlr's mutex.
-	 * This means that we should never get a FCF with a
-	 * non-NULL priv pointer.
+	 * If ctlr_dev doesn't exist then it means we're a libfcoe user
+	 * who doesn't use fcoe_syfs and didn't allocate a fcoe_ctlr_device.
+	 * fnic would be an example of a driver with this behavior. In this
+	 * case we want to add the fcoe_fcf to the fcoe_ctlr list, but we
+	 * don't want to make sysfs changes.
 	 */
-	BUG_ON(fcf_dev->priv);
+	ctlr_dev = fcoe_ctlr_to_ctlr_dev(fip);
+	if (ctlr_dev) {
+		mutex_lock(&ctlr_dev->lock);
+		fcf_dev = fcoe_fcf_device_add(ctlr_dev, &temp);
+		if (unlikely(!fcf_dev)) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		/*
+		 * The fcoe_sysfs layer can return a CONNECTED fcf that
+		 * has a priv (fcf was never deleted) or a CONNECTED fcf
+		 * that doesn't have a priv (fcf was deleted). However,
+		 * libfcoe will always delete FCFs before trying to add
+		 * them. This is ensured because both recv_adv and
+		 * age_fcfs are protected by the the fcoe_ctlr's mutex.
+		 * This means that we should never get a FCF with a
+		 * non-NULL priv pointer.
+		 */
+		BUG_ON(fcf_dev->priv);
 
-	fcf_dev->priv = new;
-	new->fcf_dev = fcf_dev;
+		fcf_dev->priv = new;
+		new->fcf_dev = fcf_dev;
+		mutex_unlock(&ctlr_dev->lock);
+	}
 
 	list_add(&new->list, &fip->fcfs);
 	fip->fcf_count++;
 
 out:
-	mutex_unlock(&ctlr_dev->lock);
 	return rc;
 }
 
+/**
+ * fcoe_sysfs_fcf_del() - Remove a fcoe_fcf{,_device} to a fcoe_ctlr{,_device}
+ * @new: The FCF to be removed
+ *
+ * Called with fip->ctlr_mutex held
+ */
 static void fcoe_sysfs_fcf_del(struct fcoe_fcf *new)
 {
 	struct fcoe_ctlr *fip = new->fip;
-	struct fcoe_ctlr_device *ctlr_dev = fcoe_ctlr_to_ctlr_dev(fip);
+	struct fcoe_ctlr_device *cdev;
 	struct fcoe_fcf_device *fcf_dev;
 
 	list_del(&new->list);
 	fip->fcf_count--;
 
-	mutex_lock(&ctlr_dev->lock);
-
-	fcf_dev = fcoe_fcf_to_fcf_dev(new);
-	WARN_ON(!fcf_dev);
-	new->fcf_dev = NULL;
-	fcoe_fcf_device_delete(fcf_dev);
-	kfree(new);
-
-	mutex_unlock(&ctlr_dev->lock);
+	/*
+	 * If ctlr_dev doesn't exist then it means we're a libfcoe user
+	 * who doesn't use fcoe_syfs and didn't allocate a fcoe_ctlr_device
+	 * or a fcoe_fcf_device.
+	 *
+	 * fnic would be an example of a driver with this behavior. In this
+	 * case we want to remove the fcoe_fcf from the fcoe_ctlr list (above),
+	 * but we don't want to make sysfs changes.
+	 */
+	cdev = fcoe_ctlr_to_ctlr_dev(fip);
+	if (cdev) {
+		mutex_lock(&cdev->lock);
+		fcf_dev = fcoe_fcf_to_fcf_dev(new);
+		WARN_ON(!fcf_dev);
+		new->fcf_dev = NULL;
+		fcoe_fcf_device_delete(fcf_dev);
+		kfree(new);
+		mutex_unlock(&cdev->lock);
+	}
 }
 
 /**
diff --git a/include/scsi/libfcoe.h b/include/scsi/libfcoe.h
index 4427393..de7e3ee 100644
--- a/include/scsi/libfcoe.h
+++ b/include/scsi/libfcoe.h
@@ -90,6 +90,7 @@  enum fip_state {
  * @lp:		   &fc_lport: libfc local port.
  * @sel_fcf:	   currently selected FCF, or NULL.
  * @fcfs:	   list of discovered FCFs.
+ * @cdev:          (Optional) pointer to sysfs fcoe_ctlr_device.
  * @fcf_count:	   number of discovered FCF entries.
  * @sol_time:	   time when a multicast solicitation was last sent.
  * @sel_time:	   time after which to select an FCF.
@@ -127,6 +128,7 @@  struct fcoe_ctlr {
 	struct fc_lport *lp;
 	struct fcoe_fcf *sel_fcf;
 	struct list_head fcfs;
+	struct fcoe_ctlr_device *cdev;
 	u16 fcf_count;
 	unsigned long sol_time;
 	unsigned long sel_time;
@@ -168,8 +170,11 @@  static inline void *fcoe_ctlr_priv(const struct fcoe_ctlr *ctlr)
 	return (void *)(ctlr + 1);
 }
 
+/*
+ * This assumes that the fcoe_ctlr (x) is allocated with the fcoe_ctlr_device.
+ */
 #define fcoe_ctlr_to_ctlr_dev(x)					\
-	(struct fcoe_ctlr_device *)(((struct fcoe_ctlr_device *)(x)) - 1)
+	(x)->cdev
 
 /**
  * struct fcoe_fcf - Fibre-Channel Forwarder