[nfsv4] draft notes from pNFS meeting held in Vancouver

Mike Eisler <email2mre-ietf@yahoo.com> Wed, 05 December 2007 15:55 UTC

Return-path: <nfsv4-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izwb5-0005oH-49; Wed, 05 Dec 2007 10:55:39 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izwb3-0005o8-Ev for nfsv4@ietf.org; Wed, 05 Dec 2007 10:55:37 -0500
Received: from web38101.mail.mud.yahoo.com ([209.191.124.128]) by ietf-mx.ietf.org with smtp (Exim 4.43) id 1Izwb1-00069O-2K for nfsv4@ietf.org; Wed, 05 Dec 2007 10:55:37 -0500
Received: (qmail 79611 invoked by uid 60001); 5 Dec 2007 15:55:34 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=sFXAbwLspGCQFIdu4LlXR0qhkWdFR+fEZLSnbze+yACGOoutdE71NbiuzUR5CKsfUI1J3qK3LtXfEWWwa41ttma7hOKzp2vQKBnivGn3G7oMUD1MFWXCWOqiqJUxKYSFC9TfntSXpP45GoTHhLv7T5LUypU931/kLYj6ggEr+fc=;
X-YMail-OSG: SMVxJr4VM1lVeaoe.mrpAhrBstboxtOoBeYSmvd5wlCshCj_2bIgmmZO8ysF_jyYaqzA70kTCw--
Received: from [198.95.226.230] by web38101.mail.mud.yahoo.com via HTTP; Wed, 05 Dec 2007 07:55:34 PST
Date: Wed, 05 Dec 2007 07:55:34 -0800
From: Mike Eisler <email2mre-ietf@yahoo.com>
To: nfsv4@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
Message-ID: <589343.79277.qm@web38101.mail.mud.yahoo.com>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 958aa603499a3de6b2b87d68741ed60e
Subject: [nfsv4] draft notes from pNFS meeting held in Vancouver
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: email2mre-ietf@yahoo.com
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
Errors-To: nfsv4-bounces@ietf.org

(these notes are draft. please send me any corrections)

Date of Meeting: December 4, 2007
Where: Vancouver
Purpose: to make headway on pile of pNFS issues

Attendees

- Andy Adamson - CITI
- Mike Eisler - NetApp
- Sam Falkner - Sun
- Jason Glasgow - EMC
- Garth Goodson - NetApp
- Robert Gordon - Sun
- Suchit Kaura
- Trond Myklebust - NetApp
- Anatoly Pinchuk
- Spencer Shepler - Sun
- Renu Tewari -- IBM
- Brent Welch - Panasas

Phone

- Marc Eschel - IBM
- Benny Halevy - Panasas
- Dean Hildebrand - IBM
- Rahul Iyer - NetApp
- Lisa Week - Sun

Major Issues

- Recall processing and referential items to fore operations
-- refers to language in pNFS and sessions about how
   callbacks refer to fore operations

- Sessions major/minor and impact on pNFS
- persistent sessions usage (required or not)
- layout hint usage (OPEN and/or SETATTR)
- Overall model of device map stateids (one per major
  device) needs to be explained (dependence on GETDEVICELIST)

-- filehandle to be used for recall
-- Recall by FSID, ALL, etc.

- Layout stateids (one per clientID/layout-type/filehandle) needs
  description

- Relationship of open mode and mode of layout needs
  description in LAYOUTGET

- Layout ranges and accounting: strict or forgetful
-- Layout return not being complete until all the range
   is returned

- Layout return not being complete until all of the range
  is returned

- LAYOUTCOMMIT and access-time
- GETDEVICELIST and device map stateids
- Obtaining the device list for a major id
- sparse versus dense layouts
- granularity of device id for the file layouts
- layoutcommit vs. commit

------------------------------------------------------------

Decided to deal with device ID issues first.

Spencer: issues with device ID mapping are around recovery
in the event of server restart. How do clients and
servers discover inconsistencies. General idea was to uses
stateids, leases, etc. to deal with issues. For layouts,
this is solved, but for deviceIDs, the issues was how to
recall individual deviceID mappings without recalling
layouts.  We also changed devices to 64 bits (from 32
bits). Also wanted to deal with recalling deviceIDs
associated with a particular fsid without perturbing the
layouts. This led to major and minor devices. The server
could align major deviceIDs to fsids to get the desired
effect. We are now looking at 128 bit deviceIDs.

Tom: If we have something as wide as a uuid why not call
it a uuid?

Andy: Don't we have to fence off I/O when recalling device
IDs. If an I/O is in progress to one data server address,
and I change the device ID to another data server, how is
this different from layout recall.

Andy: Who wants this feature, blocks and objects don't
need it, files don't need it?

Brent: if you have a model where device ID is 1-1 to the
layout, then the device ID recall is superfluous.

Jason: If the IP address changes, you should be able to
recall the device ID without the layout.

(Editor: this was the ah hah moment).

Jason: The clearest thing is to support the recall of a
device ID.

Brent: We would like to be able recall device ID to handle
the IP address case.

Brent: Do we need major/minor device IDs?

Garth: I see the need to recall major device IDs to deal
with case where all dev mappings for an fsid are affected.

Mike: We have two methods for breaking device ID to device
address mappings.

Brent: Notify vs. recall: Notify is for optimizing GETATTRs
in the directory case.  Recalling deviceIDs should rare

Mike: My rationale for notifications was to avoid having
zillions of device IDs.

Suchit: the most common use case is to update a device
address for device ID. An update notification is the
least intrusive.

Trond: stateids only make sense if you are going to recall
by stateid.

Sam: Would it make sense to state that recall of layout
by fsid does not invalidate deviceIDs.

Garth/Sam: What does a notify delete callback response
mean: The client got the notify, or it deleted the
mapping.

Jason: We need to very clear: does deleting device ID
mapping mean the client's I/O is drained?

Trond: You should be able to fence the client.

Andy: Here is the client sending I/O to a device. Gets
device ID delete notify, the client's data falls on
the floor.

Brent: It's actually a benefit, because if the mapping is
deleted, the client gets told that writing to the device
address is futile.

Garth: If you get a recall of a device ID, the server
shout allow the I/O to be flushed complete until the recall
completes.

Trond: But it should be allowed to have a server fence
the client.

Spencer: Recall processing. There is fundamental issue
how long a client is allowed to process the callback? In
NFSv4.0, the client could acknowledge the recall, and then do
delegreturn. In NFsv4.1 we seem to have added the notion
that a client gets a device ID recall, and the client can
wait for I/O flush before acknowledging the recall.

Mike: If deviceIDs are delegated, then we cannot have
a different model for device IDs than for file object
recalls.

Spencer: But layouts don't behave delegations

Mike: Right, because they aren't delegations. I can get
behind making them behave the same, but for most non-file
(objects excepted) layout types, making them equivalent
doesn't work (because of the fencing issue; most native
data access protocols don't use NFSv4.1 stateids in the
I/O operation).

Spencer: Note: we don't use layout stateid on READ/WRITE

Andy: What is a layout stateid used for?

Spencer: It is used to keep track of the set of layouts
for a file (stateids have seqids)

Jason: If client sent a LAYOUTGET/LAYOUTRETURN at the same
time, won't the sequence id be wrong?

Spencer: In that case, the client set the seqid to zero,
to indicate it doesn't care about sequencing.

Trond: What does it mean when the current stateid is
used? What does that do the seqid?

Spencer: Need to clarify. If is undefined what the
seqid is.

Spencer: If we use recall to recall a device ID delegation,
then we need an fh for every device ID

Mike: This is why I wanted notifications for the entire
device map versus a delegation on a device ID.

Jason: How about: server should do CB_NOTIFY remove device
ID. Client immediately responds. When client has drained
I/O, it does a delegreturn.

Sam: If server knows no client is using device ID 1,
what does the server do with that knowledge?

Mike/Trond: so add a generation number to device ID.

Marc: At 128 bit device IDs, there is room for generation
number.

Spencer: We need to establish requirements now and then
move forward.

Requirements:

- large opaque device IDs to accommodate generation IDs

- want to allow the server to stop using a data server
  and switch to another

- recalling layouts to change device ID is too burdensome

-- because shared deviceIDs result in zillions of
   layoutgets

- The spec should not mandate that when
  changing the deviceID mapping the client MUST drain I/O.

-- Server gives client an indication whether old mapping
   exists for a while or is gone. Regardless, the indication
   of mapping change says that the client needs to stop
   sending new I/Os.

-- spec should allow server, if it wants, to preserve the
   writes to the old data server

- Should allow device ID to device address mapping to
  change, and device IDs to be re-used without recalling
  layouts because file layouts use a device ID

- recall layouts by fsid/clientID. Previously the notion
  was to use major deviceIDs to abstract this concept but
  as we've seen that introduced much complexity. Recall
  layout by fsid is seen as the lesser evil.

- recall layouts by clientID (ALL) required.

- if you delete a device ID, you have to recall the layouts
  that use the device ID (also must be addressed in IANA
  requirements section for new layout types. Not all layout
  types encode deviceIDs the same way)

- we will delete one deviceID at a time
-- no need for major/minor device IDs
-- no stateids for device IDs

(document GETDEVICEINFO / CB_NOTIFY race and how the client
deals with it).

- new notification op for deviceID changes (not CB_NOTIFY). This
  eliminates need for filehandle and stateid.

- deviceID pertains to a clientID not an fsid.
- no stateids needed for entire deviceID list.


------------------------------------------------------------

Strict and/or forgetful layout ranges

Spencer: Client can forget it has a layout on the range.

Spencer: recommendation: client can give up accounting for ranges.

Spencer: client can either return  to LAYOUTREECALL: "I
don't have that range", or issue LAYOUTRETURN on the range.

Brent: client still has to track seqid field in layout
stateid

Mike: that means an error "I don't have that range" might
cause the layout stateid seqid to be out of sync.

Consensus: client must do an explicit LAYOUTRECALL

------------------------------------------------------------

Sparse and Dense

Garth - Needs to clear that that if two layout ranges
on a file have two different device IDs, the ranges are
not mergeable.

Trond: If you can't merge layout ranges, you can't append
(mre: huh?)

Brent: 3 choices
  - when you add devices to your device set, you have to
    recall the layout
  - fix the math with dense
  - throw out dense

Mike: Having different striping patterns for different
ranges of a file is an absolute requirement. Cannot require
the entire file to be re-striped when new devices come
online in order to benefit from those devices.

Brent: I don't really have a stake in this, since it is a file
layout issue.

Mike: What happens if Panasas adds an NFSv4.1 wrapper to its data
server>

Brent: OK, in that case, since objects uses dense, it
would be much better if I give a file and object layout
that they both use the same packing. So I would prefer
DENSE be supported.

Rough consensus is that DENSE is needed.

Mike: A simple proposal is desired. The proposal Anatoly
emailed to the nfsv4@ietf.org on November 28, 2007 looked
complex.

Anatoly/Suchit: We will produce a proposal.


------------------------------------------------------------

Persistent Sessions and Layout Hints

Spencer: current text requires persistent sessions.

Spencer: wording need to change, and justify why persistent
sessions.

Jason: This sentence:

  Once the client has a client ID that supports pNFS,
  it creates a persistent session over the client ID,
  requesting persistent.

seemed to come out of no where.

Rough consensus: add an arm to the OPEN arguments and
results "EXCL2" that encodes a bitmap of attributes that
can be specified in exclusive create. Would include ACL
and layout hint.

(mre: I realized over night that we could add a
recommended, read-only attribute that is the bitmap of
attributes permitted to be specified in the new EXCL2 arm).

------------------------------------------------------------


- Recall processing and referential items to fore operations


Mike: Asserts the seqid in the layout stateid and the
removal of device ID stateids makes this moot.

Mike: need text to understand how the client deals with
the race.

Tom: What if the server loses its mind, the client has to
cache some bogus, temporal state that the server.

Mike: The slot table on the fore channel has N slots. The
client has at most N layoutgets outstanding. When the
server sends N+1 recalls that don't match any stateids,
the client knows the server is insane, and destroys the
session and client ID.

Spencer: This issues applies to delegations.

Mike: Need common text in spec.

------------------------------------------------------------


- LAYOUTCOMMIT and access-time

Spencer/Brent: this is in place

Spencer: will verify text.

Mike: found this text in the i-d:

   "The loca_time_modify and loca_time_access"


------------------------------------------------------------


   - layoutcommit vs. commit

Mike: asserts that until layoutcommit succeeds, the data
is not committed.

Tom: what about a client that writes beyond its stripe?

Mike: need text to describe that NFSv4.1 layout type must
prevent this.

Issue tabled for now.





_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4