[nfsv4] draft notes from pNFS meeting held in Vancouver
Mike Eisler <email2mre-ietf@yahoo.com> Wed, 05 December 2007 15:55 UTC
Return-path: <nfsv4-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izwb5-0005oH-49; Wed, 05 Dec 2007 10:55:39 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Izwb3-0005o8-Ev for nfsv4@ietf.org; Wed, 05 Dec 2007 10:55:37 -0500
Received: from web38101.mail.mud.yahoo.com ([209.191.124.128]) by ietf-mx.ietf.org with smtp (Exim 4.43) id 1Izwb1-00069O-2K for nfsv4@ietf.org; Wed, 05 Dec 2007 10:55:37 -0500
Received: (qmail 79611 invoked by uid 60001); 5 Dec 2007 15:55:34 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=sFXAbwLspGCQFIdu4LlXR0qhkWdFR+fEZLSnbze+yACGOoutdE71NbiuzUR5CKsfUI1J3qK3LtXfEWWwa41ttma7hOKzp2vQKBnivGn3G7oMUD1MFWXCWOqiqJUxKYSFC9TfntSXpP45GoTHhLv7T5LUypU931/kLYj6ggEr+fc=;
X-YMail-OSG: SMVxJr4VM1lVeaoe.mrpAhrBstboxtOoBeYSmvd5wlCshCj_2bIgmmZO8ysF_jyYaqzA70kTCw--
Received: from [198.95.226.230] by web38101.mail.mud.yahoo.com via HTTP; Wed, 05 Dec 2007 07:55:34 PST
Date: Wed, 05 Dec 2007 07:55:34 -0800
From: Mike Eisler <email2mre-ietf@yahoo.com>
To: nfsv4@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
Message-ID: <589343.79277.qm@web38101.mail.mud.yahoo.com>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 958aa603499a3de6b2b87d68741ed60e
Subject: [nfsv4] draft notes from pNFS meeting held in Vancouver
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: email2mre-ietf@yahoo.com
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
Errors-To: nfsv4-bounces@ietf.org
(these notes are draft. please send me any corrections) Date of Meeting: December 4, 2007 Where: Vancouver Purpose: to make headway on pile of pNFS issues Attendees - Andy Adamson - CITI - Mike Eisler - NetApp - Sam Falkner - Sun - Jason Glasgow - EMC - Garth Goodson - NetApp - Robert Gordon - Sun - Suchit Kaura - Trond Myklebust - NetApp - Anatoly Pinchuk - Spencer Shepler - Sun - Renu Tewari -- IBM - Brent Welch - Panasas Phone - Marc Eschel - IBM - Benny Halevy - Panasas - Dean Hildebrand - IBM - Rahul Iyer - NetApp - Lisa Week - Sun Major Issues - Recall processing and referential items to fore operations -- refers to language in pNFS and sessions about how callbacks refer to fore operations - Sessions major/minor and impact on pNFS - persistent sessions usage (required or not) - layout hint usage (OPEN and/or SETATTR) - Overall model of device map stateids (one per major device) needs to be explained (dependence on GETDEVICELIST) -- filehandle to be used for recall -- Recall by FSID, ALL, etc. - Layout stateids (one per clientID/layout-type/filehandle) needs description - Relationship of open mode and mode of layout needs description in LAYOUTGET - Layout ranges and accounting: strict or forgetful -- Layout return not being complete until all the range is returned - Layout return not being complete until all of the range is returned - LAYOUTCOMMIT and access-time - GETDEVICELIST and device map stateids - Obtaining the device list for a major id - sparse versus dense layouts - granularity of device id for the file layouts - layoutcommit vs. commit ------------------------------------------------------------ Decided to deal with device ID issues first. Spencer: issues with device ID mapping are around recovery in the event of server restart. How do clients and servers discover inconsistencies. General idea was to uses stateids, leases, etc. to deal with issues. For layouts, this is solved, but for deviceIDs, the issues was how to recall individual deviceID mappings without recalling layouts. We also changed devices to 64 bits (from 32 bits). Also wanted to deal with recalling deviceIDs associated with a particular fsid without perturbing the layouts. This led to major and minor devices. The server could align major deviceIDs to fsids to get the desired effect. We are now looking at 128 bit deviceIDs. Tom: If we have something as wide as a uuid why not call it a uuid? Andy: Don't we have to fence off I/O when recalling device IDs. If an I/O is in progress to one data server address, and I change the device ID to another data server, how is this different from layout recall. Andy: Who wants this feature, blocks and objects don't need it, files don't need it? Brent: if you have a model where device ID is 1-1 to the layout, then the device ID recall is superfluous. Jason: If the IP address changes, you should be able to recall the device ID without the layout. (Editor: this was the ah hah moment). Jason: The clearest thing is to support the recall of a device ID. Brent: We would like to be able recall device ID to handle the IP address case. Brent: Do we need major/minor device IDs? Garth: I see the need to recall major device IDs to deal with case where all dev mappings for an fsid are affected. Mike: We have two methods for breaking device ID to device address mappings. Brent: Notify vs. recall: Notify is for optimizing GETATTRs in the directory case. Recalling deviceIDs should rare Mike: My rationale for notifications was to avoid having zillions of device IDs. Suchit: the most common use case is to update a device address for device ID. An update notification is the least intrusive. Trond: stateids only make sense if you are going to recall by stateid. Sam: Would it make sense to state that recall of layout by fsid does not invalidate deviceIDs. Garth/Sam: What does a notify delete callback response mean: The client got the notify, or it deleted the mapping. Jason: We need to very clear: does deleting device ID mapping mean the client's I/O is drained? Trond: You should be able to fence the client. Andy: Here is the client sending I/O to a device. Gets device ID delete notify, the client's data falls on the floor. Brent: It's actually a benefit, because if the mapping is deleted, the client gets told that writing to the device address is futile. Garth: If you get a recall of a device ID, the server shout allow the I/O to be flushed complete until the recall completes. Trond: But it should be allowed to have a server fence the client. Spencer: Recall processing. There is fundamental issue how long a client is allowed to process the callback? In NFSv4.0, the client could acknowledge the recall, and then do delegreturn. In NFsv4.1 we seem to have added the notion that a client gets a device ID recall, and the client can wait for I/O flush before acknowledging the recall. Mike: If deviceIDs are delegated, then we cannot have a different model for device IDs than for file object recalls. Spencer: But layouts don't behave delegations Mike: Right, because they aren't delegations. I can get behind making them behave the same, but for most non-file (objects excepted) layout types, making them equivalent doesn't work (because of the fencing issue; most native data access protocols don't use NFSv4.1 stateids in the I/O operation). Spencer: Note: we don't use layout stateid on READ/WRITE Andy: What is a layout stateid used for? Spencer: It is used to keep track of the set of layouts for a file (stateids have seqids) Jason: If client sent a LAYOUTGET/LAYOUTRETURN at the same time, won't the sequence id be wrong? Spencer: In that case, the client set the seqid to zero, to indicate it doesn't care about sequencing. Trond: What does it mean when the current stateid is used? What does that do the seqid? Spencer: Need to clarify. If is undefined what the seqid is. Spencer: If we use recall to recall a device ID delegation, then we need an fh for every device ID Mike: This is why I wanted notifications for the entire device map versus a delegation on a device ID. Jason: How about: server should do CB_NOTIFY remove device ID. Client immediately responds. When client has drained I/O, it does a delegreturn. Sam: If server knows no client is using device ID 1, what does the server do with that knowledge? Mike/Trond: so add a generation number to device ID. Marc: At 128 bit device IDs, there is room for generation number. Spencer: We need to establish requirements now and then move forward. Requirements: - large opaque device IDs to accommodate generation IDs - want to allow the server to stop using a data server and switch to another - recalling layouts to change device ID is too burdensome -- because shared deviceIDs result in zillions of layoutgets - The spec should not mandate that when changing the deviceID mapping the client MUST drain I/O. -- Server gives client an indication whether old mapping exists for a while or is gone. Regardless, the indication of mapping change says that the client needs to stop sending new I/Os. -- spec should allow server, if it wants, to preserve the writes to the old data server - Should allow device ID to device address mapping to change, and device IDs to be re-used without recalling layouts because file layouts use a device ID - recall layouts by fsid/clientID. Previously the notion was to use major deviceIDs to abstract this concept but as we've seen that introduced much complexity. Recall layout by fsid is seen as the lesser evil. - recall layouts by clientID (ALL) required. - if you delete a device ID, you have to recall the layouts that use the device ID (also must be addressed in IANA requirements section for new layout types. Not all layout types encode deviceIDs the same way) - we will delete one deviceID at a time -- no need for major/minor device IDs -- no stateids for device IDs (document GETDEVICEINFO / CB_NOTIFY race and how the client deals with it). - new notification op for deviceID changes (not CB_NOTIFY). This eliminates need for filehandle and stateid. - deviceID pertains to a clientID not an fsid. - no stateids needed for entire deviceID list. ------------------------------------------------------------ Strict and/or forgetful layout ranges Spencer: Client can forget it has a layout on the range. Spencer: recommendation: client can give up accounting for ranges. Spencer: client can either return to LAYOUTREECALL: "I don't have that range", or issue LAYOUTRETURN on the range. Brent: client still has to track seqid field in layout stateid Mike: that means an error "I don't have that range" might cause the layout stateid seqid to be out of sync. Consensus: client must do an explicit LAYOUTRECALL ------------------------------------------------------------ Sparse and Dense Garth - Needs to clear that that if two layout ranges on a file have two different device IDs, the ranges are not mergeable. Trond: If you can't merge layout ranges, you can't append (mre: huh?) Brent: 3 choices - when you add devices to your device set, you have to recall the layout - fix the math with dense - throw out dense Mike: Having different striping patterns for different ranges of a file is an absolute requirement. Cannot require the entire file to be re-striped when new devices come online in order to benefit from those devices. Brent: I don't really have a stake in this, since it is a file layout issue. Mike: What happens if Panasas adds an NFSv4.1 wrapper to its data server> Brent: OK, in that case, since objects uses dense, it would be much better if I give a file and object layout that they both use the same packing. So I would prefer DENSE be supported. Rough consensus is that DENSE is needed. Mike: A simple proposal is desired. The proposal Anatoly emailed to the nfsv4@ietf.org on November 28, 2007 looked complex. Anatoly/Suchit: We will produce a proposal. ------------------------------------------------------------ Persistent Sessions and Layout Hints Spencer: current text requires persistent sessions. Spencer: wording need to change, and justify why persistent sessions. Jason: This sentence: Once the client has a client ID that supports pNFS, it creates a persistent session over the client ID, requesting persistent. seemed to come out of no where. Rough consensus: add an arm to the OPEN arguments and results "EXCL2" that encodes a bitmap of attributes that can be specified in exclusive create. Would include ACL and layout hint. (mre: I realized over night that we could add a recommended, read-only attribute that is the bitmap of attributes permitted to be specified in the new EXCL2 arm). ------------------------------------------------------------ - Recall processing and referential items to fore operations Mike: Asserts the seqid in the layout stateid and the removal of device ID stateids makes this moot. Mike: need text to understand how the client deals with the race. Tom: What if the server loses its mind, the client has to cache some bogus, temporal state that the server. Mike: The slot table on the fore channel has N slots. The client has at most N layoutgets outstanding. When the server sends N+1 recalls that don't match any stateids, the client knows the server is insane, and destroys the session and client ID. Spencer: This issues applies to delegations. Mike: Need common text in spec. ------------------------------------------------------------ - LAYOUTCOMMIT and access-time Spencer/Brent: this is in place Spencer: will verify text. Mike: found this text in the i-d: "The loca_time_modify and loca_time_access" ------------------------------------------------------------ - layoutcommit vs. commit Mike: asserts that until layoutcommit succeeds, the data is not committed. Tom: what about a client that writes beyond its stripe? Mike: need text to describe that NFSv4.1 layout type must prevent this. Issue tabled for now. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4
- [nfsv4] draft notes from pNFS meeting held in Van… Mike Eisler
- Re: [nfsv4] draft notes from pNFS meeting held in… Mike Eisler
- Re: [nfsv4] draft notes from pNFS meeting held in… Garth Goodson
- Re: [nfsv4] draft notes from pNFS meeting held in… Suchit Kaura
- Re: [nfsv4] draft notes from pNFS meeting held in… Mike Eisler
- Re: [nfsv4] draft notes from pNFS meeting held in… Garth Goodson
- Re: [nfsv4] draft notes from pNFS meeting held in… Anatoly Pinchuk
- [nfsv4] Final notes from pNFS meeting held in Van… Mike Eisler
- [nfsv4] Persistent Sessions and Layout Hints (was… Mike Eisler
- Re: [nfsv4] Persistent Sessions and Layout Hints Benny Halevy
- [nfsv4] device ID issues (was Final notes from pN… Mike Eisler
- Re: [nfsv4] device ID issues Mike Eisler
- Re: [nfsv4] device ID issues Benny Halevy
- Re: [nfsv4] device ID issues Mike Eisler
- Re: [nfsv4] device ID issues Benny Halevy
- Layout races (was Re: [nfsv4] Final notes from pN… Mike Eisler
- Re: Layout races (was Re: [nfsv4] Final notes fro… Benny Halevy
- Re: Layout races (was Re: [nfsv4] Final notes fro… Mike Eisler
- Re: Layout races (was Re: [nfsv4] Final notes fro… Benny Halevy
- Re: Layout races (was Re: [nfsv4] Final notes fro… Mike Eisler
- Re: Layout races (was Re: [nfsv4] Final notes fro… Benny Halevy
- Re: Layout races (was Re: [nfsv4] Final notes fro… Mike Eisler
- RE: Layout races (was Re: [nfsv4] Final notes fro… Glasgow_Jason
- RE: Layout races (was Re: [nfsv4] Final notes fro… Mike Eisler
- Re: Layout races (was Re: [nfsv4] Final notes fro… Benny Halevy
- Re: Layout races (was Re: [nfsv4] Final notes fro… Mike Eisler
- Re: Layout races (was Re: [nfsv4] Final notes fro… Benny Halevy
- [nfsv4] Functional Changes. Robert Gordon