[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nfsv4] question about pNFS.



On Jun. 29, 2008, 17:32 +0300, "???" <lonat.front at gmail.com> wrote:
> 
> 
> 2008/6/29, Benny Halevy <bhalevy at panasas.com <mailto:bhalevy at panasas.com>>:
> 
>     Yi,
> 
>     >The server strategy for granting overlapping write layouts
>     >and preventing corruption due to concurrent writing of the
>     >file depends on the layout type and other parameters (e.g.
>     >objects RAID level for pnfs-obj).
> 
>     >For example with the files layout or with a striped (RAID-0)
>     >file over the objects layout, if the clients write *only*
>     >the application buffers, i.e. they do not read around that
>     >to pad and align to page/buffer size, then the client should
>     >not step on each other's data, since writing to any data server
>     >in this case is no different than writing to the metadata
>     >server.
> 
> 
>  
> 
>     Other cases require serialization of writes that is out of
>     scope for the client.  For example, with objects RAID, the
>     whole stripe needs to be locked while writing to it,
>     therefore the server will not grant more than one outstanding
>     layout for any specific stripe.
> 
>  
>   as to the lock of whole stripe, do you means the RAID-5,6...(all RAID
> with parity, and client must cach data to compute parity), but
> RAID-0,1... ? i do not think RAID-0,1 need to lock whole stripe..

Indeed, RAID-0 (which isn't really RAID as for redundancy...) need no
locking. RAID-1, in this particular case, where the clients write into
disjoint areas need no locking, however once the server has given out
the layouts it has no guarantee that the client writes won't overlap.

>  
> 
>  
> 
>     Another example could be the block layout, for which the server
>     provisionally allocates blocks for newly written data.
>     In this case, false sharing may result in data corruption if
>     two clients are given concurrently write layouts overlapping
>     a block, if the server is unable to perform LAYOUTCOMMIT
>     for extents that are not block aligned.
> 
>  
>   while doing "pre-allocate"(do not only one block be allocated), i
> think if the write layouts overlapping a block is granted to two
> clients, and the server can perform LAYOUTCOMMIT, the data may be
> corruption. for example, A CLIENT allocate new blocks (file logic
> address 1M-2M), and B CLIENT get the layout of 1M-2M (of course, both
> clients doing "write only" or "wirte_read" file), and the A CLIENT
> commit the file of size 1.5M, the B CLIENT will write the range of
> 1.5M-2M range of the file but at this time these block may be allocated
> to other files.
> so i think the layout should be granted to one "writing" client at any time.

That's definitely true to be on the safe side.  I think there could be
one exception, as I mentioned above, if the server supports committing
layouts with single byte granularity it can, in theory, correctly merge
LAYOUTCOMMITs that overlap a block by copying data out of provisionally
allocated blocks rather than committing then into the file, if the respective
byte-ranges are already allocated to the file and the client LAYOUTCOMMITs
only a part of the block.

> 
>  
> 
>     Benny
> 

_______________________________________________
nfsv4 mailing list
nfsv4 at ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4