[tahoe-dev] maximum file sizes: 2GiB with helper, 12GiB without

zooko zooko at zooko.com
Tue Mar 11 18:31:47 UTC 2008


Brian:

Thanks for figuring out the file size limitations.

It's too bad that it isn't "basically unlimited", the way you and I  
both thought that it was.

In the future, let's always use 8-byte lengths/counts instead of 4- 
byte, unless we are very, very sure that nobody could ever want more  
items of that type, *and* saving 4 bytes in that data structure is a  
significant efficiency improvement.

I'm re-arranging your quoted letter in order to respond to the most  
urgent point first:


On Mar 11, 2008, at 2:41 AM, Brian Warner wrote, in some order:

> The storage server's share format (with the 4-byte share size) is a  
> tougher
> limit to raise, since it requires a new version number for the  
> sharefiles
> (and backwards-compatibility code).

Hm.  As you mention, the backwards compatibility issues on foolscap  
protocol are not too troublesome -- if we simply replace the "int"  
constraint with IntegerConstraint(8), then old servers and old  
clients will continue to function normally.  (Thank you, duck typing.)

Also, the backwards compatbility issues on share files are not too  
troublesome, because we do not need for storage servers running code  
< 0.9.0 to read in share files produced by storage servers running  
code >= 0.9.0.

So we could, today, change the code in storage.py [1] which detects  
the share version number when a share file is read in, and if it is  
version 2, then it treats the share data length as 8 bytes instead of 4.

At the same time, we can change storage.py to write out shares in the  
new version 2 format.

If we did this, the effective limit on file size that can be uploaded  
with v0.9.0 would go from 2 GiB to 8.8 TB (with helper), or from 12  
GiB to 8.8 TB (without helper).

Doing this today would not cause any further backwards compatibility  
issues, other than that you cannot move share files from a >= 0.9.0  
storage server to a < 0.9.0 server, nor replace a running >= 0.9.0  
server with a < 0.9.0 server.


> The hash trees that we use to validate the contents of immutable files
> require filesize/segsize * 2 * 32 bytes each, and they use 4-byte  
> offsets. So
> this imposes a limit of 67M segments, which even for the new+smaller
> 128KiB-sized segments is only an 8.8TB limit.

Eventually we should change these 4-byte offsets to 8-byte offsets,  
too, but this is a bigger backwards compatibility issue, because  
downloaders of immutable files have to know what to do with the new  
hash trees.  It is also less urgent, because most users will not need  
to upload files of greater than 8.8 TB for now.

We could, for example, make the v0.10.0 release of allmydata.org  
"Tahoe" Don't-Call-It-Laugfs accept hash trees with 8-byte offsets  
while still producing hash trees with 4-byte offsets.


Regards,

Zooko




More information about the tahoe-dev mailing list