[tahoe-dev] zfec right for me?

Mon Apr 11 18:12:19 UTC 2011

My memory is hazy about all this, but I think my comment there was
with regards to the C zfec library, vs the Python wrapper library
(which offers more functionality).

I did know that zfec does have a header (there is an example encoder
that produces zfec-compatible output in the fecpp sources), but the
problems (as I perceive them) with the header were that the padding
bytes had to be known prior to writing the output (ie, you need to
either buffer the entire dataset, or use a seekable medium), you can't
modifiy the block size, and the bit-packing scheme used to encode k/n
is absurdly complicated for something that will only save 2 bytes.

I've attached the format spec as I wrote it circa 2009. Some of this
may state things which are patently untrue about zfec-in-2009 and/or
zfec-in-2011; if you spot such falsehoods please point them out. It's
also never been implemented, so it may be proposing things which
are foolish, impossible, or both.

-Jack

On Mon, Apr 11, 2011 at 09:52:55AM -0600, Zooko O'Whielacronx wrote:
> Reading back through the last year's mailing list archive in search of
> Kyle Markley's benchmark reports and commentary thereon, I happened to
> see this letter:
> 
> http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004688.html
> 
> I'm not sure, but perhaps Jack is overlooking an under-documented
> feature of zfec. Zfec does offer a header which contains K, N (which
> in zfec code and docs is named "M"), how many bytes of 0-padding are
> needed, and the number of the share:
> 
> http://tahoe-lafs.org/trac/zfec/browser/trunk/zfec/zfec/filefec.py?annotate=blame&rev=363#L35
> 
> Tahoe-LAFS does not use this feature since that information is already
> present elsewhere in the Tahoe-LAFS data structures.
> 
> Zfec does not have an implementation of identifiable magic numbers nor
> per-share checksums. That would be useful. Perhaps a good way to get
> that would be to combine zfec with Ludovic Courtès's libchop [1],
> which has checksums (including Merkle Trees) but doesn't have erasure
> coding.
> 
> Jack: want to share that draft of the file format that you wrote up?
> 
> Regards,
> 
> Zooko
> 
> [1] http://www.nongnu.org/libchop
-------------- next part --------------

Proposal for a standardized FEC format, with partial justifications why
the current zfec format isn't cutting it.

Header
---------------

- 4 bytes: magic number

A magic number is useful for automated identification by applications.
A canonical example would be file(1). Without such a magic number an
application has to rely on other information (file extensions,
user-provided information, blind guessing) to figure out how to
process the file.

The value I initially preferred was 0xFE 0xCC 0x0D 0xEC, which is the
magic number used internally by Rizzo's FEC. Unfortunately this
sequence is also a legal zfec header, which would mean it cannot be
unambiguously parsed (though the use of n=255 is probably not high),
assuming you are decoding zfec data. So instead we reverse the bytes
(0xEC, 0x0D, 0xCC, 0xFE) which is not a legal zfec header, because as
zfec would interpret this, the share number (0xFE) is greater than N
(0xEC), so it "must" be a FEC file.

- 2 bytes: chunk size in units of 1024 bytes (big endian)

Currently zfec's encoding works splitting the file into chunks, K*4096
bytes. This cannot be changed in the zfec format because there is no
info in the header about it. In the future (or even now) (depending on
characteristics of CPU, caches, memory, disks, and operating systems)
it might be much faster to encode 16K, 32K or even larger chunks at a
time, so why not let the encoder choose (if it wants).

Supporting sub-kilobyte values does not seem particularly useful, and
expressing it in 1 KiB units allows up to 64 MiB chunks, which seems
more than sufficient.

- 1 byte: N-1
- 1 byte: K-1
- 1 byte: share #

Why use full byte for N, K, share#? My feeling is the bit packing
scheme used by zfec is not worth it: saves only 2 bytes (at most) and
makes it much harder/more bug prone to write and parse the header.

- 1 byte: hash id

0: no hash
1: SHA-1
2: SHA-256
3: SHA-256d [SHA-256d(m) == SHA-256(SHA-256(m)]

Allow error checking of recovered inputs

Semantics of this are currently undefined.

- 2 bytes: reserved (0)

Round the header to 12 bytes, and leave some room for expansion.

If necessary this could be treated as a big-endian length field, with
N following bytes being whatever extra information was needed.

Trailing Padding
----------

zfec handles data that isn't a multiple of k bytes by adding
sufficient number of 0 bytes. The number of bytes added is
included in header.

Problem: this requires knowing the length of the input prior to
writing out the header. For files, this can be done (stat the file
ahead of time) but it seems pretty inelegant. Also it doesn't deal at
all well with streaming inputs.

Instead, prior to encoding, form padding like so:

0x80 0x00* <HASH> <LENGTH>

where <LENGTH> is a 2 byte big endian integer specifying the number of
padding bytes.

<HASH> is the hash of the entire plaintext, using whatever algorithm
was specified in the header (if any).

The minimum number of 0x00 bytes are used to pad the total (input +
0x80 + hash + 2 byte len + zeros) to a multiple of k bytes.

The decoder then decodes the entire trailing block, and, knowing that
it is looking at the final block, strip off the trailing bytes, saving
the hash to compare with the final value.

Other thoughts on hashing
-------------------

Doing a straight end-to-end hash of the plaintext would require
processing the blocks in-order when decoding, which would require
saving copies instead of emitting them and forgetting about them as
soon as we find them.

Instead, perhaps, a construction like this:

H_i = H(share_i)

Final H = H(H_1 || H_2 || ... || H_n)

Which would require saving only the hash values instead of the entire
input.

This also offers reasonable parallelism opportunities, especially
since the inputs (except possibly the last block) will all be the same
size, so you can relatively easily do 4-way hashing on SIMD
processors.

As always, if hash_id=0 (no hash) is selected, this is all a no-op and
the hash value placed in the trailer is empty.