[tahoe-dev] Tahoe on large filesystems

Jan-Benedict Glaw jbglaw at lug-owl.de
Fri Feb 4 16:30:37 UTC 2011


On Fri, 2011-02-04 05:51:12 -0700, Shawn Willden <shawn at willden.org> wrote:
> On Fri, Feb 4, 2011 at 3:51 AM, Jan-Benedict Glaw <jbglaw at lug-owl.de> wrote:
> > Consider a 2TB
> > filesystem on a 2TB disk. Sooner or later, you will face a read (or
> > even write) errors, which will at least easily result in a r/o
> > filesystem. For reading other shares, that's not much of a problem.
> > But you're instantly also loosing a hugh *writeable* area.
> >
> > So with disks that large, do you use a small number of large
> > partitions/filesystems (or even only one), or do you cut it down to,
> > say, 10 filesystems of 200GB each, starting a separate tahoe node for
> > each filesystem. Or do you link the individual filesystems into the
> > storage directory?
> >
> 
> I don't think using lots of partitions really helps.  I've long used many
> partitions on my big disks (starting back when 10 GB was a "big disk") for
> reasons of flexibility.  I have multiple large disks, each broken into many
> partitions, then I create RAID arrays on the "parallel" partitions,
> including one from each disk, then bind the RAID arrays together with LVM
> and finally carve out logical volumes for actual use.  Without getting into
> the advantages/disadvantages of that approach, the reason I mention it is
> because what I've observed is that when a disk gets an I/O error on any one
> of the partitions, the OS assumes that the whole disk is having trouble and
> drops _all_ the partitions out of their RAID arrays.

That's not my experience. There are usually two quite distinct error
types. On one hand, errors that cover the whole disk. Even in the
datacenter of the company I work for, a disk dieing completely is
quite rare. Single-block read errors happen, in comparison, quite
often. But they usually don't affect anything (except the filesystem's
error handling code, which may set the filesystem r/o.)

These read errors are of different sources, though. One is wear of the
magnetic data, thus, writing the sector anew "fixes" the problem. Thus
regular reads of the whole disk seem to prevent parts of these
problems; seems that some drives re-write the sectors after read, IFF
they're still intact but "hard" to read.

A different thing is disks with real write errors. Usually, disks have
spare (non-accessible and invisible) blocks where they *internally*
remap bad sectors to. An application (ie. the operating system's
kernel) doesn't see this happen, thus such a disk can instantly be
considered bricked with the first write error, because there's no way
that disk could remap further problematic (user-visible) blocks to.

> I believe the same thing happens if you place file systems directly on the
> partitions; an I/O error on one of them will cause them all to be put in
> read-only mode.  Given that, unless you have other reasons to prefer many
> partitions, I think a single big partition makes more sense.

I haven't seen that in the wild, too. Usually, single partitions (and
their filesystems) seem to be really handled independand, at least
with Linux. (Of course, a disk that died completely will make all
their partitions/filesystems go booom.)

> > Running like 10 tahoe nodes on one physical HDD would create another
> > problem: what if all (or most of all) shares get stored to that single
> > HDD, with all being lost with a single drive crash?
> 
> Yeah, multiple nodes on one HDD hugely increases the impact of a common
> failure mode.  I think it's just a bad idea.

...as long as you cannot direct Tahoe to distribute the shares as I
threw it onto the mailing list these days :)

MfG, JBG

-- 
      Jan-Benedict Glaw      jbglaw at lug-owl.de              +49-172-7608481
Signature of:           Ich hatte in letzter Zeit ein bißchen viel Realitycheck.
the second  :               Langsam möchte ich mal wieder weiterträumen können.
                             -- Maximilian Wilhelm (18. Mai 2005, #lug-owl.de)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110204/12c898ac/attachment.asc>


More information about the tahoe-dev mailing list