[tahoe-dev] crash-only design (self-citation by zooko)

zooko zooko at zooko.com
Thu Jan 10 22:50:11 UTC 2008


Folks:

Since Rob is making it possible to run Tahoe as a Windows service, we  
are faced again with the issue of "crash-only design" versus "clean  
shutdown".  Brian mentioned today that he still wasn't happy with the  
fact that when servers get stopped or restarted they might corrupt  
any mutable file shares that they were in the middle of updating at  
that moment.

I looked at the relevant tickets (#181 and #200), and here I quote  
myself at my most eloquent:

"Since SDMFs get overwritten in their entirety each time, why is it  
more I/O expense? Oh, I know, because of the metadata such as leases.
...
"?;/
I don't necessarily object to giving the process a SIGTERM warning  
before the SIGKILL. I think this change to crash-only was useful  
because it prompted us to think through these kinds of questions  
about intermediate persistent state, and because it led us to not  
waste developer time (and sysadmin time) on "clean shutdown" behavior  
that we didn't really need.

If letting the filesystem I/O buffers flush is some clean shutdown  
behavior that we *do* really need, I'm okay with that, as long as it  
doesn't mislead us into adding other (harder to maintain) behavior  
that we don't really need. (Or lead us to forget about the chance of  
leaving inconsistent persistent state that we don't know how to deal  
with afterward.)

Make sense?

I guess there are two orthogonal reasons why I like crash-only:

    1. force us to think the effect of unclean shutdown
    2. don't add maintenance burden for something we can live without

Basically, the I regard the behavior of Python, the operating system,  
file system, etc., in response to kill -15 $pid ; sleep 5 ; kill -9  
$pid as being easy for us to maintain. I regard the behavior of  
Twisted and any Python code of ours in response to that sequence as  
relatively hard to maintain. :-)

So overall I'm +1 on leaving the SIGKILL shutdown as is, in order to  
keep reminding us to think about consistency of intermediate  
persistent state, but I'm also +0 on adding a SIGTERM in order to let  
filesystem updates flush."



More information about the tahoe-dev mailing list