[tahoe-dev] debugging Tahoe in its current form

Fri Jan 18 05:33:40 UTC 2008

robk mentions in the [1923] checkin..

> though curiously still my test logs a NotEnoughWritersError error, and I'm
> not currently able to fathom why that exception isn't leading to any
> overall failure of the unit test itself.

That unit test does two uploads: the first one is interrupted, the second one
resumes where it left off. The first one is supposed to fire the
NotEnoughWritersError, since it ran out of writers (i.e. uploading clients)
before the upload could be completed. The second one should complete
successfully. I think that's probably why you were still seeing that error
message.

> for general interest, a large part of the time spent trying to track this
> down was lost to the state of logging. I added a whole bunch of logging to
> try and track down where the tests were failing, but then spent a bunch of
> time searching in vain for that log output. as far as I can tell at this
> point the unit tests are themselves logging to foolscap's log module, but
> that isn't being directed anywhere, so all the test's logging is being
> black holed.

Ouch.. I'm sorry you had to go through all that pain. I saw the buildbot
failure about 3am last night, tried to fix it, but my patch didn't actually
work.

Tahoe's logging is in a funny state right now, because Foolscap's logging
code is in a funny state. I'm in the slow process of converting over a lot of
tahoe event logging to foolscap, because it gives much more structured
information that normal twisted.log (but only when foolscap event logging is
enabled). Here's my recipe for using foolscap event logging (aka "flogging")
with the current codebase (note that this will be easier in the future):

 When running the unit tests, if anything goes wrong, re-run them with:

  make test FLOGFILE=flog.out.bz2 FLOGTWISTED=1 FLOGLEVEL=1

   (if you want to run just a specific test, add TEST=allmydata.test.test_foo)

 that will write pickled log events to flog.out.bz2 . Then you can view those
 events with the 'flogtool' executable that is installed along with foolscap.
 (if your foolscap was built by tahoe's automatic 'make build-deps' step, then
 it can probably be found in support/bin/flogtool).

  flogtool dump flog.out.bz2

   this will just dump the log events to stdout, one per line. Each is
   emitted as a dictionary of events. Gross, but complete.

  flogtool web-viewer -p 9999 flog.out.bz2

   this will run a small web server on localhost and display the URL you
   should visit. From this view, you can see a hierarchical view of log
   events, with color-coded severities, and nicely-formatted tracebacks

While writing the offloaded-uploader, I've been making heavy use of the
web-viewer tool. I've also been making improvements to Foolscap's upstream to
make it easier to use. In particular there is now a way to remove all the
foolscap.negotiation events from the logfile, since it is likely that they'll
just get in the way. Here's the recipe:

  hg clone http://foolscap.lothar.com/repos/trunk ~/foolscap-trunk
  PYTHONPATH=~/foolscap-trunk make test FLOGFILE=flog.out.bz2 FLOGTWISTED=1 FLOGLEVEL=1
  ~/foolscap-trunk/bin/flogtool filter --strip-facility foolscap flog.out.bz2 flog2.out.bz2
  ~/foolscap-trunk/bin/flogtool web-viewer -p 9999 flog2.out.bz2

Eventually I plan to move this filtering functionality into the web-viewer
pages directly.

When you're running an actual node, you've got a few different ways to get at
the logged events:

  1. run the node with FLOGFILE=flog.out.bz2 like you do with the unit tests.
     Just be sure that when you stop the node, you do so gently (i.e. with
     SIGINT), so the logfile has a chance to get flushed before the process
     exits. The .bz2 output format does not flush events in a useful way any
     time before reactor shutdown. Note that you can use FLOGFILE=flog.out if
     you like, same information only not compressed, and flushing happens
     more frequently. Then use 'flogtool dump' or 'flogtool web-viewer' on the
     resulting file.

  2. run 'flogtool tail BASEDIR/private/logport.furl' . This will connect to
     the foolscap logport and subscribe to log events, then emit them to
     stdout. The presentation isn't quite as nice as web-viewer, but it's in
     real-time. Note that at the moment you don't get any events that
     occurred before you connected, so you'll miss the startup events. Within
     a week I should have the foolscap changes in place to let you "catch up"
     on events that were recorded before you connect.

  3. use a log-gatherer: run 'flogtool gather' in an empty directory, paste
     the FURL it gives you into your node's BASEDIR/log-gatherer.furl, then
     run the node normally. This will cause the node to connect to the log
     gatherer and send a copy of all its log messages to the gatherer. The
     gatherer will write all events to a log.pickle file, which can then be
     used by 'flogtool dump' or 'flogtool web-viewer'. We're using this on
     the test grid to gather log messages from all storage nodes in a single
     place.

My plan is to change both the way we run the unit tests and the way we start
up a node to provide a simple switch that causes flogging to write all log
events to a simple text file, to bring back the functionality of twistd.log
or _trial_temp/test.log . But it will still put the structured data somewhere
useful, so you can use tools like 'flogtool web-viewer' and get better
insight into what's really going on.

I'm equally frustrated with the current half-finished state of foolscap's
logging code.. I apologize for the time it's cost you!

Also, for the record, while debugging a problem in the unit tests, I usually
add a lot of "print" messages all over the place. These go directly to stdout
during unit tests (although in a node running under twistd they get sent to
the log instead), and are very handy. The twisted.log or flog messages that
are most useful to me are the Unhandled Error in Deferred ones. I remove the
print messages before checking anything in, of course.

cheers,
 -Brian

[1923]: http://allmydata.org/trac/tahoe/changeset/1923