Brian Warner [Thu, 19 Mar 2009 18:07:56 +0000 (11:07 -0700)]
storage status: report expiration-cutoff-date like 19-Mar-2009 (as opposed to the tahoe.cfg input format of 2009-03-19), for redundancy: someone who gets the month and day switched will have a better chance to spot the problem in the storage-status output if it's in a different format
Brian Warner [Thu, 19 Mar 2009 00:58:14 +0000 (17:58 -0700)]
util/time_format: new routine to parse dates like 2009-03-18, switch expirer to use it. I'd prefer to use 18-Mar-2009, but it is surprisingly non-trivial to build a parser that will take UTC dates instead of local dates
Brian Warner [Mon, 16 Mar 2009 00:51:34 +0000 (17:51 -0700)]
test_deepcheck: remove the 10s timeout: our dapper buildslave requires 30s, and the reduced timeout was only there because this tests fails by timeout rather than explicitly
Brian Warner [Sun, 15 Mar 2009 23:19:58 +0000 (16:19 -0700)]
tahoe cp -r: add --caps-only flag, to write filecaps into local files instead of actual file contents. Used only for debugging and as a quick tree-comparison tool.
Brian Warner [Fri, 13 Mar 2009 23:31:35 +0000 (16:31 -0700)]
dirnode deep_traverse: insert a turn break (fireEventually) at least once every 100 files, otherwise a CHK followed by more than 158 LITs can overflow the stack, sort of like #237.
Brian Warner [Thu, 12 Mar 2009 20:56:06 +0000 (13:56 -0700)]
add 'tahoe debug consolidate' command, to merge directories created by repeated 'tahoe cp -r' or the allmydata win32 backup tool, into the form that would have been created by 'tahoe backup'.
nodeadmin: node stops itself if a hotline file hasn't been touched in 120 seconds now, instead of in 60 seconds
A test failed on draco (MacPPC) because it took 67.1 seconds to get around to running the test, and the node had already stopped itself when the hotline file was 60 seconds old.
Brian Warner [Sat, 7 Mar 2009 11:56:01 +0000 (04:56 -0700)]
web: when a dirnode can't be read, emit a regular HTML page but with the child-table and upload-forms replaced with an apologetic message. Make sure to include the 'get info' links so the user can do a filecheck
Brian Warner [Sat, 7 Mar 2009 11:54:08 +0000 (04:54 -0700)]
web/common: split out exception-to-explanation+code mapping to a separate humanize_failure() function, so it can be used by other code. Add explanation for mutable UnrecoverableFileError.
Brian Warner [Sat, 7 Mar 2009 05:45:17 +0000 (22:45 -0700)]
storage: add a lease-checker-and-expirer crawler, plus web status page.
This walks slowly through all shares, examining their leases, deciding which
are still valid and which have expired. Once enabled, it will then remove the
expired leases, and delete shares which no longer have any valid leases. Note
that there is not yet a tahoe.cfg option to enable lease-deletion: the
current code is read-only. A subsequent patch will add a tahoe.cfg knob to
control this, as well as docs. Some other minor items included in this patch:
tahoe debug dump-share has a new --leases-only flag
storage sharefile/leaseinfo code is cleaned up
storage web status page (/storage) has more info, more tests coverage
space-left measurement on OS-X should be more accurate (it was off by 2048x)
(use stat .f_frsize instead of f_bsize)
Brian Warner [Fri, 27 Feb 2009 08:04:26 +0000 (01:04 -0700)]
servermap add-lease: fix the code that's supposed to catch remote IndexErrors, I forgot that they present as ServerFailures instead. This should stop the deluge of Incidents that occur when you do add-lease against 1.3.0 servers
Brian Warner [Wed, 25 Feb 2009 04:42:13 +0000 (21:42 -0700)]
#165: make 'tahoe restart --force' the default behavior: warn but do not stop if restart is used on something that wasn't a running node, and always try to start it afterwards. This is particularly important for #315 (restart -m), because otherwise a single not-already-running node will prevent all nodes from being restarted, resulting in longer downtime than necessary
Brian Warner [Wed, 25 Feb 2009 00:56:20 +0000 (17:56 -0700)]
test_cli: exercise the recent tolerate-'c:\dir\file.txt' fix in scripts/common, recorded in a separate match to make it easier to merge the fix to prod
Brian Warner [Tue, 24 Feb 2009 22:40:17 +0000 (15:40 -0700)]
test_web: add (disabled) test to see what happens when deep-check encounters an unrecoverable directory. We still need code changes to improve this behavior.
Brian Warner [Tue, 24 Feb 2009 05:14:05 +0000 (22:14 -0700)]
immutable/checker.py: trap ShareVersionIncompatible too. Also, use f.check
instead of examining the value returned by f.trap, because the latter appears
to squash exception types down into their base classes (i.e. since
ShareVersionIncompatible is a subclass of LayoutInvalid,
f.trap(Failure(ShareVersionIncompatible)) == LayoutInvalid).
All this resulted in 'incompatible' shares being misclassified as 'corrupt'.
Brian Warner [Tue, 24 Feb 2009 00:42:27 +0000 (17:42 -0700)]
test_repairer: change Repairer to use much-faster no_network.GridTestMixin. As a side-effect, fix what I think was a bug: some of the assert-minimal-effort-expended checks were mixing write counts and allocate counts