From: Zooko O'Whielacronx Date: Fri, 19 Aug 2011 04:41:03 +0000 (-0700) Subject: docs: reflow docs/logging.rst to fill-column 77 X-Git-Tag: pre-393~8 X-Git-Url: https://git.rkrishnan.org/reliability?a=commitdiff_plain;h=41999430e04f96ead8fd67c5629a6aec0f4409ef;p=tahoe-lafs%2Ftahoe-lafs.git docs: reflow docs/logging.rst to fill-column 77 --- diff --git a/docs/logging.rst b/docs/logging.rst index 247429ae..5a24700e 100644 --- a/docs/logging.rst +++ b/docs/logging.rst @@ -26,9 +26,9 @@ went wrong. The foolscap logging system is documented at ``_. -The foolscap distribution includes a utility named "``flogtool``" (usually -at ``/usr/bin/flogtool`` on Unix) which is used to get access to many -foolscap logging features. +The foolscap distribution includes a utility named "``flogtool``" (usually at +``/usr/bin/flogtool`` on Unix) which is used to get access to many foolscap +logging features. Realtime Logging ================ @@ -45,11 +45,10 @@ this port and start emitting log information:: flogtool tail BASEDIR/private/logport.furl The ``--save-to FILENAME`` option will save all received events to a file, -where then can be examined later with "``flogtool dump``" or -"``flogtool web-viewer``". The ``--catch-up`` option will ask the node to -dump all stored events before subscribing to new ones (without ``--catch-up``, -you will only hear about events that occur after the tool has connected and -subscribed). +where then can be examined later with "``flogtool dump``" or "``flogtool +web-viewer``". The ``--catch-up`` option will ask the node to dump all stored +events before subscribing to new ones (without ``--catch-up``, you will only +hear about events that occur after the tool has connected and subscribed). Incidents ========= @@ -65,18 +64,18 @@ level or higher, but other criteria could be implemented. The typical "incident report" we've seen in a large Tahoe grid is about 40kB compressed, representing about 1800 recent events. -These "flogfiles" have a similar format to the files saved by -"``flogtool tail --save-to``". They are simply lists of log events, with a -small header to indicate which event triggered the incident. +These "flogfiles" have a similar format to the files saved by "``flogtool +tail --save-to``". They are simply lists of log events, with a small header +to indicate which event triggered the incident. The "``flogtool dump FLOGFILE``" command will take one of these ``.flog.bz2`` files and print their contents to stdout, one line per event. The raw event dictionaries can be dumped by using "``flogtool dump --verbose FLOGFILE``". -The "``flogtool web-viewer``" command can be used to examine the flogfile -in a web browser. It runs a small HTTP server and emits the URL on stdout. -This view provides more structure than the output of "``flogtool dump``": -the parent/child relationships of log events is displayed in a nested format. +The "``flogtool web-viewer``" command can be used to examine the flogfile in +a web browser. It runs a small HTTP server and emits the URL on stdout. This +view provides more structure than the output of "``flogtool dump``": the +parent/child relationships of log events is displayed in a nested format. "``flogtool web-viewer``" is still fairly immature. Working with flogfiles @@ -86,10 +85,10 @@ The "``flogtool filter``" command can be used to take a large flogfile (perhaps one created by the log-gatherer, see below) and copy a subset of events into a second file. This smaller flogfile may be easier to work with than the original. The arguments to "``flogtool filter``" specify filtering -criteria: a predicate that each event must match to be copied into the -target file. ``--before`` and ``--after`` are used to exclude events outside -a given window of time. ``--above`` will retain events above a certain -severity level. ``--from`` retains events send by a specific tubid. +criteria: a predicate that each event must match to be copied into the target +file. ``--before`` and ``--after`` are used to exclude events outside a given +window of time. ``--above`` will retain events above a certain severity +level. ``--from`` retains events send by a specific tubid. ``--strip-facility`` removes events that were emitted with a given facility (like ``foolscap.negotiation`` or ``tahoe.upload``). @@ -102,8 +101,8 @@ modest) storage requirements to a different host and provides access to logfiles from multiple nodes (web-API, storage, or helper) in a single place. There are two kinds of gatherers: "log gatherer" and "stats gatherer". Each -produces a FURL which needs to be placed in the ``NODEDIR/tahoe.cfg`` file -of each node that is to publish to the gatherer, under the keys +produces a FURL which needs to be placed in the ``NODEDIR/tahoe.cfg`` file of +each node that is to publish to the gatherer, under the keys "log_gatherer.furl" and "stats_gatherer.furl" respectively. When the Tahoe node starts, it will connect to the configured gatherers and offer its logport: the gatherer will then use the logport to subscribe to hear about @@ -127,35 +126,38 @@ provided in ``misc/incident-gatherer/support_classifiers.py`` . There is roughly one category for each ``log.WEIRD``-or-higher level event in the Tahoe source code. -The incident gatherer is created with the "``flogtool create-incident-gatherer -WORKDIR``" command, and started with "``tahoe start``". The generated -"``gatherer.tac``" file should be modified to add classifier functions. +The incident gatherer is created with the "``flogtool +create-incident-gatherer WORKDIR``" command, and started with "``tahoe +start``". The generated "``gatherer.tac``" file should be modified to add +classifier functions. The incident gatherer writes incident names (which are simply the relative pathname of the ``incident-\*.flog.bz2`` file) into ``classified/CATEGORY``. For example, the ``classified/mutable-retrieve-uncoordinated-write-error`` -file contains a list of all incidents which were triggered by an uncoordinated -write that was detected during mutable file retrieval (caused when somebody -changed the contents of the mutable file in between the node's mapupdate step -and the retrieve step). The ``classified/unknown`` file contains a list of all -incidents that did not match any of the classification functions. +file contains a list of all incidents which were triggered by an +uncoordinated write that was detected during mutable file retrieval (caused +when somebody changed the contents of the mutable file in between the node's +mapupdate step and the retrieve step). The ``classified/unknown`` file +contains a list of all incidents that did not match any of the classification +functions. At startup, the incident gatherer will automatically reclassify any incident report which is not mentioned in any of the ``classified/\*`` files. So the usual workflow is to examine the incidents in ``classified/unknown``, add a new classification function, delete ``classified/unknown``, then bound the gatherer with "``tahoe restart WORKDIR``". The incidents which can be -classified with the new functions will be added to their own ``classified/FOO`` -lists, and the remaining ones will be put in ``classified/unknown``, where -the process can be repeated until all events are classifiable. +classified with the new functions will be added to their own +``classified/FOO`` lists, and the remaining ones will be put in +``classified/unknown``, where the process can be repeated until all events +are classifiable. The incident gatherer is still fairly immature: future versions will have a web interface and an RSS feed, so operations personnel can track problems in the storage grid. -In our experience, each incident takes about two seconds to transfer from -the node that generated it to the gatherer. The gatherer will automatically -catch up to any incidents which occurred while it is offline. +In our experience, each incident takes about two seconds to transfer from the +node that generated it to the gatherer. The gatherer will automatically catch +up to any incidents which occurred while it is offline. Log Gatherer ------------ @@ -170,12 +172,11 @@ contain events from many different sources, making it easier to correlate things that happened on multiple machines (such as comparing a client node making a request with the storage servers that respond to that request). -Create the Log Gatherer with the "``flogtool create-gatherer -WORKDIR``" command, and start it with "``tahoe start``". Then copy the -contents of the ``log_gatherer.furl`` file it creates into the -``BASEDIR/tahoe.cfg`` file (under the key ``log_gatherer.furl`` of the -section ``[node]``) of all nodes that should be sending it log -events. (See ``_.) +Create the Log Gatherer with the "``flogtool create-gatherer WORKDIR``" +command, and start it with "``tahoe start``". Then copy the contents of the +``log_gatherer.furl`` file it creates into the ``BASEDIR/tahoe.cfg`` file +(under the key ``log_gatherer.furl`` of the section ``[node]``) of all nodes +that should be sending it log events. (See ``_.) The "``flogtool filter``" command, described above, is useful to cut down the potentially large flogfiles into a more focussed form. @@ -194,11 +195,11 @@ Local twistd.log files In addition to the foolscap-based event logs, certain high-level events will be recorded directly in human-readable text form, in the -``BASEDIR/logs/twistd.log`` file (and its rotated old versions: ``twistd.log.1``, -``twistd.log.2``, etc). This form does not contain as much information as the -flogfiles available through the means described previously, but they are -immediately available to the curious developer, and are retained until the -twistd.log.NN files are explicitly deleted. +``BASEDIR/logs/twistd.log`` file (and its rotated old versions: +``twistd.log.1``, ``twistd.log.2``, etc). This form does not contain as much +information as the flogfiles available through the means described +previously, but they are immediately available to the curious developer, and +are retained until the twistd.log.NN files are explicitly deleted. Only events at the ``log.OPERATIONAL`` level or higher are bridged to ``twistd.log`` (i.e. not the ``log.NOISY`` debugging events). In addition, @@ -224,22 +225,22 @@ but a few notes are worth stating here: clustered with its parent. For example, a download process that involves three sequential hash fetches could announce the send and receipt of those hash-fetch messages with a ``parent=`` argument that ties them to the - overall download process. However, each new web-API download request - should be unparented. + overall download process. However, each new web-API download request should + be unparented. * use the ``format=`` argument in preference to the ``message=`` argument. E.g. use ``log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k)`` instead of ``log.msg("got %d shares, need %d" % (n,k))``. This will allow - later tools to analyze the event without needing to scrape/reconstruct - the structured data out of the formatted string. + later tools to analyze the event without needing to scrape/reconstruct the + structured data out of the formatted string. * Pass extra information as extra keyword arguments, even if they aren't included in the ``format=`` string. This information will be displayed in the "``flogtool dump --verbose``" output, as well as being available to other tools. The ``umid=`` argument should be passed this way. -* use ``log.err`` for the catch-all ``addErrback`` that gets attached to - the end of any given Deferred chain. When used in conjunction with +* use ``log.err`` for the catch-all ``addErrback`` that gets attached to the + end of any given Deferred chain. When used in conjunction with ``LOGTOTWISTED=1``, ``log.err()`` will tell Twisted about the error-nature of the log message, causing Trial to flunk the test (with an "ERROR" indication that prints a copy of the Failure, including a traceback). @@ -270,12 +271,12 @@ by running the tests like this:: make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 The first environment variable will cause foolscap log events to be written -to ``./flog.out.bz2`` (instead of merely being recorded in the circular buffers -for the use of remote subscribers or incident reports). The second will cause -all log events to be written out, not just the higher-severity ones. The -third will cause twisted log events (like the markers that indicate when each -unit test is starting and stopping) to be copied into the flogfile, making it -easier to correlate log events with unit tests. +to ``./flog.out.bz2`` (instead of merely being recorded in the circular +buffers for the use of remote subscribers or incident reports). The second +will cause all log events to be written out, not just the higher-severity +ones. The third will cause twisted log events (like the markers that indicate +when each unit test is starting and stopping) to be copied into the flogfile, +making it easier to correlate log events with unit tests. Enabling this form of logging appears to roughly double the runtime of the unit tests. The ``flog.out.bz2`` file is approximately 2MB.