From: warner-tahoe <warner-tahoe@allmydata.com> Date: Wed, 8 Aug 2007 01:55:47 +0000 (-0700) Subject: update foolscap to foolscap-0.1.5, the latest release X-Git-Url: https://git.rkrishnan.org/pf/vdrive/somewhere?a=commitdiff_plain;h=afe006d70007edd7ae79ce77408e01daabed5f6d;p=tahoe-lafs%2Ftahoe-lafs.git update foolscap to foolscap-0.1.5, the latest release --- diff --git a/src/foolscap/ChangeLog b/src/foolscap/ChangeLog index bade5f83..9c887ab9 100644 --- a/src/foolscap/ChangeLog +++ b/src/foolscap/ChangeLog @@ -1,3 +1,129 @@ +2007-08-07 Brian Warner <warner@lothar.com> + + * foolscap/__init__.py: release Foolscap-0.1.5 + * misc/{sid|sarge|dapper|edgy|feisty}/debian/changelog: same + +2007-08-07 Brian Warner <warner@lothar.com> + + * NEWS: update for the upcoming release + + * foolscap/pb.py (Tub.registerNameLookupHandler): new function to + augment Tub.registerReference(). This allows names to be looked up + at request time, rather than requiring all Referenceables be + pre-registered with registerReference(). The chief use of this + would be for FURLs which point at objects that live on disk in + some persistent state until they are needed. Closes #6. + (Tub.unregisterNameLookupHandler): allow handlers to be removed + (Tub.getReferenceForName): use the handler during lookup + * foolscap/test/test_tub.py (NameLookup): test it + +2007-07-27 Brian Warner <warner@lothar.com> + + * foolscap/referenceable.py (LocalReferenceable): implement an + adapter that allows code to do IRemoteReference(t).callRemote(...) + and have it work for both RemoteReferences and local + Referenceables. You might want to do this if you're getting back + introductions to a variety of remote Referenceables, some of which + might actually be on your local system, and you want to treat all + of the, the same way. Local Referenceables will be wrapped with a + class that implements callRemote() and makes it behave like an + actual remote callRemote() would. Closes ticket #1. + * foolscap/test/test_reference.py (LocalReference): test it + +2007-07-26 Brian Warner <warner@lothar.com> + + * foolscap/call.py (AnswerUnslicer.receiveChild): accept a + ready_deferred, to accomodate Gifts in return values. Closes #5. + (AnswerUnslicer.receiveClose): .. and don't fire the response + until any such Gifts resolve + * foolscap/test/test_gifts.py (Gifts.testReturn): test it + (Gifts.testReturnInContainer): same + (Bad.testReturn_swissnum): and test the failure case too + + * foolscap/test/test_pb.py (TestAnswer.testAccept1): fix a test + which wasn't calling start() properly and was broken by that change + (TestAnswer.testAccept2): same + + * foolscap/test/test_gifts.py (Bad.setUp): disable these tests when + we don't have crypto, since TubIDs are not mangleable in the same + way without crypto. + + * foolscap/slicer.py (BaseUnslicer.receiveChild): new convention: + Unslicers should accumulate their children's ready_deferreds into + an AsyncAND, and pass it to the parent. If something goes wrong, + the ready_deferred should errback, which will abandon the method + call that contains it. + * foolscap/slicers/dict.py (DictUnslicer.receiveClose): same + * foolscap/slicers/tuple.py (TupleUnslicer.receiveClose): same + (TupleUnslicer.complete): same + * foolscap/slicers/set.py (SetUnslicer.receiveClose): same + * foolscap/slicers/list.py (ListUnslicer.receiveClose): same + * foolscap/call.py (CallUnslicer.receiveClose): same + + * foolscap/referenceable.py (TheirReferenceUnslicer.receiveClose): + use our ready_deferred to signal whether the gift resolves + correctly or not. If it fails, errback ready_deferred (to prevent + the message from being delivered without the resolved gift), but + callback obj_deferred with a placeholder to avoid causing too much + distress to the container. + + * foolscap/broker.py (PBRootUnslicer.receiveChild): accept + ready_deferred in the InboundDelivery, stash both of them in the + broker. + (Broker.scheduleCall): rewrite inbound delivery handling: use a + self._call_is_running flag to prevent concurrent deliveries, and + wait for the ready_deferred before delivering the top-most + message. If the ready_deferred errbacks, that gets routed to + self.callFailed so the caller hears about the problem. This closes + ticket #2. + + * foolscap/call.py (InboundDelivery): remove whenRunnable, relying + upon the ready_deferred to let the Broker know when the message + can be delivered. + (ArgumentUnslicer): significant cleanup, using ready_deferred. + Remove isReady and whenReady. + + * foolscap/test/test_gifts.py (Base): factor setup code out + (Base.createCharacters): registerReference(tubname), for debugging + (Bad): add a bunch of tests to make sure that gifts which fail to + resolve (for various reasons) will inform the caller about the + problem, via an errback on the original callRemote()'s Deferred. + +2007-07-25 Brian Warner <warner@lothar.com> + + * foolscap/util.py (AsyncAND): new utility class, which is like + DeferredList but is specifically for control flow rather than data + flow. + * foolscap/test/test_util.py: test it + + * foolscap/call.py (CopiedFailure.setCopyableState): set .type to + a class that behaves (as least as far as reflect.qual() is + concerned) just like the original exception class. This improves + the behavior of derived Failure objects, as well as trial's + handling of CopiedFailures that get handed to log.err(). + CopiedFailures are now a bit more like actual Failures. See ticket + #4 (http://foolscap.lothar.com/trac/ticket/4) for more details. + (CopiedFailureSlicer): make sure that CopiedFailures can be + serialized, so that A-calls-B-calls-C can return a failure all + the way back. + * foolscap/test/test_call.py (TestCall.testCopiedFailure): test it + * foolscap/test/test_copyable.py: update to match, now we must + compare reflect.qual(f.type) against some extension classname, + rather than just f.type. + * foolscap/test/test_pb.py: same + * foolscap/test/common.py: same + +2007-07-15 Brian Warner <warner@lothar.com> + + * foolscap/test/test_interfaces.py (TestInterface.testStack): + don't look for a '/' in the stacktrace, since it won't be there + under windows. Thanks to 'strank'. Closes Twisted#2731. + +2007-06-29 Brian Warner <warner@lothar.com> + + * foolscap/__init__.py: bump revision to 0.1.4+ while between releases + * misc/{sid|sarge|dapper|edgy|feisty}/debian/changelog: same + 2007-05-14 Brian Warner <warner@lothar.com> * foolscap/__init__.py: release Foolscap-0.1.4 diff --git a/src/foolscap/Makefile b/src/foolscap/Makefile index d1c1612e..99496904 100644 --- a/src/foolscap/Makefile +++ b/src/foolscap/Makefile @@ -53,3 +53,4 @@ docs: lore -p --config template=$(DOC_TEMPLATE) --config ext=.html \ `find doc -name '*.xhtml'` + diff --git a/src/foolscap/NEWS b/src/foolscap/NEWS index f45ecb20..839c6de9 100644 --- a/src/foolscap/NEWS +++ b/src/foolscap/NEWS @@ -1,5 +1,66 @@ User visible changes in Foolscap (aka newpb/pb2). -*- outline -*- +* Release 0.1.5 (07 Aug 2007) + +** Compatibility + +This release is fully compatible with 0.1.4 and 0.1.3 . + +** CopiedFailure improvements + +When a remote method call fails, the calling side gets back a CopiedFailure +instance. These instances now behave slightly more like the (local) Failure +objects that they are intended to mirror, in that .type now behaves much like +the original class. This should allow trial tests which result in a +CopiedFailure to be logged without exploding. In addition, chained failures +(where A calls B, and B calls C, and C fails, so C's Failure is eventually +returned back to A) should work correctly now. + +** Gift improvements + +Gifts inside return values should properly stall the delivery of the response +until the gift is resolved. Gifts in all sorts of containers should work +properly now. Gifts which cannot be resolved successfully (either because the +hosting Tub cannot be reached, or because the name cannot be found) will now +cause a proper error rather than hanging forever. Unresolvable gifts in +method arguments will cause the message to not be delivered and an error to +be returned to the caller. Unresolvable gifts in method return values will +cause the caller to receive an error. + +** IRemoteReference() adapter + +The IRemoteReference() interface now has an adapter from Referenceable which +creates a wrapper that enables the use of callRemote() and other +IRemoteReference methods on a local object. + +The situation where this might be useful is when you have a central +introducer and a bunch of clients, and the clients are introducing themselves +to each other (to create a fully-connected mesh), and the introductions are +using live references (i.e. Gifts), then when a specific client learns about +itself from the introducer, that client will receive a local object instead +of a RemoteReference. Each client will wind up with n-1 RemoteReferences and +a single local object. + +This adapter allows the client to treat all these introductions as equal. A +client that wishes to send a message to everyone it's been introduced to +(including itself) can use: + + for i in introductions: + IRemoteReference(i).callRemote("hello", args) + +In the future, if we implement coercing Guards (instead of +compliance-asserting Constraints), then IRemoteReference will be useful as a +guard on methods that want to insure that they can do callRemote (and +notifyOnDisconnect, etc) on their argument. + +** Tub.registerNameLookupHandler + +This method allows a one-argument name-lookup callable to be attached to the +Tub. This augments the table maintained by Tub.registerReference, allowing +Referenceables to be created on the fly, or persisted/retrieved on disk +instead of requiring all of them to be generated and registered at startup. + + * Release 0.1.4 (14 May 2007) ** Compatibility diff --git a/src/foolscap/doc/jobs.txt b/src/foolscap/doc/jobs.txt new file mode 100644 index 00000000..d671b466 --- /dev/null +++ b/src/foolscap/doc/jobs.txt @@ -0,0 +1,619 @@ +-*- outline -*- + +Reasonably independent newpb sub-tasks that need doing. Most important come +first. + +* decide on a version negotiation scheme + +Should be able to telnet into a PB server and find out that it is a PB +server. Pointing a PB client at an HTTP server (or an HTTP client at a PB +server) should result in an error, not a timeout. Implement in +banana.Banana.connectionMade(). + +desiderata: + + negotiation should take place with regular banana sequences: don't invent a + new protocol that is only used at the start of the connection + + Banana should be useable one-way, for storage or high-latency RPC (the mnet + folks want to create a method call, serialize it to a string, then encrypt + and forward it on to other nodes, sometimes storing it in relays along the + way if a node is offline for a few days). It should be easy for the layer + above Banana to feed it the results of what its negotiation would have been + (if it had actually used an interactive connection to its peer). Feeding the + same results to both sides should have them proceed as if they'd agreed to + those results. + + negotiation should be flexible enough to be extended but still allow old + code to talk with new code. Magically predict every conceivable extension + and provide for it from the very first release :). + +There are many levels to banana, all of which could be useful targets of +negotiation: + + which basic tokens are in use? Is there a BOOLEAN token? a NONE token? Can + it accept a LONGINT token or is the target limited to 32-bit integers? + + are there any variations in the basic Banana protocol being used? Could the + smaller-scope OPEN-counter decision be deferred until after the first + release and handled later with a compatibility negotiation flag? + + What "base" OPEN sequences are known? 'unicode'? 'boolean'? 'dict'? This is + an overlap between expressing the capabilities of the host language, the + Banana implementation, and the needs of the application. How about + 'instance', probably only used for StorageBanana? + + What "top-level" OPEN sequences are known? PB stuff (like 'call', and + 'your-reference')? Are there any variations or versions that need to be + known? We may add new functionality in the future, it might be useful for + one end to know whether this functionality is available or not. (the PB + 'call' sequence could some day take numeric argument names to convey + positional parameters, a 'reference' sequence could take a string to + indicate globally-visible PB URLs, it could become possible to pass + target.remote_foo directly to a peer and have a callable RemoteMethod object + pop out the other side). + + What "application-level" sequences are available? (Which RemoteInterface + classes are known and valid in 'call' sequences? Which RemoteCopy names are + valid for targets of the 'copy' sequence?). This is not necessarily within + the realm of Banana negotiation, but applications may need to negotiate this + sort of thing, and any disagreements will be manifested when Banana starts + raising Violations, so it may be useful to include it in the Banana-level + negotiation. + +On the other hand, negotiation is only useful if one side is prepared to +accomodate a peer which cannot do some of the things it would prefer to use, +or if it wants to know about the incapabilities so it can report a useful +failure rather than have an obscure protocol-level error message pop up an +hour later. So negotiation isn't the only goal: simple capability awareness +is a useful lesser goal. + +It kind of makes sense for the first object of a stream to be a negotiation +blob. We could make a new 'version' opentype, and declare that the contents +will be something simple and forever-after-parseable (like a dict, with heavy +constraints on the keys and values, all strings emitted in full). + +DONE, at least the framework is in place. Uses HTTP-style header-block +exchange instead of banana sequences, with client-sends-first and +server-decides. This correctly handles PB-vs-HTTP, but requires a timeout to +detect oldpb clients vs newpb servers. No actual feature negotiation is +performed yet, because we still only have the one version of the code. + +* connection initiation + +** define PB URLs + +[newcred is the most important part of this, the URL stuff can wait] + +A URL defines an endpoint: a pb.Referenceable, with methods. Somewhere along +the way it defines a transport (tcp+host+port, or unix+path) and an object +reference (pathname). It might also define a RemoteInterface, or that might +be put off until we actually invoke a method. + + URL = f("pb:", host, port, pathname) + d = pb.callRemoteURL(URL, ifacename, methodname, args) + +probably give an actual RemoteInterface instead of just its name + +a pb.RemoteReference claims to provide access to zero-or-more +RemoteInterfaces. You may choose which one you want to use when invoking +callRemote. + +TODO: decide upon a syntax for URLs that refer to non-TCP transports + pb+foo://stuff, pby://stuff (for yURL-style self-authenticating names) + +TODO: write the URL parser, implementing pb.getRemoteURL and pb.callRemoteURL + DONE: use a Tub/PBService instead + +TODO: decide upon a calling convention for callRemote when specifying which +RemoteInterface is being used. + + +DONE, PB-URL is the way to go. +** more URLs + +relative URLs (those without a host part) refer to objects on the same +Broker. Absolute URLs (those with a host part) refer to objects on other +Brokers. + +SKIP, interesting but not really useful + +** build/port pb.login: newcred for newpb + +Leave cred work for Glyph. + +<thomasvs> has some enhanced PB cred stuff (challenge/response, pb.Copyable +credentials, etc). + +URL = pb.parseURL("pb://lothar.com:8789/users/warner/services/petmail", + IAuthorization) +URL = doFullLogin(URL, "warner", "x8yzzy") +URL.callRemote(methodname, args) + +NOTDONE + +* constrain ReferenceUnslicer properly + +The schema can use a ReferenceConstraint to indicate that the object must be +a RemoteReference, and can also require that the remote object be capable of +handling a particular Interface. + +This needs to be implemented. slicer.ReferenceUnslicer must somehow actually +ask the constraint about the incoming tokens. + +An outstanding question is "what counts". The general idea is that +RemoteReferences come over the wire as a connection-scoped ID number and an +optional list of Interface names (strings and version numbers). In this case +it is the far end which asserts that its object can implement any given +Interface, and the receiving end just checks to see if the schema-imposed +required Interface is in the list. + +This becomes more interesting when applied to local objects, or if a +constraint is created which asserts that its object is *something* (maybe a +RemoteReference, maybe a RemoteCopy) which implements a given Interface. In +this case, the incoming object could be an actual instance, but the class +name must be looked up in the unjellyableRegistry (and the class located, and +the __implements__ list consulted) before any of the object's tokens are +accepted. + +* security TODOs: + +** size constraints on the set-vocab sequence + +* implement schema.maxSize() + +In newpb, schemas serve two purposes: + + a) make programs safer by reducing the surprises that can appear in their + arguments (i.e. factoring out argument-checking in a useful way) + + b) remove memory-consumption DoS attacks by putting an upper bound on the + memory consumed by any particular message. + +Each schema has a pair of methods named maxSize() and maxDepth() which +provide this upper bound. While the schema is in effect (say, during the +receipt of a particular named argument to a remotely-invokable method), at +most X bytes and Y slicer frames will be in use before either the object is +accepted and processed or the schema notes the violation and the object is +rejected (whereupon the temporary storage is released and all further bytes +in the rejected object are simply discarded). Strictly speaking, the number +returned by maxSize() is the largest string on the wire which has not yet +been rejected as violating the constraint, but it is also a reasonable +metric to describe how much internal storage must be used while processing +it. (To achieve greater accuracy would involve knowing exactly how large +each Python type is; not a sensible thing to attempt). + +The idea is that someone who is worried about an attacker throwing a really +long string or an infinitely-nested list at them can ask the schema just what +exactly their current exposure is. The tradeoff between flexibility ("accept +any object whatsoever here") and exposure to DoS attack is then user-visible +and thus user-selectable. + +To implement maxSize() for a basic schema (like a string), you simply need +to look at banana.xhtml and see how basic tokens are encoded (you will also +need to look at banana.py and see how deserialization is actually +implemented). For a schema.StringConstraint(32) (which accepts strings <= 32 +characters in length), the largest serialized form that has not yet been +either accepted or rejected is: + + 64 bytes (header indicating 0x000000..0020 with lots of leading zeros) + + 1 byte (STRING token) + + 32 bytes (string contents) + = 97 + +If the header indicates a conforming length (<=32) then just after the 32nd +byte is received, the string object is created and handed to up the stack, so +the temporary storage tops out at 97. If someone is trying to spam us with a +million-character string, the serialized form would look like: + + 64 bytes (header indicating 1-million in hex, with leading zeros) ++ 1 byte (STRING token) += 65 + +at which point the receive parser would check the constraint, decide that +1000000 > 32, and reject the remainder of the object. + +So (with the exception of pass/fail maxSize values, see below), the following +should hold true: + + schema.StringConstraint(32).maxSize() == 97 + +Now, schemas which represent containers have size limits that are the sum of +their contents, plus some overhead (and a stack level) for the container +itself. For example, a list of two small integers is represented in newbanana +as: + + OPEN(list) + INT + INT + CLOSE() + +which really looks like: + + opencount-OPEN + len-STRING-"list" + value-INT + value-INT + opencount-CLOSE + +This sequence takes at most: + + opencount-OPEN: 64+1 + len-STRING-"list": 64+1+1000 (opentypes are confined to be <= 1k long) + value-INT: 64+1 + value-INT: 64+1 + opencount-CLOSE: 64+1 + +or 5*(64+1)+1000 = 1325, or rather: + + 3*(64+1)+1000 + N*(IntConstraint().maxSize()) + +So ListConstraint.maxSize is computed by doing some math involving the +.maxSize value of the objects that go into it (the ListConstraint.constraint +attribute). This suggests a recursive algorithm. If any constraint is +unbounded (say a ListConstraint with no limit on the length of the list), +then maxSize() raises UnboundedSchema to indicate that there is no limit on +the size of a conforming string. Clearly, if any constraint is found to +include itself, UnboundedSchema must also be raised. + +This is a loose upper bound. For example, one non-conforming input string +would be: + + opencount-OPEN: 64+1 + len-STRING-"x"*1000: 64+1+1000 + +The entire string would be accepted before checking to see which opentypes +were valid: the ListConstraint only accepts the "list" opentype and would +reject this string immediately after the 1000th "x" was received. So a +tighter upper bound would be 2*65+1000 = 1130. + +In general, the bound is computed by walking through the deserialization +process and identifying the largest string that could make it past the +validity checks. There may be later checks that will reject the string, but +if it has not yet been rejected, then it still represents exposure for a +memory consumption DoS. + +** pass/fail sizes + +I started to think that it was necessary to have each constraint provide two +maxSize numbers: one of the largest sequence that could possibly be accepted +as valid, and a second which was the largest sequence that could be still +undecided. This would provide a more accurate upper bound because most +containers will respond to an invalid object by abandoning the rest of the +container: i.e. if the current active constraint is: + + ListConstraint(StringConstraint(32), maxLength=30) + +then the first thing that doesn't match the string constraint (say an +instance, or a number, or a 33-character string) will cause the ListUnslicer +to go into discard-everything mode. This makes a significant difference when +the per-item constraint allows opentypes, because the OPEN type (a string) is +constrained to 1k bytes. The item constraint probably imposes a much smaller +limit on the set of actual strings that would be accepted, so no +kilobyte-long opentype will possibly make it past that constraint. That means +there can only be one outstanding invalid object. So the worst case (maximal +length) string that has not yet been rejected would be something like: + + OPEN(list) + validthing [0] + validthing [1] + ... + validthing [n-1] + long-invalid-thing + +because if the long-invalid thing had been received earlier, the entire list +would have been abandoned. + +This suggests that the calculation for ListConstraint.maxSize() really needs +to be like + overhead + +(len-1)*itemConstraint.maxSize(valid) + +(1)*itemConstraint.maxSize(invalid) + +I'm still not sure about this. I think it provides a significantly tighter +upper bound. The deserialization process itself does not try to achieve the +absolute minimal exposure (i.e., the opentype checker could take the set of +all known-valid open types, compute the maximum length, and then impose a +StringConstraint with that length instead of 1000), because it is, in +general, a inefficient hassle. There is a tradeoff between computational +efficiency and removing the slack in the maxSize bound, both in the +deserialization process (where the memory is actually consumed) and in +maxSize (where we estimate how much memory could be consumed). + +Anyway, maxSize() and maxDepth() (which is easier: containers add 1 to the +maximum of the maxDepth values of their possible children) need to be +implemented for all the Constraint classes. There are some tests (disabled) +in test_schema.py for this code: those tests assert specific values for +maxSize. Those values are probably wrong, so they must be updated to match +however maxSize actually works. + +* decide upon what the "Shared" constraint should mean + +The idea of this one was to avoid some vulnerabilities by rejecting arbitrary +object graphs. Fundamentally Banana can represent most anything (just like +pickle), including objects that refer to each other in exciting loops and +whorls. There are two problems with this: it is hard to enforce a schema that +allows cycles in the object graph (indeed it is tricky to even describe one), +and the shared references could be used to temporarily violate a schema. + +I think these might be fixable (the sample case is where one tuple is +referenced in two different places, each with a different constraint, but the +tuple is incomplete until some higher-level node in the graph has become +referenceable, so [maybe] the schema can't be enforced until somewhat after +the object has actually finished arriving). + +However, Banana is aimed at two different use-cases. One is kind of a +replacement for pickle, where the goal is to allow arbitrary object graphs to +be serialized but have more control over the process (in particular we still +have an unjellyableRegistry to prevent arbitrary constructors from being +executed during deserialization). In this mode, a larger set of Unslicers are +available (for modules, bound methods, etc), and schemas may still be useful +but are not enforced by default. + +PB will use the other mode, where the set of conveyable objects is much +smaller, and security is the primary goal (including putting limits on +resource consumption). Schemas are enforced by default, and all constraints +default to sensible size limits (strings to 1k, lists to [currently] 30 +items). Because complex object graphs are not commonly transported across +process boundaries, the default is to not allow any Copyable object to be +referenced multiple times in the same serialization stream. The default is to +reject both cycles and shared references in the object graph, allowing only +strict trees, making life easier (and safer) for the remote methods which are +being given this object tree. + +The "Shared" constraint is intended as a way to turn off this default +strictness and allow the object to be referenced multiple times. The +outstanding question is what this should really mean: must it be marked as +such on all places where it could be referenced, what is the scope of the +multiple-reference region (per- method-call, per-connection?), and finally +what should be done when the limit is violated. Currently Unslicers see an +Error object which they can respond to any way they please: the default +containers abandon the rest of their contents and hand an Error to their +parent, the MethodCallUnslicer returns an exception to the caller, etc. With +shared references, the first recipient sees a valid object, while the second +and later recipient sees an error. + + +* figure out Deferred errors for immutable containers + +Somewhat related to the previous one. The now-classic example of an immutable +container which cannot be created right away is the object created by this +sequence: + + t = ([],) + t[0].append((t,)) + +This serializes into (with implicit reference numbers on the left): + +[0] OPEN(tuple) +[1] OPEN(list) +[2] OPEN(tuple) +[3] OPEN(reference #0) + CLOSE + CLOSE + CLOSE + +In newbanana, the second TupleUnslicer cannot return a fully-formed tuple to +its parent (the ListUnslicer), because that tuple cannot be created until the +contents are all referenceable, and that cannot happen until the first +TupleUnslicer has completed. So the second TupleUnslicer returns a Deferred +instead of a tuple, and the ListUnslicer adds a callback which updates the +list's item when the tuple is complete. + +The problem here is that of error handling. In general, if an exception is +raised (perhaps a protocol error, perhaps a schema violation) while an +Unslicer is active, that Unslicer is abandoned (all its remaining tokens are +discarded) and the parent gets an Error object. (the parent may give up too.. +the basic Unslicers all behave this way, so any exception will cause +everything up to the RootUnslicer to go boom, and the RootUnslicer has the +option of dropping the connection altogether). When the error is noticed, the +Unslicer stack is queried to figure out what path was taken from the root of +the object graph to the site that had an error. This is really useful when +trying to figure out which exact object cause a SchemaViolation: rather than +being told a call trace or a description of the *object* which had a problem, +you get a description of the path to that object (the same series of +dereferences you'd use to print the object: obj.children[12].peer.foo.bar). + +When references are allowed, these exceptions could occur after the original +object has been received, when that Deferred fires. There are two problems: +one is that the error path is now misleading, the other is that it might not +have been possible to enforce a schema because the object was incomplete. + +The most important thing is to make sure that an exception that occurs while +the Deferred is being fired is caught properly and flunks the object just as +if the problem were caught synchronously. This may involve discarding an +otherwise complete object graph and blaming the problem on a node much closer +to the root than the one which really caused the failure. + +* adaptive VOCAB compression + +We want to let banana figure out a good set of strings to compress on its +own. In Banana.sendToken, keep a list of the last N strings that had to be +sent in full (i.e. they weren't in the table). If the string being sent +appears more than M times in that table, before we send the token, emit an +ADDVOCAB sequence, add a vocab entry for it, then send a numeric VOCAB token +instead of the string. + +Make sure the vocab mapping is not used until the ADDVOCAB sequence has been +queued. Sending it inline should take care of this, but if for some reason we +need to push it on the top-level object queue, we need to make sure the vocab +table is not updated until it gets serialized. Queuing a VocabUpdate object, +which updates the table when it gets serialized, would take care of this. The +advantage of doing it inline is that later strings in the same object graph +would benefit from the mapping. The disadvantage is that the receiving +Unslicers must be prepared to deal with ADDVOCAB sequences at any time (so +really they have to be stripped out). This disadvantage goes away if ADDVOCAB +is a token instead of a sequence. + +Reasonable starting values for N and M might be 30 and 3. + +* write oldbanana compatibility code? + +An oldbanana peer can be detected because the server side sends its dialect +list from connectionMade, and oldbanana lists are sent with OLDLIST tokens +(the explicit-length kind). + + +* add .describe methods to all Slicers + +This involves setting an attribute between each yield call, to indicate what +part is about to be serialized. + + +* serialize remotely-callable methods? + +It might be useful be able to do something like: + + class Watcher(pb.Referenceable): + def remote_foo(self, args): blah + + w = Watcher() + ref.callRemote("subscribe", w.remote_foo) + +That would involve looking up the method and its parent object, reversing +the remote_*->* transformation, then sending a sequence which contained both +the object's RemoteReference and the appropriate method name. + +It might also be useful to generalize this: passing a lambda expression to +the remote end could stash the callable in a local table and send a Callable +Reference to the other side. I can smell a good general-purpose object +classification framework here, but I haven't quite been able to nail it down +exactly. + +* testing + +** finish testing of LONGINT/LONGNEG + +test_banana.InboundByteStream.testConstrainedInt needs implementation + +** thoroughly test failure-handling at all points of in/out serialization + +places where BananaError or Violation might be raised + +sending side: + Slicer creation (schema pre-validation? no): no no + pre-validation is done before sending the object, Broker.callFinished, + RemoteReference.doCall + slicer creation is done in newSlicerFor + + .slice (called in pushSlicer) ? + .slice.next raising Violation + .slice.next returning Deferrable when streaming isn't allowed + .sendToken (non-primitive token, can't happen) + .newSlicerFor (no ISlicer adapter) + top.childAborted + +receiving side: + long header (>64 bytes) + checkToken (top.openerCheckToken) + checkToken (top.checkToken) + typebyte == LIST (oldbanana) + bad VOCAB key + too-long vocab key + bad FLOAT encoding + top.receiveClose + top.finish + top.reportViolation + oldtop.finish (in from handleViolation) + top.doOpen + top.start +plus all of these when discardCount != 0 +OPENOPEN + +send-side uses: + f = top.reportViolation(f) +receive-side should use it too (instead of f.raiseException) + +** test failure-handing during callRemote argument serialization + +** implement/test some streaming Slicers + +** test producer Banana + +* profiling/optimization + +Several areas where I suspect performance issues but am unwilling to fix +them before having proof that there is a problem: + +** Banana.produce + +This is the main loop which creates outbound tokens. It is called once at +connectionMade() (after version negotiation) and thereafter is fired as the +result of a Deferred whose callback is triggered by a new item being pushed +on the output queue. It runs until the output queue is empty, or the +production process is paused (by a consumer who is full), or streaming is +enabled and one of the Slicers wants to pause. + +Each pass through the loop either pushes a single token into the transport, +resulting in a number of short writes. We can do better than this by telling +the transport to buffer the individual writes and calling a flush() method +when we leave the loop. I think Itamar's new cprotocol work provides this +sort of hook, but it would be nice if there were a generalized Transport +interface so that Protocols could promise their transports that they will +use flush() when they've stopped writing for a little while. + +Also, I want to be able to move produce() into C code. This means defining a +CSlicer in addition to the cprotocol stuff before. The goal is to be able to +slice a large tree of basic objects (lists, tuples, dicts, strings) without +surfacing into Python code at all, only coming "up for air" when we hit an +object type that we don't recognize as having a CSlicer available. + +** Banana.handleData + +The receive-tokenization process wants to be moved into C code. It's +definitely on the critical path, but it's ugly because it has to keep +calling into python code to handle each extracted token. Maybe there is a +way to have fast C code peek through the incoming buffers for token +boundaries, then give a list of offsets and lengths to the python code. The +b128 conversion should also happen in C. The data shouldn't be pulled out of +the input buffer until we've decided to accept it (i.e. the +memory-consumption guarantees that the schemas provide do not take any +transport-level buffering into account, and doing cprotocol tokenization +would represent memory that an attacker can make us spend without triggering +a schema violation). Itamar's CLineReceiver is a good example: you tokenize +a big buffer as much as you can, pass the tokens upstairs to Python code, +then hand the leftover tail to the next read() call. The tokenizer always +works on the concatenation of two buffers: the tail of the previous read() +and the complete contents of the current one. + +** Unslicer.doOpen delegation + +Unslicers form a stack, and each Unslicer gets to exert control over the way +that its descendents are deserialized. Most don't bother, they just delegate +the control methods up to the RootUnslicer. For example, doOpen() takes an +opentype and may return a new Unslicer to handle the new OPEN sequence. Most +of the time, each Unslicer delegates doOpen() to their parent, all the way +up the stack to the RootUnslicer who actually performs the UnslicerRegistry +lookup. + +This provides an optimization point. In general, the Unslicer knows ahead of +time whether it cares to be involved in these methods or not (i.e. whether +it wants to pay attention to its children/descendants or not). So instead of +delegating all the time, we could just have a separate Opener stack. +Unslicers that care would be pushed on the Opener stack at the same time +they are pushed on the regular unslicer stack, likewise removed. The +doOpen() method would only be invoked on the top-most Opener, removing a lot +of method calls. (I think the math is something like turning +avg(treedepth)*avg(nodes) into avg(nodes)). + +There are some other methods that are delegated in this way. open() is +related to doOpen(). setObject()/getObject() keep track of references to +shared objects and are typically only intercepted by a second-level object +which defines a "serialization scope" (like a single remote method call), as +well as connection-wide references (like pb.Referenceables) tracked by the +PBRootUnslicer. These would also be targets for optimization. + +The fundamental reason for this optimization is that most Unslicers don't +care about these methods. There are far more uses of doOpen() (one per +object node) then there are changes to the desired behavior of doOpen(). + +** CUnslicer + +Like CSlicer, the unslicing process wants to be able to be implemented (for +built-in objects) entirely in C. This means a CUnslicer "object" (a struct +full of function pointers), a table accessible from C that maps opentypes to +both CUnslicers and regular python-based Unslicers, and a CProtocol +tokenization code fed by a CTransport. It should be possible for the +python->C transition to occur in the reactor when it calls ctransport.doRead +python->and then not come back up to Python until Banana.receivedObject(), +at least for built-in types like dicts and strings. diff --git a/src/foolscap/doc/newpb-jobs.txt b/src/foolscap/doc/newpb-jobs.txt deleted file mode 100644 index d671b466..00000000 --- a/src/foolscap/doc/newpb-jobs.txt +++ /dev/null @@ -1,619 +0,0 @@ --*- outline -*- - -Reasonably independent newpb sub-tasks that need doing. Most important come -first. - -* decide on a version negotiation scheme - -Should be able to telnet into a PB server and find out that it is a PB -server. Pointing a PB client at an HTTP server (or an HTTP client at a PB -server) should result in an error, not a timeout. Implement in -banana.Banana.connectionMade(). - -desiderata: - - negotiation should take place with regular banana sequences: don't invent a - new protocol that is only used at the start of the connection - - Banana should be useable one-way, for storage or high-latency RPC (the mnet - folks want to create a method call, serialize it to a string, then encrypt - and forward it on to other nodes, sometimes storing it in relays along the - way if a node is offline for a few days). It should be easy for the layer - above Banana to feed it the results of what its negotiation would have been - (if it had actually used an interactive connection to its peer). Feeding the - same results to both sides should have them proceed as if they'd agreed to - those results. - - negotiation should be flexible enough to be extended but still allow old - code to talk with new code. Magically predict every conceivable extension - and provide for it from the very first release :). - -There are many levels to banana, all of which could be useful targets of -negotiation: - - which basic tokens are in use? Is there a BOOLEAN token? a NONE token? Can - it accept a LONGINT token or is the target limited to 32-bit integers? - - are there any variations in the basic Banana protocol being used? Could the - smaller-scope OPEN-counter decision be deferred until after the first - release and handled later with a compatibility negotiation flag? - - What "base" OPEN sequences are known? 'unicode'? 'boolean'? 'dict'? This is - an overlap between expressing the capabilities of the host language, the - Banana implementation, and the needs of the application. How about - 'instance', probably only used for StorageBanana? - - What "top-level" OPEN sequences are known? PB stuff (like 'call', and - 'your-reference')? Are there any variations or versions that need to be - known? We may add new functionality in the future, it might be useful for - one end to know whether this functionality is available or not. (the PB - 'call' sequence could some day take numeric argument names to convey - positional parameters, a 'reference' sequence could take a string to - indicate globally-visible PB URLs, it could become possible to pass - target.remote_foo directly to a peer and have a callable RemoteMethod object - pop out the other side). - - What "application-level" sequences are available? (Which RemoteInterface - classes are known and valid in 'call' sequences? Which RemoteCopy names are - valid for targets of the 'copy' sequence?). This is not necessarily within - the realm of Banana negotiation, but applications may need to negotiate this - sort of thing, and any disagreements will be manifested when Banana starts - raising Violations, so it may be useful to include it in the Banana-level - negotiation. - -On the other hand, negotiation is only useful if one side is prepared to -accomodate a peer which cannot do some of the things it would prefer to use, -or if it wants to know about the incapabilities so it can report a useful -failure rather than have an obscure protocol-level error message pop up an -hour later. So negotiation isn't the only goal: simple capability awareness -is a useful lesser goal. - -It kind of makes sense for the first object of a stream to be a negotiation -blob. We could make a new 'version' opentype, and declare that the contents -will be something simple and forever-after-parseable (like a dict, with heavy -constraints on the keys and values, all strings emitted in full). - -DONE, at least the framework is in place. Uses HTTP-style header-block -exchange instead of banana sequences, with client-sends-first and -server-decides. This correctly handles PB-vs-HTTP, but requires a timeout to -detect oldpb clients vs newpb servers. No actual feature negotiation is -performed yet, because we still only have the one version of the code. - -* connection initiation - -** define PB URLs - -[newcred is the most important part of this, the URL stuff can wait] - -A URL defines an endpoint: a pb.Referenceable, with methods. Somewhere along -the way it defines a transport (tcp+host+port, or unix+path) and an object -reference (pathname). It might also define a RemoteInterface, or that might -be put off until we actually invoke a method. - - URL = f("pb:", host, port, pathname) - d = pb.callRemoteURL(URL, ifacename, methodname, args) - -probably give an actual RemoteInterface instead of just its name - -a pb.RemoteReference claims to provide access to zero-or-more -RemoteInterfaces. You may choose which one you want to use when invoking -callRemote. - -TODO: decide upon a syntax for URLs that refer to non-TCP transports - pb+foo://stuff, pby://stuff (for yURL-style self-authenticating names) - -TODO: write the URL parser, implementing pb.getRemoteURL and pb.callRemoteURL - DONE: use a Tub/PBService instead - -TODO: decide upon a calling convention for callRemote when specifying which -RemoteInterface is being used. - - -DONE, PB-URL is the way to go. -** more URLs - -relative URLs (those without a host part) refer to objects on the same -Broker. Absolute URLs (those with a host part) refer to objects on other -Brokers. - -SKIP, interesting but not really useful - -** build/port pb.login: newcred for newpb - -Leave cred work for Glyph. - -<thomasvs> has some enhanced PB cred stuff (challenge/response, pb.Copyable -credentials, etc). - -URL = pb.parseURL("pb://lothar.com:8789/users/warner/services/petmail", - IAuthorization) -URL = doFullLogin(URL, "warner", "x8yzzy") -URL.callRemote(methodname, args) - -NOTDONE - -* constrain ReferenceUnslicer properly - -The schema can use a ReferenceConstraint to indicate that the object must be -a RemoteReference, and can also require that the remote object be capable of -handling a particular Interface. - -This needs to be implemented. slicer.ReferenceUnslicer must somehow actually -ask the constraint about the incoming tokens. - -An outstanding question is "what counts". The general idea is that -RemoteReferences come over the wire as a connection-scoped ID number and an -optional list of Interface names (strings and version numbers). In this case -it is the far end which asserts that its object can implement any given -Interface, and the receiving end just checks to see if the schema-imposed -required Interface is in the list. - -This becomes more interesting when applied to local objects, or if a -constraint is created which asserts that its object is *something* (maybe a -RemoteReference, maybe a RemoteCopy) which implements a given Interface. In -this case, the incoming object could be an actual instance, but the class -name must be looked up in the unjellyableRegistry (and the class located, and -the __implements__ list consulted) before any of the object's tokens are -accepted. - -* security TODOs: - -** size constraints on the set-vocab sequence - -* implement schema.maxSize() - -In newpb, schemas serve two purposes: - - a) make programs safer by reducing the surprises that can appear in their - arguments (i.e. factoring out argument-checking in a useful way) - - b) remove memory-consumption DoS attacks by putting an upper bound on the - memory consumed by any particular message. - -Each schema has a pair of methods named maxSize() and maxDepth() which -provide this upper bound. While the schema is in effect (say, during the -receipt of a particular named argument to a remotely-invokable method), at -most X bytes and Y slicer frames will be in use before either the object is -accepted and processed or the schema notes the violation and the object is -rejected (whereupon the temporary storage is released and all further bytes -in the rejected object are simply discarded). Strictly speaking, the number -returned by maxSize() is the largest string on the wire which has not yet -been rejected as violating the constraint, but it is also a reasonable -metric to describe how much internal storage must be used while processing -it. (To achieve greater accuracy would involve knowing exactly how large -each Python type is; not a sensible thing to attempt). - -The idea is that someone who is worried about an attacker throwing a really -long string or an infinitely-nested list at them can ask the schema just what -exactly their current exposure is. The tradeoff between flexibility ("accept -any object whatsoever here") and exposure to DoS attack is then user-visible -and thus user-selectable. - -To implement maxSize() for a basic schema (like a string), you simply need -to look at banana.xhtml and see how basic tokens are encoded (you will also -need to look at banana.py and see how deserialization is actually -implemented). For a schema.StringConstraint(32) (which accepts strings <= 32 -characters in length), the largest serialized form that has not yet been -either accepted or rejected is: - - 64 bytes (header indicating 0x000000..0020 with lots of leading zeros) - + 1 byte (STRING token) - + 32 bytes (string contents) - = 97 - -If the header indicates a conforming length (<=32) then just after the 32nd -byte is received, the string object is created and handed to up the stack, so -the temporary storage tops out at 97. If someone is trying to spam us with a -million-character string, the serialized form would look like: - - 64 bytes (header indicating 1-million in hex, with leading zeros) -+ 1 byte (STRING token) -= 65 - -at which point the receive parser would check the constraint, decide that -1000000 > 32, and reject the remainder of the object. - -So (with the exception of pass/fail maxSize values, see below), the following -should hold true: - - schema.StringConstraint(32).maxSize() == 97 - -Now, schemas which represent containers have size limits that are the sum of -their contents, plus some overhead (and a stack level) for the container -itself. For example, a list of two small integers is represented in newbanana -as: - - OPEN(list) - INT - INT - CLOSE() - -which really looks like: - - opencount-OPEN - len-STRING-"list" - value-INT - value-INT - opencount-CLOSE - -This sequence takes at most: - - opencount-OPEN: 64+1 - len-STRING-"list": 64+1+1000 (opentypes are confined to be <= 1k long) - value-INT: 64+1 - value-INT: 64+1 - opencount-CLOSE: 64+1 - -or 5*(64+1)+1000 = 1325, or rather: - - 3*(64+1)+1000 + N*(IntConstraint().maxSize()) - -So ListConstraint.maxSize is computed by doing some math involving the -.maxSize value of the objects that go into it (the ListConstraint.constraint -attribute). This suggests a recursive algorithm. If any constraint is -unbounded (say a ListConstraint with no limit on the length of the list), -then maxSize() raises UnboundedSchema to indicate that there is no limit on -the size of a conforming string. Clearly, if any constraint is found to -include itself, UnboundedSchema must also be raised. - -This is a loose upper bound. For example, one non-conforming input string -would be: - - opencount-OPEN: 64+1 - len-STRING-"x"*1000: 64+1+1000 - -The entire string would be accepted before checking to see which opentypes -were valid: the ListConstraint only accepts the "list" opentype and would -reject this string immediately after the 1000th "x" was received. So a -tighter upper bound would be 2*65+1000 = 1130. - -In general, the bound is computed by walking through the deserialization -process and identifying the largest string that could make it past the -validity checks. There may be later checks that will reject the string, but -if it has not yet been rejected, then it still represents exposure for a -memory consumption DoS. - -** pass/fail sizes - -I started to think that it was necessary to have each constraint provide two -maxSize numbers: one of the largest sequence that could possibly be accepted -as valid, and a second which was the largest sequence that could be still -undecided. This would provide a more accurate upper bound because most -containers will respond to an invalid object by abandoning the rest of the -container: i.e. if the current active constraint is: - - ListConstraint(StringConstraint(32), maxLength=30) - -then the first thing that doesn't match the string constraint (say an -instance, or a number, or a 33-character string) will cause the ListUnslicer -to go into discard-everything mode. This makes a significant difference when -the per-item constraint allows opentypes, because the OPEN type (a string) is -constrained to 1k bytes. The item constraint probably imposes a much smaller -limit on the set of actual strings that would be accepted, so no -kilobyte-long opentype will possibly make it past that constraint. That means -there can only be one outstanding invalid object. So the worst case (maximal -length) string that has not yet been rejected would be something like: - - OPEN(list) - validthing [0] - validthing [1] - ... - validthing [n-1] - long-invalid-thing - -because if the long-invalid thing had been received earlier, the entire list -would have been abandoned. - -This suggests that the calculation for ListConstraint.maxSize() really needs -to be like - overhead - +(len-1)*itemConstraint.maxSize(valid) - +(1)*itemConstraint.maxSize(invalid) - -I'm still not sure about this. I think it provides a significantly tighter -upper bound. The deserialization process itself does not try to achieve the -absolute minimal exposure (i.e., the opentype checker could take the set of -all known-valid open types, compute the maximum length, and then impose a -StringConstraint with that length instead of 1000), because it is, in -general, a inefficient hassle. There is a tradeoff between computational -efficiency and removing the slack in the maxSize bound, both in the -deserialization process (where the memory is actually consumed) and in -maxSize (where we estimate how much memory could be consumed). - -Anyway, maxSize() and maxDepth() (which is easier: containers add 1 to the -maximum of the maxDepth values of their possible children) need to be -implemented for all the Constraint classes. There are some tests (disabled) -in test_schema.py for this code: those tests assert specific values for -maxSize. Those values are probably wrong, so they must be updated to match -however maxSize actually works. - -* decide upon what the "Shared" constraint should mean - -The idea of this one was to avoid some vulnerabilities by rejecting arbitrary -object graphs. Fundamentally Banana can represent most anything (just like -pickle), including objects that refer to each other in exciting loops and -whorls. There are two problems with this: it is hard to enforce a schema that -allows cycles in the object graph (indeed it is tricky to even describe one), -and the shared references could be used to temporarily violate a schema. - -I think these might be fixable (the sample case is where one tuple is -referenced in two different places, each with a different constraint, but the -tuple is incomplete until some higher-level node in the graph has become -referenceable, so [maybe] the schema can't be enforced until somewhat after -the object has actually finished arriving). - -However, Banana is aimed at two different use-cases. One is kind of a -replacement for pickle, where the goal is to allow arbitrary object graphs to -be serialized but have more control over the process (in particular we still -have an unjellyableRegistry to prevent arbitrary constructors from being -executed during deserialization). In this mode, a larger set of Unslicers are -available (for modules, bound methods, etc), and schemas may still be useful -but are not enforced by default. - -PB will use the other mode, where the set of conveyable objects is much -smaller, and security is the primary goal (including putting limits on -resource consumption). Schemas are enforced by default, and all constraints -default to sensible size limits (strings to 1k, lists to [currently] 30 -items). Because complex object graphs are not commonly transported across -process boundaries, the default is to not allow any Copyable object to be -referenced multiple times in the same serialization stream. The default is to -reject both cycles and shared references in the object graph, allowing only -strict trees, making life easier (and safer) for the remote methods which are -being given this object tree. - -The "Shared" constraint is intended as a way to turn off this default -strictness and allow the object to be referenced multiple times. The -outstanding question is what this should really mean: must it be marked as -such on all places where it could be referenced, what is the scope of the -multiple-reference region (per- method-call, per-connection?), and finally -what should be done when the limit is violated. Currently Unslicers see an -Error object which they can respond to any way they please: the default -containers abandon the rest of their contents and hand an Error to their -parent, the MethodCallUnslicer returns an exception to the caller, etc. With -shared references, the first recipient sees a valid object, while the second -and later recipient sees an error. - - -* figure out Deferred errors for immutable containers - -Somewhat related to the previous one. The now-classic example of an immutable -container which cannot be created right away is the object created by this -sequence: - - t = ([],) - t[0].append((t,)) - -This serializes into (with implicit reference numbers on the left): - -[0] OPEN(tuple) -[1] OPEN(list) -[2] OPEN(tuple) -[3] OPEN(reference #0) - CLOSE - CLOSE - CLOSE - -In newbanana, the second TupleUnslicer cannot return a fully-formed tuple to -its parent (the ListUnslicer), because that tuple cannot be created until the -contents are all referenceable, and that cannot happen until the first -TupleUnslicer has completed. So the second TupleUnslicer returns a Deferred -instead of a tuple, and the ListUnslicer adds a callback which updates the -list's item when the tuple is complete. - -The problem here is that of error handling. In general, if an exception is -raised (perhaps a protocol error, perhaps a schema violation) while an -Unslicer is active, that Unslicer is abandoned (all its remaining tokens are -discarded) and the parent gets an Error object. (the parent may give up too.. -the basic Unslicers all behave this way, so any exception will cause -everything up to the RootUnslicer to go boom, and the RootUnslicer has the -option of dropping the connection altogether). When the error is noticed, the -Unslicer stack is queried to figure out what path was taken from the root of -the object graph to the site that had an error. This is really useful when -trying to figure out which exact object cause a SchemaViolation: rather than -being told a call trace or a description of the *object* which had a problem, -you get a description of the path to that object (the same series of -dereferences you'd use to print the object: obj.children[12].peer.foo.bar). - -When references are allowed, these exceptions could occur after the original -object has been received, when that Deferred fires. There are two problems: -one is that the error path is now misleading, the other is that it might not -have been possible to enforce a schema because the object was incomplete. - -The most important thing is to make sure that an exception that occurs while -the Deferred is being fired is caught properly and flunks the object just as -if the problem were caught synchronously. This may involve discarding an -otherwise complete object graph and blaming the problem on a node much closer -to the root than the one which really caused the failure. - -* adaptive VOCAB compression - -We want to let banana figure out a good set of strings to compress on its -own. In Banana.sendToken, keep a list of the last N strings that had to be -sent in full (i.e. they weren't in the table). If the string being sent -appears more than M times in that table, before we send the token, emit an -ADDVOCAB sequence, add a vocab entry for it, then send a numeric VOCAB token -instead of the string. - -Make sure the vocab mapping is not used until the ADDVOCAB sequence has been -queued. Sending it inline should take care of this, but if for some reason we -need to push it on the top-level object queue, we need to make sure the vocab -table is not updated until it gets serialized. Queuing a VocabUpdate object, -which updates the table when it gets serialized, would take care of this. The -advantage of doing it inline is that later strings in the same object graph -would benefit from the mapping. The disadvantage is that the receiving -Unslicers must be prepared to deal with ADDVOCAB sequences at any time (so -really they have to be stripped out). This disadvantage goes away if ADDVOCAB -is a token instead of a sequence. - -Reasonable starting values for N and M might be 30 and 3. - -* write oldbanana compatibility code? - -An oldbanana peer can be detected because the server side sends its dialect -list from connectionMade, and oldbanana lists are sent with OLDLIST tokens -(the explicit-length kind). - - -* add .describe methods to all Slicers - -This involves setting an attribute between each yield call, to indicate what -part is about to be serialized. - - -* serialize remotely-callable methods? - -It might be useful be able to do something like: - - class Watcher(pb.Referenceable): - def remote_foo(self, args): blah - - w = Watcher() - ref.callRemote("subscribe", w.remote_foo) - -That would involve looking up the method and its parent object, reversing -the remote_*->* transformation, then sending a sequence which contained both -the object's RemoteReference and the appropriate method name. - -It might also be useful to generalize this: passing a lambda expression to -the remote end could stash the callable in a local table and send a Callable -Reference to the other side. I can smell a good general-purpose object -classification framework here, but I haven't quite been able to nail it down -exactly. - -* testing - -** finish testing of LONGINT/LONGNEG - -test_banana.InboundByteStream.testConstrainedInt needs implementation - -** thoroughly test failure-handling at all points of in/out serialization - -places where BananaError or Violation might be raised - -sending side: - Slicer creation (schema pre-validation? no): no no - pre-validation is done before sending the object, Broker.callFinished, - RemoteReference.doCall - slicer creation is done in newSlicerFor - - .slice (called in pushSlicer) ? - .slice.next raising Violation - .slice.next returning Deferrable when streaming isn't allowed - .sendToken (non-primitive token, can't happen) - .newSlicerFor (no ISlicer adapter) - top.childAborted - -receiving side: - long header (>64 bytes) - checkToken (top.openerCheckToken) - checkToken (top.checkToken) - typebyte == LIST (oldbanana) - bad VOCAB key - too-long vocab key - bad FLOAT encoding - top.receiveClose - top.finish - top.reportViolation - oldtop.finish (in from handleViolation) - top.doOpen - top.start -plus all of these when discardCount != 0 -OPENOPEN - -send-side uses: - f = top.reportViolation(f) -receive-side should use it too (instead of f.raiseException) - -** test failure-handing during callRemote argument serialization - -** implement/test some streaming Slicers - -** test producer Banana - -* profiling/optimization - -Several areas where I suspect performance issues but am unwilling to fix -them before having proof that there is a problem: - -** Banana.produce - -This is the main loop which creates outbound tokens. It is called once at -connectionMade() (after version negotiation) and thereafter is fired as the -result of a Deferred whose callback is triggered by a new item being pushed -on the output queue. It runs until the output queue is empty, or the -production process is paused (by a consumer who is full), or streaming is -enabled and one of the Slicers wants to pause. - -Each pass through the loop either pushes a single token into the transport, -resulting in a number of short writes. We can do better than this by telling -the transport to buffer the individual writes and calling a flush() method -when we leave the loop. I think Itamar's new cprotocol work provides this -sort of hook, but it would be nice if there were a generalized Transport -interface so that Protocols could promise their transports that they will -use flush() when they've stopped writing for a little while. - -Also, I want to be able to move produce() into C code. This means defining a -CSlicer in addition to the cprotocol stuff before. The goal is to be able to -slice a large tree of basic objects (lists, tuples, dicts, strings) without -surfacing into Python code at all, only coming "up for air" when we hit an -object type that we don't recognize as having a CSlicer available. - -** Banana.handleData - -The receive-tokenization process wants to be moved into C code. It's -definitely on the critical path, but it's ugly because it has to keep -calling into python code to handle each extracted token. Maybe there is a -way to have fast C code peek through the incoming buffers for token -boundaries, then give a list of offsets and lengths to the python code. The -b128 conversion should also happen in C. The data shouldn't be pulled out of -the input buffer until we've decided to accept it (i.e. the -memory-consumption guarantees that the schemas provide do not take any -transport-level buffering into account, and doing cprotocol tokenization -would represent memory that an attacker can make us spend without triggering -a schema violation). Itamar's CLineReceiver is a good example: you tokenize -a big buffer as much as you can, pass the tokens upstairs to Python code, -then hand the leftover tail to the next read() call. The tokenizer always -works on the concatenation of two buffers: the tail of the previous read() -and the complete contents of the current one. - -** Unslicer.doOpen delegation - -Unslicers form a stack, and each Unslicer gets to exert control over the way -that its descendents are deserialized. Most don't bother, they just delegate -the control methods up to the RootUnslicer. For example, doOpen() takes an -opentype and may return a new Unslicer to handle the new OPEN sequence. Most -of the time, each Unslicer delegates doOpen() to their parent, all the way -up the stack to the RootUnslicer who actually performs the UnslicerRegistry -lookup. - -This provides an optimization point. In general, the Unslicer knows ahead of -time whether it cares to be involved in these methods or not (i.e. whether -it wants to pay attention to its children/descendants or not). So instead of -delegating all the time, we could just have a separate Opener stack. -Unslicers that care would be pushed on the Opener stack at the same time -they are pushed on the regular unslicer stack, likewise removed. The -doOpen() method would only be invoked on the top-most Opener, removing a lot -of method calls. (I think the math is something like turning -avg(treedepth)*avg(nodes) into avg(nodes)). - -There are some other methods that are delegated in this way. open() is -related to doOpen(). setObject()/getObject() keep track of references to -shared objects and are typically only intercepted by a second-level object -which defines a "serialization scope" (like a single remote method call), as -well as connection-wide references (like pb.Referenceables) tracked by the -PBRootUnslicer. These would also be targets for optimization. - -The fundamental reason for this optimization is that most Unslicers don't -care about these methods. There are far more uses of doOpen() (one per -object node) then there are changes to the desired behavior of doOpen(). - -** CUnslicer - -Like CSlicer, the unslicing process wants to be able to be implemented (for -built-in objects) entirely in C. This means a CUnslicer "object" (a struct -full of function pointers), a table accessible from C that maps opentypes to -both CUnslicers and regular python-based Unslicers, and a CProtocol -tokenization code fed by a CTransport. It should be possible for the -python->C transition to occur in the reactor when it calls ctransport.doRead -python->and then not come back up to Python until Banana.receivedObject(), -at least for built-in types like dicts and strings. diff --git a/src/foolscap/doc/newpb-todo.txt b/src/foolscap/doc/newpb-todo.txt deleted file mode 100644 index 14dc1608..00000000 --- a/src/foolscap/doc/newpb-todo.txt +++ /dev/null @@ -1,1304 +0,0 @@ --*- outline -*- - -non-independent things left to do on newpb. These require deeper magic or -can not otherwise be done casually. Many of these involve fundamental -protocol issues, and therefore need to be decided sooner rather than later. - -* summary -** protocol issues -*** negotiation -*** VOCABADD/DEL/SET sequences -*** remove 'copy' prefix from RemoteCopy type sequences? -*** smaller scope for OPEN-counter reference numbers? -** implementation issues -*** cred -*** oldbanana compatibility -*** Copyable/RemoteCopy default to __getstate__ or self.__dict__ ? -*** RIFoo['bar'] vs RIFoo.bar (should RemoteInterface inherit from Interface?) -*** constrain ReferenceUnslicer -*** serialize target.remote_foo usefully - -* decide whether to accept positional args in non-constrained methods - -DEFERRED until after 2.0 -<glyph> warner: that would be awesome but let's do it _later_ - -This is really a backwards-source-compatibility issue. In newpb, the -preferred way of invoking callRemote() is with kwargs exclusively: glyph's -felt positional arguments are more fragile. If the client has a -RemoteInterface, then they can convert any positional arguments into keyword -arguments before sending the request. - -The question is what to do when the client is not using a RemoteInterface. -Until recently, callRemote("bar") would try to find a matching RI. I changed -that to have callRemote("bar") never use an RI, and instead you would use -callRemote(RIFoo['bar']) to indicate that you want argument-checking. - -That makes positional arguments problematic in more situations than they were -before. The decision to be made is if the OPEN(call) sequence should provide -a way to convey positional args to the server (probably with numeric "names" -in the (argname, argvalue) tuples). If we do this, the server (which always -has the RemoteInterface) can do the positional-to-keyword mapping. But -putting this in the protocol will oblige other implementations to handle them -too. - -* change the method-call syntax to include an interfacename -DONE - -Scope the method name to the interface. This implies (I think) one of two -things: - - callRemote() must take a RemoteInterface argument - - each RemoteReference handles just a single Interface - -Probably the latter, maybe have the RR keep both default RI and a list of -all implemented ones, then adapting the RR to a new RI can be a simple copy -(and change of the default one) if the Referenceable knows about the RI. -Otherwise something on the local side will need to adapt one RI to another. -Need to handle reference-counting/DECREF properly for these shared RRs. - -From glyph: - - callRemote(methname, **args) # searches RIs - callRemoteInterface(remoteinterface, methname, **args) # single RI - - getRemoteURL(url, *interfaces) - - URL-RRefs should turn into the original Referenceable (in args/results) - (map through the factory's table upon receipt) - - URL-RRefs will not survive round trips. leave reference exchange for later. - (like def remote_foo(): return GlobalReference(self) ) - - move method-invocation code into pb.Referenceable (or IReferenceable - adapter). Continue using remote_ prefix for now, but make it a property of - that code so it can change easily. - <warner> ok, for today I'm just going to stick with remote_foo() as a - low-budget decorator, so the current restrictions are 1: subclass - pb.Referenceable, 2: implements() a RemoteInterface with method named "foo", - 3: implement a remote_foo method - <warner> and #1 will probably go away within a week or two, to be replaced by - #1a: subclass pb.Referenceable OR #1b: register an IReferenceable adapter - - try serializing with ISliceable first, then try IReferenceable. The - IReferenceable adapter must implements() some RemoteInterfaces and gets - serialized with a MyReferenceSlicer. - -http://svn.twistedmatrix.com/cvs/trunk/pynfo/admin.py?view=markup&rev=44&root=pynfo - -** use the methods of the RemoteInterface as the "method name" -DONE (provisional), using RIFoo['add'] - - rr.callRemote(RIFoo.add, **args) - -Nice and concise. However, #twisted doesn't like it, adding/using arbitrary -attributes of Interfaces is not clean (think about IFoo.implements colliding -with RIFoo.something). - - rr.callRemote(RIFoo['add'], **args) - RIFoo(rr).callRemote('add', **args) - adaptation, or narrowing? - -<warner> glyph: I'm adding callRemote(RIFoo.bar, **args) to newpb right now -<radix> wow. -<warner> seemed like a simpler interface than callRemoteInterface("RIFoo", -"bar", **args) -<radix> warner: Does this mean that IPerspective can be parameterized now? -<glyph> warner: bad idea -<exarkun> warner: Zope hates you! -<glyph> warner: zope interfaces don't support that syntax -<slyphon> zi does support multi-adapter syntax -<slyphon> but i don't really know what that is -<exarkun> warner: callRemote(RIFoo.getDescriptionFor("bar"), *a, **k) -<warner> glyph: yeah, I fake it. In RemoteInterfaceClass, I remove those -attributes, call InterfaceClass, and then put them all back in -<glyph> warner: don't add 'em as attributes -<glyph> warner: just fix the result of __getitem__ to add a slot actually -refer back to the interface -<glyph> radix: the problem is that IFoo['bar'] doesn't point back to IFoo -<glyph> warner: even better, make them callable :-) -<exarkun> glyph: IFoo['bar'].interface == 'IFoo' -<glyph> RIFoo['bar']('hello') -<warner> glyph: I was thinking of doing that in a later version of -RemoteInterface -<glyph> exarkun: >>> type(IFoo['bar'].interface) -<glyph> <type 'str'> -<exarkun> right -<exarkun> 'IFoo' -<exarkun> Just look through all the defined interfaces for ones with matching -names -<glyph> exarkun: ... e.g. *NOT* __main__.IFoo -<glyph> exarkun: AAAA you die -<radix> hee hee -* warner struggles to keep up with his thoughts and those of people around him -* glyph realizes he has been given the power to whine -<warner> glyph: ok, so with RemoteInterface.__getitem__, you could still do -rr.callRemote(RIFoo.bar, **kw), right? -<warner> was your objection to the interface or to the implementation? -<itamar> I really don't think you should add attributes to the interface -<warner> ok -<warner> I need to stash a table of method schemas somewhere -<itamar> just make __getitem__ return better type of object -<itamar> and ideally if this is generic we can get it into upstream -<exarkun> Is there a reason Method.interface isn't a fully qualified name? -<itamar> not necessarily -<itamar> I have commit access to zope.interface -<itamar> if you have any features you want added, post to -interface-dev@zope.org mailing list -<itamar> and if Jim Fulton is ok with them I can add them for you -<warner> hmm -<warner> does using RIFoo.bar to designate a remote method seem reasonable? -<warner> I could always adapt it to something inside callRemote -<warner> something PB-specific, that is -<warner> but that adapter would have to be able to pull a few attributes off -the method (name, schema, reference to the enclosing RemoteInterface) -<warner> and we're really talking about __getattr__ here, not __getitem__, -right? -<exarkun> for x.y yes -<itamar> no, I don't think that's a good idea -<itamar> interfaces have all kinds od methods on them already, for -introspection purposes -<itamar> namespace clashes are the suck -<itamar> unless RIFoo isn't really an Interface -<itamar> hm -<itamar> how about if it were a wrapper around a regular Interface? -<warner> yeah, RemoteInterfaces are kind of a special case -<itamar> RIFoo(IFoo, publishedMethods=['doThis', 'doThat']) -<itamar> s/RIFoo/RIFoo = RemoteInterface(/ -<exarkun> I'm confused. Why should you have to specify which methods are -published? -<itamar> SECURITY! -<itamar> not actually necessary though, no -<itamar> and may be overkill -<warner> the only reason I have it derive from Interface is so that we can do -neat adapter tricks in the future -<itamar> that's not contradictory -<itamar> RIFoo(x) would still be able to do magic -<itamar> you wouldn't be able to check if an object provides RIFoo, though -<itamar> which kinda sucks -<itamar> but in any case I am against RIFoo.bar -<warner> pity, it makes the callRemote syntax very clean -<radix> hm -<radix> So how come it's a RemoteInterface and not an Interface, anyway? -<radix> I mean, how come that needs to be done explicitly. Can't you just -write a serializer for Interface itself? - -* warner goes to figure out where the RemoteInterface discussion went after he - got distracted -<warner> maybe I should make RemoteInterface a totally separate class and just -implement a couple of Interface-like methods -<warner> cause rr.callRemote(IFoo.bar, a=1) just feels so clean -<Jerub> warner: why not IFoo(rr).bar(a=1) ? -<warner> hmm, also a possibility -<radix> well -<radix> IFoo(rr).callRemote('bar') -<radix> or RIFoo, or whatever -<Jerub> hold on, what does rr inherit from? -<warner> RemoteReference -<radix> it's a RemoteReference -<Jerub> then why not IFoo(rr) / -<warner> I'm keeping a strong distinction between local interfaces and remote -ones -<Jerub> ah, oka.y -<radix> warner: right, you can still do RIFoo -<warner> ILocal(a).meth(args) is an immediate function call -<Jerub> in that case, I prefer rr.callRemote(IFoo.bar, a=1) -<radix> .meth( is definitely bad, we need callRemote -<warner> rr.callRemote("meth", args) returns a deferred -<Jerub> radix: I don't like from foo import IFoo, RIFoo -<warner> you probably wouldn't have both an IFoo and an RIFoo -<radix> warner: well, look at it this way: IFoo(rr).callRemote('foo') still -makes it obvious that IFoo isn't local -<radix> warner: you could implement RemoteReferen.__conform__ to implement it -<warner> radix: I'm thinking of providing some kind of other class that would -allow .meth() to work (without the callRemote), but it wouldn't be the default -<radix> plus, IFoo(rr) is how you use interfaces normally, and callRemote is -how you make remote calls normally, so it seems that's the best way to do -interfaces + PB -<warner> hmm -<warner> in that case the object returned by IFoo(rr) is just rr with a tag -that sets the "default interface name" -<radix> right -<warner> and callRemote(methname) looks in that default interface before -looking anywhere else -<warner> for some reason I want to get rid of the stringyness of the method -name -<warner> and the original syntax (callRemoteInterface('RIFoo', 'methname', -args)) felt too verbose -<radix> warner: well, isn't that what your optional .meth thing is for? -<radix> yes, I don't like that either -<warner> using callRemote(RIFoo.bar, args) means I can just switch on the -_name= argument being either a string or a (whatever) that's contained in a -RemoteInterface -<warner> a lot of it comes down to how adapters would be most useful when -dealing with remote objects -<warner> and to what extent remote interfaces should be interchangeable with -local ones -<radix> good point. I have never had a use case where I wanted to adapt a -remote object, I don't think -<radix> however, I have had use cases to send interfaces across the wire -<radix> e.g. having a parameterized portal.login() interface -<warner> that'll be different, just callRemote('foo', RIFoo) -<radix> yeah. -<warner> the current issue is whether to pass them by reference or by value -<radix> eugh -<radix> Can you explain it without using those words? :) -<warner> hmm -<radix> Do you mean, Referenceable style vs Copyable style? -<warner> at the moment, when you send a Referenceable across the wire, the -id-number is accompanied with a list of strings that designate which -RemoteInterfaces the original claims to provide -<warner> the receiving end looks up each string in a local table, and -populates the RemoteReference with a list of RemoteInterface classes -<warner> the table is populated by metaclass magic that runs when a 'class -RIFoo(RemoteInterface)' definition is complete -<radix> ok -<radix> so a RemoteInterface is simply serialized as its qual(), right? -<warner> so as long as both sides include the same RIFoo definition, they'll -wind up with compatible remote interfaces, defining the same method names, -same method schemas, etc -<warner> effectively -<warner> you can't just send a RemoteInterface across the wire right now, but -it would be easy to add -<warner> the places where they are used (sending a Referenceable across the -wire) all special case them -<radix> ok, and you're considering actually writing a serializer for them that -sends all the information to totally reconstruct it on the other side without -having the definiton -<warner> yes -<warner> or having some kind of debug method which give you that -<radix> I'd say, do it the way you're doing it now until someone comes up with -a use case for actually sending it... -<warner> right -<warner> the only case I can come up with is some sort of generic object -browser debug tool -<warner> everything else turns into a form of version negotiation which is -better handled elsewhere -<warner> hmm -<warner> so RIFoo(rr).callRemote('bar', **kw) -<warner> I guess that's not too ugly -<radix> That's my vote. :) -<warner> one thing it lacks is the ability to cleanly state that if 'bar' -doesn't exist in RIFoo then it should signal an error -<warner> whereas callRemote(RIFoo.bar, **kw) would give you an AttributeError -before callRemote ever got called -<warner> i.e. "make it impossible to express the incorrect usage" -<radix> mmmh -<radix> warner: but you _can_ check it immediately when it's called -<warner> in the direction I was heading, callRemote(str) would just send the -method request and let the far end deal with it, no schema-checking involved -<radix> warner: which, 99% of the time, is effectively the same time as -IFoo.bar would happen -<warner> whereas callRemote(RIFoo.bar) would indicate that you want schema -checking -<warner> yeah, true -<radix> hm. -<warner> (that last feature is what allowed callRemote and callRemoteInterface -to be merged) -<warner> or, I could say that the normal RemoteReference is "untyped" and does -not do schema checking -<warner> but adapting one to a RemoteInterface results in a -TypedRemoteReference which does do schema checking -<warner> and which refuses to be invoked with method names that are not in the -schema -<radix> warner: we-ell -<radix> warner: doing method existence checking is cool -<radix> warner: but I think tying any further "schema checking" to adaptation -is a bad idea -<warner> yeah, that's my hunch too -<warner> which is why I'd rather not use adapters to express the scope of the -method name (which RemoteInterface it is supposed to be a part of) -<radix> warner: well, I don't think tying it to callRemote(RIFoo.methName) -would be a good idea just the same -<warner> hm -<warner> so that leaves rr.callRemote(RIFoo['add']) and -rr.callRemoteInterface(RIFoo, 'add') -<radix> OTOH, I'm inclined to think schema checking should happen by default -<radix> It's just a the matter of where it's parameterized -<warner> yeah, it's just that the "default" case (rr.callRemote('name')) needs -to work when there aren't any RemoteInterfaces declared -<radix> warner: oh -<warner> but if we want to encourage people to use the schemas, then we need -to make that case simple and concise -* radix goes over the issue in his head again -<radix> Yes, I think I still have the same position. -<warner> which one? :) -<radix> IFoo(rr).callRemote("foo"); which would do schema checking because -schema checking is on by default when it's possible -<warner> using an adaptation-like construct to declare a scope of the method -name that comes later -<radix> well, it _is_ adaptation, I think. -<radix> Adaptation always has plugged in behavior, we're just adding a bit -more :) -<warner> heh -<warner> it is a narrowing of capability -<radix> hmm, how do you mean? -<warner> rr.callRemote("foo") will do the same thing -<warner> but rr.callRemote("foo") can be used without the remote interfaces -<radix> I think I lost you. -<warner> if rr has any RIs defined, it will try to use them (and therefore -complain if "foo" does not exist in any of them, or if the schema is violated) -<radix> Oh. That's strange. -<radix> So it's really quite different from how interfaces regularly work... -<warner> yeah -<warner> except that if you were feeling clever you could use them the normal -way -<radix> Well, my inclination is to make them work as similarly as possible. -<warner> "I have a remote reference to something that implements RIFoo, but I -want to use it in some other way" -<radix> s/possible/practical/ -<warner> then IBar(rr) or RIBar(rr) would wrap rr in something that knows how -to translate Bar methods into RIFoo remote methods -<radix> Maybe it's not practical to make them very similar. -<radix> I see. - -rr.callRemote(RIFoo.add, **kw) -rr.callRemote(RIFoo['add'], **kw) -RIFoo(rr).callRemote('add', **kw) - -I like the second one. Normal Interfaces behave like a dict, so IFoo['add'] -gets you the method-describing object (z.i.i.Method). My RemoteInterfaces -don't do that right now (because I remove the attributes before handing the -RI to z.i), but I could probably fix that. I could either add attributes to -the Method or hook __getitem__ to return something other than a Method -(maybe a RemoteMethodSchema). - -Those Method objects have a .getSignatureInfo() which provides almost -everything I need to construct the RemoteMethodSchema. Perhaps I should -post-process Methods rather than pre-process the RemoteInterface. I can't -tell how to use the return value trick, and it looks like the function may -be discarded entirely once the Method is created, so this approach may not -work. - -On the server side (Referenceable), subclassing Interface is nice because it -provides adapters and implements() queries. - -On the client side (RemoteReference), subclassing Interface is a hassle: I -don't think adapters are as useful, but getting at a method (as an attribute -of the RI) is important. We have to bypass most of Interface to parse the -method definitions differently. - -* create UnslicerRegistry, registerUnslicer -DONE (PROVISIONAL), flat registry (therefore problematic for len(opentype)>1) - -consider adopting the existing collection API (getChild, putChild) for this, -or maybe allow registerUnslicer() to take a callable which behaves kind of -like a twisted.web isLeaf=1 resource (stop walking the tree, give all index -tokens to the isLeaf=1 node) - -also some APIs to get a list of everything in the registry - -* use metaclass to auto-register RemoteCopy classes -DONE - -** use metaclass to auto-register Unslicer classes -DONE - -** and maybe Slicer classes too -DONE with name 'slices', perhaps change to 'slicerForClasses'? - - class FailureSlicer(slicer.BaseSlicer): - classname = "twisted.python.failure.Failure" - slicerForClasses = (failure.Failure,) # triggers auto-register - -** various registry approaches -DONE - -There are currently three kinds of registries used in banana/newpb: - - RemoteInterface <-> interface name - class/type -> Slicer (-> opentype) -> Unslicer (-> class/type) - Copyable subclass -> copyable-opentype -> RemoteCopy subclass - -There are two basic approaches to representing the mappings that these -registries implement. The first is implicit, where the local objects are -subclassed from Sliceable or Copyable or RemoteInterface and have attributes -to define the wire-side strings that represent them. On the receiving side, -we make extensive use of metaclasses to perform automatic registration -(taking names from class attributes and mapping them to the factory or -RemoteInterface used to create the remote version). - -The second approach is explicit, where pb.registerRemoteInterface, -pb.registerRemoteCopy, and pb.registerUnslicer are used to establish the -receiving-side mapping. There isn't a clean way to do it explicitly on the -sending side, since we already have instances whose classes can give us -whatever information we want. - -The advantage of implicit is simplicity: no more questions about why my -pb.RemoteCopy is giving "not unserializable" errors. The mere act of -importing a module is enough to let PB create instances of its classes. - -The advantage of doing it explicitly is to remind the user about the -existence of those maps, because the factory classes in the receiving map is -precisely equal to the user's exposure (from a security point of view). See -the E paper on secure-serialization for some useful concepts. - -A disadvantage of implicit is that you can't quite be sure what, exactly, -you're exposed to: the registrations take place all over the place. - -To make explicit not so painful, we can use quotient's .wsv files -(whitespace-separated values) which map from class to string and back again. -The file could list fully-qualified classname, wire-side string, and -receiving factory class on each line. The Broker (or rather the RootSlicer -and RootUnslicer) would be given a set of .wsv files to define their -mapping. It would get all the registrations at once (instead of having them -scattered about). They could also demand-load the receive-side factory -classes. - -For now, go implicit. Put off the decision until we have some more -experience with using newpb. - -* move from VocabSlicer sequence to ADDVOCAB/DELVOCAB tokens - -Requires a .wantVocabString flag in the parser, which is kind of icky but -fixes the annoying asymmetry between set (vocab sequence) and get (VOCAB -token). Might want a CLEARVOCAB token too. - -On second thought, this won't work. There isn't room for both a vocab number -and a variable-length string in a single token. It must be an open sequence. -However, it could be an add/del/set-vocab sequence, allowing the vocab to be -modified incrementally. - -** VOCABize interface/method names - -One possibility is to make a list of all strings used by all known -RemoteInterfaces and all their methods, then send it at broker connection -time as the initial vocab map. A better one (maybe) is to somehow track what -we send and add a word to the vocab once we've sent it more than three -times. - -Maybe vocabize the pairs, as "ri/name1","ri/name2", etc, or maybe do them -separately. Should do some handwaving math to figure out which is better. - -* nail down some useful schema syntaxes - -This has two parts: parsing something like a __schema__ class attribute (see -the sketches in schema.xhtml) into a tree of FooConstraint objects, and -deciding how to retrieve schemas at runtime from things like the object being -serialized or the object being called from afar. To be most useful, the -syntax needs to mesh nicely (read "is identical to") things like formless and -(maybe?) atop or whatever has replaced the high-density highly-structured -save-to-disk scheme that twisted.world used to do. - -Some lingering questions in this area: - - When an object has a remotely-invokable method, where does the appropriate - MethodConstraint come from? Some possibilities: - - an attribute of the method itself: obj.method.__schema__ - - from inside a __schema__ attribute of the object's class - - from inside a __schema__ attribute of an Interface (which?) that the object - implements - - Likewise, when a caller holding a RemoteReference invokes a method on it, it - would be nice to enforce a schema on the arguments they are sending to the - far end ("be conservative in what you send"). Where should this schema come - from? It is likely that the sender only knows an Interface for their - RemoteReference. - - When PB determines that an object wants to be copied by value instead of by - reference (pb.Copyable subclass, Copyable(obj), schema says so), where - should it find a schema to define what exactly gets copied over? A class - attribute of the object's class would make sense: most objects would do - this, some could override jellyFor to get more control, and others could - override something else to push a new Slicer on the stack and do streaming - serialization. Whatever the approach, it needs to be paralleled by the - receiving side's unjellyableRegistry. - -* RemoteInterface instances should have an "RI-" prefix instead of "I-" - -DONE - -* merge my RemoteInterface syntax with zope.interface's - -I hacked up a syntax for how method definitions are parsed in -RemoteInterface objects. That syntax isn't compatible with the one -zope.interface uses for local methods, so I just delete them from the -attribute dictionary to avoid causing z.i indigestion. It would be nice if -they were compatible so I didn't have to do that. This basically translates -into identifying the nifty extra flags (like priority classes, no-response) -that we want on these methods and finding a z.i-compatible way to implement -them. It also means thinking of SOAP/XML-RPC schemas and having a syntax -that can represent everything at once. - - -* use adapters to enable pass-by-reference or pass-by-value - -It should be possible to pass a reference with variable forms: - - rr.callRemote("foo", 1, Reference(obj)) - rr.callRemote("bar", 2, Copy(obj)) - -This should probably adapt the object to IReferenceable or ICopyable, which -are like ISliceable except they can pass the object by reference or by -value. The slicing process should be: - - look up the type() in a table: this handles all basic types - else adapt the object to ISliceable, use the result - else raise an Unsliceable exception - (and point the user to the docs on how to fix it) - -The adapter returned by IReferenceable or ICopyable should implement -ISliceable, so no further adaptation will be done. - -* remove 'copy' prefix from remotecopy banana type names? - -<glyph> warner: did we ever finish our conversation on the usefulness of the -(copy foo blah) namespace rather than just (foo blah)? -<warner> glyph: no, I don't think we did -<glyph> warner: do you still have (copy foo blah)? -<warner> glyph: yup -<warner> so far, it seems to make some things easier -<warner> glyph: the sender can subclass pb.Copyable and not write any new -code, while the receiver can write an Unslicer and do a registerRemoteCopy -<warner> glyph: instead of the sender writing a whole slicer and the receiver -registering at the top-level -<glyph> warner: aah -<warner> glyph: although the fact that it's easier that way may be an artifact -of my sucky registration scheme -<glyph> warner: so the advantage is in avoiding registration of each new -unslicer token? -<glyph> warner: yes. I'm thinking that a metaclass will handily remove the -need for extra junk in the protocol ;) -<warner> well, the real reason is my phobia about namespace purity, of course -<glyph> warner: That's what the dots are for -<warner> but ease of dispatch is also important -<glyph> warner: I'm concerned about it because I consider my use of the same -idiom in the first version of PB to be a serious wart -* warner nods -<warner> I will put together a list of my reasoning -<glyph> warner: I think it's likely that PB implementors in other languages -are going to want to introduce new standard "builtin" types; our "builtins" -shouldn't be limited to python's provided data structures -<moshez> glyph: wait -<warner> ok -<moshez> glyph: are you talking of banana types -<moshez> glyph: or really PB -<warner> in which case (copy blah blah) is a non-builtin type, while -(type-foo) is a builtin type -<glyph> warner: plus, our namespaces are already quite well separated, I can -tell you I will never be declaring new types outside of quotient.* and -twisted.* :) -<warner> moshez: this is mostly banana (or what used to be jelly, really) -<glyph> warner: my inclination is to standardize by convention -<glyph> warner: *.* is a non-builtin type, [~.] is a builtin -<moshez> glyph: ? -<glyph> sorry [^.]* -<glyph> my regular expressions and shell globs are totally confused but you -know what I mean -<glyph> moshez: yes -<moshez> glyph: hrm -<saph_w> glyph: you're making crazy anime faces -<moshez> glyph: why do we need any non-Python builtin types -<glyph> moshez: because I want to destroy SOAP, and doing that means working -with people I don't like -<glyph> moshez: outside of python -<moshez> glyph: I meant, "what specific types" -<moshez> I'd appreciate a blog on that - -* have Copyable/RemoteCopy default to __getstate__/__setstate__? - -At the moment, the default implementations of getStateToCopy() and -setCopyableState() get and set __dict__ directly. Should the default instead -be to call __getstate__() or __setstate__()? - -* make slicer/unslicers for pb.RemoteInterfaces - -exarkun's use case requires these Interfaces to be passable by reference -(i.e. by name). It would also be interesting to let them be passed (and -requested!) by value, so you can ask a remote peer exactly what their -objects will respond to (the method names, the argument values, the return -value). This also requires that constraints be serializable. - -do this, should be referenceable (round-trip should return the same object), -should use the same registration lookup that RemoteReference(interfacelist) -uses - -* investigate decref/Referenceable race - -Any object that includes some state when it is first sent across the wire -needs more thought. The far end could drop the last reference (at time t=1) -while a method is still pending that wants to send back the same object. If -the method finishes at time t=2 but the decref isn't received until t=3, the -object will be sent across the wire without the state, and the far end will -receive it for the "first" time without that associated state. - -This kind of conserve-bandwidth optimization may be a bad idea. Or there -might be a reasonable way to deal with it (maybe request the state if it -wasn't sent and the recipient needs it, and delay delivery of the object -until the state arrives). - -DONE, the RemoteReference is held until the decref has been acked. As long as -the methods are executed in-order, this will prevent the race. TODO: -third-party references (and other things that can cause out-of-order -execution) could mess this up. - -* sketch out how to implement glyph's crazy non-compressed sexpr encoding - -* consider a smaller scope for OPEN-counter reference numbers - -For newpb, we moved to implicit reference numbers (counting OPEN tags -instead of putting a number in the OPEN tag) because we didn't want to burn -so much bandwidth: it isn't feasible to predict whether your object will -need to be referenced in the future, so you always have to be prepared to -reference it, so we always burn the memory to keep track of them (generally -in a ScopedSlicer subclass). If we used explicit refids then we'd have to -burn the bandwidth too. - -The sorta-problem is that these numbers will grow without bound as long as -the connection remains open. After a few hours of sending 100-byte objects -over a 100MB connection, you'll hit 1G-references and will have to start -sending them as LONGINT tokens, which is annoying and slightly verbose (say -3 or 4 bytes of number instead of 1 or 2). You never keep track of that many -actual objects, because the references do not outlive their parent -ScopedSlicer. - -The fact that the references themselves are scoped to the ScopedSlicer -suggests that the reference numbers could be too. Each ScopedSlicer would -track the number of OPEN tokens emitted (actually the number of -slicerForObject calls made, except you'd want to use a different method to -make sure that children who return a Slicer themselves don't corrupt the -OPEN count). - -This requires careful synchronization between the ScopedSlicers on one end -and the ScopedUnslicers on the other. I suspect it would be slightly -fragile. - -One sorta-benefit would be that a somewhat human-readable sexpr-based -encoding would be even more human readable if the reference numbers stayed -small (you could visually correlate objects and references more easily). The -ScopedSlicer's open-parenthesis could be represented with a curly brace or -something, then the refNN number would refer to the NN'th left-paren from -the last left-brace. It would also make it clear that the recipient will not -care about objects outside that scope. - -* implement the FDSlicer - -Over a unix socket, you can pass fds. exarkun had a presentation at PyCon04 -describing the use of this to implement live application upgrade. I think -that we could make a simple FDSlicer to hide the complexity of the -out-of-band part of the communication. - -class Server(unix.Server): - def sendFileDescriptors(self, fileno, data="Filler"): - """ - @param fileno: An iterable of the file descriptors to pass. - """ - payload = struct.pack("%di" % len(fileno), *fileno) - r = sendmsg(self.fileno(), data, 0, (socket.SOL_SOCKET, SCM_RIGHTS, payload)) - return r - -class Client(unix.Client): - def doRead(self): - if not self.connected: - return - try: - msg, flags, ancillary = recvmsg(self.fileno()) - except: - log.msg('recvmsg():') - log.err() - else: - buf = ancillary[0][2] - fds = [] - while buf: - fd, buf = buf[:4], buf[4:] - fds.append(struct.unpack("i", fd)[0]) - try: - self.protocol.fileDescriptorsReceived(fds) - except: - log.msg('protocol.fileDescriptorsReceived') - log.err() - return unix.Client.doRead(self) - -* implement AsyncDeferred returns - -dash wanted to implement a TransferrableReference object with a scheme that -would require creating a new connection (to a third-party Broker) during -ReferenceUnslicer.receiveClose . This would cause the object deserialization -to be asynchronous. - -At the moment, Unslicers can return a Deferred from their receiveClose -method. This is used by immutable containers (like tuples) to indicate that -their object cannot be created yet. Other containers know to watch for these -Deferreds and add a callback which will update their own entries -appropriately. The implicit requirement is that all these Deferreds fire -before the top-level parent object (usually a CallUnslicer) finishes. This -allows for circular references involving immutable containers to be resolved -into the final object graph before the target method is invoked. - -To accomodate Deferreds which will fire at arbitrary points in the future, -it would be useful to create a marker subclass named AsyncDeferred. If an -unslicer returns such an object, the container parent starts by treating it -like a regular Deferred, but it also knows that its object is not -"complete", and therefore returns an AsyncDeferred of its own. When the -child completes, the parent can complete, etc. The difference between the -two types: Deferred means that the object will be complete before the -top-level parent is finished, AsyncDeferred makes claims about when the -object will be finished. - -CallUnslicer would know that if any of its arguments are Deferreds or -AsyncDeferreds then it need to hold off on the broker.doCall until all those -Deferreds have fired. Top-level objects are not required to differentiate -between the two types, because they do not return an object to an enclosing -parent (the CallUnslicer is a child of the RootUnslicer, but it always -returns None). - -Other issues: we'll need a schema to let you say whether you'll accept these -late-bound objects or not (because if you do accept them, you won't be able -to impose the same sorts of type-checks as you would on immediate objects). -Also this will impact the in-order-invocation promises of PB method calls, -so we may need to implement the "it is ok to run this asynchronously" flag -first, then require that TransferrableReference objects are only passed to -methods with the flag set. - -Also, it may not be necessary to have a marker subclass of Deferred: perhaps -_any_ Deferred which arrives from a child is an indication that the object -will not be available until an unknown time in the future, and obligates the -parent to return another Deferred upwards (even though their object could be -created synchronously). Or, it might be better to implement this some other -way, perhaps separating "here is my object" from "here is a Deferred that -will fire when my object is complete", like a call to -parent.addDependency(self.deferred) or something. - -DONE, needs testing - -* TransferrableReference - -class MyThing(pb.Referenceable): pass -r1 = MyThing() -r2 = Facet(r1) -g1 = Global(r1) -class MyGlobalThing(pb.GloballyReferenceable): pass -g2 = MyGlobalThing() -g3 = Facet(g2) - -broker.setLocation("pb://hostname.com:8044") - -rem.callRemote("m1", r1) # limited to just this connection -rem.callRemote("m2", Global(r1)) # can be published -g3 = Global(r1) -rem.callRemote("m3", g1) # can also be published.. -g1.revoke() # but since we remember it, it can be revoked too -g1.restrict() # and, as a Facet, we can revoke some functionality but not all - -rem.callRemote("m1", g2) # can be published - -E tarball: jsrc/net/captp/tables/NearGiftTable - -issues: - 1: when A sends a reference on B to C, C's messages to the object - referenced must arrive after any messages A sent before the reference forks - - in particular, if A does: - B.callRemote("1", hugestring) - B.callRemote("2_makeYourSelfSecure", args) - C.callRemote("3_transfer", B) - - and C does B.callRemote("4_breakIntoYou") as soon as it gets the reference, - then the A->B queue looks like (1, 2), and the A->C queue looks like (3). - The transfer message can be fast, and the resulting 4 message could be - delivered to B before the A->B queue manages to deliver 2. - - 2: an object which get passed through multiple external brokers and - eventually comes home must be recognized as a local object - - 3: Copyables that contain RemoteReferences must be passable between hosts - -E cannot do all three of these at once -http://www.erights.org/elib/distrib/captp/WormholeOp.html - -I think that it's ok to tell people who want this guarantee to explicitly -serialize it like this: - - B.callRemote("1", hugestring) - d = B.callRemote("2_makeYourSelfSecure", args) - d.addCallback(lambda res: C.callRemote("3_transfer", B)) - -Note that E might not require that method calls even have a return value, so -they might not have had a convenient way to express this enforced -serialization. - -** more thoughts - -To enforce the partial-ordering, you could do the equivalent of: - A: - B.callRemote("1", hugestring) - B.callRemote("2_makeYourSelfSecure", args) - nonce = makeNonce() - B.callRemote("makeYourSelfAvailableAs", nonce) - C.callRemote("3_transfer", (nonce, B.name)) - C: - B.callRemote("4_breakIntoYou") - -C uses the nonce when it connects to B. It knows the name of the reference, -so it can compare it against some other reference to the same thing, but it -can't actually use that name alone to get access. - -When the connection request arrives at B, it sees B.name (which is also -unguessable), so that gives it reason to believe that it should queue C's -request (that it isn't just a DoS attack). It queues it until it sees A's -request to makeYourSelfAvailableAs with the matching nonce. Once that -happens, it can provide the reference back to C. - -This implies that C won't be able to send *any* messages to B until that -handshake has completed. It might be desireable to avoid the extra round-trip -this would require. - -** more thoughts - - url = PBServerFactory.registerReference(ref, name=None) - creates human-readable URLs or random identifiers - -the factory keeps a bidirectional mapping of names and Referenceables - -when a Referenceable gets serialized, if the factory's table doesn't have a -name for it, the factory creates a random one. This entry in the table is -kept alive by two things: - - a live reference by one of the factory's Brokers - an entry in a Broker's "gift table" - -When a RemoteReference gets serialized (and it doesn't point back to the -receiving Broker, and thus get turned into a your-reference sequence), - -<warner> A->C: "I'm going to send somebody a reference to you, incref your - gift table", C->A: roger that, here's a gift nonce -<warner> A->B: "here's Carol's reference: URL plus nonce" -<warner> B->C: "I want a liveref to your 'Carol' object, here's my ticket - (nonce)", C->B: "ok, ticket redeemed, here's your liveref" - -once more, without nonces: - A->C: "I'm going to send somebody a reference to you, incref your - gift table", C->A: roger that - A->B: "here's Carol's reference: URL" - B->C: "I want a liveref to your 'Carol' object", C->B: "ok, here's your - liveref" - -really: - on A: c.vat.callRemote("giftYourReference", c).addCallback(step2) - c is serialized as (your-reference, clid) - on C: vat.remote_giftYourReference(which): self.table[which] += 1; return - on A: step2: b.introduce(c) - c is serialized as (their-reference, url) - on B: deserialization sees their-reference - newvat = makeConnection(URL) - newvat.callRemote("redeemGift", URL).addCallback(step3) - on C: vat.remote_redeemGift(URL): - ref = self.urls[URL]; self.table[ref] -= 1; return ref - ref is serialized as (my-reference, clid) - on B: step3(c): b.remote_introduce(c) - -problem: if alice sends a thousand copies, that means these 5 messages are -each send a thousand times. The makeConnection is cached, but the rest are -not. We don't rememeber that we've already made this gift before, that the -other end probably still has it. Hm, but we also don't know that they didn't -lose it already. - -** ok, a plan: - -concern 1: objects must be kept alive as long as there is a RemoteReference -to them. - -concern 2: we should be able to tell when an object is being sent for the -first time, to add metadata (interface list, public URL) that would be -expensive to add to every occurrence. - - each (my-reference) sent over the wire increases the broker's refcount on - both ends. - - the receiving Broker retains a weakref to the RemoteReference, and retains a - copy of the metadata necessary to create it in the clid table (basically the - entire contents of the RemoteReference). When the weakref expires, it marks - the clid entry as "pending-free", and sends a decref(clid,N) to the other - Broker. The decref is actually sent with broker.callRemote("decref", clid, - N), so it can be acked. - - the sending broker gets the decref and reduces its count by N. If another - reference was sent recently, this count may not drop all the way to zero, - indicating there is a reference "in flight" and the far end should be ready - to deal with it (by making a new RemoteReference with the same properties as - the old one). If N!=0, it returns False to indicate that this was not the - last decref message for the clid. If N==0, it returns True, since it is the - last decref, and removes the entry from its table. Once remote_decref - returns True, the clid is retired. - - the receiving broker receives the ack from the decref. If the ack says - last==True, the clid table entry is freed. If it says last==False, then - there should have been another (my-reference) received before the ack, so - the refcount should be non-zero. - - message sequence: - - A-> : (my-reference clid metadata) [A.myrefs[clid].refcount++ = 1] - A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 2] - ->B: receives my-ref, creates RR, B.yourrefs[clid].refcount++ = 1 - ->B: receives my-ref, B.yourrefs[clid].refcount++ = 2 - : time passes, B sees the reference go away - <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) - B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) - A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 3] - A<- : receives decref, A.myrefs[clid].refcount -= 2, now =1, returns False - ->B: receives my-ref, re-creates RR, B.yourrefs[clid].refcount++ = 1 - ->B: receives ack(False), B.checkref asserts refcount != 0 - : time passes, B sees the reference go away again - <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) - B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) - A<- : receives decref, A.myrefs[clid].refcount -= 1, now =0, returns True - del A.myrefs[clid] - ->B: receives ack(True), B.checkref asserts refcount==0 - del B.yourrefs[clid] - -B retains the RemoteReference data until it receives confirmation from A. -Therefore whenever A sends a reference that doesn't already exist in the clid -table, it is sending it to a B that doesn't know about that reference, so it -needs to send the metadata. - -concern 3: in the three-party exchange, Carol must be kept alive until Bob -has established a reference to her, even if Alice drops her carol-reference -immediately after sending the introduction to Bob. - -(my-reference, clid, [interfaces, public URL]) -(your-reference, clid) -(their-reference, URL) - -Serializing a their-reference causes an entry to be placed in the Broker's -.theirrefs[URL] table. Each time a their-reference is sent, the entry's -refcount is incremented. - -Receiving a their-reference may initiate a PB connection to the target, -followed by a getNamedReference request. When this completes (or if the -reference was already available), the recipient sends a decgift message to -the sender. This message includes a count, so multiple instances of the same -gift can be acked as a group. - -The .theirrefs entry retains a reference to the sender's RemoteReference, so -it cannot go away until the gift is acked. - -DONE, gifts are implemented, we punted on partial-ordering - -*** security, DoS - -Bob can force Alice to hold on to a reference to Carol, as long as both -connections are open, by never acknowledging the gift. - -Alice can cause Bob to open up TCP connections to arbitrary hosts and ports, -by sending third-party references to him, although the only protocol those -connections will speak is PB. - -Using yURLs and StartTLS should be enough to secure and authenticate the -connections. - -*** partial-ordering - -If we need it, the gift (their-reference message) can include a nonce, Alice -sends a makeYourSelfAvailableAs message to Carol with the nonce, and Bob must -do a new getReference with the nonce. - -Kragen came up with a good use-case for partial-ordering: - A: - B.callRemote("updateDocument", bigDocument) - C.callRemote("pleaseReviewLatest", B) - C: - B.callRemote("getLatestDocument") - - -* PBService / Tub - -Really, PB wants to be a Service, since third-party references mean it will -need to make connections to arbitrary targets, and it may want to re-use -those connections. - - s = pb.PBService() - s.listenOn(strport) # provides URL base - swissURL = s.registerReference(ref) # creates unguessable name - publicURL = s.registerReference(ref, "name") # human-readable name - s.unregister(URL) # also revokes all clids - s.unregisterReference(ref) - d = s.getReference(URL) # Deferred which fires with the RemoteReference - d = s.shutdown() # close all servers and client connections - -DONE, this makes things quite clean - -* promise pipelining - -Even without third-party references, we can do E-style promise pipelining. - -<warner> hmm. subclass of Deferred that represents a Promise, can be - serialized if it's being sent to the same broker as the RemoteReference it was - generated for -<dash> warner: hmmm. how's that help us? -<dash> oh, pipelining? -<warner> maybe a flag on the callRemote to say that "yeah, I want a - DeferredPromise out of you, but I'm only going to include it as an argument to - another method call I'm sending you, so don't bother sending *me* the result" -<dash> aah -<dash> yeah -<dash> that sounds like a reasonable approach -<warner> that would actually work -<warner> dash: do you know if E makes any attempt to handle >2 vats in their - pipelining implementation? seems to me it could turn into a large network - optimization problem pretty quickly -<dash> warner: Mmm -<warner> hmm -<dash> I do not think you have to -<warner> so you have: t1=a.callRemote("foo",args1); - t2=t1.callRemote("bar",args2), where callRemote returns a Promise, which is a - special kind of Deferred that remembers the Broker its answer will eventually - come from. If args2 consists of entirely immediate things (no Promises) or - Promises that are coming from the same broker as t1 uses, then the "bar" call - is eligible for pipelining and gets sent to the remote broker -<warner> in the resulting newpb banana sequence, the clid of the target method - is replaced by another kind of clid, which means "the answer you're going to - send to method call #N", where N comes from t1 -<dash> mmm yep -<warner> using that new I-can't-unserialize-this-yet hook we added, the second - call sequence doesn't finish unserializing until the first call finishes and - sends the answer. Sending answer #N fires the hook's deferred. -<warner> that triggers the invocation of the second method -<dash> yay -<warner> hm, of course that totally blows away the idea of using a Constraint - on the arguments to the second method -<warner> because you don't even know what the object is until after the - arguments have arrived -<warner> but -<dash> well -<warner> the first method has a schema, which includes a return constraint -<dash> okay you can't fail synchronously -<warner> so you *can* assert that, whatever the object will be, it obeys that - constraint -<dash> but you can return a failure like everybody else -<warner> and since the constraint specifies an Interface, then the Interface - plus mehtod name is enough to come up with an argument constraint -<warner> so you can still enforce one -<warner> this is kind of cool -<dash> the big advantage of pipelining is that you can have a lot of - composable primitives on your remote interfaces rather than having to smush - them together into things that are efficient to call remotely -<warner> hm, yeah, as long as all the arguments are either immediate or - reference something on the recipient -<warner> as soon as a third party enters the equation, you have to decide - whether to wait for the arguments to resolve locally or if it might be faster - to throw them at someone else -<warner> that's where the network-optimization thing I mentioned before comes - into play -<dash> mmm -<warner> you send messages to A and to B, once you get both results you want - to send the pair to C to do something with them -<dash> spin me an example scenario -<dash> Hmm -<warner> if all three are close to each other, and you're far from all of - them, it makes more sense to tell C about A and B -<dash> how _does_ E handle that -<warner> or maybe tell A and B about C, tell them "when you get done, send - your results to C, who will be waiting for them" -<dash> warner: yeah, i think that the right thing to do is to wait for them to - resolve locally -<Tv> assuming that C can talk to A and B is bad -<dash> no it isn't -<Tv> well, depends on whether you live in this world or not :) -<dash> warner: if you want other behaviour then you should have to set it up - explicitly, i think -<warner> I'm not even sure how you would describe that sort of thing. It'd be - like routing protocols, you assign a cost to each link and hope some magical - omniscient entity can pick an optimal solution - -** revealing intentions - -<zooko> Now suppose I say "B.your_fired(C.revoke_his_rights())", or such. -<warner> A->C: sell all my stock. A->B: declare bankruptcy - -If B has access to C, and the promises are pipelined, then B has a window -during which they know something's about to happen, and they still have full -access to C, so they can do evil. - -Zooko tried to explain the concern to MarkM years ago, but didn't have a -clear example of the problem. The thing is, B can do evil all the time, -you're just trying to revoke their capability *before* they get wind of your -intentions. Keeping intentions secret is hard, much harder than limiting -someone's capabilities. It's kind of the trailing edge of the capability, as -opposed to the leading edge. - -Zooko feels the language needs clear support for expressing how the -synchronization needs to take place, and which domain it needs to happen in. - -* web-calculus integration - -Tyler pointed out that it is vital for a node to be able to grant limited -access to some held object. Specifically, Alice may want to give Bob a -reference not to Carol as a whole, but to just a specific Carol.remote_foo -method (and not to any other methods that Alice might be allowed to invoke). -I had been thinking of using RemoteInterfaces to indicate method subsets, -something like this: - - bob.callRemote("introduce", Facet(self, RIMinimal)) - -but Tyler thinks that this is too coarse-grained and not likely to encourage -the right kinds of security decisions. In his web-calculus, recipients can -grant third-parties access to individual bound methods. - - bob.callRemote("introduce", carol.getMethod("howdy")) - -If I understand it correctly, his approach makes Referenceables into a -copy-by-value object that is represented by a dictionary which maps method -names to these RemoteMethod objects, so there is no actual -callRemote(methname) method. Instead you do something like: - - rr = tub.getReference(url) - d = rr['introduce'].call(args) - -These RemoteMethod objects are top-level, so unguessable URLs must be -generated for them when they are sent, and they must be reference-counted. It -must not be possible to get from the bound method to the (unrestricted) -referenced object. - -TODO: how does the web-calculus maintain reference counts for these? It feels -like there would be an awful lot of messages being thrown around. - -To implement this, we'll need: - - banana sequences for bound methods - ('my-method', clid, url) - ('your-method', clid) - ('their-method', url, RI+methname?) - syntax to carve a single method out of a local Referenceable - A: self.doFoo (only if we get rid of remote_) - B: self.remote_doFoo - C: self.getMethod("doFoo") - D: self.getMethod(RIFoo['doFoo']) - leaning towards C or D - syntax to carve a single method out of a RemoteReference - A: rr.doFoo - B: rr.getMethod('doFoo') - C: rr.getMethod(RIFoo['doFoo']) - D: rr['doFoo'] - E: rr[RIFoo['doFoo']] - leaning towards B or C - decide whether to do getMethod early or late - early means ('my-reference') includes a big dict of my-method values - and a whole bunch of DECREFs when that dict goes away - late means there is a remote_tub.getMethod(your-ref, methname) call - and an extra round-trip to retrieve them - dash thinks late is better - -We could say that the 'my-reference' sequence for any RemoteInterface-enabled -Referenceable will include a dictionary of bound methods. The receiving end -will just stash the whole thing. - -* do implicit "doFoo" -> RIFoo["doFoo"] conversion - -I want rr.callRemote("doFoo", args) to take advantage of a RemoteInterface, -if one is available. RemoteInterfaces aren't supposed to be overlapping (at -least not among RemoteInterfaces that are shared by a single Referenceable), -so there shouldn't be any ambiguity. If there is, we can raise an error. - -* accept Deferreds as arguments? - - bob.callRemote("introduce", target=self.tub.getReference(pburl)) - or - bob.callRemote("introduce", carol.getMethod("doFoo")) - instead of - carol.getMethod("doFoo").addCallback(lambda r: bob.callRemote("introduce", r)) - -If one of the top-level arguments to callRemote is a Deferred, don't send the -method request until all the arguments resolve. If any of the arguments -errback, the callRemote will fail with some new exception (that can contain a -reference to the argument's exception). - -however, this would mean the method would be invoked out-of-order w.r.t. an -immediately-following bob.callRemote - -put this off until we get some actual experience. - -* batch decrefs? - -If we implement the copy-by-value Referenceable idea, then a single gc may -result in dozens of simultaneous decrefs. It would be nice to reduce the -traffic generated by that. - -* promise pipelining - -Promise(Deferred).__getattr__ - -DoS prevention techniques in CapIDL (MarkM) - -pb://key@ip,host,[ipv6],localhost,[/unix]/swissnumber -tubs for lifetime management -separate listener object, share tubs between listeners - distinguish by key number - - actually, why bother with separate keys? Why allow the outside world to - distinguish between these sub-Tubs? Use them purely for lifetime management, - not security properties. That means a name->published-object table for each - SubTub, maybe a hierarchy of them, and the parent-most Tub gets the - Listeners. Incoming getReferenceByURL requests require a lookup in all Tubs - that descend from the one attached to that listener. - -So one decision is whether to have implicitly-published objects have a name -that lasts forever (well, until the Tub is destroyed), or if they should be -reference-counted. If they are reference counted, then outstanding Gifts need -to maintain a reference, and the gift must be turned into a live -RemoteReference right away. It has bearing on how/if we implement SturdyRefs, -so I need to read more about them in the E docs. - -Hrm, and creating new Tubs from within a remote_foo method.. to make that -useful, you'd need to have a way to ask for the Tub through which you were -being invoked. hrm. - -* creating new Tubs - -Tyler suggests using Tubs for namespace management. Tubs can share TCP -listening ports, but MarkS recommends giving them all separate keys (which -means separate SSL sessions, so separate TCP connections). Bill Frantz -discourages using a hierarchy of Tubs, says it's not the sort of thing you -want to be locked into. - -That means I'll need a separate Listener object, where the rule is that the -last Tub to be stopped makes the Listener stop too.. probably abuse the -Service interface in some wacky way to pull this off. - -Creating a new Tub.. how to conveniently create it with the same Listeners as -the current one? If the method that's creating the Tub is receiving a -reference, the Tub can be an attribute of the inbound RemoteReference. If -not, that's trickier.. the _tub= argument may still be a useful way to go. -Once you've got a source tub, then tub.newTub() should create a new one with -the same Listeners as the source (but otherwise unassociated with it). - -Once you have the new Tub, registering an object in it should return -something that can be directly serialized into a gift. - -class Target(pb.Referenceable): - def remote_startGame(self, player_black, player_white): - tub = player_black.tub.newTub() - game = self.createGame() - gameref = tub.register(game) - game.setPlayer("black", tub.something(player_black)) - game.setPlayer("white", tub.something(player_white)) - return gameref - -Hmm. So, create a SturdyRef class, which remembers the tubid (key), list of -location hints, and object name. These have a url() method that renders out a -URL string, and a compare method which compares the tubid and object name but -ignores the location hints. Serializing a SturdyRef creates a their-reference -sequence. Tub.register takes an object (and maybe a name) and returns a -SturdyRef. Tub.getReference takes either a URL or a SturdyRef. -RemoteReferences should have a .getSturdyRef method. - -Actually, I think SturdyRefs should be serialized as Copyables, and create -SturdyRefs on the other side. The new-tub sequence should be: - - create new tub, using the Listener from an existing tub - register the objects in the new tub, obtaining a SturdyRef - send/return SendLiveRef(sturdyref) to the far side - SendLiveRef is a wrapper that causes a their-reference sequence to be sent. - The alternative is to obtain an actual live reference (via - player_black.tub.getReference(sturdyref) first), then send that, but it's - kind of a waste if you don't actually want to use the liveref yourself. - -Note that it becomes necessary to provide for local references here: ones in -different Tubs which happen to share a Listener. These can use real TCP -connections (unless the Listener hint is only valid from the outside world). -It might be possible to use some tricks cut out some of the network overhead, -but I suspect there are reasons why you wouldn't actually want to do that. diff --git a/src/foolscap/doc/todo.txt b/src/foolscap/doc/todo.txt new file mode 100644 index 00000000..14dc1608 --- /dev/null +++ b/src/foolscap/doc/todo.txt @@ -0,0 +1,1304 @@ +-*- outline -*- + +non-independent things left to do on newpb. These require deeper magic or +can not otherwise be done casually. Many of these involve fundamental +protocol issues, and therefore need to be decided sooner rather than later. + +* summary +** protocol issues +*** negotiation +*** VOCABADD/DEL/SET sequences +*** remove 'copy' prefix from RemoteCopy type sequences? +*** smaller scope for OPEN-counter reference numbers? +** implementation issues +*** cred +*** oldbanana compatibility +*** Copyable/RemoteCopy default to __getstate__ or self.__dict__ ? +*** RIFoo['bar'] vs RIFoo.bar (should RemoteInterface inherit from Interface?) +*** constrain ReferenceUnslicer +*** serialize target.remote_foo usefully + +* decide whether to accept positional args in non-constrained methods + +DEFERRED until after 2.0 +<glyph> warner: that would be awesome but let's do it _later_ + +This is really a backwards-source-compatibility issue. In newpb, the +preferred way of invoking callRemote() is with kwargs exclusively: glyph's +felt positional arguments are more fragile. If the client has a +RemoteInterface, then they can convert any positional arguments into keyword +arguments before sending the request. + +The question is what to do when the client is not using a RemoteInterface. +Until recently, callRemote("bar") would try to find a matching RI. I changed +that to have callRemote("bar") never use an RI, and instead you would use +callRemote(RIFoo['bar']) to indicate that you want argument-checking. + +That makes positional arguments problematic in more situations than they were +before. The decision to be made is if the OPEN(call) sequence should provide +a way to convey positional args to the server (probably with numeric "names" +in the (argname, argvalue) tuples). If we do this, the server (which always +has the RemoteInterface) can do the positional-to-keyword mapping. But +putting this in the protocol will oblige other implementations to handle them +too. + +* change the method-call syntax to include an interfacename +DONE + +Scope the method name to the interface. This implies (I think) one of two +things: + + callRemote() must take a RemoteInterface argument + + each RemoteReference handles just a single Interface + +Probably the latter, maybe have the RR keep both default RI and a list of +all implemented ones, then adapting the RR to a new RI can be a simple copy +(and change of the default one) if the Referenceable knows about the RI. +Otherwise something on the local side will need to adapt one RI to another. +Need to handle reference-counting/DECREF properly for these shared RRs. + +From glyph: + + callRemote(methname, **args) # searches RIs + callRemoteInterface(remoteinterface, methname, **args) # single RI + + getRemoteURL(url, *interfaces) + + URL-RRefs should turn into the original Referenceable (in args/results) + (map through the factory's table upon receipt) + + URL-RRefs will not survive round trips. leave reference exchange for later. + (like def remote_foo(): return GlobalReference(self) ) + + move method-invocation code into pb.Referenceable (or IReferenceable + adapter). Continue using remote_ prefix for now, but make it a property of + that code so it can change easily. + <warner> ok, for today I'm just going to stick with remote_foo() as a + low-budget decorator, so the current restrictions are 1: subclass + pb.Referenceable, 2: implements() a RemoteInterface with method named "foo", + 3: implement a remote_foo method + <warner> and #1 will probably go away within a week or two, to be replaced by + #1a: subclass pb.Referenceable OR #1b: register an IReferenceable adapter + + try serializing with ISliceable first, then try IReferenceable. The + IReferenceable adapter must implements() some RemoteInterfaces and gets + serialized with a MyReferenceSlicer. + +http://svn.twistedmatrix.com/cvs/trunk/pynfo/admin.py?view=markup&rev=44&root=pynfo + +** use the methods of the RemoteInterface as the "method name" +DONE (provisional), using RIFoo['add'] + + rr.callRemote(RIFoo.add, **args) + +Nice and concise. However, #twisted doesn't like it, adding/using arbitrary +attributes of Interfaces is not clean (think about IFoo.implements colliding +with RIFoo.something). + + rr.callRemote(RIFoo['add'], **args) + RIFoo(rr).callRemote('add', **args) + adaptation, or narrowing? + +<warner> glyph: I'm adding callRemote(RIFoo.bar, **args) to newpb right now +<radix> wow. +<warner> seemed like a simpler interface than callRemoteInterface("RIFoo", +"bar", **args) +<radix> warner: Does this mean that IPerspective can be parameterized now? +<glyph> warner: bad idea +<exarkun> warner: Zope hates you! +<glyph> warner: zope interfaces don't support that syntax +<slyphon> zi does support multi-adapter syntax +<slyphon> but i don't really know what that is +<exarkun> warner: callRemote(RIFoo.getDescriptionFor("bar"), *a, **k) +<warner> glyph: yeah, I fake it. In RemoteInterfaceClass, I remove those +attributes, call InterfaceClass, and then put them all back in +<glyph> warner: don't add 'em as attributes +<glyph> warner: just fix the result of __getitem__ to add a slot actually +refer back to the interface +<glyph> radix: the problem is that IFoo['bar'] doesn't point back to IFoo +<glyph> warner: even better, make them callable :-) +<exarkun> glyph: IFoo['bar'].interface == 'IFoo' +<glyph> RIFoo['bar']('hello') +<warner> glyph: I was thinking of doing that in a later version of +RemoteInterface +<glyph> exarkun: >>> type(IFoo['bar'].interface) +<glyph> <type 'str'> +<exarkun> right +<exarkun> 'IFoo' +<exarkun> Just look through all the defined interfaces for ones with matching +names +<glyph> exarkun: ... e.g. *NOT* __main__.IFoo +<glyph> exarkun: AAAA you die +<radix> hee hee +* warner struggles to keep up with his thoughts and those of people around him +* glyph realizes he has been given the power to whine +<warner> glyph: ok, so with RemoteInterface.__getitem__, you could still do +rr.callRemote(RIFoo.bar, **kw), right? +<warner> was your objection to the interface or to the implementation? +<itamar> I really don't think you should add attributes to the interface +<warner> ok +<warner> I need to stash a table of method schemas somewhere +<itamar> just make __getitem__ return better type of object +<itamar> and ideally if this is generic we can get it into upstream +<exarkun> Is there a reason Method.interface isn't a fully qualified name? +<itamar> not necessarily +<itamar> I have commit access to zope.interface +<itamar> if you have any features you want added, post to +interface-dev@zope.org mailing list +<itamar> and if Jim Fulton is ok with them I can add them for you +<warner> hmm +<warner> does using RIFoo.bar to designate a remote method seem reasonable? +<warner> I could always adapt it to something inside callRemote +<warner> something PB-specific, that is +<warner> but that adapter would have to be able to pull a few attributes off +the method (name, schema, reference to the enclosing RemoteInterface) +<warner> and we're really talking about __getattr__ here, not __getitem__, +right? +<exarkun> for x.y yes +<itamar> no, I don't think that's a good idea +<itamar> interfaces have all kinds od methods on them already, for +introspection purposes +<itamar> namespace clashes are the suck +<itamar> unless RIFoo isn't really an Interface +<itamar> hm +<itamar> how about if it were a wrapper around a regular Interface? +<warner> yeah, RemoteInterfaces are kind of a special case +<itamar> RIFoo(IFoo, publishedMethods=['doThis', 'doThat']) +<itamar> s/RIFoo/RIFoo = RemoteInterface(/ +<exarkun> I'm confused. Why should you have to specify which methods are +published? +<itamar> SECURITY! +<itamar> not actually necessary though, no +<itamar> and may be overkill +<warner> the only reason I have it derive from Interface is so that we can do +neat adapter tricks in the future +<itamar> that's not contradictory +<itamar> RIFoo(x) would still be able to do magic +<itamar> you wouldn't be able to check if an object provides RIFoo, though +<itamar> which kinda sucks +<itamar> but in any case I am against RIFoo.bar +<warner> pity, it makes the callRemote syntax very clean +<radix> hm +<radix> So how come it's a RemoteInterface and not an Interface, anyway? +<radix> I mean, how come that needs to be done explicitly. Can't you just +write a serializer for Interface itself? + +* warner goes to figure out where the RemoteInterface discussion went after he + got distracted +<warner> maybe I should make RemoteInterface a totally separate class and just +implement a couple of Interface-like methods +<warner> cause rr.callRemote(IFoo.bar, a=1) just feels so clean +<Jerub> warner: why not IFoo(rr).bar(a=1) ? +<warner> hmm, also a possibility +<radix> well +<radix> IFoo(rr).callRemote('bar') +<radix> or RIFoo, or whatever +<Jerub> hold on, what does rr inherit from? +<warner> RemoteReference +<radix> it's a RemoteReference +<Jerub> then why not IFoo(rr) / +<warner> I'm keeping a strong distinction between local interfaces and remote +ones +<Jerub> ah, oka.y +<radix> warner: right, you can still do RIFoo +<warner> ILocal(a).meth(args) is an immediate function call +<Jerub> in that case, I prefer rr.callRemote(IFoo.bar, a=1) +<radix> .meth( is definitely bad, we need callRemote +<warner> rr.callRemote("meth", args) returns a deferred +<Jerub> radix: I don't like from foo import IFoo, RIFoo +<warner> you probably wouldn't have both an IFoo and an RIFoo +<radix> warner: well, look at it this way: IFoo(rr).callRemote('foo') still +makes it obvious that IFoo isn't local +<radix> warner: you could implement RemoteReferen.__conform__ to implement it +<warner> radix: I'm thinking of providing some kind of other class that would +allow .meth() to work (without the callRemote), but it wouldn't be the default +<radix> plus, IFoo(rr) is how you use interfaces normally, and callRemote is +how you make remote calls normally, so it seems that's the best way to do +interfaces + PB +<warner> hmm +<warner> in that case the object returned by IFoo(rr) is just rr with a tag +that sets the "default interface name" +<radix> right +<warner> and callRemote(methname) looks in that default interface before +looking anywhere else +<warner> for some reason I want to get rid of the stringyness of the method +name +<warner> and the original syntax (callRemoteInterface('RIFoo', 'methname', +args)) felt too verbose +<radix> warner: well, isn't that what your optional .meth thing is for? +<radix> yes, I don't like that either +<warner> using callRemote(RIFoo.bar, args) means I can just switch on the +_name= argument being either a string or a (whatever) that's contained in a +RemoteInterface +<warner> a lot of it comes down to how adapters would be most useful when +dealing with remote objects +<warner> and to what extent remote interfaces should be interchangeable with +local ones +<radix> good point. I have never had a use case where I wanted to adapt a +remote object, I don't think +<radix> however, I have had use cases to send interfaces across the wire +<radix> e.g. having a parameterized portal.login() interface +<warner> that'll be different, just callRemote('foo', RIFoo) +<radix> yeah. +<warner> the current issue is whether to pass them by reference or by value +<radix> eugh +<radix> Can you explain it without using those words? :) +<warner> hmm +<radix> Do you mean, Referenceable style vs Copyable style? +<warner> at the moment, when you send a Referenceable across the wire, the +id-number is accompanied with a list of strings that designate which +RemoteInterfaces the original claims to provide +<warner> the receiving end looks up each string in a local table, and +populates the RemoteReference with a list of RemoteInterface classes +<warner> the table is populated by metaclass magic that runs when a 'class +RIFoo(RemoteInterface)' definition is complete +<radix> ok +<radix> so a RemoteInterface is simply serialized as its qual(), right? +<warner> so as long as both sides include the same RIFoo definition, they'll +wind up with compatible remote interfaces, defining the same method names, +same method schemas, etc +<warner> effectively +<warner> you can't just send a RemoteInterface across the wire right now, but +it would be easy to add +<warner> the places where they are used (sending a Referenceable across the +wire) all special case them +<radix> ok, and you're considering actually writing a serializer for them that +sends all the information to totally reconstruct it on the other side without +having the definiton +<warner> yes +<warner> or having some kind of debug method which give you that +<radix> I'd say, do it the way you're doing it now until someone comes up with +a use case for actually sending it... +<warner> right +<warner> the only case I can come up with is some sort of generic object +browser debug tool +<warner> everything else turns into a form of version negotiation which is +better handled elsewhere +<warner> hmm +<warner> so RIFoo(rr).callRemote('bar', **kw) +<warner> I guess that's not too ugly +<radix> That's my vote. :) +<warner> one thing it lacks is the ability to cleanly state that if 'bar' +doesn't exist in RIFoo then it should signal an error +<warner> whereas callRemote(RIFoo.bar, **kw) would give you an AttributeError +before callRemote ever got called +<warner> i.e. "make it impossible to express the incorrect usage" +<radix> mmmh +<radix> warner: but you _can_ check it immediately when it's called +<warner> in the direction I was heading, callRemote(str) would just send the +method request and let the far end deal with it, no schema-checking involved +<radix> warner: which, 99% of the time, is effectively the same time as +IFoo.bar would happen +<warner> whereas callRemote(RIFoo.bar) would indicate that you want schema +checking +<warner> yeah, true +<radix> hm. +<warner> (that last feature is what allowed callRemote and callRemoteInterface +to be merged) +<warner> or, I could say that the normal RemoteReference is "untyped" and does +not do schema checking +<warner> but adapting one to a RemoteInterface results in a +TypedRemoteReference which does do schema checking +<warner> and which refuses to be invoked with method names that are not in the +schema +<radix> warner: we-ell +<radix> warner: doing method existence checking is cool +<radix> warner: but I think tying any further "schema checking" to adaptation +is a bad idea +<warner> yeah, that's my hunch too +<warner> which is why I'd rather not use adapters to express the scope of the +method name (which RemoteInterface it is supposed to be a part of) +<radix> warner: well, I don't think tying it to callRemote(RIFoo.methName) +would be a good idea just the same +<warner> hm +<warner> so that leaves rr.callRemote(RIFoo['add']) and +rr.callRemoteInterface(RIFoo, 'add') +<radix> OTOH, I'm inclined to think schema checking should happen by default +<radix> It's just a the matter of where it's parameterized +<warner> yeah, it's just that the "default" case (rr.callRemote('name')) needs +to work when there aren't any RemoteInterfaces declared +<radix> warner: oh +<warner> but if we want to encourage people to use the schemas, then we need +to make that case simple and concise +* radix goes over the issue in his head again +<radix> Yes, I think I still have the same position. +<warner> which one? :) +<radix> IFoo(rr).callRemote("foo"); which would do schema checking because +schema checking is on by default when it's possible +<warner> using an adaptation-like construct to declare a scope of the method +name that comes later +<radix> well, it _is_ adaptation, I think. +<radix> Adaptation always has plugged in behavior, we're just adding a bit +more :) +<warner> heh +<warner> it is a narrowing of capability +<radix> hmm, how do you mean? +<warner> rr.callRemote("foo") will do the same thing +<warner> but rr.callRemote("foo") can be used without the remote interfaces +<radix> I think I lost you. +<warner> if rr has any RIs defined, it will try to use them (and therefore +complain if "foo" does not exist in any of them, or if the schema is violated) +<radix> Oh. That's strange. +<radix> So it's really quite different from how interfaces regularly work... +<warner> yeah +<warner> except that if you were feeling clever you could use them the normal +way +<radix> Well, my inclination is to make them work as similarly as possible. +<warner> "I have a remote reference to something that implements RIFoo, but I +want to use it in some other way" +<radix> s/possible/practical/ +<warner> then IBar(rr) or RIBar(rr) would wrap rr in something that knows how +to translate Bar methods into RIFoo remote methods +<radix> Maybe it's not practical to make them very similar. +<radix> I see. + +rr.callRemote(RIFoo.add, **kw) +rr.callRemote(RIFoo['add'], **kw) +RIFoo(rr).callRemote('add', **kw) + +I like the second one. Normal Interfaces behave like a dict, so IFoo['add'] +gets you the method-describing object (z.i.i.Method). My RemoteInterfaces +don't do that right now (because I remove the attributes before handing the +RI to z.i), but I could probably fix that. I could either add attributes to +the Method or hook __getitem__ to return something other than a Method +(maybe a RemoteMethodSchema). + +Those Method objects have a .getSignatureInfo() which provides almost +everything I need to construct the RemoteMethodSchema. Perhaps I should +post-process Methods rather than pre-process the RemoteInterface. I can't +tell how to use the return value trick, and it looks like the function may +be discarded entirely once the Method is created, so this approach may not +work. + +On the server side (Referenceable), subclassing Interface is nice because it +provides adapters and implements() queries. + +On the client side (RemoteReference), subclassing Interface is a hassle: I +don't think adapters are as useful, but getting at a method (as an attribute +of the RI) is important. We have to bypass most of Interface to parse the +method definitions differently. + +* create UnslicerRegistry, registerUnslicer +DONE (PROVISIONAL), flat registry (therefore problematic for len(opentype)>1) + +consider adopting the existing collection API (getChild, putChild) for this, +or maybe allow registerUnslicer() to take a callable which behaves kind of +like a twisted.web isLeaf=1 resource (stop walking the tree, give all index +tokens to the isLeaf=1 node) + +also some APIs to get a list of everything in the registry + +* use metaclass to auto-register RemoteCopy classes +DONE + +** use metaclass to auto-register Unslicer classes +DONE + +** and maybe Slicer classes too +DONE with name 'slices', perhaps change to 'slicerForClasses'? + + class FailureSlicer(slicer.BaseSlicer): + classname = "twisted.python.failure.Failure" + slicerForClasses = (failure.Failure,) # triggers auto-register + +** various registry approaches +DONE + +There are currently three kinds of registries used in banana/newpb: + + RemoteInterface <-> interface name + class/type -> Slicer (-> opentype) -> Unslicer (-> class/type) + Copyable subclass -> copyable-opentype -> RemoteCopy subclass + +There are two basic approaches to representing the mappings that these +registries implement. The first is implicit, where the local objects are +subclassed from Sliceable or Copyable or RemoteInterface and have attributes +to define the wire-side strings that represent them. On the receiving side, +we make extensive use of metaclasses to perform automatic registration +(taking names from class attributes and mapping them to the factory or +RemoteInterface used to create the remote version). + +The second approach is explicit, where pb.registerRemoteInterface, +pb.registerRemoteCopy, and pb.registerUnslicer are used to establish the +receiving-side mapping. There isn't a clean way to do it explicitly on the +sending side, since we already have instances whose classes can give us +whatever information we want. + +The advantage of implicit is simplicity: no more questions about why my +pb.RemoteCopy is giving "not unserializable" errors. The mere act of +importing a module is enough to let PB create instances of its classes. + +The advantage of doing it explicitly is to remind the user about the +existence of those maps, because the factory classes in the receiving map is +precisely equal to the user's exposure (from a security point of view). See +the E paper on secure-serialization for some useful concepts. + +A disadvantage of implicit is that you can't quite be sure what, exactly, +you're exposed to: the registrations take place all over the place. + +To make explicit not so painful, we can use quotient's .wsv files +(whitespace-separated values) which map from class to string and back again. +The file could list fully-qualified classname, wire-side string, and +receiving factory class on each line. The Broker (or rather the RootSlicer +and RootUnslicer) would be given a set of .wsv files to define their +mapping. It would get all the registrations at once (instead of having them +scattered about). They could also demand-load the receive-side factory +classes. + +For now, go implicit. Put off the decision until we have some more +experience with using newpb. + +* move from VocabSlicer sequence to ADDVOCAB/DELVOCAB tokens + +Requires a .wantVocabString flag in the parser, which is kind of icky but +fixes the annoying asymmetry between set (vocab sequence) and get (VOCAB +token). Might want a CLEARVOCAB token too. + +On second thought, this won't work. There isn't room for both a vocab number +and a variable-length string in a single token. It must be an open sequence. +However, it could be an add/del/set-vocab sequence, allowing the vocab to be +modified incrementally. + +** VOCABize interface/method names + +One possibility is to make a list of all strings used by all known +RemoteInterfaces and all their methods, then send it at broker connection +time as the initial vocab map. A better one (maybe) is to somehow track what +we send and add a word to the vocab once we've sent it more than three +times. + +Maybe vocabize the pairs, as "ri/name1","ri/name2", etc, or maybe do them +separately. Should do some handwaving math to figure out which is better. + +* nail down some useful schema syntaxes + +This has two parts: parsing something like a __schema__ class attribute (see +the sketches in schema.xhtml) into a tree of FooConstraint objects, and +deciding how to retrieve schemas at runtime from things like the object being +serialized or the object being called from afar. To be most useful, the +syntax needs to mesh nicely (read "is identical to") things like formless and +(maybe?) atop or whatever has replaced the high-density highly-structured +save-to-disk scheme that twisted.world used to do. + +Some lingering questions in this area: + + When an object has a remotely-invokable method, where does the appropriate + MethodConstraint come from? Some possibilities: + + an attribute of the method itself: obj.method.__schema__ + + from inside a __schema__ attribute of the object's class + + from inside a __schema__ attribute of an Interface (which?) that the object + implements + + Likewise, when a caller holding a RemoteReference invokes a method on it, it + would be nice to enforce a schema on the arguments they are sending to the + far end ("be conservative in what you send"). Where should this schema come + from? It is likely that the sender only knows an Interface for their + RemoteReference. + + When PB determines that an object wants to be copied by value instead of by + reference (pb.Copyable subclass, Copyable(obj), schema says so), where + should it find a schema to define what exactly gets copied over? A class + attribute of the object's class would make sense: most objects would do + this, some could override jellyFor to get more control, and others could + override something else to push a new Slicer on the stack and do streaming + serialization. Whatever the approach, it needs to be paralleled by the + receiving side's unjellyableRegistry. + +* RemoteInterface instances should have an "RI-" prefix instead of "I-" + +DONE + +* merge my RemoteInterface syntax with zope.interface's + +I hacked up a syntax for how method definitions are parsed in +RemoteInterface objects. That syntax isn't compatible with the one +zope.interface uses for local methods, so I just delete them from the +attribute dictionary to avoid causing z.i indigestion. It would be nice if +they were compatible so I didn't have to do that. This basically translates +into identifying the nifty extra flags (like priority classes, no-response) +that we want on these methods and finding a z.i-compatible way to implement +them. It also means thinking of SOAP/XML-RPC schemas and having a syntax +that can represent everything at once. + + +* use adapters to enable pass-by-reference or pass-by-value + +It should be possible to pass a reference with variable forms: + + rr.callRemote("foo", 1, Reference(obj)) + rr.callRemote("bar", 2, Copy(obj)) + +This should probably adapt the object to IReferenceable or ICopyable, which +are like ISliceable except they can pass the object by reference or by +value. The slicing process should be: + + look up the type() in a table: this handles all basic types + else adapt the object to ISliceable, use the result + else raise an Unsliceable exception + (and point the user to the docs on how to fix it) + +The adapter returned by IReferenceable or ICopyable should implement +ISliceable, so no further adaptation will be done. + +* remove 'copy' prefix from remotecopy banana type names? + +<glyph> warner: did we ever finish our conversation on the usefulness of the +(copy foo blah) namespace rather than just (foo blah)? +<warner> glyph: no, I don't think we did +<glyph> warner: do you still have (copy foo blah)? +<warner> glyph: yup +<warner> so far, it seems to make some things easier +<warner> glyph: the sender can subclass pb.Copyable and not write any new +code, while the receiver can write an Unslicer and do a registerRemoteCopy +<warner> glyph: instead of the sender writing a whole slicer and the receiver +registering at the top-level +<glyph> warner: aah +<warner> glyph: although the fact that it's easier that way may be an artifact +of my sucky registration scheme +<glyph> warner: so the advantage is in avoiding registration of each new +unslicer token? +<glyph> warner: yes. I'm thinking that a metaclass will handily remove the +need for extra junk in the protocol ;) +<warner> well, the real reason is my phobia about namespace purity, of course +<glyph> warner: That's what the dots are for +<warner> but ease of dispatch is also important +<glyph> warner: I'm concerned about it because I consider my use of the same +idiom in the first version of PB to be a serious wart +* warner nods +<warner> I will put together a list of my reasoning +<glyph> warner: I think it's likely that PB implementors in other languages +are going to want to introduce new standard "builtin" types; our "builtins" +shouldn't be limited to python's provided data structures +<moshez> glyph: wait +<warner> ok +<moshez> glyph: are you talking of banana types +<moshez> glyph: or really PB +<warner> in which case (copy blah blah) is a non-builtin type, while +(type-foo) is a builtin type +<glyph> warner: plus, our namespaces are already quite well separated, I can +tell you I will never be declaring new types outside of quotient.* and +twisted.* :) +<warner> moshez: this is mostly banana (or what used to be jelly, really) +<glyph> warner: my inclination is to standardize by convention +<glyph> warner: *.* is a non-builtin type, [~.] is a builtin +<moshez> glyph: ? +<glyph> sorry [^.]* +<glyph> my regular expressions and shell globs are totally confused but you +know what I mean +<glyph> moshez: yes +<moshez> glyph: hrm +<saph_w> glyph: you're making crazy anime faces +<moshez> glyph: why do we need any non-Python builtin types +<glyph> moshez: because I want to destroy SOAP, and doing that means working +with people I don't like +<glyph> moshez: outside of python +<moshez> glyph: I meant, "what specific types" +<moshez> I'd appreciate a blog on that + +* have Copyable/RemoteCopy default to __getstate__/__setstate__? + +At the moment, the default implementations of getStateToCopy() and +setCopyableState() get and set __dict__ directly. Should the default instead +be to call __getstate__() or __setstate__()? + +* make slicer/unslicers for pb.RemoteInterfaces + +exarkun's use case requires these Interfaces to be passable by reference +(i.e. by name). It would also be interesting to let them be passed (and +requested!) by value, so you can ask a remote peer exactly what their +objects will respond to (the method names, the argument values, the return +value). This also requires that constraints be serializable. + +do this, should be referenceable (round-trip should return the same object), +should use the same registration lookup that RemoteReference(interfacelist) +uses + +* investigate decref/Referenceable race + +Any object that includes some state when it is first sent across the wire +needs more thought. The far end could drop the last reference (at time t=1) +while a method is still pending that wants to send back the same object. If +the method finishes at time t=2 but the decref isn't received until t=3, the +object will be sent across the wire without the state, and the far end will +receive it for the "first" time without that associated state. + +This kind of conserve-bandwidth optimization may be a bad idea. Or there +might be a reasonable way to deal with it (maybe request the state if it +wasn't sent and the recipient needs it, and delay delivery of the object +until the state arrives). + +DONE, the RemoteReference is held until the decref has been acked. As long as +the methods are executed in-order, this will prevent the race. TODO: +third-party references (and other things that can cause out-of-order +execution) could mess this up. + +* sketch out how to implement glyph's crazy non-compressed sexpr encoding + +* consider a smaller scope for OPEN-counter reference numbers + +For newpb, we moved to implicit reference numbers (counting OPEN tags +instead of putting a number in the OPEN tag) because we didn't want to burn +so much bandwidth: it isn't feasible to predict whether your object will +need to be referenced in the future, so you always have to be prepared to +reference it, so we always burn the memory to keep track of them (generally +in a ScopedSlicer subclass). If we used explicit refids then we'd have to +burn the bandwidth too. + +The sorta-problem is that these numbers will grow without bound as long as +the connection remains open. After a few hours of sending 100-byte objects +over a 100MB connection, you'll hit 1G-references and will have to start +sending them as LONGINT tokens, which is annoying and slightly verbose (say +3 or 4 bytes of number instead of 1 or 2). You never keep track of that many +actual objects, because the references do not outlive their parent +ScopedSlicer. + +The fact that the references themselves are scoped to the ScopedSlicer +suggests that the reference numbers could be too. Each ScopedSlicer would +track the number of OPEN tokens emitted (actually the number of +slicerForObject calls made, except you'd want to use a different method to +make sure that children who return a Slicer themselves don't corrupt the +OPEN count). + +This requires careful synchronization between the ScopedSlicers on one end +and the ScopedUnslicers on the other. I suspect it would be slightly +fragile. + +One sorta-benefit would be that a somewhat human-readable sexpr-based +encoding would be even more human readable if the reference numbers stayed +small (you could visually correlate objects and references more easily). The +ScopedSlicer's open-parenthesis could be represented with a curly brace or +something, then the refNN number would refer to the NN'th left-paren from +the last left-brace. It would also make it clear that the recipient will not +care about objects outside that scope. + +* implement the FDSlicer + +Over a unix socket, you can pass fds. exarkun had a presentation at PyCon04 +describing the use of this to implement live application upgrade. I think +that we could make a simple FDSlicer to hide the complexity of the +out-of-band part of the communication. + +class Server(unix.Server): + def sendFileDescriptors(self, fileno, data="Filler"): + """ + @param fileno: An iterable of the file descriptors to pass. + """ + payload = struct.pack("%di" % len(fileno), *fileno) + r = sendmsg(self.fileno(), data, 0, (socket.SOL_SOCKET, SCM_RIGHTS, payload)) + return r + +class Client(unix.Client): + def doRead(self): + if not self.connected: + return + try: + msg, flags, ancillary = recvmsg(self.fileno()) + except: + log.msg('recvmsg():') + log.err() + else: + buf = ancillary[0][2] + fds = [] + while buf: + fd, buf = buf[:4], buf[4:] + fds.append(struct.unpack("i", fd)[0]) + try: + self.protocol.fileDescriptorsReceived(fds) + except: + log.msg('protocol.fileDescriptorsReceived') + log.err() + return unix.Client.doRead(self) + +* implement AsyncDeferred returns + +dash wanted to implement a TransferrableReference object with a scheme that +would require creating a new connection (to a third-party Broker) during +ReferenceUnslicer.receiveClose . This would cause the object deserialization +to be asynchronous. + +At the moment, Unslicers can return a Deferred from their receiveClose +method. This is used by immutable containers (like tuples) to indicate that +their object cannot be created yet. Other containers know to watch for these +Deferreds and add a callback which will update their own entries +appropriately. The implicit requirement is that all these Deferreds fire +before the top-level parent object (usually a CallUnslicer) finishes. This +allows for circular references involving immutable containers to be resolved +into the final object graph before the target method is invoked. + +To accomodate Deferreds which will fire at arbitrary points in the future, +it would be useful to create a marker subclass named AsyncDeferred. If an +unslicer returns such an object, the container parent starts by treating it +like a regular Deferred, but it also knows that its object is not +"complete", and therefore returns an AsyncDeferred of its own. When the +child completes, the parent can complete, etc. The difference between the +two types: Deferred means that the object will be complete before the +top-level parent is finished, AsyncDeferred makes claims about when the +object will be finished. + +CallUnslicer would know that if any of its arguments are Deferreds or +AsyncDeferreds then it need to hold off on the broker.doCall until all those +Deferreds have fired. Top-level objects are not required to differentiate +between the two types, because they do not return an object to an enclosing +parent (the CallUnslicer is a child of the RootUnslicer, but it always +returns None). + +Other issues: we'll need a schema to let you say whether you'll accept these +late-bound objects or not (because if you do accept them, you won't be able +to impose the same sorts of type-checks as you would on immediate objects). +Also this will impact the in-order-invocation promises of PB method calls, +so we may need to implement the "it is ok to run this asynchronously" flag +first, then require that TransferrableReference objects are only passed to +methods with the flag set. + +Also, it may not be necessary to have a marker subclass of Deferred: perhaps +_any_ Deferred which arrives from a child is an indication that the object +will not be available until an unknown time in the future, and obligates the +parent to return another Deferred upwards (even though their object could be +created synchronously). Or, it might be better to implement this some other +way, perhaps separating "here is my object" from "here is a Deferred that +will fire when my object is complete", like a call to +parent.addDependency(self.deferred) or something. + +DONE, needs testing + +* TransferrableReference + +class MyThing(pb.Referenceable): pass +r1 = MyThing() +r2 = Facet(r1) +g1 = Global(r1) +class MyGlobalThing(pb.GloballyReferenceable): pass +g2 = MyGlobalThing() +g3 = Facet(g2) + +broker.setLocation("pb://hostname.com:8044") + +rem.callRemote("m1", r1) # limited to just this connection +rem.callRemote("m2", Global(r1)) # can be published +g3 = Global(r1) +rem.callRemote("m3", g1) # can also be published.. +g1.revoke() # but since we remember it, it can be revoked too +g1.restrict() # and, as a Facet, we can revoke some functionality but not all + +rem.callRemote("m1", g2) # can be published + +E tarball: jsrc/net/captp/tables/NearGiftTable + +issues: + 1: when A sends a reference on B to C, C's messages to the object + referenced must arrive after any messages A sent before the reference forks + + in particular, if A does: + B.callRemote("1", hugestring) + B.callRemote("2_makeYourSelfSecure", args) + C.callRemote("3_transfer", B) + + and C does B.callRemote("4_breakIntoYou") as soon as it gets the reference, + then the A->B queue looks like (1, 2), and the A->C queue looks like (3). + The transfer message can be fast, and the resulting 4 message could be + delivered to B before the A->B queue manages to deliver 2. + + 2: an object which get passed through multiple external brokers and + eventually comes home must be recognized as a local object + + 3: Copyables that contain RemoteReferences must be passable between hosts + +E cannot do all three of these at once +http://www.erights.org/elib/distrib/captp/WormholeOp.html + +I think that it's ok to tell people who want this guarantee to explicitly +serialize it like this: + + B.callRemote("1", hugestring) + d = B.callRemote("2_makeYourSelfSecure", args) + d.addCallback(lambda res: C.callRemote("3_transfer", B)) + +Note that E might not require that method calls even have a return value, so +they might not have had a convenient way to express this enforced +serialization. + +** more thoughts + +To enforce the partial-ordering, you could do the equivalent of: + A: + B.callRemote("1", hugestring) + B.callRemote("2_makeYourSelfSecure", args) + nonce = makeNonce() + B.callRemote("makeYourSelfAvailableAs", nonce) + C.callRemote("3_transfer", (nonce, B.name)) + C: + B.callRemote("4_breakIntoYou") + +C uses the nonce when it connects to B. It knows the name of the reference, +so it can compare it against some other reference to the same thing, but it +can't actually use that name alone to get access. + +When the connection request arrives at B, it sees B.name (which is also +unguessable), so that gives it reason to believe that it should queue C's +request (that it isn't just a DoS attack). It queues it until it sees A's +request to makeYourSelfAvailableAs with the matching nonce. Once that +happens, it can provide the reference back to C. + +This implies that C won't be able to send *any* messages to B until that +handshake has completed. It might be desireable to avoid the extra round-trip +this would require. + +** more thoughts + + url = PBServerFactory.registerReference(ref, name=None) + creates human-readable URLs or random identifiers + +the factory keeps a bidirectional mapping of names and Referenceables + +when a Referenceable gets serialized, if the factory's table doesn't have a +name for it, the factory creates a random one. This entry in the table is +kept alive by two things: + + a live reference by one of the factory's Brokers + an entry in a Broker's "gift table" + +When a RemoteReference gets serialized (and it doesn't point back to the +receiving Broker, and thus get turned into a your-reference sequence), + +<warner> A->C: "I'm going to send somebody a reference to you, incref your + gift table", C->A: roger that, here's a gift nonce +<warner> A->B: "here's Carol's reference: URL plus nonce" +<warner> B->C: "I want a liveref to your 'Carol' object, here's my ticket + (nonce)", C->B: "ok, ticket redeemed, here's your liveref" + +once more, without nonces: + A->C: "I'm going to send somebody a reference to you, incref your + gift table", C->A: roger that + A->B: "here's Carol's reference: URL" + B->C: "I want a liveref to your 'Carol' object", C->B: "ok, here's your + liveref" + +really: + on A: c.vat.callRemote("giftYourReference", c).addCallback(step2) + c is serialized as (your-reference, clid) + on C: vat.remote_giftYourReference(which): self.table[which] += 1; return + on A: step2: b.introduce(c) + c is serialized as (their-reference, url) + on B: deserialization sees their-reference + newvat = makeConnection(URL) + newvat.callRemote("redeemGift", URL).addCallback(step3) + on C: vat.remote_redeemGift(URL): + ref = self.urls[URL]; self.table[ref] -= 1; return ref + ref is serialized as (my-reference, clid) + on B: step3(c): b.remote_introduce(c) + +problem: if alice sends a thousand copies, that means these 5 messages are +each send a thousand times. The makeConnection is cached, but the rest are +not. We don't rememeber that we've already made this gift before, that the +other end probably still has it. Hm, but we also don't know that they didn't +lose it already. + +** ok, a plan: + +concern 1: objects must be kept alive as long as there is a RemoteReference +to them. + +concern 2: we should be able to tell when an object is being sent for the +first time, to add metadata (interface list, public URL) that would be +expensive to add to every occurrence. + + each (my-reference) sent over the wire increases the broker's refcount on + both ends. + + the receiving Broker retains a weakref to the RemoteReference, and retains a + copy of the metadata necessary to create it in the clid table (basically the + entire contents of the RemoteReference). When the weakref expires, it marks + the clid entry as "pending-free", and sends a decref(clid,N) to the other + Broker. The decref is actually sent with broker.callRemote("decref", clid, + N), so it can be acked. + + the sending broker gets the decref and reduces its count by N. If another + reference was sent recently, this count may not drop all the way to zero, + indicating there is a reference "in flight" and the far end should be ready + to deal with it (by making a new RemoteReference with the same properties as + the old one). If N!=0, it returns False to indicate that this was not the + last decref message for the clid. If N==0, it returns True, since it is the + last decref, and removes the entry from its table. Once remote_decref + returns True, the clid is retired. + + the receiving broker receives the ack from the decref. If the ack says + last==True, the clid table entry is freed. If it says last==False, then + there should have been another (my-reference) received before the ack, so + the refcount should be non-zero. + + message sequence: + + A-> : (my-reference clid metadata) [A.myrefs[clid].refcount++ = 1] + A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 2] + ->B: receives my-ref, creates RR, B.yourrefs[clid].refcount++ = 1 + ->B: receives my-ref, B.yourrefs[clid].refcount++ = 2 + : time passes, B sees the reference go away + <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) + B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) + A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 3] + A<- : receives decref, A.myrefs[clid].refcount -= 2, now =1, returns False + ->B: receives my-ref, re-creates RR, B.yourrefs[clid].refcount++ = 1 + ->B: receives ack(False), B.checkref asserts refcount != 0 + : time passes, B sees the reference go away again + <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) + B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) + A<- : receives decref, A.myrefs[clid].refcount -= 1, now =0, returns True + del A.myrefs[clid] + ->B: receives ack(True), B.checkref asserts refcount==0 + del B.yourrefs[clid] + +B retains the RemoteReference data until it receives confirmation from A. +Therefore whenever A sends a reference that doesn't already exist in the clid +table, it is sending it to a B that doesn't know about that reference, so it +needs to send the metadata. + +concern 3: in the three-party exchange, Carol must be kept alive until Bob +has established a reference to her, even if Alice drops her carol-reference +immediately after sending the introduction to Bob. + +(my-reference, clid, [interfaces, public URL]) +(your-reference, clid) +(their-reference, URL) + +Serializing a their-reference causes an entry to be placed in the Broker's +.theirrefs[URL] table. Each time a their-reference is sent, the entry's +refcount is incremented. + +Receiving a their-reference may initiate a PB connection to the target, +followed by a getNamedReference request. When this completes (or if the +reference was already available), the recipient sends a decgift message to +the sender. This message includes a count, so multiple instances of the same +gift can be acked as a group. + +The .theirrefs entry retains a reference to the sender's RemoteReference, so +it cannot go away until the gift is acked. + +DONE, gifts are implemented, we punted on partial-ordering + +*** security, DoS + +Bob can force Alice to hold on to a reference to Carol, as long as both +connections are open, by never acknowledging the gift. + +Alice can cause Bob to open up TCP connections to arbitrary hosts and ports, +by sending third-party references to him, although the only protocol those +connections will speak is PB. + +Using yURLs and StartTLS should be enough to secure and authenticate the +connections. + +*** partial-ordering + +If we need it, the gift (their-reference message) can include a nonce, Alice +sends a makeYourSelfAvailableAs message to Carol with the nonce, and Bob must +do a new getReference with the nonce. + +Kragen came up with a good use-case for partial-ordering: + A: + B.callRemote("updateDocument", bigDocument) + C.callRemote("pleaseReviewLatest", B) + C: + B.callRemote("getLatestDocument") + + +* PBService / Tub + +Really, PB wants to be a Service, since third-party references mean it will +need to make connections to arbitrary targets, and it may want to re-use +those connections. + + s = pb.PBService() + s.listenOn(strport) # provides URL base + swissURL = s.registerReference(ref) # creates unguessable name + publicURL = s.registerReference(ref, "name") # human-readable name + s.unregister(URL) # also revokes all clids + s.unregisterReference(ref) + d = s.getReference(URL) # Deferred which fires with the RemoteReference + d = s.shutdown() # close all servers and client connections + +DONE, this makes things quite clean + +* promise pipelining + +Even without third-party references, we can do E-style promise pipelining. + +<warner> hmm. subclass of Deferred that represents a Promise, can be + serialized if it's being sent to the same broker as the RemoteReference it was + generated for +<dash> warner: hmmm. how's that help us? +<dash> oh, pipelining? +<warner> maybe a flag on the callRemote to say that "yeah, I want a + DeferredPromise out of you, but I'm only going to include it as an argument to + another method call I'm sending you, so don't bother sending *me* the result" +<dash> aah +<dash> yeah +<dash> that sounds like a reasonable approach +<warner> that would actually work +<warner> dash: do you know if E makes any attempt to handle >2 vats in their + pipelining implementation? seems to me it could turn into a large network + optimization problem pretty quickly +<dash> warner: Mmm +<warner> hmm +<dash> I do not think you have to +<warner> so you have: t1=a.callRemote("foo",args1); + t2=t1.callRemote("bar",args2), where callRemote returns a Promise, which is a + special kind of Deferred that remembers the Broker its answer will eventually + come from. If args2 consists of entirely immediate things (no Promises) or + Promises that are coming from the same broker as t1 uses, then the "bar" call + is eligible for pipelining and gets sent to the remote broker +<warner> in the resulting newpb banana sequence, the clid of the target method + is replaced by another kind of clid, which means "the answer you're going to + send to method call #N", where N comes from t1 +<dash> mmm yep +<warner> using that new I-can't-unserialize-this-yet hook we added, the second + call sequence doesn't finish unserializing until the first call finishes and + sends the answer. Sending answer #N fires the hook's deferred. +<warner> that triggers the invocation of the second method +<dash> yay +<warner> hm, of course that totally blows away the idea of using a Constraint + on the arguments to the second method +<warner> because you don't even know what the object is until after the + arguments have arrived +<warner> but +<dash> well +<warner> the first method has a schema, which includes a return constraint +<dash> okay you can't fail synchronously +<warner> so you *can* assert that, whatever the object will be, it obeys that + constraint +<dash> but you can return a failure like everybody else +<warner> and since the constraint specifies an Interface, then the Interface + plus mehtod name is enough to come up with an argument constraint +<warner> so you can still enforce one +<warner> this is kind of cool +<dash> the big advantage of pipelining is that you can have a lot of + composable primitives on your remote interfaces rather than having to smush + them together into things that are efficient to call remotely +<warner> hm, yeah, as long as all the arguments are either immediate or + reference something on the recipient +<warner> as soon as a third party enters the equation, you have to decide + whether to wait for the arguments to resolve locally or if it might be faster + to throw them at someone else +<warner> that's where the network-optimization thing I mentioned before comes + into play +<dash> mmm +<warner> you send messages to A and to B, once you get both results you want + to send the pair to C to do something with them +<dash> spin me an example scenario +<dash> Hmm +<warner> if all three are close to each other, and you're far from all of + them, it makes more sense to tell C about A and B +<dash> how _does_ E handle that +<warner> or maybe tell A and B about C, tell them "when you get done, send + your results to C, who will be waiting for them" +<dash> warner: yeah, i think that the right thing to do is to wait for them to + resolve locally +<Tv> assuming that C can talk to A and B is bad +<dash> no it isn't +<Tv> well, depends on whether you live in this world or not :) +<dash> warner: if you want other behaviour then you should have to set it up + explicitly, i think +<warner> I'm not even sure how you would describe that sort of thing. It'd be + like routing protocols, you assign a cost to each link and hope some magical + omniscient entity can pick an optimal solution + +** revealing intentions + +<zooko> Now suppose I say "B.your_fired(C.revoke_his_rights())", or such. +<warner> A->C: sell all my stock. A->B: declare bankruptcy + +If B has access to C, and the promises are pipelined, then B has a window +during which they know something's about to happen, and they still have full +access to C, so they can do evil. + +Zooko tried to explain the concern to MarkM years ago, but didn't have a +clear example of the problem. The thing is, B can do evil all the time, +you're just trying to revoke their capability *before* they get wind of your +intentions. Keeping intentions secret is hard, much harder than limiting +someone's capabilities. It's kind of the trailing edge of the capability, as +opposed to the leading edge. + +Zooko feels the language needs clear support for expressing how the +synchronization needs to take place, and which domain it needs to happen in. + +* web-calculus integration + +Tyler pointed out that it is vital for a node to be able to grant limited +access to some held object. Specifically, Alice may want to give Bob a +reference not to Carol as a whole, but to just a specific Carol.remote_foo +method (and not to any other methods that Alice might be allowed to invoke). +I had been thinking of using RemoteInterfaces to indicate method subsets, +something like this: + + bob.callRemote("introduce", Facet(self, RIMinimal)) + +but Tyler thinks that this is too coarse-grained and not likely to encourage +the right kinds of security decisions. In his web-calculus, recipients can +grant third-parties access to individual bound methods. + + bob.callRemote("introduce", carol.getMethod("howdy")) + +If I understand it correctly, his approach makes Referenceables into a +copy-by-value object that is represented by a dictionary which maps method +names to these RemoteMethod objects, so there is no actual +callRemote(methname) method. Instead you do something like: + + rr = tub.getReference(url) + d = rr['introduce'].call(args) + +These RemoteMethod objects are top-level, so unguessable URLs must be +generated for them when they are sent, and they must be reference-counted. It +must not be possible to get from the bound method to the (unrestricted) +referenced object. + +TODO: how does the web-calculus maintain reference counts for these? It feels +like there would be an awful lot of messages being thrown around. + +To implement this, we'll need: + + banana sequences for bound methods + ('my-method', clid, url) + ('your-method', clid) + ('their-method', url, RI+methname?) + syntax to carve a single method out of a local Referenceable + A: self.doFoo (only if we get rid of remote_) + B: self.remote_doFoo + C: self.getMethod("doFoo") + D: self.getMethod(RIFoo['doFoo']) + leaning towards C or D + syntax to carve a single method out of a RemoteReference + A: rr.doFoo + B: rr.getMethod('doFoo') + C: rr.getMethod(RIFoo['doFoo']) + D: rr['doFoo'] + E: rr[RIFoo['doFoo']] + leaning towards B or C + decide whether to do getMethod early or late + early means ('my-reference') includes a big dict of my-method values + and a whole bunch of DECREFs when that dict goes away + late means there is a remote_tub.getMethod(your-ref, methname) call + and an extra round-trip to retrieve them + dash thinks late is better + +We could say that the 'my-reference' sequence for any RemoteInterface-enabled +Referenceable will include a dictionary of bound methods. The receiving end +will just stash the whole thing. + +* do implicit "doFoo" -> RIFoo["doFoo"] conversion + +I want rr.callRemote("doFoo", args) to take advantage of a RemoteInterface, +if one is available. RemoteInterfaces aren't supposed to be overlapping (at +least not among RemoteInterfaces that are shared by a single Referenceable), +so there shouldn't be any ambiguity. If there is, we can raise an error. + +* accept Deferreds as arguments? + + bob.callRemote("introduce", target=self.tub.getReference(pburl)) + or + bob.callRemote("introduce", carol.getMethod("doFoo")) + instead of + carol.getMethod("doFoo").addCallback(lambda r: bob.callRemote("introduce", r)) + +If one of the top-level arguments to callRemote is a Deferred, don't send the +method request until all the arguments resolve. If any of the arguments +errback, the callRemote will fail with some new exception (that can contain a +reference to the argument's exception). + +however, this would mean the method would be invoked out-of-order w.r.t. an +immediately-following bob.callRemote + +put this off until we get some actual experience. + +* batch decrefs? + +If we implement the copy-by-value Referenceable idea, then a single gc may +result in dozens of simultaneous decrefs. It would be nice to reduce the +traffic generated by that. + +* promise pipelining + +Promise(Deferred).__getattr__ + +DoS prevention techniques in CapIDL (MarkM) + +pb://key@ip,host,[ipv6],localhost,[/unix]/swissnumber +tubs for lifetime management +separate listener object, share tubs between listeners + distinguish by key number + + actually, why bother with separate keys? Why allow the outside world to + distinguish between these sub-Tubs? Use them purely for lifetime management, + not security properties. That means a name->published-object table for each + SubTub, maybe a hierarchy of them, and the parent-most Tub gets the + Listeners. Incoming getReferenceByURL requests require a lookup in all Tubs + that descend from the one attached to that listener. + +So one decision is whether to have implicitly-published objects have a name +that lasts forever (well, until the Tub is destroyed), or if they should be +reference-counted. If they are reference counted, then outstanding Gifts need +to maintain a reference, and the gift must be turned into a live +RemoteReference right away. It has bearing on how/if we implement SturdyRefs, +so I need to read more about them in the E docs. + +Hrm, and creating new Tubs from within a remote_foo method.. to make that +useful, you'd need to have a way to ask for the Tub through which you were +being invoked. hrm. + +* creating new Tubs + +Tyler suggests using Tubs for namespace management. Tubs can share TCP +listening ports, but MarkS recommends giving them all separate keys (which +means separate SSL sessions, so separate TCP connections). Bill Frantz +discourages using a hierarchy of Tubs, says it's not the sort of thing you +want to be locked into. + +That means I'll need a separate Listener object, where the rule is that the +last Tub to be stopped makes the Listener stop too.. probably abuse the +Service interface in some wacky way to pull this off. + +Creating a new Tub.. how to conveniently create it with the same Listeners as +the current one? If the method that's creating the Tub is receiving a +reference, the Tub can be an attribute of the inbound RemoteReference. If +not, that's trickier.. the _tub= argument may still be a useful way to go. +Once you've got a source tub, then tub.newTub() should create a new one with +the same Listeners as the source (but otherwise unassociated with it). + +Once you have the new Tub, registering an object in it should return +something that can be directly serialized into a gift. + +class Target(pb.Referenceable): + def remote_startGame(self, player_black, player_white): + tub = player_black.tub.newTub() + game = self.createGame() + gameref = tub.register(game) + game.setPlayer("black", tub.something(player_black)) + game.setPlayer("white", tub.something(player_white)) + return gameref + +Hmm. So, create a SturdyRef class, which remembers the tubid (key), list of +location hints, and object name. These have a url() method that renders out a +URL string, and a compare method which compares the tubid and object name but +ignores the location hints. Serializing a SturdyRef creates a their-reference +sequence. Tub.register takes an object (and maybe a name) and returns a +SturdyRef. Tub.getReference takes either a URL or a SturdyRef. +RemoteReferences should have a .getSturdyRef method. + +Actually, I think SturdyRefs should be serialized as Copyables, and create +SturdyRefs on the other side. The new-tub sequence should be: + + create new tub, using the Listener from an existing tub + register the objects in the new tub, obtaining a SturdyRef + send/return SendLiveRef(sturdyref) to the far side + SendLiveRef is a wrapper that causes a their-reference sequence to be sent. + The alternative is to obtain an actual live reference (via + player_black.tub.getReference(sturdyref) first), then send that, but it's + kind of a waste if you don't actually want to use the liveref yourself. + +Note that it becomes necessary to provide for local references here: ones in +different Tubs which happen to share a Listener. These can use real TCP +connections (unless the Listener hint is only valid from the outside world). +It might be possible to use some tricks cut out some of the network overhead, +but I suspect there are reasons why you wouldn't actually want to do that. diff --git a/src/foolscap/foolscap/__init__.py b/src/foolscap/foolscap/__init__.py index ab58d559..b3d41909 100644 --- a/src/foolscap/foolscap/__init__.py +++ b/src/foolscap/foolscap/__init__.py @@ -1,6 +1,6 @@ """Foolscap""" -__version__ = "0.1.4" +__version__ = "0.1.5" # here are the primary entry points from foolscap.pb import Tub, UnauthenticatedTub, getRemoteURL_TCP diff --git a/src/foolscap/foolscap/broker.py b/src/foolscap/foolscap/broker.py index 1977f4d6..4da9ebc3 100644 --- a/src/foolscap/foolscap/broker.py +++ b/src/foolscap/foolscap/broker.py @@ -111,8 +111,7 @@ class PBRootUnslicer(RootUnslicer): def receiveChild(self, token, ready_deferred): if isinstance(token, call.InboundDelivery): - assert ready_deferred is None - self.broker.scheduleCall(token) + self.broker.scheduleCall(token, ready_deferred) @@ -215,6 +214,7 @@ class Broker(banana.Banana, referenceable.Referenceable): self.disconnectWatchers = [] # receiving side uses these self.inboundDeliveryQueue = [] + self._call_is_running = False self.activeLocalCalls = {} # the other side wants an answer from us def setTub(self, tub): @@ -506,33 +506,31 @@ class Broker(banana.Banana, referenceable.Referenceable): return m return None - def scheduleCall(self, delivery): - self.inboundDeliveryQueue.append(delivery) + def scheduleCall(self, delivery, ready_deferred): + self.inboundDeliveryQueue.append( (delivery,ready_deferred) ) eventually(self.doNextCall) - def doNextCall(self, ignored=None): - if not self.inboundDeliveryQueue: + def doNextCall(self): + if self._call_is_running: return - nextCall = self.inboundDeliveryQueue[0] - if nextCall.isRunnable(): - # remove it and arrange to run again soon - self.inboundDeliveryQueue.pop(0) - delivery = nextCall - if self.inboundDeliveryQueue: - eventually(self.doNextCall) - - # now perform the actual delivery - d = defer.maybeDeferred(self._doCall, delivery) - d.addCallback(self._callFinished, delivery) - d.addErrback(self.callFailed, delivery.reqID, delivery) + if not self.inboundDeliveryQueue: return - # arrange to wake up when the next call becomes runnable - d = nextCall.whenRunnable() - d.addCallback(self.doNextCall) + delivery, ready_deferred = self.inboundDeliveryQueue.pop(0) + self._call_is_running = True + if not ready_deferred: + ready_deferred = defer.succeed(None) + d = ready_deferred + d.addCallback(lambda res: self._doCall(delivery)) + d.addCallback(self._callFinished, delivery) + d.addErrback(self.callFailed, delivery.reqID, delivery) + def _done(res): + self._call_is_running = False + eventually(self.doNextCall) + d.addBoth(_done) + return None def _doCall(self, delivery): obj = delivery.obj - assert delivery.allargs.isReady() args = delivery.allargs.args kwargs = delivery.allargs.kwargs for i in args + kwargs.values(): diff --git a/src/foolscap/foolscap/call.py b/src/foolscap/foolscap/call.py index 2f1ccd6a..e675a7b8 100644 --- a/src/foolscap/foolscap/call.py +++ b/src/foolscap/foolscap/call.py @@ -3,11 +3,11 @@ from twisted.python import failure, log, reflect from twisted.internet import defer from foolscap import copyable, slicer, tokens -from foolscap.eventual import eventually from foolscap.copyable import AttributeDictConstraint from foolscap.constraint import ByteStringConstraint from foolscap.slicers.list import ListConstraint from tokens import BananaError, Violation +from foolscap.util import AsyncAND class FailureConstraint(AttributeDictConstraint): @@ -162,21 +162,6 @@ class InboundDelivery: self.methodname = methodname self.methodSchema = methodSchema self.allargs = allargs - if allargs.isReady(): - self.runnable = True - self.runnable = False - - def isRunnable(self): - if self.allargs.isReady(): - return True - return False - - def whenRunnable(self): - if self.allargs.isReady(): - return defer.succeed(self) - d = self.allargs.whenReady() - d.addCallback(lambda res: self) - return d def logFailure(self, f): # called if tub.logLocalFailures is True @@ -211,7 +196,8 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): self.argname = None self.argConstraint = None self.num_unreferenceable_children = 0 - self.num_unready_children = 0 + self._all_children_are_referenceable_d = None + self._ready_deferreds = [] self.closed = False def checkToken(self, typebyte, size): @@ -248,7 +234,7 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): if self.debug: log.msg("%s.receiveChild: %s %s %s %s %s args=%s kwargs=%s" % (self, self.closed, self.num_unreferenceable_children, - self.num_unready_children, token, ready_deferred, + len(self._ready_deferreds), token, ready_deferred, self.args, self.kwargs)) if self.numargs is None: # this token is the number of positional arguments @@ -273,12 +259,10 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): # resolved yet. self.num_unreferenceable_children += 1 argvalue.addCallback(self.updateChild, argpos) - argvalue.addErrback(self.explode) if ready_deferred: if self.debug: log.msg("%s.receiveChild got an unready posarg" % self) - self.num_unready_children += 1 - ready_deferred.addCallback(self.childReady) + self._ready_deferreds.append(ready_deferred) if len(self.args) < self.numargs: # more to come ms = self.methodSchema @@ -291,6 +275,7 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): if self.argname is None: # this token is the name of a keyword argument + assert ready_deferred is None self.argname = token # if the argname is invalid, this may raise Violation ms = self.methodSchema @@ -308,12 +293,10 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): if isinstance(argvalue, defer.Deferred): self.num_unreferenceable_children += 1 argvalue.addCallback(self.updateChild, self.argname) - argvalue.addErrback(self.explode) if ready_deferred: if self.debug: log.msg("%s.receiveChild got an unready kwarg" % self) - self.num_unready_children += 1 - ready_deferred.addCallback(self.childReady) + self._ready_deferreds.append(ready_deferred) self.argname = None return @@ -333,70 +316,31 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): else: self.kwargs[which] = obj self.num_unreferenceable_children -= 1 - self.checkComplete() + if self.num_unreferenceable_children == 0: + if self._all_children_are_referenceable_d: + self._all_children_are_referenceable_d.callback(None) return obj - def childReady(self, obj): - self.num_unready_children -= 1 - if self.debug: - log.msg("%s.childReady, now %d left" % - (self, self.num_unready_children)) - log.msg(" obj=%s, args=%s, kwargs=%s" % - (obj, self.args, self.kwargs)) - self.checkComplete() - return obj - - def checkComplete(self): - # this is called each time one of our children gets updated or - # becomes ready (like when a Gift is finally resolved) - if self.debug: - log.msg("%s.checkComplete: %s %s %s args=%s kwargs=%s" % - (self, self.closed, self.num_unreferenceable_children, - self.num_unready_children, self.args, self.kwargs)) - - if not self.closed: - return - if self.num_unreferenceable_children: - return - if self.num_unready_children: - return - # yup, we're done. Notify anyone who is still waiting - if self.debug: - log.msg(" we are ready") - for d in self.watchers: - eventually(d.callback, self) - del self.watchers def receiveClose(self): if self.debug: log.msg("%s.receiveClose: %s %s %s" % (self, self.closed, self.num_unreferenceable_children, - self.num_unready_children)) + len(self._ready_deferreds))) if (self.numargs is None or len(self.args) < self.numargs or self.argname is not None): raise BananaError("'arguments' sequence ended too early") self.closed = True - self.watchers = [] - # we don't return a ready_deferred. Instead, the InboundDelivery - # object queries our isReady() method directly. - return self, None - - def isReady(self): - assert self.closed + dl = [] if self.num_unreferenceable_children: - return False - if self.num_unready_children: - return False - return True - - def whenReady(self): - assert self.closed - if self.isReady(): - return defer.succeed(self) - d = defer.Deferred() - self.watchers.append(d) - return d + d = self._all_children_are_referenceable_d = defer.Deferred() + dl.append(d) + dl.extend(self._ready_deferreds) + ready_deferred = None + if dl: + ready_deferred = AsyncAND(dl) + return self, ready_deferred def describe(self): s = "<arguments" @@ -409,11 +353,9 @@ class ArgumentUnslicer(slicer.ScopedUnslicer): else: s += " arg[?]" if self.closed: - if self.isReady(): - # waiting to be delivered - s += " ready" - else: - s += " waiting" + s += " closed" + # TODO: it would be nice to indicate if we still have unready + # children s += ">" return s @@ -430,6 +372,7 @@ class CallUnslicer(slicer.ScopedUnslicer): self.interface = None self.methodname = None self.methodSchema = None # will be a MethodArgumentsConstraint + self._ready_deferreds = [] def checkToken(self, typebyte, size): # TODO: limit strings by returning a number instead of None @@ -472,13 +415,13 @@ class CallUnslicer(slicer.ScopedUnslicer): def receiveChild(self, token, ready_deferred=None): assert not isinstance(token, defer.Deferred) - assert ready_deferred is None if self.debug: log.msg("%s.receiveChild [s%d]: %s" % (self, self.stage, repr(token))) if self.stage == 0: # reqID # we don't yet know which reqID to send any failure to + assert ready_deferred is None self.reqID = token self.stage = 1 if self.reqID != 0: @@ -488,6 +431,7 @@ class CallUnslicer(slicer.ScopedUnslicer): if self.stage == 1: # objID # this might raise an exception if objID is invalid + assert ready_deferred is None self.objID = token self.obj = self.broker.getMyReferenceByCLID(token) #iface = self.broker.getRemoteInterfaceByName(token) @@ -517,6 +461,7 @@ class CallUnslicer(slicer.ScopedUnslicer): # class). If this expectation were to go away, a quick # obj.__class__ -> RemoteReferenceSchema cache could be built. + assert ready_deferred is None self.stage = 3 if self.objID < 0: @@ -548,6 +493,8 @@ class CallUnslicer(slicer.ScopedUnslicer): # queue the message. It will not be executed until all the # arguments are ready. The .args list and .kwargs dict may change # before then. + if ready_deferred: + self._ready_deferreds.append(ready_deferred) self.stage = 4 return @@ -559,7 +506,10 @@ class CallUnslicer(slicer.ScopedUnslicer): self.interface, self.methodname, self.methodSchema, self.allargs) - return delivery, None + ready_deferred = None + if self._ready_deferreds: + ready_deferred = AsyncAND(self._ready_deferreds) + return delivery, ready_deferred def describe(self): s = "<methodcall" @@ -600,6 +550,11 @@ class AnswerUnslicer(slicer.ScopedUnslicer): resultConstraint = None haveResults = False + def start(self, count): + slicer.ScopedUnslicer.start(self, count) + self._ready_deferreds = [] + self._child_deferred = None + def checkToken(self, typebyte, size): if self.request is None: if typebyte != tokens.INT: @@ -633,15 +588,20 @@ class AnswerUnslicer(slicer.ScopedUnslicer): return unslicer def receiveChild(self, token, ready_deferred=None): - assert not isinstance(token, defer.Deferred) - assert ready_deferred is None if self.request == None: + assert not isinstance(token, defer.Deferred) + assert ready_deferred is None reqID = token # may raise Violation for bad reqIDs self.request = self.broker.getRequest(reqID) self.resultConstraint = self.request.constraint else: - self.results = token + if isinstance(token, defer.Deferred): + self._child_deferred = token + else: + self._child_deferred = defer.succeed(token) + if ready_deferred: + self._ready_deferreds.append(ready_deferred) self.haveResults = True def reportViolation(self, f): @@ -652,7 +612,32 @@ class AnswerUnslicer(slicer.ScopedUnslicer): return f # give up our sequence def receiveClose(self): - self.request.complete(self.results) + # three things must happen before our request is complete: + # receiveClose has occurred + # the receiveChild object deferred (if any) has fired + # ready_deferred has finished + # If ready_deferred errbacks, provide its failure object to the + # request. If not, provide the request with whatever receiveChild + # got. + + if not self._child_deferred: + raise BananaError("Answer didn't include an answer") + + if self._ready_deferreds: + d = AsyncAND(self._ready_deferreds) + else: + d = defer.succeed(None) + + def _ready(res): + return self._child_deferred + d.addCallback(_ready) + + def _done(res): + self.request.complete(res) + def _fail(f): + self.request.fail(f) + d.addCallbacks(_done, _fail) + return None, None def describe(self): @@ -818,6 +803,30 @@ class CopiedFailure(failure.Failure, copyable.RemoteCopyOldStyle): self.frames = [] self.stack = [] + # MAYBE: for native exception types, be willing to wire up a + # reference to the real exception class. For other exception types, + # our .type attribute will be a string, which (from a Failure's point + # of view) looks as if someone raised an old-style string exception. + # This is here so that trial will properly render a CopiedFailure + # that comes out of a test case (since it unconditionally does + # reflect.qual(f.type) + + # ACTUALLY: replace self.type with a class that looks a lot like the + # original exception class (meaning that reflect.qual() will return + # the same string for this as for the original). If someone calls our + # .trap method, resulting in a new Failure with contents copied from + # this one, then the new Failure.printTraceback will attempt to use + # reflect.qual() on our self.type, so it needs to be a class instead + # of a string. + + assert isinstance(self.type, str) + typepieces = self.type.split(".") + class ExceptionLikeString: + pass + self.type = ExceptionLikeString + self.type.__module__ = ".".join(typepieces[:-1]) + self.type.__name__ = typepieces[-1] + def __str__(self): return "[CopiedFailure instance: %s]" % self.getBriefTraceback() @@ -829,3 +838,21 @@ class CopiedFailure(failure.Failure, copyable.RemoteCopyOldStyle): file.write(self.traceback) copyable.registerRemoteCopy(FailureSlicer.classname, CopiedFailure) + +class CopiedFailureSlicer(FailureSlicer): + # A calls B. B calls C. C fails and sends a Failure to B. B gets a + # CopiedFailure and sends it to A. A should get a CopiedFailure too. This + # class lives on B and slicers the CopiedFailure as it is sent to A. + slices = CopiedFailure + + def getStateToCopy(self, obj, broker): + state = {} + for k in ('value', 'type', 'parents'): + state[k] = getattr(obj, k) + if broker.unsafeTracebacks: + state['traceback'] = obj.traceback + else: + state['traceback'] = "Traceback unavailable\n" + if not isinstance(state['type'], str): + state['type'] = reflect.qual(state['type']) # Exception class + return state diff --git a/src/foolscap/foolscap/ipb.py b/src/foolscap/foolscap/ipb.py index 795ef7f5..22a98c0d 100644 --- a/src/foolscap/foolscap/ipb.py +++ b/src/foolscap/foolscap/ipb.py @@ -62,6 +62,10 @@ class IRemoteReference(Interface): notifyOnDisconnect handlers are cancelled. """ + def dontNotifyOnDisconnect(cookie): + """Deregister a callback that was registered with notifyOnDisconnect. + """ + def callRemote(name, *args, **kwargs): """Invoke a method on the remote object with which I am associated. diff --git a/src/foolscap/foolscap/pb.py b/src/foolscap/foolscap/pb.py index 735f1dbc..e40056e0 100644 --- a/src/foolscap/foolscap/pb.py +++ b/src/foolscap/foolscap/pb.py @@ -230,6 +230,8 @@ class Tub(service.MultiService): self.nameToReference = weakref.WeakValueDictionary() self.referenceToName = weakref.WeakKeyDictionary() self.strongReferences = [] + self.nameLookupHandlers = [] + # remote stuff. Most of these use a TubRef (or NoAuthTubRef) as a # dictionary key self.tubConnectors = {} # maps TubRef to a TubConnector @@ -487,7 +489,15 @@ class Tub(service.MultiService): return name def getReferenceForName(self, name): - return self.nameToReference[name] + if name in self.nameToReference: + return self.nameToReference[name] + for lookup in self.nameLookupHandlers: + ref = lookup(name) + if ref: + if ref not in self.referenceToName: + self.referenceToName[ref] = name + return ref + raise KeyError("unable to find reference for name '%s'" % (name,)) def getReferenceForURL(self, url): # TODO: who should this be used by? @@ -526,6 +536,46 @@ class Tub(service.MultiService): self.strongReferences.remove(ref) self.revokeReference(ref) + def registerNameLookupHandler(self, lookup): + """Add a function to help convert names to Referenceables. + + When remote systems pass a FURL to their Tub.getReference(), our Tub + will be asked to locate a Referenceable for the name inside that + furl. The normal mechanism for this is to look at the table + maintained by registerReference() and unregisterReference(). If the + name does not exist in that table, other 'lookup handler' functions + are given a chance. Each lookup handler is asked in turn, and the + first which returns a non-None value wins. + + This may be useful for cases where the furl represents an object that + lives on disk, or is generated on demand: rather than creating all + possible Referenceables at startup, the lookup handler can create or + retrieve the objects only when someone asks for them. + + Note that constructing the FURLs of these objects may be non-trivial. + It is safe to create an object, use tub.registerReference in one + invocation of a program to obtain (and publish) the furl, parse the + furl to extract the name, save the contents of the object on disk, + then in a later invocation of the program use a lookup handler to + retrieve the object from disk. This approach means the objects that + are created in a given invocation stick around (inside + tub.strongReferences) for the rest of that invocation. An alternatve + approach is to create the object but *not* use tub.registerReference, + but in that case you have to construct the FURL yourself, and the Tub + does not currently provide any support for doing this robustly. + + @param lookup: a callable which accepts a name (as a string) and + returns either a Referenceable or None. Note that + these strings should not contain a slash, a question + mark, or an ampersand, as these are reserved in the + FURL for later expansion (to add parameters beyond the + object name) + """ + self.nameLookupHandlers.append(lookup) + + def unregisterNameLookupHandler(self, lookup): + self.nameLookupHandlers.remove(lookup) + def getReference(self, sturdyOrURL): """Acquire a RemoteReference for the given SturdyRef/URL. diff --git a/src/foolscap/foolscap/referenceable.py b/src/foolscap/foolscap/referenceable.py index d940429f..8c2e06a8 100644 --- a/src/foolscap/foolscap/referenceable.py +++ b/src/foolscap/foolscap/referenceable.py @@ -11,7 +11,7 @@ from zope.interface import implements from twisted.python.components import registerAdapter Interface = interface.Interface from twisted.internet import defer -from twisted.python import failure +from twisted.python import failure, log from foolscap import ipb, slicer, tokens, call BananaError = tokens.BananaError @@ -21,7 +21,7 @@ from foolscap.remoteinterface import getRemoteInterface, \ getRemoteInterfaceByName, RemoteInterfaceConstraint from foolscap.schema import constraintMap from foolscap.copyable import Copyable, RemoteCopy -from foolscap.eventual import eventually +from foolscap.eventual import eventually, fireEventually class OnlyReferenceable(object): implements(ipb.IReferenceable) @@ -538,6 +538,33 @@ class RemoteMethodReference(RemoteReference): methodSchema = None return interfaceName, methodName, methodSchema +class LocalReferenceable: + implements(ipb.IRemoteReference) + def __init__(self, original): + self.original = original + + def notifyOnDisconnect(self, callback, *args, **kwargs): + # local objects never disconnect + return None + def dontNotifyOnDisconnect(self, marker): + pass + + def callRemote(self, methname, *args, **kwargs): + def _try(ignored): + meth = getattr(self.original, "remote_" + methname) + return meth(*args, **kwargs) + d = fireEventually() + d.addCallback(_try) + return d + + def callRemoteOnly(self, methname, *args, **kwargs): + d = self.callRemote(methname, *args, **kwargs) + d.addErrback(lambda f: None) + return None + +registerAdapter(LocalReferenceable, ipb.IReferenceable, ipb.IRemoteReference) + + class YourReferenceSlicer(slicer.BaseSlicer): """I handle pb.RemoteReference objects (being sent back home to the @@ -635,11 +662,26 @@ class TheirReferenceUnslicer(slicer.LeafUnslicer): # but the message delivery must still wait for the getReference to # complete. See to it that we fire the object deferred before we fire # the ready_deferred. - obj_deferred, ready_deferred = defer.Deferred(), defer.Deferred() + + obj_deferred = defer.Deferred() + ready_deferred = defer.Deferred() + def _ready(rref): obj_deferred.callback(rref) ready_deferred.callback(rref) - d.addCallback(_ready) + def _failed(f): + # if an error in getReference() occurs, log it locally (with + # priority UNUSUAL), because this end might need to diagnose some + # connection or networking problems. + log.msg("gift (%s) failed to resolve: %s" % (self.url, f)) + # deliver a placeholder object to the container, but signal the + # ready_deferred that we've failed. This will bubble up to the + # enclosing InboundDelivery, and when it gets to the top of the + # queue, it will be flunked. + obj_deferred.callback("Place holder for a Gift which failed to " + "resolve: %s" % f) + ready_deferred.errback(f) + d.addCallbacks(_ready, _failed) return obj_deferred, ready_deferred diff --git a/src/foolscap/foolscap/slicer.py b/src/foolscap/foolscap/slicer.py index 17365d0d..b36569ab 100644 --- a/src/foolscap/foolscap/slicer.py +++ b/src/foolscap/foolscap/slicer.py @@ -1,6 +1,7 @@ # -*- test-case-name: foolscap.test.test_banana -*- from twisted.python.components import registerAdapter +from twisted.python import log from zope.interface import implements from twisted.internet.defer import Deferred import tokens @@ -190,6 +191,11 @@ class BaseUnslicer: return self.open(opentype) def receiveChild(self, obj, ready_deferred=None): + """Unslicers for containers should accumulate their children's + ready_deferreds, then combine them in an AsyncAND when receiveClose() + happens, and return the AsyncAND as the ready_deferreds half of the + receiveClose() return value. + """ pass def reportViolation(self, why): @@ -221,16 +227,20 @@ class BaseUnslicer: return None def explode(self, failure): - """If something goes wrong in a Deferred callback, it may be too - late to reject the token and to normal error handling. I haven't - figured out how to do sensible error-handling in this situation. - This method exists to make sure that the exception shows up - *somewhere*. If this is called, it is also likely that a placeholder - (probably a Deferred) will be left in the unserialized object about - to be handed to the RootUnslicer. + """If something goes wrong in a Deferred callback, it may be too late + to reject the token and to normal error handling. I haven't figured + out how to do sensible error-handling in this situation. This method + exists to make sure that the exception shows up *somewhere*. If this + is called, it is also likely that a placeholder (probably a Deferred) + will be left in the unserialized object graph about to be handed to + the RootUnslicer. """ - print "KABOOM" - print failure + + # RootUnslicer pays attention to this .exploded attribute and refuses + # to deliver anything if it is set. But PBRootUnslicer ignores it. + # TODO: clean this up, and write some unit tests to trigger it (by + # violating schemas?) + log.msg("BaseUnslicer.explode: %s" % failure) self.protocol.exploded = failure class ScopedUnslicer(BaseUnslicer): diff --git a/src/foolscap/foolscap/slicers/dict.py b/src/foolscap/foolscap/slicers/dict.py index 23a69ca3..b5207c1d 100644 --- a/src/foolscap/foolscap/slicers/dict.py +++ b/src/foolscap/foolscap/slicers/dict.py @@ -1,10 +1,11 @@ # -*- test-case-name: foolscap.test.test_banana -*- from twisted.python import log -from twisted.internet.defer import Deferred, DeferredList +from twisted.internet.defer import Deferred from foolscap.tokens import Violation, BananaError from foolscap.slicer import BaseSlicer, BaseUnslicer from foolscap.constraint import OpenerConstraint, Any, UnboundedSchema, IConstraint +from foolscap.util import AsyncAND class DictSlicer(BaseSlicer): opentype = ('dict',) @@ -105,7 +106,7 @@ class DictUnslicer(BaseUnslicer): def receiveClose(self): ready_deferred = None if self._ready_deferreds: - ready_deferred = DeferredList(self._ready_deferreds) + ready_deferred = AsyncAND(self._ready_deferreds) return self.d, ready_deferred def describe(self): diff --git a/src/foolscap/foolscap/slicers/list.py b/src/foolscap/foolscap/slicers/list.py index 70ad55f8..357a1059 100644 --- a/src/foolscap/foolscap/slicers/list.py +++ b/src/foolscap/foolscap/slicers/list.py @@ -1,10 +1,11 @@ # -*- test-case-name: foolscap.test.test_banana -*- from twisted.python import log -from twisted.internet.defer import Deferred, DeferredList +from twisted.internet.defer import Deferred from foolscap.tokens import Violation from foolscap.slicer import BaseSlicer, BaseUnslicer from foolscap.constraint import OpenerConstraint, Any, UnboundedSchema, IConstraint +from foolscap.util import AsyncAND class ListSlicer(BaseSlicer): @@ -105,7 +106,7 @@ class ListUnslicer(BaseUnslicer): def receiveClose(self): ready_deferred = None if self._ready_deferreds: - ready_deferred = DeferredList(self._ready_deferreds) + ready_deferred = AsyncAND(self._ready_deferreds) return self.list, ready_deferred def describe(self): diff --git a/src/foolscap/foolscap/slicers/set.py b/src/foolscap/foolscap/slicers/set.py index f2afa09c..85a469b6 100644 --- a/src/foolscap/foolscap/slicers/set.py +++ b/src/foolscap/foolscap/slicers/set.py @@ -9,6 +9,7 @@ from foolscap.slicer import BaseUnslicer from foolscap.tokens import Violation from foolscap.constraint import OpenerConstraint, UnboundedSchema, Any, \ IConstraint +from foolscap.util import AsyncAND class SetSlicer(ListSlicer): opentype = ("set",) @@ -136,7 +137,7 @@ class SetUnslicer(BaseUnslicer): def receiveClose(self): ready_deferred = None if self._ready_deferreds: - ready_deferred = defer.DeferredList(self._ready_deferreds) + ready_deferred = AsyncAND(self._ready_deferreds) return self.set, ready_deferred class FrozenSetUnslicer(TupleUnslicer): diff --git a/src/foolscap/foolscap/slicers/tuple.py b/src/foolscap/foolscap/slicers/tuple.py index 6469f822..3c85aefe 100644 --- a/src/foolscap/foolscap/slicers/tuple.py +++ b/src/foolscap/foolscap/slicers/tuple.py @@ -1,10 +1,11 @@ # -*- test-case-name: foolscap.test.test_banana -*- -from twisted.internet.defer import Deferred, DeferredList +from twisted.internet.defer import Deferred from foolscap.tokens import Violation from foolscap.slicer import BaseUnslicer from foolscap.slicers.list import ListSlicer from foolscap.constraint import OpenerConstraint, Any, UnboundedSchema, IConstraint +from foolscap.util import AsyncAND class TupleSlicer(ListSlicer): @@ -91,7 +92,7 @@ class TupleUnslicer(BaseUnslicer): def complete(self): ready_deferred = None if self._ready_deferreds: - ready_deferred = DeferredList(self._ready_deferreds) + ready_deferred = AsyncAND(self._ready_deferreds) t = tuple(self.list) if self.debug: @@ -111,7 +112,7 @@ class TupleUnslicer(BaseUnslicer): print " not finished yet" ready_deferred = None if self._ready_deferreds: - ready_deferred = DeferredList(self._ready_deferreds) + ready_deferred = AsyncAND(self._ready_deferreds) return self.deferred, ready_deferred # the list is already complete diff --git a/src/foolscap/foolscap/test/common.py b/src/foolscap/foolscap/test/common.py index b53d0a72..d31c7aa8 100644 --- a/src/foolscap/foolscap/test/common.py +++ b/src/foolscap/foolscap/test/common.py @@ -266,6 +266,9 @@ class Target(Referenceable): return 24 def remote_fail(self): raise ValueError("you asked me to fail") + def remote_fail_remotely(self, target): + return target.callRemote("fail") + def remote_failstring(self): raise "string exceptions are annoying" diff --git a/src/foolscap/foolscap/test/test_call.py b/src/foolscap/foolscap/test/test_call.py index 79e19beb..745f9ed5 100644 --- a/src/foolscap/foolscap/test/test_call.py +++ b/src/foolscap/foolscap/test/test_call.py @@ -143,6 +143,20 @@ class TestCall(TargetMixin, unittest.TestCase): self.failUnless(f.check("string exceptions are annoying"), "wrong exception type: %s" % f) + def testCopiedFailure(self): + # A calls B, who calls C. C fails. B gets a CopiedFailure and reports + # it back to A. What does a get? + rr, target = self.setupTarget(TargetWithoutInterfaces()) + d = rr.callRemote("fail_remotely", target) + def _check(f): + # f should be a CopiedFailure + self.failUnless(isinstance(f, failure.Failure), + "Hey, we didn't fail: %s" % f) + self.failUnless(f.check(ValueError), + "wrong exception type: %s" % f) + self.failUnlessSubstring("you asked me to fail", f.value) + d.addBoth(_check) + return d def testCall2(self): # server end uses an interface this time, but not the client end diff --git a/src/foolscap/foolscap/test/test_copyable.py b/src/foolscap/foolscap/test/test_copyable.py index f9df5e9c..5830e598 100644 --- a/src/foolscap/foolscap/test/test_copyable.py +++ b/src/foolscap/foolscap/test/test_copyable.py @@ -1,6 +1,6 @@ from twisted.trial import unittest -from twisted.python import components, failure +from twisted.python import components, failure, reflect from foolscap.test.common import TargetMixin, HelperTarget from foolscap import copyable, tokens @@ -121,13 +121,15 @@ class Copyable(TargetMixin, unittest.TestCase): def _testFailure1_1(self, (f,)): #print "CopiedFailure is:", f #print f.__dict__ - self.failUnlessEqual(f.type, "exceptions.RuntimeError") + self.failUnlessEqual(reflect.qual(f.type), "exceptions.RuntimeError") + self.failUnless(f.check, RuntimeError) self.failUnlessEqual(f.value, "message here") self.failUnlessEqual(f.frames, []) self.failUnlessEqual(f.tb, None) self.failUnlessEqual(f.stack, []) # there should be a traceback - self.failUnless(f.traceback.find("raise RuntimeError") != -1) + self.failUnless(f.traceback.find("raise RuntimeError") != -1, + "no 'raise RuntimeError' in '%s'" % (f.traceback,)) def testFailure2(self): self.callingBroker.unsafeTracebacks = False @@ -141,7 +143,8 @@ class Copyable(TargetMixin, unittest.TestCase): def _testFailure2_1(self, (f,)): #print "CopiedFailure is:", f #print f.__dict__ - self.failUnlessEqual(f.type, "exceptions.RuntimeError") + self.failUnlessEqual(reflect.qual(f.type), "exceptions.RuntimeError") + self.failUnless(f.check, RuntimeError) self.failUnlessEqual(f.value, "message here") self.failUnlessEqual(f.frames, []) self.failUnlessEqual(f.tb, None) diff --git a/src/foolscap/foolscap/test/test_gifts.py b/src/foolscap/foolscap/test/test_gifts.py index db6dfd78..02a394ec 100644 --- a/src/foolscap/foolscap/test/test_gifts.py +++ b/src/foolscap/foolscap/test/test_gifts.py @@ -1,12 +1,15 @@ from zope.interface import implements from twisted.trial import unittest -from twisted.internet import defer -from twisted.internet.error import ConnectionDone, ConnectionLost +from twisted.internet import defer, protocol, reactor +from twisted.internet.error import ConnectionDone, ConnectionLost, \ + ConnectionRefusedError +from twisted.python import failure from foolscap import Tub, UnauthenticatedTub, RemoteInterface, Referenceable -from foolscap.referenceable import RemoteReference +from foolscap.referenceable import RemoteReference, SturdyRef from foolscap.test.common import HelperTarget, RIHelper from foolscap.eventual import flushEventualQueue +from foolscap.tokens import BananaError, NegotiationError crypto_available = False try: @@ -38,18 +41,13 @@ class ConstrainedHelper(Referenceable): def remote_set(self, obj): self.obj = obj -class Gifts(unittest.TestCase): - # Here we test the three-party introduction process as depicted in the - # classic Granovetter diagram. Alice has a reference to Bob and another - # one to Carol. Alice wants to give her Carol-reference to Bob, by - # including it as the argument to a method she invokes on her - # Bob-reference. +class Base: debug = False def setUp(self): - self.services = [GoodEnoughTub(), GoodEnoughTub(), GoodEnoughTub()] - self.tubA, self.tubB, self.tubC = self.services + self.services = [GoodEnoughTub() for i in range(4)] + self.tubA, self.tubB, self.tubC, self.tubD = self.services for s in self.services: s.startService() l = s.listenOn("tcp:0:interface=127.0.0.1") @@ -63,9 +61,9 @@ class Gifts(unittest.TestCase): def createCharacters(self): self.alice = HelperTarget("alice") self.bob = HelperTarget("bob") - self.bob_url = self.tubB.registerReference(self.bob) + self.bob_url = self.tubB.registerReference(self.bob, "bob") self.carol = HelperTarget("carol") - self.carol_url = self.tubC.registerReference(self.carol) + self.carol_url = self.tubC.registerReference(self.carol, "carol") # cindy is Carol's little sister. She doesn't have a phone, but # Carol might talk about her anyway. self.cindy = HelperTarget("cindy") @@ -75,6 +73,8 @@ class Gifts(unittest.TestCase): self.clarisse = HelperTarget("clarisse") self.colette = HelperTarget("colette") self.courtney = HelperTarget("courtney") + self.dave = HelperTarget("dave") + self.dave_url = self.tubD.registerReference(self.dave, "dave") def createInitialReferences(self): # we must start by giving Alice a reference to both Bob and Carol. @@ -90,49 +90,80 @@ class Gifts(unittest.TestCase): def _aliceGotCarol(acarol): if self.debug: print "Alice got carol" self.acarol = acarol # Alice's reference to Carol + d = self.tubB.getReference(self.dave_url) + return d d.addCallback(_aliceGotCarol) + def _bobGotDave(bdave): + self.bdave = bdave + d.addCallback(_bobGotDave) return d def createMoreReferences(self): # give Alice references to Carol's sisters dl = [] - url = self.tubC.registerReference(self.charlene) + url = self.tubC.registerReference(self.charlene, "charlene") d = self.tubA.getReference(url) def _got_charlene(rref): self.acharlene = rref d.addCallback(_got_charlene) dl.append(d) - url = self.tubC.registerReference(self.christine) + url = self.tubC.registerReference(self.christine, "christine") d = self.tubA.getReference(url) def _got_christine(rref): self.achristine = rref d.addCallback(_got_christine) dl.append(d) - url = self.tubC.registerReference(self.clarisse) + url = self.tubC.registerReference(self.clarisse, "clarisse") d = self.tubA.getReference(url) def _got_clarisse(rref): self.aclarisse = rref d.addCallback(_got_clarisse) dl.append(d) - url = self.tubC.registerReference(self.colette) + url = self.tubC.registerReference(self.colette, "colette") d = self.tubA.getReference(url) def _got_colette(rref): self.acolette = rref d.addCallback(_got_colette) dl.append(d) - url = self.tubC.registerReference(self.courtney) + url = self.tubC.registerReference(self.courtney, "courtney") d = self.tubA.getReference(url) def _got_courtney(rref): self.acourtney = rref d.addCallback(_got_courtney) dl.append(d) + return defer.DeferredList(dl) + def shouldFail(self, res, expected_failure, which, substring=None): + # attach this with: + # d = something() + # d.addBoth(self.shouldFail, IndexError, "something") + # the 'which' string helps to identify which call to shouldFail was + # triggered, since certain versions of Twisted don't display this + # very well. + + if isinstance(res, failure.Failure): + res.trap(expected_failure) + if substring: + self.failUnless(substring in str(res), + "substring '%s' not in '%s'" + % (substring, str(res))) + else: + self.fail("%s was supposed to raise %s, not get '%s'" % + (which, expected_failure, res)) + +class Gifts(Base, unittest.TestCase): + # Here we test the three-party introduction process as depicted in the + # classic Granovetter diagram. Alice has a reference to Bob and another + # one to Carol. Alice wants to give her Carol-reference to Bob, by + # including it as the argument to a method she invokes on her + # Bob-reference. + def testGift(self): #defer.setDebugging(True) self.createCharacters() @@ -164,7 +195,6 @@ class Gifts(unittest.TestCase): d.addCallback(_carolCalled) return d - def testImplicitGift(self): # in this test, Carol was registered in her Tub (using # registerReference), but Cindy was not. Alice is given a reference @@ -226,6 +256,42 @@ class Gifts(unittest.TestCase): d.addCallback(_carolAndCindyCalled) return d + # test gifts in return values too + + def testReturn(self): + self.createCharacters() + d = self.createInitialReferences() + def _introduce(res): + self.bob.obj = self.bdave + return self.abob.callRemote("get") + d.addCallback(_introduce) + def _check(adave): + # this ought to be a RemoteReference to dave, usable by alice + self.failUnless(isinstance(adave, RemoteReference)) + return adave.callRemote("set", 12) + d.addCallback(_check) + def _check2(res): + self.failUnlessEqual(self.dave.obj, 12) + d.addCallback(_check2) + return d + + def testReturnInContainer(self): + self.createCharacters() + d = self.createInitialReferences() + def _introduce(res): + self.bob.obj = {"foo": [(set([self.bdave]),)]} + return self.abob.callRemote("get") + d.addCallback(_introduce) + def _check(obj): + adave = list(obj["foo"][0][0])[0] + # this ought to be a RemoteReference to dave, usable by alice + self.failUnless(isinstance(adave, RemoteReference)) + return adave.callRemote("set", 12) + d.addCallback(_check) + def _check2(res): + self.failUnlessEqual(self.dave.obj, 12) + d.addCallback(_check2) + return d def testOrdering(self): self.createCharacters() @@ -303,9 +369,11 @@ class Gifts(unittest.TestCase): def create_constrained_characters(self): self.alice = HelperTarget("alice") self.bob = ConstrainedHelper("bob") - self.bob_url = self.tubB.registerReference(self.bob) + self.bob_url = self.tubB.registerReference(self.bob, "bob") self.carol = HelperTarget("carol") - self.carol_url = self.tubC.registerReference(self.carol) + self.carol_url = self.tubC.registerReference(self.carol, "carol") + self.dave = HelperTarget("dave") + self.dave_url = self.tubD.registerReference(self.dave, "dave") def test_constraint(self): self.create_constrained_characters() @@ -319,6 +387,8 @@ class Gifts(unittest.TestCase): d.addCallback(_checkBob) return d + + # this was used to alice's reference to carol (self.acarol) appeared in # alice's gift table at the right time, to make sure that the # RemoteReference is kept alive while the gift is in transit. The whole @@ -359,3 +429,127 @@ class Gifts(unittest.TestCase): d.addCallback(lambda res: d1) return d + +class Bad(Base, unittest.TestCase): + + # if the recipient cannot claim their gift, the caller should see an + # errback. + + def setUp(self): + if not crypto_available: + raise unittest.SkipTest("crypto not available") + Base.setUp(self) + + def test_swissnum(self): + self.createCharacters() + d = self.createInitialReferences() + d.addCallback(lambda res: self.tubA.getReference(self.dave_url)) + def _introduce(adave): + # now break the gift to insure that Bob is unable to claim it. + # The first way to do this is to simple mangle the swissnum, + # which will result in a failure in remote_getReferenceByName. + # NOTE: this will have to change when we modify the way gifts are + # referenced, since tracker.url is scheduled to go away. + r = SturdyRef(adave.tracker.url) + r.name += ".MANGLED" + adave.tracker.url = r.getURL() + return self.acarol.callRemote("set", adave) + d.addCallback(_introduce) + d.addBoth(self.shouldFail, KeyError, "Bad.test_swissnum") + # make sure we can still talk to Carol, though + d.addCallback(lambda res: self.acarol.callRemote("set", 14)) + d.addCallback(lambda res: self.failUnlessEqual(self.carol.obj, 14)) + return d + test_swissnum.timeout = 10 + + def test_tubid(self): + self.createCharacters() + d = self.createInitialReferences() + d.addCallback(lambda res: self.tubA.getReference(self.dave_url)) + def _introduce(adave): + # The second way is to mangle the tubid, which will result in a + # failure during negotiation. NOTE: this will have to change when + # we modify the way gifts are referenced, since tracker.url is + # scheduled to go away. + r = SturdyRef(adave.tracker.url) + r.tubID += ".MANGLED" + adave.tracker.url = r.getURL() + return self.acarol.callRemote("set", adave) + d.addCallback(_introduce) + d.addBoth(self.shouldFail, BananaError, "Bad.test_tubid", + "unknown TubID") + return d + test_tubid.timeout = 10 + + def test_location(self): + self.createCharacters() + d = self.createInitialReferences() + d.addCallback(lambda res: self.tubA.getReference(self.dave_url)) + def _introduce(adave): + # The third way is to mangle the location hints, which will + # result in a failure during negotiation as it attempts to + # establish a TCP connection. + r = SturdyRef(adave.tracker.url) + # highly unlikely that there's anything listening on this port + r.locationHints = ["127.0.0.47:1"] + adave.tracker.url = r.getURL() + return self.acarol.callRemote("set", adave) + d.addCallback(_introduce) + d.addBoth(self.shouldFail, ConnectionRefusedError, "Bad.test_location") + return d + test_location.timeout = 10 + + def test_hang(self): + f = protocol.Factory() + f.protocol = protocol.Protocol # ignores all input + p = reactor.listenTCP(0, f, interface="127.0.0.1") + self.createCharacters() + d = self.createInitialReferences() + d.addCallback(lambda res: self.tubA.getReference(self.dave_url)) + def _introduce(adave): + # The next form of mangling is to connect to a port which never + # responds, which could happen if a firewall were silently + # dropping the TCP packets. We can't accurately simulate this + # case, but we can connect to a port which accepts the connection + # and then stays silent. This should trigger the overall + # connection timeout. + r = SturdyRef(adave.tracker.url) + r.locationHints = ["127.0.0.1:%d" % p.getHost().port] + adave.tracker.url = r.getURL() + self.tubD.options['connect_timeout'] = 2 + return self.acarol.callRemote("set", adave) + d.addCallback(_introduce) + d.addBoth(self.shouldFail, NegotiationError, "Bad.test_hang", + "no connection established within client timeout") + def _stop_listening(res): + d1 = p.stopListening() + def _done_listening(x): + return res + d1.addCallback(_done_listening) + return d1 + d.addBoth(_stop_listening) + return d + test_hang.timeout = 10 + + + def testReturn_swissnum(self): + self.createCharacters() + d = self.createInitialReferences() + def _introduce(res): + # now break the gift to insure that Alice is unable to claim it. + # The first way to do this is to simple mangle the swissnum, + # which will result in a failure in remote_getReferenceByName. + # NOTE: this will have to change when we modify the way gifts are + # referenced, since tracker.url is scheduled to go away. + r = SturdyRef(self.bdave.tracker.url) + r.name += ".MANGLED" + self.bdave.tracker.url = r.getURL() + self.bob.obj = self.bdave + return self.abob.callRemote("get") + d.addCallback(_introduce) + d.addBoth(self.shouldFail, KeyError, "Bad.testReturn_swissnum") + # make sure we can still talk to Bob, though + d.addCallback(lambda res: self.abob.callRemote("set", 14)) + d.addCallback(lambda res: self.failUnlessEqual(self.bob.obj, 14)) + return d + testReturn_swissnum.timeout = 10 diff --git a/src/foolscap/foolscap/test/test_interfaces.py b/src/foolscap/foolscap/test/test_interfaces.py index ba3001bf..29c3d810 100644 --- a/src/foolscap/foolscap/test/test_interfaces.py +++ b/src/foolscap/foolscap/test/test_interfaces.py @@ -121,7 +121,7 @@ class TestInterface(TargetMixin, unittest.TestCase): for i in range(len(s)): line = s[i] #print line - if ("test/test_interfaces.py" in line + if ("test_interfaces.py" in line and i+1 < len(s) and "rr.callRemote" in s[i+1]): return # all good diff --git a/src/foolscap/foolscap/test/test_pb.py b/src/foolscap/foolscap/test/test_pb.py index a1cc1645..4d69a554 100644 --- a/src/foolscap/foolscap/test/test_pb.py +++ b/src/foolscap/foolscap/test/test_pb.py @@ -7,7 +7,7 @@ if False: from twisted.python import log log.startLogging(sys.stderr) -from twisted.python import failure, log +from twisted.python import failure, log, reflect from twisted.internet import defer from twisted.trial import unittest @@ -117,6 +117,7 @@ class TestAnswer(unittest.TestCase): req = TestRequest(12) self.broker.addRequest(req) u = self.newUnslicer() + u.start(0) u.checkToken(INT, 0) u.receiveChild(12) # causes broker.getRequest u.checkToken(STRING, 8) @@ -130,6 +131,7 @@ class TestAnswer(unittest.TestCase): req.setConstraint(IConstraint(str)) self.broker.addRequest(req) u = self.newUnslicer() + u.start(0) u.checkToken(INT, 0) u.receiveChild(12) # causes broker.getRequest u.checkToken(STRING, 15) @@ -617,7 +619,7 @@ class TestService(unittest.TestCase): return d testBadMethod2.timeout = 5 def _testBadMethod2_eb(self, f): - self.failUnlessEqual(f.type, 'exceptions.AttributeError') + self.failUnlessEqual(reflect.qual(f.type), 'exceptions.AttributeError') self.failUnlessSubstring("TargetWithoutInterfaces", f.value) self.failUnlessSubstring(" has no attribute 'remote_missing'", f.value) diff --git a/src/foolscap/foolscap/test/test_reference.py b/src/foolscap/foolscap/test/test_reference.py new file mode 100644 index 00000000..930f593c --- /dev/null +++ b/src/foolscap/foolscap/test/test_reference.py @@ -0,0 +1,71 @@ + +from zope.interface import implements +from twisted.trial import unittest +from twisted.python import failure +from foolscap.ipb import IRemoteReference +from foolscap.test.common import HelperTarget, Target +from foolscap.eventual import flushEventualQueue + +class Remote: + implements(IRemoteReference) + pass + + +class LocalReference(unittest.TestCase): + def tearDown(self): + return flushEventualQueue() + + def ignored(self): + pass + + def test_remoteReference(self): + r = Remote() + rref = IRemoteReference(r) + self.failUnlessIdentical(r, rref) + + def test_callRemote(self): + t = HelperTarget() + t.obj = None + rref = IRemoteReference(t) + marker = rref.notifyOnDisconnect(self.ignored, "args", kwargs="foo") + rref.dontNotifyOnDisconnect(marker) + d = rref.callRemote("set", 12) + # the callRemote should be put behind an eventual-send + self.failUnlessEqual(t.obj, None) + def _check(res): + self.failUnlessEqual(t.obj, 12) + self.failUnlessEqual(res, True) + d.addCallback(_check) + return d + + def test_callRemoteOnly(self): + t = HelperTarget() + t.obj = None + rref = IRemoteReference(t) + rc = rref.callRemoteOnly("set", 12) + self.failUnlessEqual(rc, None) + + def shouldFail(self, res, expected_failure, which, substring=None): + # attach this with: + # d = something() + # d.addBoth(self.shouldFail, IndexError, "something") + # the 'which' string helps to identify which call to shouldFail was + # triggered, since certain versions of Twisted don't display this + # very well. + + if isinstance(res, failure.Failure): + res.trap(expected_failure) + if substring: + self.failUnless(substring in str(res), + "substring '%s' not in '%s'" + % (substring, str(res))) + else: + self.fail("%s was supposed to raise %s, not get '%s'" % + (which, expected_failure, res)) + + def test_fail(self): + t = Target() + d = IRemoteReference(t).callRemote("fail") + d.addBoth(self.shouldFail, ValueError, "test_fail", + "you asked me to fail") + return d diff --git a/src/foolscap/foolscap/test/test_tub.py b/src/foolscap/foolscap/test/test_tub.py index 8f36a524..9bd6f73e 100644 --- a/src/foolscap/foolscap/test/test_tub.py +++ b/src/foolscap/foolscap/test/test_tub.py @@ -11,7 +11,7 @@ try: except ImportError: pass -from foolscap import Tub, UnauthenticatedTub +from foolscap import Tub, UnauthenticatedTub, SturdyRef, Referenceable from foolscap.referenceable import RemoteReference from foolscap.eventual import eventually, flushEventualQueue from foolscap.test.common import HelperTarget, TargetMixin @@ -117,3 +117,93 @@ class QueuedStartup(TargetMixin, unittest.TestCase): eventually(t1.startService) return d + +class NameLookup(TargetMixin, unittest.TestCase): + + # test registerNameLookupHandler + + def setUp(self): + TargetMixin.setUp(self) + self.tubA, self.tubB = [GoodEnoughTub(), GoodEnoughTub()] + self.services = [self.tubA, self.tubB] + self.tubA.startService() + self.tubB.startService() + l = self.tubB.listenOn("tcp:0:interface=127.0.0.1") + self.tubB.setLocation("127.0.0.1:%d" % l.getPortnum()) + self.url_on_b = self.tubB.registerReference(Referenceable()) + self.lookups = [] + self.lookups2 = [] + self.names = {} + self.names2 = {} + + def tearDown(self): + d = TargetMixin.tearDown(self) + def _more(res): + return defer.DeferredList([s.stopService() for s in self.services]) + d.addCallback(_more) + d.addCallback(flushEventualQueue) + return d + + def lookup(self, name): + self.lookups.append(name) + return self.names.get(name, None) + + def lookup2(self, name): + self.lookups2.append(name) + return self.names2.get(name, None) + + def testNameLookup(self): + t1 = HelperTarget() + t2 = HelperTarget() + self.names["foo"] = t1 + self.names2["bar"] = t2 + self.names2["baz"] = t2 + self.tubB.registerNameLookupHandler(self.lookup) + self.tubB.registerNameLookupHandler(self.lookup2) + # hack up a new furl pointing at the same tub but with a name that + # hasn't been registered. + s = SturdyRef(self.url_on_b) + s.name = "foo" + + d = self.tubA.getReference(s) + + def _check(res): + self.failUnless(isinstance(res, RemoteReference)) + self.failUnlessEqual(self.lookups, ["foo"]) + # the first lookup should short-circuit the process + self.failUnlessEqual(self.lookups2, []) + self.lookups = []; self.lookups2 = [] + s.name = "bar" + return self.tubA.getReference(s) + d.addCallback(_check) + + def _check2(res): + self.failUnless(isinstance(res, RemoteReference)) + # if the first lookup fails, the second handler should be asked + self.failUnlessEqual(self.lookups, ["bar"]) + self.failUnlessEqual(self.lookups2, ["bar"]) + self.lookups = []; self.lookups2 = [] + # make sure that loopbacks use this too + return self.tubB.getReference(s) + d.addCallback(_check2) + + def _check3(res): + self.failUnless(isinstance(res, RemoteReference)) + self.failUnlessEqual(self.lookups, ["bar"]) + self.failUnlessEqual(self.lookups2, ["bar"]) + self.lookups = []; self.lookups2 = [] + # and make sure we can de-register handlers + self.tubB.unregisterNameLookupHandler(self.lookup) + s.name = "baz" + return self.tubA.getReference(s) + d.addCallback(_check3) + + def _check4(res): + self.failUnless(isinstance(res, RemoteReference)) + self.failUnlessEqual(self.lookups, []) + self.failUnlessEqual(self.lookups2, ["baz"]) + self.lookups = []; self.lookups2 = [] + d.addCallback(_check4) + + return d + diff --git a/src/foolscap/foolscap/test/test_util.py b/src/foolscap/foolscap/test/test_util.py new file mode 100644 index 00000000..c98c976c --- /dev/null +++ b/src/foolscap/foolscap/test/test_util.py @@ -0,0 +1,92 @@ + +from twisted.trial import unittest +from twisted.internet import defer +from twisted.python import failure +from foolscap import util, eventual + + +class AsyncAND(unittest.TestCase): + def setUp(self): + self.fired = False + self.failed = False + + def callback(self, res): + self.fired = True + def errback(self, res): + self.failed = True + + def attach(self, d): + d.addCallbacks(self.callback, self.errback) + return d + + def shouldNotFire(self, ignored=None): + self.failIf(self.fired) + self.failIf(self.failed) + def shouldFire(self, ignored=None): + self.failUnless(self.fired) + self.failIf(self.failed) + def shouldFail(self, ignored=None): + self.failUnless(self.failed) + self.failIf(self.fired) + + def tearDown(self): + return eventual.flushEventualQueue() + + def test_empty(self): + self.attach(util.AsyncAND([])) + self.shouldFire() + + def test_simple(self): + d1 = eventual.fireEventually(None) + a = util.AsyncAND([d1]) + self.attach(a) + a.addBoth(self.shouldFire) + return a + + def test_two(self): + d1 = defer.Deferred() + d2 = defer.Deferred() + self.attach(util.AsyncAND([d1, d2])) + self.shouldNotFire() + d1.callback(1) + self.shouldNotFire() + d2.callback(2) + self.shouldFire() + + def test_one_failure_1(self): + d1 = defer.Deferred() + d2 = defer.Deferred() + self.attach(util.AsyncAND([d1, d2])) + self.shouldNotFire() + d1.callback(1) + self.shouldNotFire() + d2.errback(RuntimeError()) + self.shouldFail() + + def test_one_failure_2(self): + d1 = defer.Deferred() + d2 = defer.Deferred() + self.attach(util.AsyncAND([d1, d2])) + self.shouldNotFire() + d1.errback(RuntimeError()) + self.shouldFail() + d2.callback(1) + self.shouldFail() + + def test_two_failure(self): + d1 = defer.Deferred() + d2 = defer.Deferred() + self.attach(util.AsyncAND([d1, d2])) + def _should_fire(res): + self.failIf(isinstance(res, failure.Failure)) + def _should_fail(f): + self.failUnless(isinstance(f, failure.Failure)) + d1.addBoth(_should_fire) + d2.addBoth(_should_fail) + self.shouldNotFire() + d1.errback(RuntimeError()) + self.shouldFail() + d2.errback(RuntimeError()) + self.shouldFail() + + diff --git a/src/foolscap/foolscap/util.py b/src/foolscap/foolscap/util.py new file mode 100644 index 00000000..367d7086 --- /dev/null +++ b/src/foolscap/foolscap/util.py @@ -0,0 +1,52 @@ + +from twisted.internet import defer + + +class AsyncAND(defer.Deferred): + """Like DeferredList, but results are discarded and failures handled + in a more convenient fashion. + + Create me with a list of Deferreds. I will fire my callback (with None) + if and when all of my component Deferreds fire successfully. I will fire + my errback when and if any of my component Deferreds errbacks, in which + case I will absorb the failure. If a second Deferred errbacks, I will not + absorb that failure. + + This means that you can put a bunch of Deferreds together into an + AsyncAND and then forget about them. If all succeed, the AsyncAND will + fire. If one fails, that Failure will be propagated to the AsyncAND. If + multiple ones fail, the first Failure will go to the AsyncAND and the + rest will be left unhandled (and therefore logged). + """ + + def __init__(self, deferredList): + defer.Deferred.__init__(self) + + if not deferredList: + self.callback(None) + return + + self.remaining = len(deferredList) + self._fired = False + + for d in deferredList: + d.addCallbacks(self._cbDeferred, self._cbDeferred, + callbackArgs=(True,), errbackArgs=(False,)) + + def _cbDeferred(self, result, succeeded): + self.remaining -= 1 + if succeeded: + if not self._fired and self.remaining == 0: + # the last input has fired. We fire. + self._fired = True + self.callback(None) + return + else: + if not self._fired: + # the first Failure is carried into our output + self._fired = True + self.errback(result) + return None + else: + # second and later Failures are not absorbed + return result diff --git a/src/foolscap/misc/dapper/debian/changelog b/src/foolscap/misc/dapper/debian/changelog index 9699a6df..d9b00653 100644 --- a/src/foolscap/misc/dapper/debian/changelog +++ b/src/foolscap/misc/dapper/debian/changelog @@ -1,3 +1,9 @@ +foolscap (0.1.5) unstable; urgency=low + + * new release + + -- Brian Warner <warner@lothar.com> Tue, 07 Aug 2007 17:47:53 -0700 + foolscap (0.1.4) unstable; urgency=low * new release diff --git a/src/foolscap/misc/edgy/debian/changelog b/src/foolscap/misc/edgy/debian/changelog index 9699a6df..d9b00653 100644 --- a/src/foolscap/misc/edgy/debian/changelog +++ b/src/foolscap/misc/edgy/debian/changelog @@ -1,3 +1,9 @@ +foolscap (0.1.5) unstable; urgency=low + + * new release + + -- Brian Warner <warner@lothar.com> Tue, 07 Aug 2007 17:47:53 -0700 + foolscap (0.1.4) unstable; urgency=low * new release diff --git a/src/foolscap/misc/feisty/debian/changelog b/src/foolscap/misc/feisty/debian/changelog index 9699a6df..d9b00653 100644 --- a/src/foolscap/misc/feisty/debian/changelog +++ b/src/foolscap/misc/feisty/debian/changelog @@ -1,3 +1,9 @@ +foolscap (0.1.5) unstable; urgency=low + + * new release + + -- Brian Warner <warner@lothar.com> Tue, 07 Aug 2007 17:47:53 -0700 + foolscap (0.1.4) unstable; urgency=low * new release diff --git a/src/foolscap/misc/sarge/debian/changelog b/src/foolscap/misc/sarge/debian/changelog index 9699a6df..d9b00653 100644 --- a/src/foolscap/misc/sarge/debian/changelog +++ b/src/foolscap/misc/sarge/debian/changelog @@ -1,3 +1,9 @@ +foolscap (0.1.5) unstable; urgency=low + + * new release + + -- Brian Warner <warner@lothar.com> Tue, 07 Aug 2007 17:47:53 -0700 + foolscap (0.1.4) unstable; urgency=low * new release diff --git a/src/foolscap/misc/sid/debian/changelog b/src/foolscap/misc/sid/debian/changelog index 9699a6df..d9b00653 100644 --- a/src/foolscap/misc/sid/debian/changelog +++ b/src/foolscap/misc/sid/debian/changelog @@ -1,3 +1,9 @@ +foolscap (0.1.5) unstable; urgency=low + + * new release + + -- Brian Warner <warner@lothar.com> Tue, 07 Aug 2007 17:47:53 -0700 + foolscap (0.1.4) unstable; urgency=low * new release