[tahoe-dev] proposal for an HTTP-based storage protocol

Mon Sep 27 03:21:20 UTC 2010

On Sun, Sep 26, 2010 at 12:44 PM, Kevin Reid <kpreid at switchb.org> wrote:
> On Sep 26, 2010, at 12:03, Ravi Pinjala wrote:
>
>> The trouble with using the xmlns to identify the interface is that it
>> complicates parsing a bit - clients have to support separate formats
>> for each interface.
>
> Could you explain this further?
>

I was envisioning having just a simple key-value configuration system
for interfaces; arbitrary XML configuration would still be allowed,
under a new namespace, but I feel like the main namespace should
contain enough functionality to get basic stuff working. If every
interface is required (not just allowed) to do random stuff for
configuration, then people will do random weird stuff, and clients
will have to support that. (Okay, so it's a pretty weak argument.)

>>>> * URL of a document stored on the server:
>>>> http://server.address/data/foo/bar
>>>>
>>>> * URL of the metadata for said document:
>>>> http://server.address/metadata/foo/bar
>>>>
>>>> * Example of direct access to a metadata key:
>>>> http://server.address/metadata/foo/bar?mtime
>>>
>>> It should be explicitly part of the definition of the data and metadata
>>> modules that they define these path patterns (underneath the path= URL).
>>
>> Mmm, what do you mean? I'm not really seeing what you're saying here.
>
> It's one of the REST principles: the client should never construct a URL
> according to its own rules, but rather only use links and forms returned by
> the service. In this case since the objects are identified by pathnames it
> is natural for the "form" to be "use the pathname as a URL relative to the
> root specified for this interface in the discovery document", but this
> should be explicitly part of the specification of these interfaces rather
> than a global assumption that URLs are of the form /<interface path>/<file
> path>.
>
> I admit this is a particularly degenerate case of this principle, but I
> think following the principle is a good thing in principle. Ahem.
>

Oh, that's a sensible principle. The way I was imagining it, that was
already defined by the interface itself (i.e., the format for URLs
would be explicitly stated in the specification for that interface),
and clients would never blindly construct a URL for an interface they
didn't understand. There could certainly be interfaces which didn't
follow the same pattern of interface_path/file_path - I could see a
management interface being useful, to allow users to administer files
through a web interface, for example. So clients wouldn't be able to
safely construct a URL by tacking on the file path, unless they knew
for sure that the interface worked that way.

>> Do we actually need server-side verification of data? We already let
>> clients upload whatever they want to servers, as long as it's properly
>> formatted as a share.
>
> Yes, but they can't upload something that looks like a share of file A but
> actually contains some other content (unless they find a hash collision).
>

Ah! So if I'm understanding correctly: a client could try to overwrite
a share for an existing file, and the server currently prevents this
by verifying that the share is for the correct file? And if the server
doesn't verify the share, there's the possibility of a DoS if a
malicious client overwrites all shares for a given file?

Hrmm. I was hoping that the server could be treated as a dumb file
store as much as possible, to simplify things. I'll have to look more
into how much verification the server actually needs to do for things
to work.

--Ravi