[tahoe-dev] help: how should you tell a web browser what name to use for a file?

Brian Warner warner-tahoe at allmydata.com
Sat May 10 18:19:16 UTC 2008


> One means of telling the browser to name a download is specified in
> HTTP/1.1:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1

(i.e. Content-Disposition: attachment; filename="allmydata-org.png")

.. and that's exactly what Tahoe does when you give it a query argument of
save=true, and it's insufficient to make the "Save Link As.." button work,
nor to make wget work, probably because both of these decide upon a filename
before ever talking to the server.

For concreteness, paste the following snippet of HTML into a file and view it
from your browser (or better yet use lwp's GET or wget or curl to do a fetch
and look at the response headers):


<ul>
  <li>
  <a href="http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Atuext4iyoc7fht3ryqby73rohe%3Athdtizipy746gvjnievhntn6nn2wkkvhnoo4kvbxpjr5cfk6kmwa%3A3%3A10%3A7628">1: unadorned URI</a>
  </li>
  <li>
  <a href="http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Atuext4iyoc7fht3ryqby73rohe%3Athdtizipy746gvjnievhntn6nn2wkkvhnoo4kvbxpjr5cfk6kmwa%3A3%3A10%3A7628?filename=allmydata-org.png">2: URI + filename=</a>
  </li>
  <li>
  <a href="http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Atuext4iyoc7fht3ryqby73rohe%3Athdtizipy746gvjnievhntn6nn2wkkvhnoo4kvbxpjr5cfk6kmwa%3A3%3A10%3A7628?filename=allmydata-org.png&save=true">3: URI + filename= + save=true</a>
  </li>
</ul>

Here are the use cases:

 * A: clicking on the link in your browser
  * B: if that shows you the document (i.e. if the document type can be
    displayed inline, like for .txt/.jpg/.html), use the "Save Page As"
    button.
 * C: point at the link, then use the "Save Link As" context menu item
 * D: wget LINK

1: The unadorned URI doesn't give the tahoe node enough information to
provide a useful Content-Type, so it always gives back text/plain. (There's a
good argument to make this application/octet-stream, or whatever it's called,
but I picked text/plain because that way I can see *something* in the
browser). Browsers usually respond to this by making you save the file
instead of trying to display it. 1A: my browser (firefox 2) says "You have
chosen to open URI... which is a: BIN file", and offers to save it, using a
filename that starts with the URI and ends with .htm . 1C does the same. 1D:
wget doesn't care about content-type, so it will just save the bytes to a
file named URI:..7628 . So in all cases, we wind up with a file named after
the URI, plus an extension that depends upon how misguidedly helpful the
browser attempted to be.

2: The URI + filename= gives tahoe an extension to work with, so it uses the
mime.types table to come up with a Content-Type. This gives browsers that
follow the link something to work with, so 2A will display the document. 2B
knows that it's a .png file, but doesn't know a useful filename, so it offers
to save it as "URI:...7628.png". 2C picks the filename before it follows the
link, and throws out the query string in the process, so it winds up with the
same "URI:...7628.htm" that 1C got. 2D: again, all wget cares about the name
after the last slash, so we get "URI:..7628?filename=allmydata-org.png".

3: Adding save= causes tahoe to add the Content-Disposition header with
"attachment" and a filename. This causes browsers to refuse to show the
document inline (even when it is perfectly capable of doing so, for
.txt/.html/.jpg) and instead prompt the user to choose a filename for saving.
3A works perfectly, at least on firefox2. But 3C gets the same annoying
filename as 2C and 1C, and 3D (wget) behaves just as badly as 2D and 1D. In
addition, this requires us to provide separate links for viewing a document
versus saving it, and one of my pet peeves is being forced to save a text
file and leave the browser to read it, rather than just viewing it inside the
browser. (e.g. overzealous mime.types which cause Trac attachments to appear
with type application/x-diff or application/x-python, unviewable within the
browser).

So the only thing I can think of that will let you use the same URL for all
of [viewing, "Save Page As", "Save Link As", and wget] is one that ends in
/allmydata-org.png .

> > ... I'd advise against the double slash hack; since
> > I've seen a lot of code in my day that reduces multiple slashes to
> > single
> > ones; as multiples are so easily created in shell scripts; e.g. $dir/
> > $name  
> 
> I strongly second this.  There are quite a few different rfcs on urls,
> but the original explicitly forbids this construct in the grammar.

So if zooko's //-marker trick is out-of-spec, then I think the *only* place
left to tell tahoe to ignore the last pathname component is at the beginning
of the URL, with /named or /file or /download or some other spelling of
option 1b/1c from ticket #221. And of these, I prefer 1b (in which this is
only used for file URIs, not for directory-URI-plus-subpath, to avoid the
visual awkwardness of http://named/DIRURI/subdir/foo.txt/foo.txt). And if
this is really limited to files and not DIRURI+subpath, then I think the
namespace indicator could express this: using /file or /file-as or
/named-file or /file-named or something. (/download isn't appropriate, since
this isn't just for downloading).


cheers,
 -Brian



More information about the tahoe-dev mailing list