tor hidden service endpoint designs

Sun May 11 19:52:03 UTC 2014

Hi Meejah,

Briefly changing subject to the IListeningPort interface:

http://foolscap.lothar.com/trac/ticket/203

please read trac comment #19 --->
http://foolscap.lothar.com/trac/ticket/203#comment:19

Reading Foolscap trac ticket 203 caused me to think about the exact
object implementing IListeningPort interface which is fired by the Tor
Hidden Service endpoint `listen` method. A common use case for TCP
servers is to create an endpoint object with port=0; which means
randomly assigned port. Then when IListeningPort is fired you grab
your host-port tuple from the object implementing IAddress returned
from the getHost method of the object implementing IListeningPort. You
may want to pass this information to other clients in the network so
that they can connect to your server. In my opinion the Tor Hidden
Service endpoint should not return the TCP specific IAddress object...
but instead it's own.. with only the needed information (onion and
port).

The twisted documentation specifies that the getHost method of the
IListeningPort should return an IAddress object. If we take a close
look at how various twisted endpoints work... we see that they have
their own objects which implement IListeningPort and IAddress where
appropriate. I'm working on a simple changeset which will cause the
tor hidden service endpoint `listen` method to fire a Tor Onion
address specific object which implements the IListeningPort
interface... and whose getHost method will return a proper Tor Onion
address specific object which implements the IAddress interface.

My unfinished branch implementing these interfaces is here:
https://github.com/david415/txtorcon/tree/endpoint_parser_plugin-rewrite3_tor_onion_address

> Now I'm a little confused: doesn't the txsocksx client-endpoint do "TCP
> over Tor" connections? That is, client-side connections? It shouldn't
> really care about which Tor instance it's using (except, of course, that
> you don't want to launch lots of them), right?
> I guess maybe you mean you want "launch if 0 tors, re-use if 1" logic
> here too?

No... not exactly.

In my previous e-mail I was slightly wrong and unclear when I
mentioned a couple of things...
1. A Tahoe-LAFS introducer or storage node does listen on multiple
ports... but only for "onion-grids" it will only need one Tor Hidden
Service onion address. Maybe it will be desirable to have a second
onion for the diagnostic "web port listener"...
2. I did not mean to imply that the txsocksx tor client endpoint
should launch tor. It should not. What it currently does is try a list
of local tcp ports that tor might have it's socks port listening. 9050
is the default tor socks port and 9150 the tor browser bundle default
socks port:
socks_ports_to_try = [9050, 9150]

What I would like to do is have the txsocksx tor client endpoint
optionally (if txtorcon is installed) check for a
txtorcon-launched-tor socks port and add it to the begining of
`socks_ports_to_try`... so that it tries the txtorcon tor socks port
first. If fail then retry with the next port in the list. Repeat until
`socks_ports_to_try` list is consumed.

This should be good enough. Users (developers using txsocksx and
txtorcon) have 3 options:
1. launch their own tor proc using your launch_tor api call directly
2. launch tor by calling `listen` method on a Tor Hidden Service endpoint object
3. there system tor will be used if listening on default ports

In cases 1 and 2 the launching of tor could cause a txtorcon module
attribute to record the socks port and control port. These port
numbers can then be used later by txsocksx tor client endpoints and
txtorcon tor hidden service endpoints. Some applications may want many
outgoing tor connections, some may want many hidden services and some
may want both. In this way we can accommodate all these uses
efficiently.

> One thing I need to bear in mind a little as "the library author" is
> that I know some people will want a "private tor no matter what" API
> (which is of course still possible; just use txtorcon.launch_tor).

Yes... you are right that it should have this option.

> Perhaps providing a txtorcon API for "tell me all the tors you have
> launched so far" would be a good thing. This would also be useful for
> use-cases where you launch lots of tor instances (e.g. things like
> Chutney, which launched a tor test network).

Yeah I agree that txtorcon should accommodate this use-case.
Let's try to clearly work out a good way to do this while still
implementing the above mentioned logic for also ensuring we have only
one instance of tor per python process. I think one instance of Tor
per python process should be the simple default way...

> I think we've already rejected overloading controlPort= for two
> different use-cases, right?

Yeah... Tahoe-LAFS and Foolscap will never need to mess with the controlPort.
Most uses of the endpoint class will not ever need to set the control
port... but someone out there
might want the ability to set it... so we should give it to them but
keep our sane defaults.

The tor hidden service endpoint just needs to either use the system
tor's controlPort or launch it's own tor proc with
a control port of it's choosing. It could either choose the control
port by using the user specified control port... or if unspecified it
can select an unused tcp port to listen on.

> So how about somthing like this:
>
> 1. if controlPort is specified, and we can connect to it, great: we use
>    that Tor (if we can't connect, it's an error and things fail)

Precisely.

> 2. if controlPort is NOT specified, we do some "maybe auto launch"
>    logic:
>
>     a. if our list of launched tors isn't empty, we use the first one
>     b. if there's no launched tors yet, we launch one (and remember it)

Yes. Correct.

> I'm still a little tempted to do something with overloading controlPort=
> or adding another option so people get explicit control over whether a
> new tor is launched or not ("launch=auto" which would be the default, as
> above versus "launch=private" or "launch=never" maybe?)

Yes I totally agree that there will be other weird use cases that
txtorcon should provide...
however we should also keep in mind that these other ways may possibly break
the beautiful programming interfaces and patterns provided by Twisted.
Therefore the "proper" twisted way could be the default... and the
other stuff could be optional.

> [ Thanks for the explanation of your usecase :) ]

Sure thing! I know it is very different to the way most people are
using Twisted networking interfaces... but I believe this to be the
most "correct" way to use Twisted for a large complex application like
Tahoe-LAFS. Otherwise making Tahoe-LAFS Tor friendly would be much
more difficult.

> Also, thanks for your work on this!

Sure thing... it is my pleasure to be able to help out!

> I'm really glad we're getting this into txtorcon

Yes... and by the way: Thanks for writing txtorcon!

OK... sounds like we both have code to write.
Let's worry about resolving merge conflicts later.
Perhaps Leif will chime in with some comments or code later.

I'd like to think that we are no longer "bike-shedding"... that these
design decisions are actually really important because they affect
what use cases will be possible with the txtorcon endpoint.

;-)

Let's keep up the good work and write beautiful code and unit tests
for our design decisions!

Sincerely,

David