[tahoe-dev] [tahoe-lafs] #867: use ipv6

Mon Feb 18 14:25:49 UTC 2013

Randall Mason <clashthebunny at gmail.com> writes:

No worriees on the delay - I realize there are more things to do than
hack on tahoe :-)

I've tried to trim to only what I'm replying to.

> What happens currently with 169.254.0.0/16addresses in Tahoe?  

I'm not 100% sure.  But it seems the way they work is that you only have
a 169.254.0.0/16 address on an interface when you don't have a global
address.  So no one would normally create an introducer like that, and
then one wouldn't be able to connect.  So in the
all-nodes-have-169.254-only case, it would work, but in the normal case
with a disconnected node, it wouldn't have any real effect.

(I have tried to use 169.254 on multiple interfaces at once.  This
doesn't really work, at least on NetBSD.)

> What about RFC1918 addresses?

I believe they are currently not treated specially.  But typically the
introducer will have a global address instead.  On the old pubgrid, we
saw nodes connect and advertise RFC1918 addresses, and I saw my client
trying to connect, and fail.   That was slightly annoying, but there was
no other way to get to that node, so it was just clutter in the
connection table and making me look bad to my ISP for sending them
packets to RFC1918 addresses, but it didn't stop anything from working
that could have worked.

> What about 127.0.0.0/8?  Are they deprioritized so that connections
> happen to them in the last case?

"tahoe create-introducer ." (and start) puts that in the furl, at the
end.  I think that's a bug, and maybe there should be a variable to use
127.0.0.1 instead of the global address.  I try to remember to remove
these.  But, 1) they are last and 2) connecting to that port on some
other machine will probably get a connection refused very fast.  So I
think this is messy, but it's not causing real trouble.

> What is the delay if the connection times out?

I think it's typically a minute or so.  On a NetBSD 5 system, it was 75s
(to try to open a TCP connection to an address that did not respond).

> Does Tahoe only connect in serial, as apposed to starting to open up x
> different connections and take the first one that connects?  Does
> Tahoe use the order of the address in the furl?  What's the current
> algorithm for IPv4 addresses and the justification for it?

I believe that the connections are tried in order, serially.
I would say that the current rule for putting v4 addresses in the
introducer furl from the node on create was not necessarily well thought
out, but has turned out to be ok enough to get revisited.   If someone
really did have a rationale, I'd like to hear it.

The typical behavior now is to query for AAAA and A records, and to try
the AAAA records in sequence, and then the A (e.g., ssh does this).
There is some sort according to local prefixes, I think.  The introducer
has explicit addresses for v6, so it makes sense to try each address in
order.  The pro-IPv6 position would be to list the v6 ones first.

>> In the v6 specifications, the intent of link-local addresses (in
>> fe80::/16) is that they are only used for things like neighbor discovery
>> and routing protocols.
>
> This is true, but in [RFC4291](http://tools.ietf.org/html/rfc4291#page-11)
> Link-Local is for three cases, the two sited, and when no routers are
> present:
>
>    Link-Local addresses are designed to be used for addressing on a
>    single link for purposes such as automatic address configuration,
>    neighbor discovery, or when no routers are present.
>
> It is not a requirement that Link-Local addresses are used, but it is one
> of the purposes of Link-Local addresses.  My use only falls in the third
> category, but I think that may be enough.

That's true, but it's only on a single link, and when no routers are
present.  So I could see using them if there are no global addresses.
Sending them as data gets into the scopeid hair, though, and I don't
think it's worth it.

>>  When you use a link-local address, the address itself does not
>> specify to the host stack which interface to use.  That's why you
>> have the %wm0 or whatever showing up in the display representation,
>> which is based on the ifindex being inserted in one of the bytes of
>> the address.  To have the address be used by another host, it has to
>> not have that, and has to insert its own index.  So passing them
>> around is non-obvious; typically a routing protocol will just record
>> the other side's address and reuse it - but there it has its *own*
>> ifindex, specifying the interface the packet arrived on, rather than
>> tha remote side's index.
>
> After a lot of research I get what you're saying.  I do filter off the
> ifindex that BSD includes in the ifconfig -a results so it looked familiar,
> but I didn't know how necessary it was on Linux.  Picturing a multihomed
> computer, say WiFi + Ethernet, there is no way to know WHICH fe80::
> interface we should use.

Note that your boxes with lo0, ethernet, and tunnel have 3 interfaces, 2
of which have reachable peers.

> Why is it that BSD and Windows don't seem to have
> a problem.  I ping6 fe80 addresses without an interface identifier on those
> two platforms and don't have a problem.  Even with multihomed devices.  Is
> this a deviation from the standards with Linux or some bizarre speed
> optimization?

That's odd.  It doesn't work that way on my NetBSD 6 box; you need the
scopeid.

> Why can't it ARP all interfaces?

It's not that it can't do NDP (v6 name for ARP :-) on all interfaces,
but rather that the packet has to be routed to an interface (since it
isn't multicast), and the address without a scopeid doesn't specify an
interface.

>> tahoe is designed primarily as a wide-area protocol; in the general
>> case all nodes are expected to be able to open a TCP connection to
>> the introducer, and then all clients are expected to open a
>> connection to all servers.  To do this sensibly, the introducer and
>> each server node need a globally-routable address.
>
> This is the main purpose for my work.  I think that the global nature
> of IPv6 and Tahoe are a match made in heaven.

True, but it's just an example of how the Internet has been broken
because of the notion of client-only systems created by NAT, and with v6
we can once again operate in the way that used to be normal.

> It will be a wonderful day when I can have a Mobile IPv6 my laptop and
> my phone and never think about which link to communicate over,
> multiple DNS entries for a single host, what network it's hooked up
> to, and which interface is actually connected right now.  At least
> right now it's light years ahead of NAT and port forwarding.  :-)

I have a dynamic tunnel for my notebook, which is not Mobile IPv6, but
it's easier and actually works today.

>> I should point out that my bias is that running tahoe only locally
>> doesn't make sense in the first place.  There are other filesystems
>> that deal with resilience against disk failure and some of them have
>> better performance.  The real point is to get resilience against
>> hazards that affect an entire site, so wide-area connectivity is IMHO
>> intrinsically tied to the main use case.
>
> This exactly.  If somebody thinks this is a replacement for ceph or
> RAID, they will be sorely disappointed with the speed and how it is a
> file store and not a file system.

See my other rants about the fuse interface :-)

>>   Other advantages are that they are not routed, so that they can be
>>   more "secret" than other addresses.  If you didn't want the world
>>   to know that you were using Tahoe, preferring more local over more
>>   remote addresses could be better.
>>
>> I think this is the usual security-through-obscurity issue, and I don't
>> think it makes sense.  Outsiders can no more tell that you are using
>> tahoe with global addresses on your ethernet than they can tell about
>> private addresses.
>
> True, but this isn't about security through obscurity, it's about security
> and while we're at it, why not obscurity as well.  There is nothing wrong
> with obscurity, until people think that it can replace security.  Security
> and obscurity is even better than just security.  For example I would never
> send unencrypted information about dissidents in an oppressive regime, but
> I'd be happy if I didn't have to spend the rest of my life in jail to not
> reveal the encryption keys.  If I can both protect the information, and
> conceal that I'm transmitting it, I'm better off.  There aren't causes
> worth giving one's life for...

OK, but I don't understand how using link-local on a private grid on a
LAN instead of global on that same LAN is any different.

>> I don't really understand your dual tunnel broker case.  With a
>> normal tunnel broker, you get a routable address for your end of the
>> tunnel (typically ::2 where the tunnel far end is ::1), and often an
>> entire /64.  So just using the routable addresses would work fine.
>> (I have 3 tunnels, one of which feeds a routed network of about 6
>> subnets carve up From a /48, used by my group of about 100 at work).
>
> It's a pretty far out case and I don't think I did a good job
> explaining it.  I would never design a network to work that way and
> I'm not that sad about punishing people who (ab)use IPv6 that way.
> This would also be a case where IPv4 should be used instead of IPv6,
> but the way other major IPv6 capable applications work, here's what
> would happen:
>
> Let's say that my site gives out real IPv4 addresses liberally, allows me
> to use tunnel brokers, but also forbids me running any DHCPv6 or router
> advertisements.  Because of this I need Hurricane Electric to give me two
> tunnels for two computers that I control: host1.mason.ch and host2.mason.ch.
> Google chrome on host2 looks up host1.mason.ch.  It gets a few address,
> let's say I have in DNS the ipv6 tunnel address for home 2001:123::1/64 and
> my work address 2001:1234::1/64, and an IPv4 address of 1.2.3.4.  Google
> Chrome does connect() to all different IPv6 addresses first, then IPv4
> address, and then keeps the first connection it gets.  If it were on the
> same IPv6 network as host1, it would not matter which address of those it
> connected to, but it isn't.  It's on a separate network because I NEED two
> tunnels.  Chrome (and other applications) are set up to prefer IPv6
> connectivity.  In this case, I will end up communicating over two tunnels
> and thousands of miles to get to the computer 2 meters of Ethernet cable
> away.  It would be nice if we could prioritize IPv6 addresses on a local
> computer, but we would need the whole routing table locally.  The way
> Chrome gets around this is that it just take the fastest response (which
> may not be the highest bandwidth response).  If I had included an fe80::
> address in my DNS, Google Chrome would connect to that first because it's
> the least latent link.  This is one idea behind including all possible
> addresses of the host in the furl.  If Tahoe will issue connects to each
> address in the list and choose the first response, or if Tahoe will issue
> connects to each in order of configuration, there is the possibility of
> getting the best behavior, and not just the most reliable.

I see.  But putting fe80: in DNS means other people in other places will
try to connect to it as well, which is unhelpful.  I would argue that
putting an address that is not globally routable in DNS is wrong.

In your tunnel case, what I recommend is to set up another tunnel
between the two host and add routes pointing across the tunnel.  (Or you
can run ripng which will advertise (over link-local) prefixes and set up
routes.  But people that won't let you run rtadvd/dhcp6 probably won't
like that either.)  The point of this approach  is that it works for all
programs with all normal behaviors, rather than having to do something
special at the application layer to work around bizarreness at the
network layer.

>>   If you bring up a host, or set of hosts, in an environment without
>>   a DHCP server, and no IPv6 router, and don't run Avahi/Bonjour the
>>   only address that you'll come up with is the fe80 address.  With them
>>   included, your tahoe cluster can be brought up and connected to
>>   without any configuration, without any infrastructure, it would even
>>   work with only a crossover cable.
>>
>> I really don't follow this.  You're saying that if you bring up 4
>> computers with no addressing, and then somehow figure out the
>> link-local address of the introducer, and put it in a furl and config
>> files, and then run tahoe, that this is somehow better than manually
>> configuring 4 addresses (which then makes all sorts of other things
>> easier)?  This is IMHO a degenerate case, and it seems odd to want to
>> add complexity to tahoe to support it.  The furl is already a
>> capability, so auto-discovery seems inconsistent with tahoe's
>> security goals.
>
> Link-local addresses can either be randomly generated, or deterministicly
> generated, or specifically assigned.  If there is no router on your
> network, and you want to run IPv6, you are only able to use FE80::
> addresses.

Per spec, maybe, but you can grab a global prefix (from a /48 you own)
and then statically configure addresses from it.  I think this is a
fringe case that is not of actual interest.

> With all the Ubuntu/Debian boxen that I have and the two OS X
> boxes that my wife owns, they are always fe80:2::0 & MAC address.  I don't
> know about internal representations of this in memory, but ifconfig -a is
> where I see it.  I don't ever need to figure out the Link-Local addresses
> of my computers because I know their mac addresses.

This is why I meant to suggest that if someone wants to manually put an
fe80:: in a introducer furl, and that when connecting to such an
introducer, fe80:: are included if there are no global addresses, or
something like that, could be reasonable.  But it will be more
complicated because of the scopeid issue, and I think it's a distraction
From support of useful tahoe configurations.

>> A further complexity is that when you look at your link-local address on
>> the introducer, you'll see (assuming a:b:c:d:e:f ethernet address)
>> something that looks like
>>   fe80::a:b:c:d:e:f%wm0
>> But if you grab that out of buffer, it will look something like
>>
>>   fe80::2:a:b:c:d:e:f
>>
>> assuming 2 is the ifindex of wm0.  I don't remember which byte is used,
>> but I did figure this out in NetBSD recently.  (If my job weren't
>> developing new network protocols (some of which do use link-local
>> addresses), I can't imagine I would have dug into this.)  This is a
>> local OS decision how to represent the interface.  BSD does this (as
>> implemented by KAME), and I am 95% sure Linux does essentially the same
>> thing.  So to make this work, the server has to send the fe80:: address
>> without the interface ID to an introducer which is on the *same link*,
>> and the introdcer can then send it to clients which are also on the same
>> link.
>
> As I said above, I only see the second address everywhere I've looked.  Is
> this due to an old KAME version in OS X?  And as I said above I also don't
> know what buffer it would be grabbed out of.

I meant if you do the ioctl to list addresses, and look at the bytes
instead of using the function that does the scopeid translation.

On a mac with 10.7, I see from ifconfig -a:

  inet6 fe80::aaaa:bbbb:cccc:dddd%en1 prefixlen 64 scopeid 0x4 

and ping6 to fe80:4::aaaa:bbbb:cccc:dddd not only answers but prints as
above, wtih %en1 *and the 4 removed in the print representation*.

>> But, how does the client know which interface to use to contact the
>> introducer?  So you need the client to take the actual LL address and
>> then add a e.g. %wm0 scopeid when they configure their client.
>>
> Strangely, on Linux, scopeid is required, on Windows and OS X, it just
> works.  Ping6 fe80:: addresses to your heart's desire.

It doesn't work on my mac; with the fe80:: address for en1 (my own,
therefore), without the scopeid, I get "no route to host".

>> So if you really want to use fe80:: addresses, I would say the following
>> should let you do everything which will actually work, and avoid
>> cluttering everyone else with them:
>>
>>   Users have to configure each client and server node with a LL address
>>   for the introducer, and put a %intfN scopeid on it.  Of course the
>>   introducer has to share a link with each node.
>>
>>   Nodes contact the introducer, and send their bare (no scopeid) LL
>>   address.  The introducer keeps track of the incoming scopeid (it could
>>   have multiple interfaces), and applies it.  When sending addresses to
>>   a node, it checks that the scopeid matches the interface over which
>>   that node, and if so sends the bare LL address, and if not does not
>>   (because the two nodes are on different interfaces and therefore
>>   different links and thus will to interoperate).
>
> This does seem like what would be needed.  The other option would be to run
> a connect() with each interface that's up as a source interface.  A little
> less scientific, but maybe functional.  Is scopeid somehow global?

I don't think it's good practice to try to connect to the wrong place.
This will result in spurious NDP packets on mulitple interfaces, and I
put that in the mess category.

> How does OS X and Windows get around not using it?

My theory is that they don't get around it.  Or on windows, there is
only one interface, and id 0 is "the interface".

The fact that handling LL seems to be different on different systems is
a further reason to avoid it (in terms of balancing complexity of doing
it right and gain).

> One of the reasons I'm pushing back on this is that it seems to kludge the
> code to just start filtering out addresses in a big if then statement.  The
> fe80 block is not an 4 bit-even block, so it's not even a simple if
> addr[:4] == statement.

It's not a kludge; it's how it is supposed to work.  In (NetBSD/KAME)
netinet6/in6.h:

  #define IN6_IS_ADDR_LINKLOCAL(a)	\
          (((a)->s6_addr[0] == 0xfe) && (((a)->s6_addr[1] & 0xc0) == 0x80))

There's also SITELOCAL, but that has been deprecated.
So there's no big if/switch, it's just take all the configured
addresses, and omit the LL ones.

> It also, architecturally, may not be at the Tahoe
> level that this should be filtered out.  It could be better put in
> foolscap.  The whole point of foolscap is to abstract away networking into
> a simple RPC statement.  That's one reason why adding so little to Tahoe
> got some functionality.

That's a fair point.  Nothing in foolscap has any business using LL
addresses, so the foolscap command to "get me all the addresses" should
omit them.  Then tahoe can ignore this.

> Would foolscap be able to hide this away for all hosts that don't just work
> with link local addresses?  Is it possible to just have foolscap figure out
> if Tahoe (or any foolscap service) is on any interface with that Link-Local
> address?

I'm really not sure.  But I think it does not make sense for any service
which is not *intrinsicially limited to one link* to ever use LL
addresses.

> I think I now understand why getaddrinfo returns a 4-tupple for IPv6 when
> it just spit out a 2-tupple for IPv4.  The documentation says that "Note,
> however, omission of *scopeid* can cause problems in manipulating scoped
> IPv6 addresses."  Such a small note, such a big issue.

I presume you mean the python-wrapped version of the libc
gettaddrinfo(3).  In the modern world, I think scopeids are limited to
LL.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20130218/9c98fa9e/attachment.asc>