added some notes and ideas

author: Christian Pointner <equinox@anytun.org> 2017-09-09 01:50:51 +0200
committer: Christian Pointner <equinox@anytun.org> 2017-09-09 01:50:51 +0200
commit: f501d1780e69821363045e427b8dcbc7351e0735 (patch)
tree: e918c71f633d199975afdc77fbb219ff510109b3
parent: added benchmarks for packet marshal and unmarshal (diff)
1 files changed, 176 insertions, 0 deletions
diff --git a/NOTES b/NOTES
new file mode 100644
index 0000000..e145c51
--- /dev/null
+++ b/NOTES
@@ -0,0 +1,176 @@
+Ideas and Notes from Brainstorming Sessions (2017-09-08)
+========================================================
+
+Protocol:
+~~~~~~~~~
+
+sender-id/mux:
+
+  We already discussed the possiblity to split up the mux in order to have
+  support for link-local OOB messages. The downside is that this reduces the
+  number of concurrent virtual connections...
+
+  New Idea: don't sub-assign parts of mux but reduce sender-id to 12 bit. This
+  frees up 4 bits for additional signaling.
+  The new header would look like this:
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                         sequence number                       |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |X ? ? ?|       sender ID       |              MUX              |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   X .. key exchange flag or unencrypted flag?
+   ? .. reserved
+
+
+Key-Exchange (inline):
+
+  Idea: Key-Exchange daemons can communicate with other side via link-local IPv6
+  addresses (works with tun and tap, at least on linux...)
+  If packets incoming on tun/tap interface are IPv6 and have a link-local source
+  or destination IP, messages are sent to the other side unecrypted and with the
+  X flag set.
+
+  Idea: use crypto role (server/client, left/right, alice/bob) for addressing
+
+  possible adressing scheme:
+
+     role(server,left,alice) -> fe80::xxxx:0:0:1:0/64
+     role(client,right,bob)  -> fe80::xxxx:0:0:0:<mux>/64
+
+      (xxxx is a well known number for SATP, i.e is always '5ADB')
+
+  Question: How to handle systems with IPv6 disabled? No inline Key-Exchange
+  support in that case? IPv4 Link-local Adresses only have a /16 range and we
+  would loose one mux value in that case (or 3 if we also omit network and
+  broadcast addresses -> not too bad...)
+
+  The advantage of the use of link-local addresses is that in that case the
+  key-exchange can use TCP from OS kernel which is already resilient against
+  packet duplication and does retransmits -> very nice for RAIL-mode which will
+  produce a lot of duplicates and probably still has packet loss.
+  Possible downside is that not all programs/key-exchange daemons support
+  link-local addresses -> write proxy application for that case!
+
+  An anycast receiver will send a "redirect" message when it receives a packet
+  with the X flag set on it's anycast address. This redirect will point to a
+  unicast address on the same host. This way key-exchanges can be sure they only
+  talk to a single host. For some key-exchanges it should be possible to send
+  early data with the initial packet and the "redirect" message to save some
+  round-trips.
+  I.e. Ikev2 needs two round trips to establish a SA. The first two messages can
+  be in the initial packet and the "redirect" message. The remaining 2 packets
+  will then be sent to the unicast address of the anycast host which guarantees
+  to reach an ikev2 daemon which has already seen the first part of the
+  handshake.
+  Does this work together with the IPv6 Link-Local address idea from above?
+
+
+  Question: for the first key exchange it makes sense to update the remote
+  address in the SA even if the received packets are unauthenticated, but during
+  normal operation it is very bad to update the remote addresses, which are the
+  result of authenticated packets, in favor of unauthenticated info (aka packets
+  with X flag set).
+  Idea: have a seperate address list for encrypted/authenticated packets and for
+  unauthenticated packets. If key exchange succeeds the addresses learned by it
+  are copied to the address list for encrypted packets.
+
+
+
+Golang Implementation:
+~~~~~~~~~~~~~~~~~~~~~~
+
+Packet Handling (Marshal/Unmarshal):
+
+  Encrypted- and PlainPacket have an internal buffer using fixed pre-allocated
+  memory. This might even be 64k (the UDP maximum size) because there won't be a
+  lot of them allocated at once (maximum one per NumCPU?!).
+  Header, Payload and Authtag of EncryptedPacket as well as Type and Payload of
+  PlainPacket are go slices pointing to the underlaying buffer. TheHeader of
+  EncryptedPacket und Type of PlainPacket have Getter and Setter which directly
+  encode/decode using BigEndian.(Put)?Uint(16|32). All of this shouldn't need any
+  mallocs and would therefor be pretty fast.
+
+  EncryptedPacket has function DecryptAndVerify() which returns a PlainPacket.
+  PlainPacket has a EncryptAndAuthenticate() which returns an EncryptedPacket.
+  The implicit copy operations of the crypto functions are free because the
+  encrypt/decrypt process needs to read and write the memory anyway and it makes
+  no difference whether the destination is the same or some other memory area.
+  Conclusion: Any packet handling goroutine holds one EncryptedPacket and one
+  PlainPacket.
+
+  Idea: Have NumCPU goroutines for receving and NumCPU goroutines for sending.
+
+    Receiving:    UPD   --> decrypt&verify --> tun/tap
+    Sendung:    tun/tap -->  encrypt&auth  -->   UDP
+
+
+  Question: How can multiple goroutines listen to multiple UDP sockets but only
+  have the overall system allow only NumCPU packets to be handled at once?
+  And what about the NumCPU goroutines in the other direction?
+
+  different approach:
+    - one goroutine listeing on all udp sockets + tun/tap using select()
+    - when dispatcher gouroutine wakes up it starts upto NumCPU goroutines
+      for all the sockets and tun/tap device ready for read.
+    - only if all the file descripters returned by select() are assigned to
+      a running goroutine the dispatcher goroutine calls select() again.
+    - if a worker goroutine is done it returns it's resources to the dispatchers
+      pool (resources = EncryptedPacket + PlainPacket)
+    - number of available resources (aka packets) = NumCPU
+
+
+
+Security Assoc DB:
+
+  A map with mux as key with a single RW lock. Only if clients are added or
+  removed the writers lock needs to be acquired. Any other goroutine only needs
+  to acquire the readers lock. The values of the map have their own RW lock for
+  locking concurrent access to them.
+
+  The value struct contains:
+    - RW-mutex (see above)
+    - timestamp when the SA was generated/updated by key-exchange
+    - last sequence number used for outgoing packets
+    - a list of remote addresses, one for any socket (RAIL-mode)
+      possibly: a second list of remote addresses for uauthenticated packets
+    - a list of sequence windows, one for any sender-id (anycast cluster)
+    - the master key and salt and algo for the key derivation function
+    - the cipher and auth algo to use (might be the same -> AES-GCM)
+    - auth tag length
+
+  For sending goroutines the next sequence number to be used can be calculated
+  using AddUint32() from sync/atomic hence only the readers lock is required.
+  EncryptedPacket.DecryptAndVerify possibly needs to update the remote address(es)
+  after the packet is verified. In RAIL-mode this needs to be done regardless of
+  the packet being accepted by the sequence window. If RAIL-mode is off the remote
+  address should only be updated if the sequnce window accepts the packet.
+
+  Question: the check if remote addresses need to be changed only needs the
+  readers lock but in case it differs the goroutine needs to release the readers
+  lock and acquire the writers lock. Is this a problem? Shall we acquire the
+  writers lock in any case?
+  For IPv4 adresses we could use sync/atomic CompareAndSwapUint32 but there is
+  no such thing for IPv6 aka 128bit values.
+  (And we would even need to include the port!)
+
+
+Sequence Window:
+
+  EncryptedPacket.DecryptAndVerify needs to check the squence window which is a
+  compare and write operation.
+  Idea: Sequence window consists of one uin64 and a number of uint32 slices. The
+  first uint64 is split into a 32bit part for the current top sequence number
+  and 32 bit of flags. Each flag represents one sequence number (aligned to
+  multiples of the 32bit sequnce number). Any subsequent 32bit value contains
+  flags for older packets.
+  The 64bit and all subsequent 32bit slices can be modified using commands from
+  sync/atomic. When the bitmaps need to be rotated (ie. when the new sequence
+  number advances the window to the next 32bit boundary) the writers lock for
+  the window needs to be held. In any other cases the readers lock is enough and
+  the bit test & set ops are atomic. This minimizes the number of times the
+  writers lock is held to roughly 1/32 of every incoming packet for that
+  sequence-window (Note: there is one squence-window per mux and sender-id).
author	Christian Pointner <equinox@anytun.org>	2017-09-09 01:50:51 +0200
committer	Christian Pointner <equinox@anytun.org>	2017-09-09 01:50:51 +0200
commit	f501d1780e69821363045e427b8dcbc7351e0735 (patch)
tree	e918c71f633d199975afdc77fbb219ff510109b3
parent	added benchmarks for packet marshal and unmarshal (diff)