diff options
author | Christian Pointner <equinox@anytun.org> | 2017-09-09 01:50:51 +0200 |
---|---|---|
committer | Christian Pointner <equinox@anytun.org> | 2017-09-09 01:50:51 +0200 |
commit | f501d1780e69821363045e427b8dcbc7351e0735 (patch) | |
tree | e918c71f633d199975afdc77fbb219ff510109b3 /NOTES | |
parent | added benchmarks for packet marshal and unmarshal (diff) |
added some notes and ideas
Diffstat (limited to 'NOTES')
-rw-r--r-- | NOTES | 176 |
1 files changed, 176 insertions, 0 deletions
@@ -0,0 +1,176 @@ +Ideas and Notes from Brainstorming Sessions (2017-09-08) +======================================================== + +Protocol: +~~~~~~~~~ + +sender-id/mux: + + We already discussed the possiblity to split up the mux in order to have + support for link-local OOB messages. The downside is that this reduces the + number of concurrent virtual connections... + + New Idea: don't sub-assign parts of mux but reduce sender-id to 12 bit. This + frees up 4 bits for additional signaling. + The new header would look like this: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | sequence number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |X ? ? ?| sender ID | MUX | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + X .. key exchange flag or unencrypted flag? + ? .. reserved + + +Key-Exchange (inline): + + Idea: Key-Exchange daemons can communicate with other side via link-local IPv6 + addresses (works with tun and tap, at least on linux...) + If packets incoming on tun/tap interface are IPv6 and have a link-local source + or destination IP, messages are sent to the other side unecrypted and with the + X flag set. + + Idea: use crypto role (server/client, left/right, alice/bob) for addressing + + possible adressing scheme: + + role(server,left,alice) -> fe80::xxxx:0:0:1:0/64 + role(client,right,bob) -> fe80::xxxx:0:0:0:<mux>/64 + + (xxxx is a well known number for SATP, i.e is always '5ADB') + + Question: How to handle systems with IPv6 disabled? No inline Key-Exchange + support in that case? IPv4 Link-local Adresses only have a /16 range and we + would loose one mux value in that case (or 3 if we also omit network and + broadcast addresses -> not too bad...) + + The advantage of the use of link-local addresses is that in that case the + key-exchange can use TCP from OS kernel which is already resilient against + packet duplication and does retransmits -> very nice for RAIL-mode which will + produce a lot of duplicates and probably still has packet loss. + Possible downside is that not all programs/key-exchange daemons support + link-local addresses -> write proxy application for that case! + + An anycast receiver will send a "redirect" message when it receives a packet + with the X flag set on it's anycast address. This redirect will point to a + unicast address on the same host. This way key-exchanges can be sure they only + talk to a single host. For some key-exchanges it should be possible to send + early data with the initial packet and the "redirect" message to save some + round-trips. + I.e. Ikev2 needs two round trips to establish a SA. The first two messages can + be in the initial packet and the "redirect" message. The remaining 2 packets + will then be sent to the unicast address of the anycast host which guarantees + to reach an ikev2 daemon which has already seen the first part of the + handshake. + Does this work together with the IPv6 Link-Local address idea from above? + + + Question: for the first key exchange it makes sense to update the remote + address in the SA even if the received packets are unauthenticated, but during + normal operation it is very bad to update the remote addresses, which are the + result of authenticated packets, in favor of unauthenticated info (aka packets + with X flag set). + Idea: have a seperate address list for encrypted/authenticated packets and for + unauthenticated packets. If key exchange succeeds the addresses learned by it + are copied to the address list for encrypted packets. + + + +Golang Implementation: +~~~~~~~~~~~~~~~~~~~~~~ + +Packet Handling (Marshal/Unmarshal): + + Encrypted- and PlainPacket have an internal buffer using fixed pre-allocated + memory. This might even be 64k (the UDP maximum size) because there won't be a + lot of them allocated at once (maximum one per NumCPU?!). + Header, Payload and Authtag of EncryptedPacket as well as Type and Payload of + PlainPacket are go slices pointing to the underlaying buffer. TheHeader of + EncryptedPacket und Type of PlainPacket have Getter and Setter which directly + encode/decode using BigEndian.(Put)?Uint(16|32). All of this shouldn't need any + mallocs and would therefor be pretty fast. + + EncryptedPacket has function DecryptAndVerify() which returns a PlainPacket. + PlainPacket has a EncryptAndAuthenticate() which returns an EncryptedPacket. + The implicit copy operations of the crypto functions are free because the + encrypt/decrypt process needs to read and write the memory anyway and it makes + no difference whether the destination is the same or some other memory area. + Conclusion: Any packet handling goroutine holds one EncryptedPacket and one + PlainPacket. + + Idea: Have NumCPU goroutines for receving and NumCPU goroutines for sending. + + Receiving: UPD --> decrypt&verify --> tun/tap + Sendung: tun/tap --> encrypt&auth --> UDP + + + Question: How can multiple goroutines listen to multiple UDP sockets but only + have the overall system allow only NumCPU packets to be handled at once? + And what about the NumCPU goroutines in the other direction? + + different approach: + - one goroutine listeing on all udp sockets + tun/tap using select() + - when dispatcher gouroutine wakes up it starts upto NumCPU goroutines + for all the sockets and tun/tap device ready for read. + - only if all the file descripters returned by select() are assigned to + a running goroutine the dispatcher goroutine calls select() again. + - if a worker goroutine is done it returns it's resources to the dispatchers + pool (resources = EncryptedPacket + PlainPacket) + - number of available resources (aka packets) = NumCPU + + + +Security Assoc DB: + + A map with mux as key with a single RW lock. Only if clients are added or + removed the writers lock needs to be acquired. Any other goroutine only needs + to acquire the readers lock. The values of the map have their own RW lock for + locking concurrent access to them. + + The value struct contains: + - RW-mutex (see above) + - timestamp when the SA was generated/updated by key-exchange + - last sequence number used for outgoing packets + - a list of remote addresses, one for any socket (RAIL-mode) + possibly: a second list of remote addresses for uauthenticated packets + - a list of sequence windows, one for any sender-id (anycast cluster) + - the master key and salt and algo for the key derivation function + - the cipher and auth algo to use (might be the same -> AES-GCM) + - auth tag length + + For sending goroutines the next sequence number to be used can be calculated + using AddUint32() from sync/atomic hence only the readers lock is required. + EncryptedPacket.DecryptAndVerify possibly needs to update the remote address(es) + after the packet is verified. In RAIL-mode this needs to be done regardless of + the packet being accepted by the sequence window. If RAIL-mode is off the remote + address should only be updated if the sequnce window accepts the packet. + + Question: the check if remote addresses need to be changed only needs the + readers lock but in case it differs the goroutine needs to release the readers + lock and acquire the writers lock. Is this a problem? Shall we acquire the + writers lock in any case? + For IPv4 adresses we could use sync/atomic CompareAndSwapUint32 but there is + no such thing for IPv6 aka 128bit values. + (And we would even need to include the port!) + + +Sequence Window: + + EncryptedPacket.DecryptAndVerify needs to check the squence window which is a + compare and write operation. + Idea: Sequence window consists of one uin64 and a number of uint32 slices. The + first uint64 is split into a 32bit part for the current top sequence number + and 32 bit of flags. Each flag represents one sequence number (aligned to + multiples of the 32bit sequnce number). Any subsequent 32bit value contains + flags for older packets. + The 64bit and all subsequent 32bit slices can be modified using commands from + sync/atomic. When the bitmaps need to be rotated (ie. when the new sequence + number advances the window to the next 32bit boundary) the writers lock for + the window needs to be held. In any other cases the readers lock is enough and + the bit test & set ops are atomic. This minimizes the number of times the + writers lock is held to roughly 1/32 of every incoming packet for that + sequence-window (Note: there is one squence-window per mux and sender-id). |