summaryrefslogtreecommitdiff
path: root/doc/protocol.md
blob: fdd44f844a02cd893bc82481937f9b96235f1d4c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
Introduction
============

There a two types of interfaces: stateful and stateless.

Stateful interfaces use persistent connections and send an init message after the
connection has been established. The values in this message are treated as defaults
which will be used if the corresponding value is missing in subsequent data-update
messages. In any case the values from data updates override values from init messages.

Stateless interfaces will not use persistent connections but are datagram oriented
therfore all values must be defined in data-update messages.


Structure of data and meaning of data fields:
---------------------------------------------

Sources of data updates are called streamer. Streamer are defined by the hostname of the
machine it runs on, a content specifier (room1-audio, room2-av, audio-english, ...),
a format sepcifier (flash, webm, hls, dash, ...) and a quality specifier (high, low, ...).

Any data update has a start time and a duration. Those two values specify the timespan
during which a source gathered the data. Both these values are processesd and stored
with millisecond precision.

The actual data of the update consist of 3 aggregated values: client count, bytes sent and
bytes received.
Client count is the number of clients that are or have been connected for at least some
time within the timespan as specified by start time and duration. Bytes sent is the overall
number of bytes sent by the source to all the clients combined. Bytes received is the number
of bytes that the source received from it's stream producer to be sent out to the clients.
In an ideal world those three values have the following relation:

     bytes-sent = bytes-received * client-count

In addition to aggregated data, data updates may contain a list of all connected clients.
In order to be useful any client entry must contain the IP address of the client as well
as the bytes sent to it. Client list entries might also contain the port and other
information such as user agent strings or Geo IP information.



Messages
========

init
----

{
  "version": 2,
  "SourceHubUuid": "f7df89b4-171e-4b2f-a8a4-e58ac99e5dc5",
  "SourceHubUpdateId": 23,
  "ForwardHubUuid": "b041315e-5039-4c75-81e8-9fd42250b011",
  "ForwardHubUpdateId": 42,
  "hostname": "myhostname",
  "stream": { "content": "av-orig", "format": "flash", "quality": "medium" },
  "tags": [ "elevate", "2014", "discourse" ]
}

All fields except "version" are optional.


data-update
-----------

{
  "version": 2,
  "SourceHubUuid": "f7df89b4-171e-4b2f-a8a4-e58ac99e5dc5",
  "SourceHubUpdateId": 23,
  "ForwardHubUuid": "b041315e-5039-4c75-81e8-9fd42250b011",
  "ForwardHubUpdateId": 42,
  "hostname": "myhostname",
  "stream": { "content": "av-orig", "format": "flash", "quality": "medium" },
  "tags": [ "elevate", "2014", "discourse" ]
  "start-time": "2014-08-03T12:34:56.123Z",
  "duration-ms": 5000,
  "data": {
    "clients": [
       { "ip": "127.0.0.1", "port": 1234, "bytes-sent": 12094, "user-agent": "Mozilla Version 28", .... },
        .....
    ],
    "client-count": 12,
    "bytes-received": 12345,
    "bytes-sent": 921734098,
     ....
  }
}

All values which have been defined by the init message are optional.
"SourceHubUuid", "SourceHubUpdateId", "ForwardHubUuid", "ForwardHubUpdateId", "tags",
"data.bytes-received" and "data.clients" might be omitted and are treated as an empty
string, 0 or empty array respectively. If "clients" is present "port" and "user-agent"
fields of the entries might be empty or missing. Also in this case "data.client-count"
and "data.bytes-sent" might be 0 or omitted as those values will be calculated from
the contents of "data.clients" by the hub while ingesting the data.
In addition to the user-agent string a client entry may have the following geo-info
fields (all of which might be omitted):

    "country"  .........  the name of the country
    "country-code2"  ...  the 2-letter country code
    "region"  ..........  the name of the region
    "region-code"  .....  the 2-letter code for the region as defined by the
                          MaxMind GeoIP2 database
    "city"  ............  the name of the city
    "latitude"  ........  latitude in degrees as float value
    "longitude"  .......  longitude in degrees as float value