Following up on the “What’s Normal” diary from a couple of weeks ago, I have a new one: The size of connections. I am going to focus on the number of bytes being transmitted.
First of all, how to get the data. I am using my JSON formatted zeek logs again to extract the raw data (this may be easier with netflow data):
zcat conn.*gz | jq ‘. | select(.proto==”udp”) | (.orig_ip_bytes+.resp_ip_bytes)’ | sort | uniq -c > /tmp/udpsize.txt
zcat conn.*gz | jq ‘. | select(.proto==”tcp”) | (.orig_ip_bytes+.resp_ip_bytes)’ | sort | uniq -c > /tmp/tcpsize.txt
For additional analysis, I use the “datamash” tool, available via apt for Debian Linux or homebrew for macOS.
datamash count 1 mean 1 median 1 min 1 max 1 < tcpsize.txt
741776 81431.352935388 1275 0 829044805
datamash count 1 mean 1 median 1 min 1 max 1 < udpsize.txt
1084957 10447.352885875 200 0 687501036
Making this a bit more readable as a table:
Overall, this is what I expected. There are more (and shorter) UDP connections compared to TCP connections. Both do include extremely large connections with a couple hundred MBytes being transferred.
Let’s visualize this quickly with gnuplot:
What is surprising is the large number of very short connections for TCP. This is confirmed by looking at the raw data:
There are many TCP connections with 44 or 60 bytes. This isn’t surprising: Incomplete connections (portscans?). 40 bytes is an IP and TCP header with no option. 44 bytes gets us a single TCP option, like a maximum segment size (MSS).
So I probably should have eliminated them as anomalies.
Regarding the very large connections:
The TCP connection went to Wasabi, a cloud storage provider I use for backups. The UDP connection turned out to be a device using a VPN (as it should in this case). So nothing “bad”, but two more things ruled out and explained :).
(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.