Winter Is Here: Comparing Methodologies for Sampling Game of Thrones 703 and 705


This is a map of unique peers generated from the second day of sampling Game of Thrones episode 703. On this map, each unique peer is a very small colored dot, and represents the address on the map associated to the internet address where peer activity was measured at any point over the span of the day. If the peer was active multiple times in the day, say five minutes in the morning, and then re-joined after work in the evening, it is counted only once. If the peer was only active for ten minutes at lunch, it is counted only once.

That is, if the swarm is being sampled at that particular moment. Alpha60 might be on a lunch break itself!

Alpha60 has tried two different methods of sampling the swarm activity for Game of Thrones episodes during season seven, and is likely to try a third starting with episode 706 or 707.

This is the map of unique peers found during the second day of sampling Game of Thrones episode 705.

For Episode 703, the swarm was sampled every 4 hours, implying the number of reported peers for this episode could really be multiplied by four or more. For episode 705, the swarm was sampled ever 2 hours, implying that the number of reported peers could be a multiple of two or more (or less).


703 with 4 hour sampling, peak day: 1.038M (4.12M estimated)

705 with 2 hour sampling, peak day: 2.321M (4.6M estimated)

What is the true size and composition of the peer swarm? The answer remains unknown, for more reasons than the sampling issue listed above. To properly estimate the peer swarm, a statistical model must be developed that takes into account swarm sampling rate and duration in order to derive the sampling methodology saturation rate.

To get a more complete picture of the peer swarm, and the attempt to get to sample saturation, imagine the hypothetical results for 706 if sampled every hour as:

706 with 1 hour sampling, peak day: 4.5M

As 706 with hour sampling is less than 4.6M peers, then saturation is reached and statistical oversampling and subsampling methods can be devised to estimate the peer swarm with more precision, generating a model of the swarm. This method has a high confidence of seeing most of the peer swarm. It is hoped that hour by hour sampling will produce saturation results.

However, if 706 with hour sampling is more than 4.6M peers, then saturation is not reached. Then the race continues: another duration shortening will have to happen, a duplicate scraper added, and sampling rate cut down to a half hour. At some point in this madness, it is assumed that the point will be reached where adding a duplicate scraper does not add to the total number of unique peers.